Ranged Image Imports, Including Produced Data

Follow

Outlined below are the steps and workflow tips recommended by the Nextpoint Data Strategy Team to manage the import of produced data or data being migrated from another platform.  You can also watch our Advanced Import training video here for additional information.

All steps outlined are based on the assumption of a ranged image import.  The typical format for such an import is single-page tiff/jpg image files named by their Bates number, together with document level text files, any included natives, and document breaks/metadata contained in a load file.

WHAT DOES A RANGED IMAGE IMPORT LOOK LIKE?

The data set will contain up to 3 folders (at the very least, this type of data set must contain an "Images" folder):

IMAGES - this folder contains the document pages, each a one-page image file

TEXT - this folder contains the OCR text information, and can be either one text file per page, or one text file per document

NATIVES - this folder contains any native files that accompany each document

Image file pages will be in the .tif or .jpg format, and the files will be named by their bates numbers. If included, the OCR text and Native files will be also be named by the corresponding bates numbers. Here is what a common single-page image data set looks like:

1.png

2.png

3.png   


 PRODUCED DATA IMPORT APPROACH

    1. Download metadata file and convert from DAT to CSV.
      1. See instructions below in Exhibit A
    2. Open CSV and review the first two or three columns to confirm if they have Bates or a general control id.
      1. Typically, bates_start/bates_end or beg_doc/end_doc columns

    3. Open the source production folder and navigate to the IMAGES folder.  Check for individual tiffs/jpgs.
      1. If IMAGES is named something else (e.g. TIFFS), rename TIFFS to IMAGES
    4. Open Settings >> Coding >> Fields in your Nextpoint database.
    5. Load File Configuration / Document Boundaries: Check if there is begattach / endattach, production begin/production end, etc... 

      1. These beginning and ending numbers will identify which range of single page tiff/jpgs should be pulled to create individual documents (one document per row)
      2. If there is not already a begattach and endattach field under Settings >> Coding >> Fields, add as Freeform fields.
    6. Load File Configuration / Tvfileid and Tvfamilyid: Check if every row has a begattach value and populate tvfileid and tvfamilyid columns.  These values are vital in establishing family relationships in emails after import.
      1. Tvfileid = Bates Start and tvfamilyid = Begattach. You can simply copy/paste Bates Start and BegAttach values to new columns and rename accordingly. 

      2. Add tvfileid and tvfamilyid under Settings >> Coding >> Fields.
    7. Load File Configuration / Metadata fields: Review column headers and compare to the Fields in the database.
      1. Set up Fields: For any header in your load file, you will need to have a corresponding Field.

        • If there is already a field set up in the database, use that.
        • If not, create a new Field named the same as your load file headers.
      2. Be aware of default Fields and Document Attributes which exist (and do not need to be set up).
        • See list below in Exhibit B.
        • If a header value in your load file matches a header in Exhibit B, you do not need to set up a corresponding field, but do make sure they match exactly.
      3. Be aware of Protected System Fields!
        • For any of the fields listed in below, Exhibit C, it is necessary to rename the fields in your load file and set up a corresponding Field.
      4. Text and Native Paths: Nextpoint needs to know which text and native files to grab and line up with their respective document image(s). This is accomplished by using a text_file or native_file column headers, which contain the path to and name of the text and native files, respectively.

        • IMPORTANT: These two columns MUST be named text_file and native_file for the import to work correctly
        • Check to make sure the paths are correct.  The paths should start from where the load file is going to be saved, likely the parent production folder and in line with IMAGES/TEXT/NATIVES.  Example, the path would start with TEXT/ or NATIVES/, NOT  ./ or /
      5. Specific Field Notes
        • Subject: Refers to an Email so recommend changing to Email Subject
        • Title: Refers to a Document so recommend changing to Efile Title
    8. After all Fields are squared away, it is critical to the import’s success to save your load file as nextpoint_load_file.csv
    9. Check your database SETTINGS for Deduplication
    10. Place load file at the root of your production folder and then upload production folder to Nextpoint File Room:

      FileRoomImportPlacement.png
    11. Import your Production Folder
    12. Once Import is complete, email support@nextpoint.com and request “Family Linking on batch: ##”.  This will prompt the team to visually pair parent emails and attachments in the database via the tvfileid and tvfamilyid fields.

EXHIBITS

Exhibit A Exhibit B Exhibit C
Converting a DAT Load File to CSV 

For successful import to Nextpoint, Load files need to be saved in a CSV (comma separated value) format. DAT files can be manually converted to the CSV format by simply replacing the delimiters with a program like TextPad. So, first open your .dat file with TextPad.

The values in your original .dat file will likely be separated by the symbols þ□þ (thorn, □ (ASCII 20), and another thorn). So, your coding values look like this:

þBEGPRODþ□þENDPRODþ□þBEGPROD_ATTþ□þENDPROD_ATTþ□þCustodianþ

Using the functions in TextPad, you can do a find-replace (F8) on these characters. Use the following sequence:

  1. Find-replace all " (quote) with "" (double quote). This will ensure that your line breaks remain consistent.
  2. Find-replace all þ (thorn) with " (quote)
  3. Find-replace all □ (ASCII 20) with , (comma)
  4. This will result in all values being now separated by "," (quote comma quote):

    "BEGPROD","ENDPROD","BEGPROD_ATT","ENDPROD_ATT","Custodian"

  5. Save file as: nextpoint_load_file.csv.
  6. Open the new .csv file in Excel. Your values will be separated into columns (based on the new comma-quote positions).

 

Have Questions?

Produced data and migration imports can be difficult. If you need additional training or assistance from the Nextpoint Data Strategy team, please contact us via email at support@nextpoint.com

1 out of 1 found this helpful

Comments

0 comments

Please sign in to leave a comment.