Outlined below are the steps and workflow tips recommended by the Nextpoint Data Strategy Team to manage the import of produced data or data being migrated from another platform. You can also watch our Advanced Import training video here for additional information.
All steps outlined are based on the assumption of a ranged image import. The typical format for such an import is single-page tiff/jpg image files named by their Bates number, together with document-level text files, any included natives, and document breaks/metadata contained in a load file.
See below for a breakdown of what a ranged image import typically looks like.
The data set will contain up to 3 folders (at the very least, this type of data set must contain an "Images" folder):
IMAGES - this folder contains the document pages, each a one-page image file
TEXT - this folder contains the OCR text information, and can be either one text file per page, or one text file per document
NATIVES - this folder contains any native files that accompany each document
Image file pages will be in the .tif or .jpg format, and the files will be named by their bates numbers. If included, the OCR text and Native files will be also be named by the corresponding bates numbers. Here is what a common single-page image data set looks like:
- Download metadata file and convert from DAT to CSV.
- See instructions in "Exhibit A" tab above
- See instructions in "Exhibit A" tab above
- Open the source production folder and navigate to the IMAGES folder. Check for individual tiffs/jpgs.
- Load File Configuration / Document Boundaries: Open your CSV and review the first two or three columns to confirm if they have Bates or a general control id.
- These beginning and ending numbers will identify which range of single page tiff/jpgs should be pulled to create individual documents (one document per row).
- If named something other than bates_start and bates_end (e.g. bates_begin/end, prod_beg/end, etc..), please rename these columns to bates_start and bates_end, respectively.
- These beginning and ending attachment numbers are vital to establishing email family relationships via Family Linking as mentioned below in step 11.
- If named something other than begattach and endattach (e.g. prod_begin/end, production_beg_attach/production_end_attach, etc..), please rename these fields to begattach and endattach, respectively.
- If there is not already a begattach and endattach field under Settings >> Coding >> Fields, add as Freeform fields.
- Set up Fields: For any header in your load file, you will need to have a corresponding Field.
- If there is already a field set up in the database, use that.
- If not, create a new Field named the same as your load file headers.
- Be aware of default Fields and Document Attributes which exist (and do not need to be set up).
- See list in "Exhibit B" tab above.
- If a header value in your load file matches a header in Exhibit B, you do not need to set up a corresponding field but do make sure they match exactly.
- Be aware of Protected System Fields!
- For any of the fields listed in "Exhibit C" tab above, it is necessary to rename the fields in your load file and set up a corresponding Field.
- Text and Native Paths: Nextpoint needs to know which text and native files to grab and line up with their respective document image(s). This is accomplished by using a text_file or native_file column headers, which contain the path to and name of the text and native files, respectively.
- IMPORTANT: These two columns MUST be named text_file and native_file for the import to work correctly
- Check to make sure the paths are correct. The paths should start from where the load file is going to be saved, likely the parent production folder and in line with IMAGES/TEXT/NATIVES. Example, the path would start with TEXT/ or NATIVES/, NOT ./ or /
- Specific Field Notes
- Subject: Refers to an Email so recommend changing to Email Subject
- Title: Refers to a Document so recommend changing to Efile Title
For successful import to Nextpoint, Load files need to be saved in a CSV (comma separated value) format. DAT files can be manually converted to the CSV format by simply replacing the delimiters with a program like TextPad. So, first open your .dat file with TextPad.
The values in your original .dat file will likely be separated by the symbols þ□þ (thorn, □ (ASCII 20), and another thorn). So, your coding values look like this:
Using the functions in TextPad, you can do a find-replace (F8) on these characters. Use the following sequence:
- Find-replace all " (quote) with "" (double quote). This will ensure that your line breaks remain consistent.
- Find-replace all þ (thorn) with " (quote)
- Find-replace all □ (ASCII 20) with , (comma)
- This may read as <0x14> if working in Sublime Text as mentioned below
- This will result in all values being now separated by "," (quote comma quote):
- Save file as: nextpoint_load_file.csv.
- Open the new .csv file in Excel. Your values will be separated into columns (based on the new comma-quote positions).
Troubleshooting: "My find-replace is not working correctly."
Some clients have had trouble in the past with certain characters being unexplainably replaced during this process, thus altering the desired results of their ending CSV (e.g. "" replacing the character sequence of "th" ). If you notice nuances in your resulting CSV, some users have found Sublime Text to be a helpful text editor alternative.
For any of the below-listed fields, you do not need to set up a new Field under SETTINGS > Coding. Instead, if you have a header value in Row 1 of your load file, make sure that the load file value matches the below default fields exactly.
Values can be imported by load file headers to the provided values:
|Bates Start||Created Date Time|
|Bates End||Modified Date Time|
|Bates Range Start||Last Print Date|
|Bates Range End||File Name|
|Email Subject||File Path|
|Email Author||Root Folder|
|Mailbox Path||Document Date|
|Email Thread Index||Text File|
|Document Title||Native File|
|Document Last Author|
Note: Custodian/Custodians are visible under Settings > Import > Custodians
Data cannot be imported into any of the below fields because they are generated by the Nextpoint application. If you’d like to map any of the below values into your database, you will be required to setup a field with a different name. Common fields replaced and suggested replacement values provided in parenthesis:
|Filename (use existing File Name)||Exhibit_stamped_as|
|Filepath (use existing File Path)||Expansive_hash|
|Confidentiality (Conf_Status)||Highlight_issues (Annotation_Issues)|
|Confidentiality_Status (Conf_Status)||Highlight_notes (Annotation_Notes)|
|Created_at (use existing Created_Date_Time)||Incoming_wire_id|
|Deposition_names||Original_filename (use existing File Name)|
|Domain (EmailDomain)||Redacted (Isredacted)|
|Email_message_id (MessageID)||Relevancy_Status (Relevancy)|
|Email_Thread (ThreadID)||Updated_at (Timeupdated)|