Outlined below are the steps and workflow tips recommended by the Nextpoint Data Strategy Team to manage the import of produced data or data being migrated from another platform. You can also watch our Advanced Import training video here for additional information.
All steps outlined are based on the assumption of a ranged image import. The typical format for such an import is single-page tiff/jpg image files named by their Bates number, together with document-level text files, any included natives, and document breaks/metadata contained in a load file.
See below for a breakdown of what a ranged image import typically looks like.
The data set will contain up to 3 folders (at the very least, this type of data set must contain an "Images" folder):
IMAGES - this folder contains the document pages, each a one-page image file
TEXT - this folder contains the OCR text information, and can be either one text file per page, or one text file per document
NATIVES - this folder contains any native files that accompany each document
Image file pages will be in the .tif or .jpg format, and the files will be named by their Bates numbers. If included, the OCR text and Native files will be also be named by the corresponding Bates start numbers of the document they represent. Here is what a common single-page image data set looks like:
It is recommended you review these required considerations for produced data imports prior to proceeding with the following steps.
- Download the .DAT metadata load file from your produced data set and convert from a .DAT format to .CSV format.
- This is likely stored in the DATA subdirectory of your source production folder.
- See instructions for converting your .DAT to a .CSV in "Exhibit A" tab above or watch the video linked here.
- Open the source production folder and navigate to the IMAGES folder. Check for individual tiffs/jpgs.
- In some cases, the image files are provided as PDFs named by their Bates start. This is OK.
- Load File Configuration / Document Boundaries: Open the .CSV load file you previously converted (from the .DAT) and review the first two or three columns to confirm if they have Bates or a general control id.
- These beginning and ending numbers will identify which range of single page tiff/jpgs should be pulled to create individual documents (one document per row).
- In your load file, if named something other than bates_start and bates_end (e.g. bates_begin/end, prod_beg/end, etc..), please rename these columns to bates_start and bates_end, respectively.
- These beginning and ending attachment numbers are vital to establishing email family relationships via Family Linking as mentioned below in step 11.
- In your load file, if named something other than begattach and endattach (e.g. prod_begin/end, production_beg_attach/production_end_attach, etc..), please rename these fields to begattach and endattach, respectively.
- In your Nextpoint database, if there is not already a begattach and endattach field under Settings >> Coding >> Fields, add as Freeform fields.
- Set up Fields: For any header in your load file, you will need to have a corresponding Field in your Nextpoint database.
- If there is already a field set up in the database, use that.
- See this list of default fields and document attributes which already exist in each Nextpoint database. This information can also be found in the "Exhibit B" tab above.
- If a header value in your load file matches a field from the list in Exhibit B, you do not need to set up a corresponding field in your Nextpoint database. However, do make sure your load file header matches the list in Exhibit B exactly.
- After completing the above, if there are any remaining headers in your load file without a corresponding field in your database, create a new Field named the same as your load file headers.
- Be aware of Protected System Fields! For any of the fields listed in "Exhibit C" tab above, it is necessary to rename the fields in your load file and set up a corresponding Field.
- If there is already a field set up in the database, use that.
- Text and Native Paths: Nextpoint needs to know which text and native files to pull and line up with their respective document image(s).
This is accomplished by using the text_file and/or native_file column headers, which contain the path to, and name of, the text and native files, respectively.
- IMPORTANT: These two columns MUST be named text_file and native_file for the import to work correctly
- Check to make sure the paths are relative to where the load file will be saved. See more information on relative paths here.
For successful import to Nextpoint, Load files need to be saved in a CSV (comma-separated value) format. DAT files can be manually converted to the CSV format by simply replacing the delimiters with a program like TextPad. So, first open your .dat file with TextPad.
The values in your original .dat file will likely be separated by the symbols þ□þ (thorn, □ (ASCII 20), and another thorn). So, your coding values look like this:
Using the functions in TextPad, you can do a find-replace (F8) on these characters. Use the following sequence:
- Find-replace all " (quote) with "" (double quote). This will ensure that your line breaks remain consistent.
- Find-replace all þ (thorn) with " (quote)
- Find-replace all □ (ASCII 20) with , (comma)
- This may read as <0x14> if working in Sublime Text as mentioned below
- This will result in all values being now separated by "," (quote comma quote):
- Save file as: nextpoint_load_file.csv.
- If given the option to select a particular plain text encoding when saving, select Unicode (UTF-8)
- Open the new .csv file in Excel. Your values will be separated into columns (based on the new comma-quote positions).
Troubleshooting: "My find-replace is not working correctly."
Some clients have had trouble in the past with certain characters being unexplainably replaced during this process, thus altering the desired results of their ending CSV (e.g. "" replacing the character sequence of "th" ). If you notice nuances in your resulting CSV, some users have found Sublime Text to be a helpful text editor alternative.
For any of the below-listed fields, you do not need to set up a new Field under SETTINGS > Coding. Instead, if you have a header value in Row 1 of your load file, make sure that the load file value matches the below default fields exactly.
Values can be imported by load file headers to the provided values:
|Bates Start||Created Date Time|
|Bates End||Modified Date Time|
|Bates Range Start||Last Print Date|
|Bates Range End||File Name|
|Email Subject||File Path|
|Email Author||Root Folder|
|Mailbox Path||Document Date|
|Email Thread Index||Text File|
|Document Title||Native File|
|Document Last Author|
Note: Custodian/Custodians are visible under Settings > Import > Custodians
Data cannot be imported into any of the below fields because they are generated by the Nextpoint application. If you’d like to map any of the below values into your database, you will be required to setup a field with a different name. Common fields replaced and suggested replacement values provided in parenthesis:
|Filename (use existing File Name)||Exhibit_stamped_as|
|Filepath (use existing File Path)||Expansive_hash|
|Confidentiality (Conf_Status)||Highlight_issues (Annotation_Issues)|
|Confidentiality_Status (Conf_Status)||Highlight_notes (Annotation_Notes)|
|Created_at (use existing Created_Date_Time)||Incoming_wire_id|
|Deposition_names||Original_filename (use existing File Name)|
|Domain (EmailDomain)||Redacted (Isredacted)|
|Email_message_id (MessageID)||Relevancy_Status (Relevancy)|
|Email_Thread (ThreadID)||Updated_at (Timeupdated)|