Topics Below:
- What is a load file?
- ESI and self-contained files
- Data processed to images, including previously produced data
- Important load file column headers
- Compiling a list of files to create a load file
What is a load file?
A load file tells the Cloud processors what documents to look for, where they are located and allows for the import of all metadata.
The content for your load file will come from either an existing load file which you will need to convert, or by manually generating a directory listing of your files.
Modify your load file in Excel, then "Save As" a .csv file. Your load file must be named nextpoint_load_file.csv in order for Nextpoint to recognize it as the load file for your import.
The two main elements of a load file are the column headers and the coding values. The column headers correspond with document coding fields in your Nextpoint case, which need to be populated with metadata. The coding values on the load file contain the information that will be mapped (sent) to those fields for each document. Each row of a load file = 1 document in Nextpoint.
* Spaces are not recommended in load file headers, since different character encoding can create inconsistency among text. So, where a coding field in Nextpoint has a space, the corresponding column header must have an underscore.
Load files can be very simple or quite complex, depending on the complexity of your data set. To start constructing your load file, first identify what type of data you are dealing with:
- ESI and self-contained files - emails/PST/mailbox containers, PDFs and other file types where each file = 1 document
- Data processed to images, including previously produced data - single-page image files (TIFF or JPG) that get assembled into documents in Nextpoint, based on boundaries defined in a load file
* see processes below
ESI and self-contained files
1. Which documents need to be imported, and where are they?
Nextpoint needs to know what files it is looking for, and where to find them. This is accomplished by using an image_file column header. The information in the image_file column contains the path (location) of the file, and the file name itself.
2. What title should Nextpoint give the document?
Nextpoint needs to know what to call the document, or what value to place in the subject/title field. This is accomplished by using a title column header.
So in order to import a set of multi-page files to Nextpoint, your load file would need to look like this. Row 1, the column headers. Rows 2-7, the coding values that correspond to each column header for the 6 documents being imported. These are the minimum required load file commands for such a document import:
In this example, you would save the load file and place it in the same location as the "Evidence" folder in your data set. When you select the folder "Import" as your upload, the path information in the image_file column of your load file tells Nextpoint application where to start digging for each source file:
Beyond the required column headers illustrated above, you can add any other coding information that you would like to be mapped to your documents in Nextpoint. Remember in order to send additional coding values to your documents in Nextpoint during import, the coding fields need to be created first in your Nextpoint database. There are exceptions, or standard fields that are already built in to Nextpoint are:
- Bates
- Shortcut
- Author
- Document Type
- Title
- Document Date
* all this means is that you don't need to add these fields to your Nextpoint database - they still need to be part of your load file
Here is an example of a load file with additional coding, and the way that the first processed document (from Row 2) would look after the import:
3. Loading native files with a placeholder: You can also use a load file to import native files with a placeholder image. Just use the image_file column (for your placeholder image file), then include a native_file column for the native/original file. The native file will be uploaded to Nextpoint with the placeholder file as its image page:
Data processed to images, including previously produced data
First, the data set will contain up to 3 folders (at the very least, this type of data set must contain an IMAGES folder):
IMAGES - this folder contains the document pages, each a one-page image file
TEXT - this folder contains the OCR text information, and can be either one text file per page, or one text file per document
NATIVES - this folder contains any native files that accompany each document
Image file pages will be in the .tif or .jpg format, and the files will be named by their bates numbers. If included, the OCR text and native files will be also be named by the corresponding bates numbers. Here is what a common single-page image data set looks like:
This data structure is constant among most imports of this type, so Nextpoint has built in some automation to allow for more efficient utilization of load files:
1. Which documents need to be imported, and where are they?
When single-page image files are located in a folder called IMAGES and are named by their Bates values, you only need to use the column headers bates_start and bates_end. The combination of these two headers serves two functions:
- To tell Nextpoint which page files to use to assemble each document image (the page boundaries for each document). Nextpoint will automatically find the image files based on the Bates numbers, since they are in an IMAGES folder.
- To assign the appropriate Bates values to each document as it is processed into Nextpoint.
2. What title should Nextpoint give the document?
Nextpoint needs to know what to call the document, or what value to place in the subject/title field. This is accomplished by using a title column header.
3. Which OCR text file corresponds to each document?
Nextpoint needs to know what text file to grab, and apply to each document image as the OCR text. This is accomplished by using a text_file column header, which contains the path to and name of the text file.
4. Which native file corresponds to each document (where applicable)?
Nextpoint needs to know what native file to grab, and apply to each document image that requires an accompanying native. This is accomplished by using a native_file column header, which contains the path to and name of the native file.
5. Make sure that no OCR text (.txt) files are used as image page files:
Occasionally, text files are mixed into the same folder with image files. Since they are all named by Bates number, that means the only difference between an OCR text file and an image page is the file extension. Therefore you need to tell Nextpoint to use only files with a certain extension as document image pages. This is easily accomplished with a column header called image_extension, containing the value tif|jpg in every row.
*other important Nextpoint load file column headers
So in order to import a set of single-page image files to Nextpoint, your load file would need to look like this. Row 1, the column headers. Rows 2-5, the coding values that correspond to each column header for the 4 documents being imported. These are the minimum, required load file commands for such a document import:
In this example, you would save the load file and place it in the same location as the IMAGES, NATIVES, TEXT folders in your data set. When you select the folder "Image_Import" as your upload, the information in the bates_start column of your load file tells Nextpoint application how to find each document image page (IMAGES), native file (NATIVE), and OCR text file (TEXT):
Important Load File Column Headers
IMAGE FIELDS | |
image_dir | The folder where the image files are located |
image_extension | The extension that the image files have, especially useful if text files are in the same folder as image files |
image_file | The name of the image file, for multipage documents |
image_range_start | The name of the first image file for the document. Used when several single page images (TIFFs & JPGs usually) comprise a document. * not necessary if pages are named by Bates number, and you have bates_start column |
image_range_end | The name of the last image file for the document. Used when several single page images (TIFFs & JPGs usually) comprise a document. * not necessary if pages are named by Bates number, and you have bates_end column |
NATIVE FILE FIELDS | |
native_dir | The folder where the native files are located |
native_file | The name of the native file |
OCR FIELDS | |
text | This column header is used when the OCR text is contained in the loadfile itself, rather than in separate text files |
text_dir | The folder where the text files are located |
text_file | The name of the text file, where a single text file contains text for all pages of the document |
text_range_start | The name of the first text file for the document. Used when a single page text file exists for each page of a document |
text_range_end | The name of the last text file for the document. Used when a single page text file exists for each page of a document |
OTHER FIELDS | |
bates_start | The Bates number of the first page of the document. Also will specify the image range start when pages are named by Bates |
bates_end | The Bates number of the last page of the document. Also will specify the image range end when pages are named by Bates |
Compiling a List of Files to Create a Load File
To create a load file, you will need a list of files to include, and if applicable, the folders that they reside in. If you don't have this information in a list, you can easily create one.
Windows
1. Right click on the Root Folder, select "Open Command Window Here".
(If your PC does not have this option, download the plugin here.)
In Windows 7, you may need to hold the Shift key to get this option.
2. A command window will appear. On the Command Line, type the following: dir /b /s /a-d >>list.txt
3. This will create a text file called "list.txt" in the root folder, that contains all the path information for each file and folder within the root folder that looks like this:
4. Open "list.txt" in a text editor and do a Find and Replace to remove the extra pathing information, and you'll have a list of all the files in the folder, and their subfolder pathing, if necessary.
5. Use this list to create a load file for your documents. Click here for information on how to create a simple load file.
Mac
1. Open the utility, Terminal.
2. Type "find " (be sure to include the space) then drag the folder you want a list from.
3. Terminal will give you a list of all the file names and pathing information for the folder, that you can copy and paste into a text editor to get in the format needed for your load file.
Comments
Article is closed for comments.