Load Files

Follow

Topics Below: 


What is a load file? 

A load file tells the Cloud processors what documents to look for, where they are located and allows for the import of all metadata.  

The content for your load file will come from either an existing load file which you will need to convert, or by manually generating a directory listing of your files.

Modify your load file in Excel, then "Save As" a .csv file. Your load file must be named nextpoint_load_file.csv in order for Nextpoint to recognize it as the load file for your import.

The two main elements of a load file are the column headers and the coding values. The column headers correspond with document coding fields in your Nextpoint case, which need to be populated with metadata. The coding values on the load file contain the information that will be mapped (sent) to those fields for each document. Each row of a load file = 1 document in Nextpoint. 

* Spaces are not recommended in load file headers, since different character encoding can create inconsistency among text. So, where a coding field in Nextpoint has a space, the corresponding column header must have an underscore.

Load files can be very simple or quite complex, depending on the complexity of your data set. To start constructing your load file, first identify what type of data you are dealing with:

  • ESI and self-contained files - emails/PST/mailbox containers, PDFs and other file types where each file = 1 document
  • Data processed to images, including previously produced data - single-page image files (TIFF or JPG) that get assembled into documents in Nextpoint, based on boundaries defined in a load file

* see processes below

 


ESI and self-contained files

1. Which documents need to be imported, and where are they?

Nextpoint needs to know what files it is looking for, and where to find them. This is accomplished by using an image_file column header. The information in the image_file column contains the path (location) of the file, and the file name itself.  

2. What title should Nextpoint give the document?

Nextpoint needs to know what to call the document, or what value to place in the subject/title field. This is accomplished by using a title column header.

So in order to import a set of multi-page files to Nextpoint, your load file would need to look like this. Row 1, the column headers. Rows 2-7, the coding values that correspond to each column header for the 6 documents being imported. These are the minimum required load file commands for such a document import:

 

VypWQB1CUT.png

 

In this example, you would save the load file and place it in the same location as the "Evidence" folder in your data set. When you select the folder "Import" as your upload, the path information in the image_file column of your load file tells Nextpoint application where to start digging for each source file:

 3Kx8zVtocu.png

 

Beyond the required column headers illustrated above, you can add any other coding information that you would like to be mapped to your documents in Nextpoint. Remember in order to send additional coding values to your documents in Nextpoint during import, the coding fields need to be created first in your Nextpoint database. There are exceptions, or standard fields that are already built in to Nextpoint are:

  • Bates 
  • Shortcut
  • Author
  • Document Type
  • Title
  • Document Date 

* all this means is that you don't need to add these fields to your Nextpoint database - they still need to be part of your load file

Here is an example of a load file with additional coding, and the way that the first processed document (from Row 2) would look after the import:

 MCcfZWuvHf.png

 

Psv8imk6M2.png

 

3. Loading native files with a placeholder: You can also use a load file to import native files with a placeholder image. Just use the image_file column (for your placeholder image file), then include a native_file column for the native/original file. The native file will be uploaded to Nextpoint with the placeholder file as its image page:

 

uMyX0ILDHG.png

 


Data processed to images, including previously produced data

First, the data set will contain up to 3 folders (at the very least, this type of data set must contain an IMAGES folder):

IMAGES - this folder contains the document pages, each a one-page image file

TEXT - this folder contains the OCR text information, and can be either one text file per page, or one text file per document

NATIVES - this folder contains any native files that accompany each document

Image file pages will be in the .tif or .jpg format, and the files will be named by their bates numbers. If included, the OCR text and native files will be also be named by the corresponding bates numbers. Here is what a common single-page image data set looks like:

 

1.png

 

2.png

 

3.png

 

This data structure is constant among most imports of this type, so Nextpoint has built in some automation to allow for more efficient utilization of load files:

1. Which documents need to be imported, and where are they?

When single-page image files are located in a folder called IMAGES and are named by their Bates values, you only need to use the column headers bates_start and bates_end. The combination of these two headers serves two functions:

  • To tell Nextpoint which page files to use to assemble each document image (the page boundaries for each document). Nextpoint will automatically find the image files based on the Bates numbers, since they are in an IMAGES folder.
  • To assign the appropriate Bates values to each document as it is processed into Nextpoint.

 

2. What title should Nextpoint give the document?

Nextpoint needs to know what to call the document, or what value to place in the subject/title field. This is accomplished by using a title column header.

 

3. Which OCR text file corresponds to each document?

Nextpoint needs to know what text file to grab, and apply to each document image as the OCR text. This is accomplished by using a text_file column header, which contains the path to and name of the text file.

 

4. Which native file corresponds to each document (where applicable)?

Nextpoint needs to know what native file to grab, and apply to each document image that requires an accompanying native. This is accomplished by using a native_file column header, which contains the path to and name of the native file.

 

5. Make sure that no OCR text (.txt) files are used as image page files:

Occasionally, text files are mixed into the same folder with image files. Since they are all named by Bates number, that means the only difference between an OCR text file and an image page is the file extension. Therefore you need to tell Nextpoint to use only files with a certain extension as document image pages. This is easily accomplished with a column header called image_extension, containing the value tif|jpg in every row.

*other important Nextpoint load file column headers 

So in order to import a set of single-page image files to Nextpoint, your load file would need to look like this. Row 1, the column headers. Rows 2-5, the coding values that correspond to each column header for the 4 documents being imported. These are the minimum, required load file commands for such a document import:

 

4.png

 

In this example, you would save the load file and place it in the same location as the IMAGES, NATIVES, TEXT folders in your data set. When you select the folder "Image_Import" as your upload, the information in the bates_start column of your load file tells Nextpoint application how to find each document image page (IMAGES), native file (NATIVE), and OCR text file (TEXT):

 

5.png 

 


Important Load File Column Headers

IMAGE FIELDS  
 image_dir The folder where the image files are located
 image_extension The extension that the image files have, especially useful if text files are in the same folder as image files
 image_file The name of the image file, for multipage documents
 image_range_start  The name of the first image file for the document. Used when several single page images (TIFFs & JPGs usually) comprise a document. 
* not necessary if pages are named by Bates number, and you have bates_start column
 image_range_end The name of the last image file for the document. Used when several single page images (TIFFs & JPGs usually) comprise a document. 
* not necessary if pages are named by Bates number, and you have bates_end column
NATIVE FILE FIELDS  
 native_dir The folder where the native files are located
 native_file The name of the native file
OCR FIELDS  
 text This column header is used when the OCR text is contained in the loadfile itself, rather than in separate text files
 text_dir The folder where the text files are located
 text_file The name of the text file, where a single text file contains text for all pages of the document
 text_range_start The name of the first text file for the document. Used when a single page text file exists for each page of a document
 text_range_end The name of the last text file for the document. Used when a single page text file exists for each page of a document
OTHER FIELDS  
 bates_start The Bates number of the first page of the document. Also will specify the image range start when pages are named by Bates
 bates_end The Bates number of the last page of the document. Also will specify the image range end when pages are named by Bates

 

Compiling a List of Files to Create a Load File

To create a load file, you will need a list of files to include, and if applicable, the folders that they reside in. If you don't have this information in a list, you can easily create one. 

Windows


1. Right click on the Root Folder, select "Open Command Window Here"
(If your PC does not have this option, download the plugin here.)
In Windows 7, you may need to hold the Shift key to get this option.

2. A command window will appear. On the Command Line, type the following:  dir /b /s /a-d >>list.txt

3. This will create a text file called "list.txt" in the root folder, that contains all the path information for each file and folder within the root folder that looks like this:

list.text-1.jpg

4. Open "list.txt" in a text editor and do a Find and Replace to remove the extra pathing information, and you'll have a list of all the files in the folder, and their subfolder pathing, if necessary.

5. Use this list to create a load file for your documents. Click here for information on how to create a simple load file.

Mac

1. Open the utility, Terminal.

2. Type "find " (be sure to include the space) then drag the folder you want a list from.

3. Terminal will give you a list of all the file names and pathing information for the folder, that you can copy and paste into a text editor to get in the format needed for your load file.

0 out of 0 found this helpful

Comments

0 comments

Article is closed for comments.

Articles in this section