Topics/General Cloud Support Topics

Including OCR with an Upload

Ben Wolf January 29, 2010

***Nextpoint automatically OCR's all of your documents upon upload.

If you'd like to include existing OCR data, it can be included with your uploads through two methods; 1) as a column in a load file, or, 2) using a .map file, to match field names so that they correspond to the images. We've laid out a few scenarios and solutions below. Which method you choose depends on the nature of your OCR information.

1. Your OCR text is included in the load file

Often an export from a database such as Concordance or Summation will include the OCR text for each document as a field in the load file. In this case, you simply need to add a row in your .map file (see image below). In this example, OCR_Text is the name of the column header in your load file.

map_file_for_ocr_samples1.png

2. You have a single folder of text files whose file names match the names of your images

In this case, you will add a row in your map file to show where to find the folder that holds these text files (see image below). In this example, TEXT_FOLDER is the name of the column header in your load file. Be sure to include the actual folder name in each row of your load file.

map_file_for_ocr_samples2.png

3. You have multiple folders of text files whose file names match the names of your images

In this case, you will include fields that show the starting and ending pages of the text files, and you will add a row in your map file to show where to find those columns. In this instance, TEXT_START and TEXT_END are the names of the column headers in your load file. The information in each row of your load file should also match the Bates Start and Bates End fields.

map_file_for_ocr_samples3.png

4. You have text files that do not match the file names of your images

In this case, you will add a row in your map file to show which column contains the text file that goes with each image. So, OCR_FILES is the name of the column header in your load file. Be sure to include the actual text file name in each row of your load file.

map_file_for_ocr_samples4.png

 
 
Customer Service Software