Import Troubleshooting
- Near Dupe FAQ
- Common PST Questions & Answers
- Common Overlay Errors and Solutions
- FAQ: How does Nextpoint handle tracked changes on Word Documents?
- Upload & Import FAQ's
- Common Import Warnings and Solutions
- FAQ: The date printed on my document is today's date. Why?
- FAQ: Why is the Title of my Document NOT the File Name?
- FAQ: How is Document Search Text Obtained?
FREQUENTLY ASKED QUESTIONS
NEAR DUPE
How does our Near Dupe Algorithm work?
Our near-duplicate identification tool is like a highly intelligent filter for your documents. It's designed to sift through vast amounts of content and spot documents that are almost the same but not quite – think of it as finding twins in a crowd.
Smart Scanning with Shingling: Imagine each document is turned into a unique pattern or fingerprint. Our tool creates these fingerprints in such a way that similar documents have similar patterns.
Intelligent Matching with Jaccard Similarity: Just like matching fingerprints, our tool compares these patterns to find matches. It's smart enough to know that documents don't need to be exactly the same to be considered a match; they just need to be very close.
Accuracy Meets Speed: Our technology ensures that this matching is done quickly and accurately, so you don't have to worry about duplicates cluttering your system or missing out on important, unique documents.
Customizable: We know that different cases have different needs. That's why our tool lets you decide how similar documents need to be in order to be considered duplicates.
With Nextpoint’s near-duplicate feature, your document review will be cleaner, more organized, and more efficient, saving time and money. ________________________________________________________________________________________________________________
How are scores calculated? What do they mean?
We score each pair of documents based on their Jaccard similarity and apply a threshold to determine if they are near-duplicates. This threshold can be adjusted based on the desired sensitivity of the duplication detection.
________________________________________________________________________________________________________________
What is considered a Standard Near Duplicate Analysis? What makes it more custom?
A Standard Near Duplicate Analysis involves comparing every document to every other document currently in the database and identifying those that are not identical but share a high degree of similarity. Then, at the document-level, we will cluster those which are similar to the current document view and provide a similarity score for each.
Aspects which may make a Near Duplicate analysis more custom include, but are not limited to:
- Comparison of two particular data sets
- E.g. folder A to folder B where one document set is the “master”
- Reviewing looking from bulk perspective vs. doc by doc
- E.g. I want to see clusters of documents in my grid view
- Those targeted at minimizing your review (eliminating or setting aside documents)
- E.g. “I don’t want to review documents from this document set produced to me if I’ve already reviewed them in my own native, client-data.
Near Dupe FAQ
Why did my PST import with errors and/or warnings? Is there anything I can do to avoid them?
Because of the complexity of mailboxes and the variety of formats they can be exported from, PSTs are notoriously error prone. Some issues are unavoidable but these steps can minimize issues:
- Import PSTs one at a time.
- Look into the PST processing tab.
"Missing" errors could mean:
- We don't support the extraction of certain file types from PSTs (things like Teams Messages and PersonMetadata often show as extraction errors).
- There is some corruption in the PST that caused our processors to skip those files.
- There is an error on the PSTs index and all files were actually extracted.
To address these extraction "errors" we recommend the following:
- Download the PST
- Open it in Outlook or another compatible PST processor.
- Confirm the file count in the folder the "missing" files are from.
- Determine if the files in this folder could be relevant to your case
- Extract the missing files from that folder (or the entire folder if they cannot be individually identified)
- Import the additional individual emails into your database (with deduplication on if importing the entire folder).
We can also complete these steps for you as a billable service. Please contact support@nextpoint.com for details.
Why do several of the "emails" extracted from my PST appear blank?
Email files that appear to be blank often consist of iCalendar/vCalendar or Contact files. These files are atypical and because of their unusual formatting, we do not fully support the extraction of all of the metadata that may appear in the files themselves. To check this, click the three-dot menu from the document viewer, "view original files" and download it. If the native still appears blank after download, click into the properties of the email and you may notice mention of calendar invites or contacts.
It is possible to have a custom service project done that would convert these to a plain text file that will be imaged and can be searched, similar to what you see below. Please reach out to support@nextpoint.com for more information.
What can be done about Teams Messages that do not extract on import?
Though we do not currently extract teams messages from PST files on import. However, it is possible to extract them separately as loose files and import them into your database.
To do this, open the email file in outlook and find the folder called "TeamsMessagesData". Select all of the items and copy/paste them into a folder on your local computer. You can then upload that folder to the file room and import.
Why is the subject metadata in my grid view blank?
Sometimes emails will come in without without a subject or title field. This can happen with atypical or potentially corrupted files, but you can still click into the document and download the original file if it hasn't imaged within our system. We are working on automatically correcting these black subject fields to reflect "<no subject>" and can run a script through your import to correct this so that every document is clickable. Another workaround to address this issue is to add the Nextpoint ID to your grid view (through settings) which you can then click on to view the document in question.
Why do the PST files appear as documents in my database and why is this file "blank"?
Similar to if you imported an email with an attached zip file, Nextpoint preserves container files as their own "documents" while also extracting their contents. They are duplicative of the extracted contents and generally are NOT be produced along side their contents as they are both duplicative and could contain information that is otherwise withheld as non-responsive or privileged. These files can be removed from review folders or databases without affecting the status of their imported contents. They appear "blank" because they are simply the shells that contain other documents (which are extracted and imported).
What do errors/warnings related to PST imports mean and how should I address them?
Check out this support article for details about PST import errors and warnings.
Common PST Questions & Answers
Overlay Results Reporting
After any Overlay processing is complete, you will be notified of the processing results in two different locations.
The first, is the status within the main Overlay batch list, located via DATA > Overlays (MORE > Data > Overlays in a Litigation database). Here, the statuses will read as green Complete, yellow Complete With Errors, or red Error.
Anything marked with a Complete With Errors or Error status should be reviewed for further action.
The second location where you can gather additional details on the processing error is within the Overlay batch details page. This can be accessed by clicking on the name of the batch > the Overlay Errors tab.
Here, you will find a description of what caused the error. Outlined below is a list of the various types of errors and how to address in your Nextpoint Discovery or Litigation database.
Table of Contents
- Skipped Duplicate Document in load file
- Load file row did not match any documents in the database.
- An Unknown error occurred. Please check the load file formatting.
- Encoding error in load file. Please convert to UTF-8.
- Unable to parse the load file. Please check the formatting.
Complete with Error Messages
A Yellow Complete with Error Overlay status indicates that part, but not all, of your overlay processed as expected. The errors will need to be reviewed and addressed as necessary.
"Skipped Duplicate Document in load file."
What does this mean?
Your overlay file included the same document key more than once.
What is a document key? A document key provides Nextpoint with instructions for locating documents in your database which will be updated with information during the overlay. This will always be a Bates start number or Nextpoint Document ID.
Therefore, if your overlay file contains varying information for the same document (key), Nextpoint cannot determine which information should be prioritized. We will overlay the first instance of the document key we encounter, but the second instance will be skipped and the overlay processing will continue on.
What can be done to correct it?
Review the Overlay batch details page, and specifically the Overlay errors tab contained within.
For any document key reported as a duplicate, verify the desired data was overlaid. If so, no further action is required. If not, we recommend you prepare a paired down overlay file of only the document key which does not have the correct information and run an additional overlay.
"Load file row did not match any documents in the database."
What does this mean?
Your overlay file included a document key which does not exist in your database.
What is a document key? A document key provides Nextpoint with instructions for locating documents in your database which will be updated with information during the overlay. This will always be a Bates start number or Nextpoint Document ID.
Therefore, if the document key cannot be located during processing, it is not possible to overlay information.
What can be done to correct it?
Review the Overlay batch details page, and specifically the Overlay errors tab contained within.
For any document key reporting as not found, verify the information (spelling, number, etc...). If there was a typo error throughout the entire overlay file, we recommend you modify your overlay file and rerun. If the document key does not exist in the database, it is worth evaluating further where that information should have gone and ultimately preparing a modified overlay file which will be rerun in a separate overlay batch.
Please Note: If your Overlay batch details page reports "Load file row did not match any documents in the database. Key field: . [EMPTY ROW]", it is likely your overlay file contained what appears to be blank rows, but in fact, there are empty cells which are read by our processors and thus reported as a non-match. If you encounter the aforementioned warning, we recommend you review the full processing report as illustrated below. If the only error is the empty rows, you can ignore the warning and move forward with your review.
Error Messages
A red Error Overlay status indicates that your overlay did not process. The root cause of the error will need to be addressed and your overlay run again.
"An Unknown error occurred. Please check the load file formatting."
What does this mean?
Likely a network timeout issue, but the root cause is unidentifiable.
What can be done to correct it?
Please retry your Overlay. If doing so does not resolve the error, please contact support@nextpoint.com for further data assessment.
"Encoding error in load file. Please convert to UTF-8."
What does this mean?
If you receive this warning, Nextpoint was unable to parse your overlay file for processing due to the encoding.
What can be done to correct it?
To resolve, please save your overlay file with UTF-8 Encoding. Open your load file in a Text Editor (Sublime text shown below) > File > Save with Encoding > UTF-8. Preferred is UTF-8 without BOM, but UTF-8 with BOM should also resolve the issue.
Note: The steps to update encoding may vary depending on the text editor you are using. If not intuitive in your text editor of choice, locate instructions by searching in any browser for "how to update encoding in [insert text editor name]".
Once saved, upload to File Room in the same location where your original overlay file is located. If name the same, the correctly encoded overlay file will overwrite your initial overlay file.
"Unable to parse the load file. Please check the formatting."
What does this mean?
If you receive this warning, Nextpoint was unable to parse your overlay file. It could be due to the encoding as described in the aforementioned error, but it can also be indicative of an issue with the data being imported. It can be a number of things that cause this to happen, each unique to the data set.
What can be done to correct it?
To resolve, please first try to save your overlay file with UTF-8 Encoding and rerun your overlay. Open your load file in a Text Editor (Sublime text shown below) > File > Save with Encoding > UTF-8. Preferred is UTF-8 without BOM, but UTF-8 with BOM should also resolve the issue.
Note: The steps to update encoding may vary depending on the text editor you are using. If not intuitive in your text editor of choice, locate instructions by searching in any browser for "how to update encoding in [insert text editor name]".
Once saved, upload to File Room in the same location where your original overlay file is located. If name the same, the correctly encoded overlay file will overwrite your initial overlay file.
If the above does not resolve the error, please contact support@nextpoint.com for further data assessment.
Common Overlay Errors and Solutions
Question:
How does Nextpoint handle tracked changes and comments on Word Documents? Is there a search I can run to locate redlined documents?
Response:
Tracked changes (and comments) on Word documents are only imaged in Nextpoint if they are viewable/on at the time of import. Currently, there is not a searchable field indicative of whether a document has tracked changes.
That said, there are two ways to go about ensuring you are reviewing (and potentially producing) all tracked changes. The first, is a series of steps Nextpoint users can take to review and replace images with tracked changes in their Nextpoint database. The second incorporates the Nextpoint Engagement team for assistance. Both options are outlined below:
Nextpoint User Solution
- Search for all word documents via the search file_extension:doc*
- You may consider adding your search results to a folder via a Bulk Action for a more streamlined review of the Word Documents.
- Navigate through the various Word Documents and access the native(s) by clicking Download Original at the top right of each document.
- Once the native is downloaded, review locally in Word application with Track Changes and/or Comments enabled.
- As needed, print each Word Document to PDF.
- If image replacement is needed navigate to the applicable document in Nextpoint click Document Options View Document Pages Delete All Pages.
- Lastly, once the 'old' pages are removed click Document Options Add Pages. Drag and drop the corresponding PDF image from step 6 above and click Import Pages.
Nextpoint Engagement Team Assistance
If internal time and/or resources are not available to you, Nextpoint Engagement team is available to assist via a services request. Our team would manage the aforementioned process of re-imaging all Word Docs with tracked changes/comments and replacing in your database in a more automated fashion.
Please contact your Client Success Director if you would like to discuss further assistance.
FAQ: How does Nextpoint handle tracked changes on Word Documents?
Topics Below:
- Is there a restriction on the size of my files?
- Is there a page count limit for uploaded documents?
- Upload times seem to vary, why is that?
- Do I have to load by custodian?
- Import times seem to vary, why is that?
- How do I import LiveNote files?
- How do I import Concordance/Summation files?
- How do I import PDFs and/or TIFFs?
- What are some of the common load file formats Nextpoint supports?
- Why is there additional processing time after my upload completes?
- Why are my email times displayed in UTC (Coordinated Universal Time)?
- Can I navigate away from the upload screen before it completes?
- What is a standard import vs. extended import?
- Are there particular specifications for uploading scanned documents
Is there a restriction on the size of my files?
Although we can accept files up to 5GB, we recommend that you keep your file sizes smaller than 1GB. Keeping files more compact has advantages:
- Smaller files take less time to upload, populating your data quicker while reducing the chance of a network interruption disrupting your upload.
- For privacy and security reasons, some of our validation cannot be conducted until your file has been completely received by our systems. Avoiding extremely large files shortens the time to validation.
See our File Room & Import Best Practices Checklist linked here for further tips and tricks.
Is there a page count limit for uploaded documents?
There is NO limit. Documents for upload can be any number of pages.
Upload times seem to vary, why is that?
The largest obstacle to faster uploads is your network connection. In many cases uploads will be significantly faster at work (business lines are typically larger than at home). You may also notice a small performance boost when connected to the network by wire (vs. wireless).
* Currently, Nextpoint supports Internet Explorer version 11 or later. We highly recommend switching to the most recent version of Google Chrome, Mozilla Firefox or Microsoft Internet Explorer.
Do I have to load by custodian?
No, you are not required to load by custodian, but it is recommended to utilize the Custodian assignment feature during import. It is important to apply the custodian to an import batch so users can 1) analyze, search and isolate documents for particular custodian(s) and 2) include this information in a production export.
Import times seem to vary, why is that?
The largest obstacle to faster uploads is your network connection. In many cases, uploads will be significantly faster at work (business lines are typically larger than at home). You may also notice a small performance boost when connected to the network by wire (vs. wireless). If you experience slow speeds, please review the following linked topic on speed testing and troubleshooting.
How do I import LiveNote files?
Export your depositions as .ptf or .ptx files. Follow this link to learn how to batch load depositions.
How do I import Concordance/Summation files?
Follow this link to learn how to import documents with a load file from platforms such as Concordance or Summation.
How do I import PDFs and/or TIFFs?
- Follow this link to learn how to upload tiffs/jpgs with a load file.
- This topic covers the specific considerations for PDF imports when a load file is present.
- Loose PDFs which are not part of a document set produced to you can be imported via our Multiple Files Import Workflow.
What are some of the common load file formats Nextpoint supports?
We currently support imports from Trial Director (.oll) and Concordance (.dat/.csv) load files, as well as the EDRM XML format.
For exports, we support .oll, .dat. .csv, .dii, .lfp, the Opticon .log format, and a Summation .csv/ascii format.
You also have the option of using our services to convert load files from other formats.
Why is there additional processing time after my upload completes?
When your upload has completed, our server has received your entire file. At that point, final validation is performed before your file is cataloged and stored. During your upload, we display an estimate of how much additional time this will take (after the upload status bar is full).
Why are my email times displayed in UTC (Coordinated Universal Time) in Nextpoint, but my current time zone when I view the native?
When you import native emails into Nextpoint, you may notice that the “Email Sent” and “Email Received” metadata is displayed in UTC (Coordinated Universal Time). Coordinated Universal Time is the primary time standard by which the world regulates clocks and time. All emails will have this UTC time embedded in their contents because it is the standardized time with which all time zones can universally compare.
Relatedly, you may notice the time zone on the image of the email differs from the time zone in the metadata. This is because we convert the email dates to UTC for storing in our database as metadata, but email imaging relies on the email client specifications. That being said, you can see in the example below, the time displayed on the Email Image (with the -0500 offset) aligns with the 14:13:59 time displayed in the Email Metadata.
Email Image
Email Metadata
When a user downloads the original file of an email imaged in Nextpoint, and opens the email in an application such as Outlook or Thunderbird, the applications will parse the UTC time embedded in the original text file, and read the time stamp in the time zone of the user.
Can I navigate away from the upload screen before it completes?
When uploading to the File Room, the answer to this is No. Once an upload has started, navigating away from the upload page will cause any progress to be lost. Be patient if you are uploading a lot of files.
Along the same lines, check “Disable Session Expiration” on the login page before larger uploads to avoid Nextpoint signing you out after 30 minutes of inactivity.
If you would like to continue working, open a new browser window or tab. You can continue your work there while your file upload window continues in the background.
Once a import batch has been initiated and is queued for processing you can navigate away and importing will proceed as expected.
See our File Room & Import Best Practices Checklist linked here for further tips and tricks.
What is a standard import vs. an extended import?
Standard imports meet the following criteria:
- Documents as images with corresponding load files. Images must be named as contained in load file. Up to 3 load files per GB of data and a maximum of 25 database fields, OR
- Native files without additional coding. No load file required. Import includes custodian (if listed), folder path from received media, and document metadata.
Extended imports do not meet the above requirements and require an additional Client Success Services estimate before import.
If you have received a produced data set or have data being migrated from a different platform, we recommend reviewing the Data Planning and Advanced Imports webinar and our Ranged Image Import Instructions.
Scanning specifications for uploading documents
To make for an easy batch upload, follow these guidelines when scanning your documents.
Standard Specifications
- Logical document unitization/breaks must be captured and maintained
- Document relationships must be conveyed, including bound documents
- Relationship information must be populated in loadfile through Begattach/Endattach fields
- Maintain following “source” information, if applicable:
- Custodian;
- Box Number;
- Folder/Binder Name;
- And any other contextual information the parties involved may find useful
- All photographs, charts, graphs, and any other document where there would be a loss of integrity if the original format was not preserved, must be scanned in color. All other documents can be in black and white.
- Scan in direct size proportions (i.e., size for size)
- Scan as text reads (i.e., vertical v. horizontal)
- All covers, spines, tabs, standard language, duplicate carbons, annotations not directly on the document (i.e., Post-Its), etc. must be scanned on their own page, with a relationship indication to the document(s) it is referencing on the load file provided
- Any media found must be discussed amongst parties involved for proper protocol
- Any additional non-standard scanning metadata must be agreed to amongst parties prior to any scanning
Electronic Format
- General Considerations:
- All data must be delivered in a structured format
- All scanned collections should be converted to TIFF images, affiliated with a control number, and include fully searchable text files
- File names cannot contain embedded spaces or special characters (including the comma)
- Images:
- Black and White - 300 DPI; Group IV; Single-Page TIFF Files
- Color - JPEG
- All TIFF images must have a unique file name, correlating to the control number in the load file (I.e. Bates number)
- The number of TIFF files per folder should not exceed 500 files
- Text:
- A text path field must be included in the load file, providing the path and name of the extracted text file corresponding to each document
- Each text file must be named by the same control number as the image file it corresponds to
- Do not include the actual text in the load file
- The number of text files per folder should not exceed 1,000 files
- Load File:
- File Format: CSV or DAT accepted
- First line of loadfile must be a header row, identifying each field name provided
- Date fields must be provided in the following format: mm/dd/yyyy
Upload & Import FAQ's
Error Messages:
- "Incomplete PST file import/extraction."
- "Unable to extract unprocessable files from PST."
- "Email format was downgraded to plain text."
- "Extracted page count didn't match expectation."
- "File was too large to process."
- "Skipped document with invalid page range."
- "Skipped loadfile line for missing file."
- "Truncated data in..."
- "Unable to create document."
- "Unable to create page."
- "Your import reached an unexpected error."
- "Encoding error in load file, please convert to UTF-8."
- "Filtered duplicate file."
"Incomplete PST file import/extraction."
Same error as "Unable to extract unprocessable files from PST."
What does this mean?
Nextpoint is able to process most PST files. If you have received a PST Error in your Batch Report, there is corruption within the PST and at least one files has not been extracted correctly. The PST file most likely needs to be repaired and uploaded again.
What can be done to correct it?
To repair a PST you can follow this short tutorial from Microsoft: How to repair your Outlook PST. Please make sure to make a backup copy of the original PST file before attempting repair.
If the unextracted files within the PST have occurred in locations that are of no consequence to your review, (e.g. Calendar, Tasks, etc.) you may choose to ignore the errors and proceed. We urge you to please review the errors carefully before continuing.
If you need additional assistance, please contact Nextpoint support at support@nextpoint.com.
Warning: If you need Nextpoint to help repair PST errors and reprocess all the files in the PST, do not start your review until we re-import the new batch.
"Email format was downgraded to plain text."
What does this mean?
These emails contain anomalies that prevented Nextpoint from processing their HTML code normally. When this happens, Nextpoint processes the plain text version of the email instead.
Frequently, this leads to successful processing of the basic text content of these emails, but without embedded images or styling.
What can be done to correct it?
If the styling or embedded images are not important to your review, it may be OK to ignore this error. However, you should first spot-check these emails to verify that their text is intact.
If the content of these emails does not look acceptable to you, please contact our Client Success team at support@nextpoint.com to inquire about reprocessing them.
"Extracted page count didn't match expectation."
What does this mean?
Occasionally, this is not a serious issue. For example, an Excel spreadsheet may have different page counts on different computer print settings.
In more common scenarios, the document may be corrupt, password protected or too large or complex to be processed. The document was still imported into Nextpoint, but may not contain any content.
What can be done to correct it?
Download and attempt to open the native outside of Nextpoint.
- If the document opens and contains content: Try to "print to PDF" the documents pages and "Add pages" to your import so that you can review the pages in Nextpoint.
- If the document does not contain any content: Try to re-import the original source file and add those pages to the document later on.
- If the file is corrupt: Corrupt files typically report a size of 0KB or will trigger an error message when you try to open them. Sometimes files are corrupted during a transfer from one disk to another. Ask your source to re-send the file or provide a better copy.
- If the file is password protected: Unlock the file on your computer and upload a copy without password protection. Ask the custodian or your client's IT administrator for the password.
Note: If an image of the file isn't necessary, an image-exception-placeholder can also be substituted for the exhibit.
"Files was too large to process."
What does this mean?
The file size or resolution is unusually large, and it isn't practical for review software to image it.
What can be done to correct it?
Frequently, our clients decide to use a placeholder for these documents and review them in their native format by downloading them to a local computer.
"Skipped document with invalid page range."
Same error as "Skipped loadfile line for missing file."
What does this mean?
The image range start/end (or Bates start/end) did not make sense sequentially (eg: DAN00076 - DAN00043, or DAN00076 - JIM00084). It is most likely a loadfile error.
What can be done to correct it?
Correct the image range of the specified document(s) and create a new loadfile containing the impacted rows.
"Truncated data in (field name)."
What does this mean?
A field's value exceeded the character limit for that particular field and thus the text was not fully populated. This warning is applicable to any text field (recipients, cc, bcc, shortcut, custom text fields etc). Further information on the character limits for various field types is located here >>.
What can be done to correct it?
If you are importing native files and determine it is important to populate more than the character limit for any particular field, please contact your Account Director or our Services team at support@nextpoint.com for options to expand the character limit.
If you are importing produced data with a load file, and anticipate certain fields to contain a lot of text, you can avoid this by setting those fields as the “Paragraph” field type, which is less susceptible to truncation.
"Unable to create document."
What does this mean?
There was most likely a system or network error during processing.
What can be done to correct it?
Import the file again. If you get the same message, try importing the file using a loadfile that includes any desired coding information.
"Unable to create page."
What does this mean?
There was most likely a system or network error during processing. As a result, the document was imaged, but does not appear to include all its original pages.
What can be done to correct it?
Examine the loadfile for this import and locate the document with missing pages. Import these pages into Nextpoint, where they can be added to the previously imaged document and positioned accordingly.
"Your import reached an unexpected error."
If you receive a notice of error, and the screen displays "Your import reached and unexpected error", this message is indicative of an issue with the data being imported. It can be a number of things that cause this to happen, each unique to the data set.
If you receive this error with corresponding "no anomalies message", please contact support@nextpoint.com for further data assessment.
"Encoding error in load file, please convert to UTF-8."
If you receive this warning, Nextpoint was unable to parse your load file for processing due to the encoding.
To resolve, please save your load file with UTF-8 Encoding. Open your load file in a Text Editor (Sublime text shown below) > File > Save with Encoding > UTF-8. Preferred is UTF-8 without BOM, but UTF-8 with BOM should also resolve the issue.
Note: The steps to update encoding may vary depending on the text editor you are using. If not intuitive in your text editor of choice, locate instructions by searching in any browser for "how to update encoding in [insert text editor name]".
Once saved, upload to File Room in the same location where your original load file is located. If name the same, the correctly encoded load file will overwrite your initial load file.
"Filtered duplicate file."
If you receive the "Filtered duplicate file" warning, you've attempted to import duplicate files in the same import batch. Regardless of your deduplication settings, you are not able to import the same file multiple times in the same batch. The file will be imported once and any relevant metadata that is different (like file path) will be merged on import. If you need to import the same file multiple times, you will need to import it in separate batches.
Common Import Warnings and Solutions
Question:
I am viewing a document in Nextpoint, and the date shown printed on the face of the document is either a) today's date OR b) the date it was downloaded to my computer before I uploaded to Nextpoint. I downloaded the original/native file and can see the original date intact. Why is the date changing when I import to Nextpoint?
Response:
This document was more than likely set up as a word/document template that populates the image with the current date.
If you want to re-image this, you can download the original, change the date, print to pdf and re-upload the image to replace the existing image. For further information on how to replace individual pages of a document in Nextpoint, please see our help center topic linked here.
FAQ: The date printed on my document is today's date. Why?
In Fall of 2018, Nextpoint released a comprehensive export template library, coupled with a metadata fields expansion project. The addition of 15 new metadata fields means clients gained enhanced access to more refined searching and exporting, and that we are able to extract more information than ever before from your data. What this also means is that certain fields previously populated by default from one document attribute may be populated differently now. More specifically, the blue Document Link you see when you are viewing documents from your Grid View (see a full list of fields and their metadata priority list here)
One question we have received more often since the release of the aforementioned project, is "Why is the title of my document NOT the file name?". Let's take a closer look at the what and the why:
Question:
"I renamed XXXX documents with descriptions before importing to Nextpoint. Once the documents were imported, I noticed that the document link name (in the grid view) is pulling in a metadata field instead of the document name I gave it prior to importing. Can you tell me why this happens?"
Response:
The blue document link you see in your Grid View, is not actually the title of the document, rather a field which is populated based on a prioritized list of metadata attributes. The priority of this list is:
- Subject/Title
- Original File Name
- Untitled
What this means, is that if a Subject/Title is present in the underlying metadata, we will populate the blue document link with that value. If there is no Title present, we will populate with the Original File Name. If Title and Original File Name are both not present, we will populate with the value, "Untitled".
Question:
"What can I do to change this or how can I avoid this in the future?"
Response:
Solution 1
Add the Subject/Title, File Name, Shortcut, or any combination of such, to your Grid View Template. This will provide further visibility into other names you may have "given" the file(s) prior to import.
Solution 2
If you are importing with Bates in the File Name (e.g. PDFs produced to you) and they rename to something else upon import, you can create a simple directory listing, turn that information into a simple CSV load file, and then import your PDFs with that newly developed load file. This will not only allow you to maintain the PDFs named as their Bates start but also search on the Bates numbers/ranges of that data.
To create the load file, take the following steps:
-
Create a directory listing of your folder of documents to be imported.
-
Open Excel and copy/paste the directory listing information in Column A so that each document is a row in your excel (further referenced as your load file).
-
Add a row at the top and name the first column header (A1) image_file
-
Name your second column header (B1) bates_start
-
Insert formula =left(a2,8) in B2 and drag to the bottom of your table so all rows are filled.
-
Note: If your Bates has less than 8 characters, you may need to modify the number component in the above-listed formula
-
-
Save as nextpoint_load_file.csv and save in line with the directory from which the file was saved.
After save, upload to your parent folder to your Nextpoint File Room (unzipped), and import same.
Solution 3
Contact our Product Support team for extended assistance in creating the load file outlined in Solution 2 or for an ESI Protocol Consultation.
FAQ: Why is the Title of my Document NOT the File Name?
How does Nextpoint obtain a document's search text?
Nextpoint uses the Tesseract OCR engine, which is an OCR engine developed by H-P that was purchased and open-sourced by Google. If you would like further information the specifics of the Tesseract OCR engine, this Wikipedia article may be beneficial.
During processing, Nextpoint adds text to the database in three ways, and in the following order.
- First, if search text is provided alongside a load file, that text will be prioritized and mapped to each document.
- Next, if no page text is provided, but the document has embedded text that can be extracted, that text is added to the database (e.g. PDFs with embedded text).
- Lastly, if no page text can be extracted directly from the file, Nextpoint will OCR individual pages for their search text.
What does OCR mean?
OCR is short for Optical Character Recognition which is a technology used to recognize text inside of images, such as scanned documents and photos. Once the text is recognized (OCR'd), it is then editable and searchable data.
How does Nextpoint handle foreign language text?
Nextpoint supports language extraction for files with the text present in a load file and/or files with existing extracted text (#'s 1 & 2 above). When we extract, the text is available within the document electronically for us to utilize, and no OCR is required, thus allowing us to utilize text from more languages.
The following languages are supported:
- Chinese - Simplified
-
Chinese - Traditional
-
German
-
English
-
Math / equation detection
-
French
-
Italian
-
Korean
-
Dutch; Flemish
-
Spanish; Castilian
Currently, OCR is supported for English only, but Nextpoint can support additional languages on a custom basis. Please contact your Account Director or support@nextpoint.com for further information.
Need OCR for a foreign language not on the above list, or have further questions related to OCR?
Please feel free to contact our support team at support@nextpoint.com.