Near Dupe Detection



One of the most important features of Nextpoint is it's ability to reduce your review time by detecting duplicates and removing them from your review set. In the past this has only worked for exact copies of files in the exact same format. 

Now, Nextpoint has the ability to detect near duplicate documents to further support your review. Deduplication is not effective if documents change formats (like an email that is printed to PDF) or are produced. Near dupe detection uses the OCRed text of a document and compares it to other documents in the database giving each a similarity score. When viewing a document. you may then be able to view documents with similarity scores above a specific threshold. This will allow you to folder or code similar documents to aide your review. Additional near dupe detection options, like comparing only specific folders, or generating a master set are also available on a custom basis. 

Here is now Nextpoint's Near Dupe Detection works:

When reviewing a document, click on the button to "Contact Us" in order to enable near dupe detection. At this time, near dupe detection is an add-on feature available for an additional cost as needed in each of your databases.

Near Duplicate - In App Request_3.jpg

When you click the button, a Nextpoint team member will follow up with you about your need, and our developers will enable near dupe detection. 

Viewing Near Dupes

Near Dupe 1.png

To view near dupes once the feature is enabled:

1. Open the "Related Documents" tab.

2. Click on the "Duplicates" tab.

3. Exact Duplicates will appear first in this section followed by near dupes. In the "Near Dupe" section, you can view each near dupe with it's similarity score in a box to the right of the document pill. 

4. You can bulk code these near dupes just like you can bulk code other types of related documents in this tab. 

5. You can also change your duplicates tab view in the filter by either turning on/off the exact duplicate section and modifying the similarity score threshold for your near dupe section. 

Near Dupe 2.png

If many "near dupes" exist for a document, they will be categorized by similarity score and be available to add to a grid view. Notice the links in the near dupe section for similarity scores >= 80, 85, 90, and 95. Clicking on one of these links results in a search that includes the NPID number of the source document and the similarity score floor:

near_duplicates:(89552 and similiarity_score:>=95)

Near Dupe 3.png

From the grid view, you can run bulk actions on these similar documents. You can also manually adjust the similarity scores you want to view in the search bar. 

For additional information about near dupe detection or to schedule a demo, please reach out to or click on the "Enable Near Dupe Detection" button in your database. 



0 out of 0 found this helpful



Please sign in to leave a comment.

Articles in this section

See more