Overview
Nextpoint reduces review time by detecting duplicate documents and removing them from the review set. Previously, this only applied to exact copies in the same file format.
Near duplicate detection extends this functionality by identifying documents with similar content, even when formats differ (for example, an email printed to PDF). The feature compares OCRed text across documents and assigns a similarity score.
When reviewing a document, you can view related documents above a defined similarity threshold, making it easier to folder or code similar materials. Additional options—such as folder-specific comparisons or master set creation— are available on a custom basis.
How to Access Near Duplicate Detection
While reviewing a document, click the Contact Us button to request near duplicate detection. Near duplicate detection is an add-on feature available for an additional cost and must be enabled per database.
After submitting the request, a Nextpoint team member will follow up to understand your needs, and our developers will enable near duplicate detection.
Viewing Near Duplicates
Once the feature is enabled, follow these steps to view near duplicates:
- Open the Related Documents tab.
- Click the Duplicates tab.
- Exact duplicates appear first, followed by near duplicates. In the Near Duplicates section, each document displays a similarity score to the right of the document pill.
- Bulk code near duplicates just as you would other related documents in this tab.
- Modify the Duplicates tab view using filters to show or hide exact duplicates and adjust the similarity score threshold for near duplicates.
If a document has many near duplicates, they are grouped by similarity score and can be added to a grid view. Links are available for similarity thresholds of 80, 85, 90, and 95. Clicking one of these links runs a search that includes the source document’s NPID and the selected similarity score floor:
near_duplicates:(89552 and similarity_score:>=95)
From the grid view, you can perform bulk actions on similar documents. You may also manually adjust the similarity score threshold directly in the search bar.
For additional information about near duplicate detection or to schedule a demo, contact support@nextpoint.com or click the Enable Near Dupe Detection button within your database.
Comments
Please sign in to leave a comment.