FREQUENTLY ASKED QUESTIONS
NEAR DUPE
How does our Near Dupe Algorithm work?
Our near-duplicate identification tool is like a highly intelligent filter for your documents. It's designed to sift through vast amounts of content and spot documents that are almost the same but not quite – think of it as finding twins in a crowd.
Smart Scanning with Shingling: Imagine each document is turned into a unique pattern or fingerprint. Our tool creates these fingerprints in such a way that similar documents have similar patterns.
Intelligent Matching with Jaccard Similarity: Just like matching fingerprints, our tool compares these patterns to find matches. It's smart enough to know that documents don't need to be exactly the same to be considered a match; they just need to be very close.
Accuracy Meets Speed: Our technology ensures that this matching is done quickly and accurately, so you don't have to worry about duplicates cluttering your system or missing out on important, unique documents.
Customizable: We know that different cases have different needs. That's why our tool lets you decide how similar documents need to be in order to be considered duplicates.
With Nextpoint’s near-duplicate feature, your document review will be cleaner, more organized, and more efficient, saving time and money. ________________________________________________________________________________________________________________
How are scores calculated? What do they mean?
We score each pair of documents based on their Jaccard similarity and apply a threshold to determine if they are near-duplicates. This threshold can be adjusted based on the desired sensitivity of the duplication detection.
________________________________________________________________________________________________________________
What is considered a Standard Near Duplicate Analysis? What makes it more custom?
A Standard Near Duplicate Analysis involves comparing every document to every other document currently in the database and identifying those that are not identical but share a high degree of similarity. Then, at the document-level, we will cluster those which are similar to the current document view and provide a similarity score for each.
Aspects which may make a Near Duplicate analysis more custom include, but are not limited to:
- Comparison of two particular data sets
- E.g. folder A to folder B where one document set is the “master”
- Reviewing looking from bulk perspective vs. doc by doc
- E.g. I want to see clusters of documents in my grid view
- Those targeted at minimizing your review (eliminating or setting aside documents)
- E.g. “I don’t want to review documents from this document set produced to me if I’ve already reviewed them in my own native, client-data.
Comments
Please sign in to leave a comment.