Data Mining - Glossary


Assigned Slice: The slice(s) that a specific search in the Search Table is assigned to.

Conditions: Specific parameters placed on a slice to limit the return of a search (e.g. custodians or a date range to include)

Created On: The date that a Search was created and added to the Search Table.

Doc Hit Count: The number of documents that directly have one or more search hits. In the context of a report with multiple lines of syntax, overlap may occur if a document directly hits on more than one term. 

Early Case Assessment (ECA): The process of evaluating the strengths and weaknesses of a case prior to investing substantial resources in litigation.

Early Data Assessment (EDA): a subset of ECA that involves using data analytics and advanced eDiscovery filtering techniques to understand the contents of digital data at the outset of a matter. The primary goal of an Early Data Assessment project is to conduct an initial review and identify specific documents or other pieces of evidence that establish the strength and weakness of the litigation position.

Family Count: The number of total documents included when full families (parent emails and all of their attachments) are included with the search hits. 

File Count: The number of direct hits on a search or included in a slice. This number does not include family members of hits that do not also hit on the search or were included in the slice. 

Noise Words (also known as Stop words): A list of common words that are not indexed and therefore not searchable. (e.g. “the project” would only search on the word “project”)

S3 Repository: The data storage location for Data Mining projects (exactly like the file room in a Nextpoint database). 

Search Proportion: The percentage of documents that hit directly on a line of syntax (a search). This is calculated out of the total number of searchable documents in the slice. Search proportion is also known as "inclusiveness."

Search/Searches - A search is a line of terms (words), phrases, boolean connectors, and specialized syntax used to find specific documents. Each search can also have conditions that limit the reach of that search (FOR EXAMPLE a date range or a specific set of custodians).

Slice - Groups of searches run in bulk that can also have conditions that will be applied to each search line contained within. 

Term - A single word or phrase within the syntax of a search.

Uniqueness: The number of direct documents hits where only the current term is hit and no other. 



