Logging-in and Project Setup
Why Data Mining?
As data volumes and discoverable sources continue to grow, getting a quick understanding of what is in your data before fully processing will speed up your overall review process and create less headaches during your review. Similar to how your email inbox may be set up to pre-filter through your spam, allowing you to ONLY view pertinent information – Data Mining helps attorneys eliminate the noise before they even lay eyes on their information.
Because each firm’s environment for Data Mining is unique and independent, Data Mining is highly secure and incredibly fast. We ingest data for Data Mining up to a blazing ½ to 1 terabyte per hour.
- Sign in:
- (New Users) At this time, new users to Data Mining must be added by Nextpoint. Reach out to your Client Success Director or email@example.com for more information or to get started.
- Similar to your Nextpoint database login, Data Mining uses two factor authentication. Once you type in your email address and password, you will be asked to type in a verification pin sent to your email address the first time you log in.
- Should you need to reset your password, click on the "Forgot Password" link above the password field. You will then receive an email with a link to reset your password and return to the login screen.
- Create/Select Project:
- Once you log in, you will be prompted to “Select an Existing Project” (if any exist) or “Create a New Project.” Clicking "Select an Existing Project" (if any exist) will take you to that project's dashboard.
- If you select “Create a New Project,” you will be prompted to type in the Project Name, Client Number, and Matter Number. Note: All Data will be processed using UTC time.
- Then click “Create.”
- Importing Data:
- You will then be taken to the project's dashboard where you will be prompted to “Import Data.”
You can also update settings, users, add database file rooms, and passwords in the settings tab.
The General tab allows you to edit some of your profile information, email your CS director, or change the time zone of your profile. Changing the time zone will change how the dates are viewed within the app. All data will continue to be processed in UTC time.
The S3 Locations tab allows you to view current file rooms and other S3 storage instances that this project has access to for imports and exports. You can add new locations here by clicking on the "Add New Location" button and inserting the access keys and location information in the resulting pop-up.
The location can be anything that will help you reference this location. The other information can be found for a Nextpoint database under Settings > Imports. If you have not used your s3 credentials in a Nextpoint database, you may need to reach out to firstname.lastname@example.org so that we can set your credentials.
You can also add new locations on the fly from within an import or export.
The Security tab allows you to update your password.
The Users tab allows you to view all of the users who have accessed your account and projects including the date they were added and the last accessed date. If you are your firm's Data Mining administrator (usually the first person added on a Data Mining account), you are also able to add new users in this tab by hitting the "Invite New User" button. Adding the new user's name and email address will automatically send them an email inviting them to the account. It they are a new user, they will be prompted to set up there account before accessing the project.
Or view one of the other support resources in the Data Mining series:
Data Mining – Getting Started
The dashboard gives a high level visual overview of the data that has been imported into your project. You can choose to see how your data breaks down via a specific import, or holistically with all your data. This can help you make informed decisions for future imports prior to creating searches and slices for export.
Project Dashboard Cards
Valuable aspects of the overview include breakdowns of your data within cards such as:
Imported documents are categorized as follows:
- Fully Searchable – The metadata was successfully extracted from these files and their content was OCR'd to create searchable text.
- Metadata Only – The metadata was successfully extracted from these files, but they are of a type that would not generate searchable OCR'd text, so only the metadata is searchable.
- Unsupported – These file types are not supported by Nextpoint's Data Mining app, so neither the metadata nor any text in these files will be searchable.
- Unknown – These file types are unknown/unreadable to Nextpoint's Data Mining app and could not be processed. Neither their metadata nor any text in these files will be searchable. This may also include files that have readability issues (e.g. a corrupt eml file or an encrypted spreadsheet)
Individual documents may contain multiple languages. Each document is categorized based on the "dominant" language – or most prevalent language – detected in the text.
Importing errors are categorized as follows:
- Unsupported Type – This file type is not supported by the Data Mining app and cannot be processed.
- Unsupported Size – The file size is too large to process.
- DeNIST – Computer system files and NSRL not generally user generated and therefore often not relevant to most litigation.
- Error – An error occurred while processing this file and/or extracting its metadata.
- Other – File specific issues that make them unreadable (e.g. corruption, encryption, empty files...)
The data timeline shows the frequency of documents through your data's date range. It will include custodian frequency along the timeline (if applicable).
Or view one of the other support resources in the Data Mining series:
Data Mining – Project Dashboard
Assigned Slice: The slice(s) that a specific search in the Search Table is assigned to.
Conditions: Specific parameters placed on a slice to limit the return of a search (e.g. custodians or a date range to include)
Created On: The date that a Search was created and added to the Search Table.
Doc Hit Count: The number of documents that directly have one or more search hits. In the context of a report with multiple lines of syntax, overlap may occur if a document directly hits on more than one term.
Early Case Assessment (ECA): The process of evaluating the strengths and weaknesses of a case prior to investing substantial resources in litigation.
Early Data Assessment (EDA): a subset of ECA that involves using data analytics and advanced eDiscovery filtering techniques to understand the contents of digital data at the outset of a matter. The primary goal of an Early Data Assessment project is to conduct an initial review and identify specific documents or other pieces of evidence that establish the strength and weakness of the litigation position.
Family Count: The number of total documents included when full families (parent emails and all of their attachments) are included with the search hits.
File Count: The number of direct hits on a search or included in a slice. This number does not include family members of hits that do not also hit on the search or were included in the slice.
Noise Words (also known as Stop words): A list of common words that are not indexed and therefore not searchable. (e.g. “the project” would only search on the word “project”)
S3 Repository: The data storage location for Data Mining projects (exactly like the file room in a Nextpoint database).
Search Proportion: The percentage of documents that hit directly on a line of syntax (a search). This is calculated out of the total number of searchable documents in the slice. Search proportion is also known as "inclusiveness."
Search/Searches - A search is a line of terms (words), phrases, boolean connectors, and specialized syntax used to find specific documents. Each search can also have conditions that limit the reach of that search (FOR EXAMPLE a date range or a specific set of custodians).
Slice - Groups of searches run in bulk that can also have conditions that will be applied to each search line contained within.
Term - A single word or phrase within the syntax of a search.
Uniqueness: The number of direct documents hits where only the current term is hit and no other.
View one of the other support resources in the Data Mining series: