Data Mining
Reporting and Exporting
During a Data Mining project, you may need to run reports to share with the court, your team, and/or Opposing Counsel. You will likely also need to export culled data for review in your Nextpoint database. Here is how you can export from the Data Mining tool:
Exporting Reports
- Navigate to the "Export" tab in your Data Mining instance.
- You have the option to toggle between "S3 Exports" and "Reports." Select the "Reports" option.
- Click on the "Generate Report" button.
- Name the report you intend to export.
- Select the "Report Type" you would like to produce. Currently, the report options are "Import Summary Report," "Search Hit Report," and "Error Breakdown Report."
- Click the "Next" button.
- If you selected the "Search Hit Report," you will be taken to a list of the slices created for your data set. You can select one slice to populate the search hit report.
- Then click the “Generate” button.
- If you selected the "Import Summary Report" or the "Error Summary Report," you will be taken to a list of previous import batches. You can select one or more import batches to generate either of these reports.
- Then click the "Generate" button.
- Once the report finishes processing, it will appear in the Report List (it will remain greyed out while processing). You can access the report by clicking on the hyperlinked name of the report.
- From there you can download the report as a PDF.
Report Options:
- Search Term Hit Report – A report that contains all project searches to which the files in the data set are responsive. Users have the ability to isolate searches to only report back on the specific searches they want included by slicing data to include those searches. See the Sample Search Term Hit Report below.
-
Import Summary Report - A report of basic information on a list of selected import batches including data on size, total documents per batch, dates that the import batches were run and processing time. The report also includes size and file counts for the overall data set and the combined selected import batches. See the Sample Import Summary Report below.
- Error Summary Report - A list of selected batches with a summary of processing errors by type, a list of archives that failed to import, and summary of each import batch including size, document count, the number of archive errors in that import batch and the processing date and time. The report also includes size and file counts for the overall data set and the combined selected import batches. See the Sample Error Breakdown Report below.
Exporting Data
- Navigate to the "Export" tab.
- Select the “S3 Exports” Tab at the top of the page. On this page you also have access to information about all of your previous exports.
- To generate a new export, click on the “New Export” button.
- A window will appear asking you to name your export.
- You can then select the “Slices” radio button. You also have the ability to export a previous import by selecting the “Imports” radio button in the rare case that you may need to export a full set of files you imported.
- Select the slice(s) you want to export.
- Click the “Next" button.
- You will see the name you’ve chosen and the slice(s) you’ve selected to import. There you can assign the destination location of the export from a list of Nextpoint database file rooms already entered into the Data Mining platform. You can also add a new S3 location (e.g. an additional database’s file room) by selecting the “Add New Location” option and entering the AWS S3 Keys.
- Click the “Next” button.
- Select the folder to which you want to export the data set. Currently, you must create the folder/subfolder in your database’s file room prior to exporting to it (as you cannot create a folder during the export process).
- Click the "Export" button and your export will begin to run.
- The “Status” column will indicate when your import is complete. Once complete, you can click on the hamburger (3 dot) menu next to the import to view details about it (size, file count, timing, exporting user, and export location as well as the slice(s) included in the export). You may also edit the name of the export from this window. You may also download an error report about the export or delete it from your list of exports. Deleting the export from this list will NOT delete it from the location it was exported to. Once this "Status" shows as "Complete" you should have access to your data in the file room or s3 location chosen for the export.
Note
Currently, exports only include loose files and parent emails. On import into your Nextpoint database, you can import this set as "multiple files." Attachments and metadata will be extracted from the parent emails on import.
Next up: Data Mining - Glossary
Or view one of the other support resources in the Data Mining series:
Data Mining - Project Dashboard
Data Mining – Uploading and Importing Data
Data Mining - Searching and Slicing Data
Data Mining - Exporting Reports and Data
Search
Search Builder
- Building your search:
- In the search builder input field (1), you can manually enter your searches or paste from your external documentation into the field.
- In the search builder, each line item is equal to one search. If you would like to start a new search within the input field simply press enter/return on your keyboard.
- Note: We strongly suggest running searches in sets together, rather than individually when possible. This will be the most time and cost efficient way to search.
- For example it is considerably faster to run 100 searches together than it is to run them individually.
- You can also work out the syntax in any outside text editor and copy/paste them into the search builder.
- Applying conditions to your search:
- Conditions set in the builder will apply to all searches within the input field.
- Date Range (2) searches are inclusive of the input dates (so 10/29/2022 to 10/31/2022 would include 3 total days)
- Custodian (3) and File type (4) searches that include more than one value allow for any listed value to return a hit. So a custodian parameter that includes John Smith and Jane Doe only requires that John OR Jane are the custodian of the file in question.
- Conditions set in the builder will apply to all searches within the input field.
- Assigning your search to a slice:
- Before creating your search, you will need to assign your search to a slice. By selecting the “Assign to Slice” button (5), a modal will appear prompting you to either create a new slice for this set of search terms or to assign them to an existing slice.
- This is a required step that will help you group searches, and further filter your data to compare and contrast slices in order to make educated decisions on what you would like to export out of Data Mining later in your process.
- Once a slice is assigned, the "Create" button (6) will activate allowing the user to run the search.
Search Table
The search table showcases a list of all of the searches you have created within this project.
- Here you can see how each line item within the search builder (as mentioned in the previous step) appears as its own row with related data, conditions applied, and slice assigned.
- Each line item includes specific data relating to that search including file and family count, uniqueness and search proportion. See Glossary.
- "0" results means that there were no hits for that search.
- Empty results means that the search has not run yet. The user should hit the “calculate results” button to view the results of the search.
- "Error calculating results" means that an internal error occurred on this search. Users should reach out to the support team to identify the issue and possible next steps. If you would like to retry these searches, please copy them to builder, edit as needed, and run them again,
- If you select a slice in the “Assigned Slice(s)” column of the search page, you will be shown the Slice Details modal which give you more detailed insight into how your slice as well as the conditions applied, affect your overall data counts.
- You can review, compare, and contrast these slices and the data that they yield for context as to what you may want to export later on.
- Each line item includes specific data relating to that search including file and family count, uniqueness and search proportion. See Glossary.
- The search table is a list of all of the “searches," the individual sets of searches on each row of a search set. Each term includes specific information about that search on the data with those conditions and the counts will update every time your searches update. New searches will only calculate hits after clicking the "Calculate Results" button (2).
- The "Calculate Results" button refreshes both new and old searches with updated hit counts based on all documents currently in the database. If you have multiple searches (or even multiple slices) to run you should add them all to the search table before clicking the "Calculate Results" button.
- If you click on a slice in the “Assigned Slice(s)” column of the search page, you will be shown the details of this slice including conditions applied, total file and family counts, and counts for each of the terms included in the slice.
- The "Copy to Builder" button will copy selected terms to the builder where they can be edited and rerun with modified conditions. This button will only be active when one or more terms is selected.
- The "Export CSV" button will generate a CSV file with all of the search terms and count data from the chart. This button is only active when no specific terms are selected as it will return data for the entire chart.
- Searches refresh after new searches run, new files are imported, and whenever the "Calculate Results" button is pushed. The "Last Updated" date/time lets the user know the last time that the searches were updated.
Next up: Data Mining Search Guide
Or view one of the other support resources in the Data Mining series:
Data Mining - Project Dashboard
Data Mining – Uploading and Importing Data
Data Mining - Searching and Slicing Data
Data mining uses a powerful search syntax called dtSearch. There are differences between dtSearch and the search syntax employed in Nextpoint databases, so some translation may be required.
Documents are searchable with scans after processing is completed. Consultation on terms and syntax is available for an additional hourly charge.
Like in a Nextpoint database, Data Mining uses boolean searching for text searches. A "boolean" search request consists of a group of words or phrases linked by connectors such as AND and OR that indicate the relationship between them.
Examples:
Search Request |
Meaning |
apple and pear |
both words must be present |
apple or pear |
either word can be present |
apple w/5 pear |
"apple" must occur within 5 words of "pear" |
apple not w/12 pear |
"apple" must occur, but not within 12 words of "pear" |
apple and not pear |
"apple" must be present and "pear" cannot be present. |
name contains smith |
the field name must contain smith |
apple w/5 xfirstword |
apple must occur in the first five words of the document |
apple w/5 xlastword |
apple must occur in the last five words of the document |
Warning
Exact phrases should be off set by quotation marks.
"test phrase" OR single OR word
If you use more than one connector (and, or, contains, etc.), you should use parentheses to indicate precisely what you want to search for. For example, apple and pear or orange could mean (apple and pear) or orange, or it could mean apple and (pear or orange). For best results, always enclose expressions with connectors in parenthesis. Example:
(apple and pear) or (name contains smith)
Search terms may include the following special characters:
Character |
Meaning |
? |
matches any character |
= |
matches any single digit |
* |
matches any number of characters |
% |
|
# |
|
~ |
|
& |
|
~~ |
|
## |
Fuzzy Searching
Fuzzy searching will find a word even if it is misspelled. For example, a fuzzy search for apple will find appple. Fuzzy searching can be useful when you are searching text that may contain typographical errors (such as emails), or for text that has been scanned using optical character recognition (OCR).
Add fuzziness selectively using the % character. The number of % characters you add determines the number of differences dtSearch will ignore when searching for a word. The position of the % characters determines how many letters at the start of the word have to match exactly. Examples:
ba%nana
Word must begin with ba and have at most one difference between it and banana.
b%%anana
Word must begin with b and have at most two differences between it and banana.
Phonic Searching
Phonic searching looks for a word that sounds like the word you are searching for and begins with the same letter. For example, a phonic search for Smith will also find Smithe and Smythe.
To ask dtSearch to search for a word phonically, put a # in front of the word in your search request. Examples:
#smith
#johnson
Stemming
Stemming extends a search to cover grammatical variations on a word. For example, a search for fish would also find fishing. A search for applied would also find applying, applies, and apply.
To add stemming selectively, add a ~ at the end of words that you want stemmed in a search. Example: apply~
The stemming rules included with dtSearch are designed to work with the English language.
Synonym Searching
Synonym searching finds synonyms of a word that you include in a search request. For example, a search for fast would also find quickly. You can enable synonym searching selectively by adding the & character after certain words in your request. Example:
improve& w/5 search
Numeric Range Searching
A numeric range search is a search for any numbers that fall within a specified range. To add a numeric range component to a search request, enter the upper and lower bounds of the search separated by ~~ like this:
apple w/5 12~~17
This request would find any document containing apple within 5 words of a number between 12 and 17.
Notes
- A numeric range search includes the upper and lower bounds (so 12 and 17 would be retrieved in the above example).
- Numeric range searches only work with integers greater than or equal to zero, and less than 2,147,483,648
- For purposes of numeric range searching, decimal points and commas are treated as spaces and minus signs are ignored. For example, -123,456.78 would be interpreted as: 123 456 78 (three numbers).
Regular Expressions
Regular expression searching provides a way to search for advanced combinations of characters. A regular expression included in a search request must be quoted and must begin with ##.
Examples:
Apple and "##199[0-9]"
This would hit on a file containing the word "Apple" and the number 1994 (or 1990, 1991...1999).
Apple and "##19[0-9]+"
This would hit on a file containing the word "Apple" and the number 194 (or 1964 or 1983302002...).
Special characters in a regular expression are:
Regular expression |
Effect |
. (period) |
Matches any single character. Example: "sampl." would match "sample" or "samplZ" |
\ |
Treat next character literally. Example: in "\$100", the \ indicates that the pattern is "$100", not end-of-line ($) followed by "100" |
[abc] |
Brackets indicate a set of characters, one of which must be present. For example, "sampl[ae]" would match "sample" or "sampla", but not "samplx" |
[a-z] |
Inside brackets, a dash indicates a range of characters. For example, "[a-z]" matches any single lower-case letter. |
[^a-z] |
Indicates any character except the ones in the bracketed range. |
.* (period, asterisk) |
An asterisk means "0 or more" of something, so .* would match any string of characters, or nothing |
.+ (period, plus) |
A plus means "1 or more" of something, so .+ would match any string of at least one character |
[a-z]+ |
Any sequence of one or more lower-case letters. |
Limitations
- A regular expression must match a single whole word. For example, a search for "##app.*ie" would not find "apple pie".
- Only letters and numbers are searchable. Characters that are not indexed as letters are not searchable even using regular expressions, because the index does not contain any information about them.
- Because the dtSearch index does not store information about line breaks, searches that include begining-of-line or end-of-line regular expression criteria (^ and $) will not work.
- No case or other conversion is done on regular expressions, so a regular expression must match the case of the information stored in the index. If an index is case-insensitive, all letters in the regular expression must be lower-case. If a character is not searchable in the index, then it cannot be included as a searchable character in the regular expression. Non-searchable characters in a regular expression are not ignored as they are in other search expressions.
Performance
A regular expression is like the * wildcard character in its effect on search speed: the closer to the front of a word the expression is, the more it will slow searching. "appl.*" will be nearly as fast as "apple", while ".*pple" will be much slower.
Searching for numbers
The = wildcard, which matches a single digit, is faster than regular expressions for matching patterns of numbers. For example, to search for a social security number, you could use "=== == ====" instead of the equivalent regular expression.
For additional information about dtSearch syntax, review the following documentation (from which this search guide was adapted): https://support.dtsearch.com/webhelp/dtsearch/search_requests_overview.htm
Next up: Data Mining - Exporting Reports and Data
Or view one of the other support resources in the Data Mining series:
Data Mining - Project Dashboard
Data Mining – Uploading and Importing Data
Data Mining Search Guide
Importing
The first step in mining your data is to upload and import it into the Data Mining tool.
Step 1: Getting Started with Imports
- First Time User without imported data: As a first time user, you will be immediately prompted to import your data from the Dashboard.
- Returning User with existing data: To import new data, simply navigate to the import tab and select “New Import”.
Step 2: Naming your Import and Selecting Your Source for Data Mining
In order to import your files directly into the Data Mining project, users have the ability to add any outside S3 sources, including their Nextpoint database(s). Once a location has been added and successfully verified, the source is saved and the user will be able to access that for all future imports. Additionally, each Data Mining project comes with a Data Mining Repository pre-created for that project which can be used directly to house source data.
Name the Import and Selecting an Existing Source
-
- Name your import. This name will appear later on your import batch list, so make the name clear and unique to this import data set.
- Select the source of the data (your Data Mining s3 repository, a Nextpoint File Room, or an external s3 Repository). If you need to add a new source location, check out the next section "Adding a New Source Location".
- Click the "Next" button.
Adding a New Source Location
Amazon s3 sources are virtual data storage locations used for housing large data sets. Your Discovery or Litigation Nextpoint File Room is an example of an s3 location. To add a non-Data Mining s3 location (like a Nextpoint File Room) to your data mining project:
- Click on “Add New” at the bottom of a new import window.
- Name your new s3 location (e.g. "Hoven v. Enron Discovery Database").
- Copy and Paste your AWS Access Key ID into the textbox below that option. In a Nextpoint database, all of these can be found in the “Settings” tab under “Import” in the "File Room" section. For more information about accessing your AWS keys and File Room Path, visit this support article.
- Copy and Paste your Secret Access Key into the text box below that option.
- Copy and Paste your File Room Path into the textbox below that option.
- Click “Add” and confirm that the system was able to verify your credentials. You should see the word "Success" in green with a checkmark next to it by the new source you added.
DM S3 Repository
If you choose to import directly from your Data Mining s3 repository, a tool tip on the import screen labeled “How do I transfer files into my Data Mining repository?” will guide you through how to pull your data into your DM repository. The required AWS Access Key ID, Secret Access Key, and File Path will be provided here for input into your external sources.
Using these keys you can use any of the tools listed below to transfer your data into your repository.
If you get an error when adding an external s3 location after adding your keys, it could be because of a CORS error. If this occurs, take the following steps to add a CORS configuration to an s3 bucket:
- Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
- In the Buckets list, choose the name of the bucket that you want to create a bucket policy for.
- Choose Permissions.
- In the Cross-origin resource sharing (CORS) section, choose Edit.
- In the CORS configuration editor text box, type or copy and paste a "new CORS configuration", or "edit an existing configuration":
[ { "AllowedHeaders": [ "authorization", "content-length", "content-md5", "content-type", "host", "origin", "x-amz-acl", "x-amz-content-sha256", "x-amz-date", "x-amz-meta-path", "x-amz-meta-qqfilename", "x-amz-security-token", "x-amz-server-side-encryption", "x-amz-user-agent", "amz-sdk-invocation-id", "amz-sdk-request", "x-amz-bucket-region", "x-amz-expected-bucket-owner" ], "AllowedMethods": [ "GET", "POST", "PUT", "HEAD" ], "AllowedOrigins": [ "*" ], "ExposeHeaders": [ "ETag" ], "MaxAgeSeconds": 3000 } ]
6. The CORS configuration is a JSON file. The text that you type in the editor must be valid JSON. For more information, see CORS configuration.
7. Choose Save changes.
Still Getting Errors?
AWS IAM policy will grant permission to list and download objects from an S3 bucket. But the following script could help you set up AWS permissions. Note - this will not work for exports, only imports.
- Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
- In the Buckets list, choose the name of the bucket that you want to create a bucket policy for.
- Choose Permissions.
- In the Bucket Policy Section, choose Edit.
- Editor text box, type or copy and paste the following, updated to include information about your bucket:
-
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::bucketname/*" }, { "Effect": "Allow", "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::bucketname" } ] }
If uploading the data yourself is not possible or you have questions about your specific situation, reach out to your client success manager for other options.
Note
Linking data from your file room to the Data Mining tool will create a copy of the data in the Data Mining repository. If you are uploading new data, we recommend placing it directly into your Data Mining repository. If the data has already been uploaded to your database's file room, it is fine to utilize this option for data mining.
Step 3: Selecting Data for Import
- Review the selections from the previous step such as "Import Name", "Source Selected", and the data within the selected source can be seen within the table above.
- You have the option (not a required field) to assign a custodian(s) to the import (if applicable). At this time, custodians added to a batch are assigned to all files in the batch (so a custodian cannot be assigned to only certain parts of a batch). Custodians can be edited or added after the import completes as well (see the custodians section below for details).
- Select the folder or file for import. At this time only a singe folder or file is eligible for import in one batch.
- Click “Import.”
- The import list will show the batch as "queued" (waiting in line to start processing) and then “processing” until the batch is “complete.” In very rare occasions a batch will show as “failed” at which point you should contact the support team to identify the issue with the import.
- To download a csv of your import batch list, click on “Export CSV” at the bottom of the batch list.
- If you click on the hamburger (3 dot) menu next to any import batch, you have the option to "Edit Import Details" or "Download CSV Error Report."
Considerations for Importing
Full text + Metadata | Metadata only | Can be Identified, not processed |
pst zip mbox eml msg jpg png tiff bmp gif rtf txt doc docx xls xlsx ppt pptx dat data csv htm html mht mhtml xml |
mp3 wav flac mp4 m4v m4a mov mpg |
ics vcf flv pnm pbm pgm ppm ps svg emlx mbx anything encrypted |
Universal Fields | File Type Based* |
import_path ancestry file_type file_size md5 s3_path status searchability project_id batch_id file_id family_id unique_id |
mailbox_path author content_type creation_date creator subject language email_from email_to email_cc email_bcc email_subject email_date email_content_transfer_encoding email_content_type email_in_reply_to email_message_id email_thread_index email_thread_topic has_children family_date |
Nextpoint will assign custodians upon request. Please note that the custodian of a piece of data is not intrinsic to that data, rather it is an employee or other person or group with ownership, custody, or control over potentially relevant information. For example, an individual custodian's electronically stored information (ESI) usually includes their mail file, whereas a group custodian's ESI may include a shared network folder. Due to this, custodians cannot be assigned without direction as to how the data was collected.
Email archives collected and combined into a single PST file with multiple folders can be split among multiple custodians after processing has been completed. Assignment of more than 10 custodians in a single import may be billed as an additional hourly charge.
To add or edit the custodian(s) assigned to an import batch, click on the 3 dots next to the import and select the option to "Edit Assigned Custodians". Then click on the "-" next to an existing custodian to remove them or click on the "+ Assign Custodian" link to add a new or existing custodian. Once you select a new custodian for the batch, click the "Assign", and their name should appear on the list of existing custodians. Click the "Save" button to add the new custodian to the data from this batch.
Documents are standardized and processed into coordinated universal time (UTC) unless otherwise requested. This time zone will be used for all date filters and to standardize any datetime metadata fields. Any time zone offset can be provided in document metadata. For example, a time zone offset from GMT that the data was processed in. For example, if the data was processed in GMT-5 this would be populated with -5.00.
Master date of the document is the date used for filtering and date restrictions. Master date will be generated from the date sent of parent email for emails and their attachments and the last modified date for efiles.
When applying date restrictions, the kept documents are inclusive of the chosen date (master date as described above).
We deduplicate documents based on email message ID or MD5 hash (if no email message ID is available). Any files having matching email message IDs (or MD5 hashes) will be deduplicated, only one native copy will be stored in the system, and their metadata will be merged by default. That said, documents within different document families will not be deduplicated to split up the family. So attachments with matching MD5 hashes but attached to two different emails will be retained as separate documents. Deduplication is done globally within each project, across all batches and custodians.
Currently, this feature cannot be turned off or customized.
Upon import into the Discovery platform, Nextpoint dedupes email families and loose files globally across all custodians. To do so a MD5 hash value is generated, for emails, from Date Sent, Sender Name, Sender Email Address, Recipient Email Addresses, Display To, Display CC, Display BCC, Subject, Body, Attachment Names, Attachment Size and for loose files the bit stream of that file.
Archives with zero extracted files or mismatched expected file count (coming soon) will be addressed on import in a quality control pass. Individual file processing and indexing errors will not be addressed, only reported upon.
- Video/Audio Transcription*
- Language Detection (will occur on all imports) and Translation*
- Image Recognition*
- Entity Recognition/PII*
*These services may incur additional costs. Reach out to your client success representative for details.
Next up: Data Mining - Project Dashboard
Or view one of the other support resources in the Data Mining series:
Data Mining - Searching and Slicing Data
Data Mining – Uploading and Importing Data
Logging-in and Project Setup
Why Data Mining?
As data volumes and discoverable sources continue to grow, getting a quick understanding of what is in your data before fully processing will speed up your overall review process and create less headaches during your review. Similar to how your email inbox may be set up to pre-filter through your spam, allowing you to ONLY view pertinent information – Data Mining helps attorneys eliminate the noise before they even lay eyes on their information.
Because each firm’s environment for Data Mining is unique and independent, Data Mining is highly secure and incredibly fast. We ingest data for Data Mining up to a blazing ½ to 1 terabyte per hour.
Getting Started
- Sign in:
- (New Users) At this time, new users to Data Mining must be added by Nextpoint. Reach out to your Client Success Director or support@nextpoint.com for more information or to get started.
- Similar to your Nextpoint database login, Data Mining uses two factor authentication. Once you type in your email address and password, you will be asked to type in a verification pin sent to your email address the first time you log in.
- Should you need to reset your password, click on the "Forgot Password" link above the password field. You will then receive an email with a link to reset your password and return to the login screen.
- Create/Select Project:
- Once you log in, you will be prompted to “Select an Existing Project” (if any exist) or “Create a New Project.” Clicking "Select an Existing Project" (if any exist) will take you to that project's dashboard.
- If you select “Create a New Project,” you will be prompted to type in the Project Name, Client Number, and Matter Number. Note: All Data will be processed using UTC time.
- Then click “Create.”
- Once you log in, you will be prompted to “Select an Existing Project” (if any exist) or “Create a New Project.” Clicking "Select an Existing Project" (if any exist) will take you to that project's dashboard.
- Importing Data:
- You will then be taken to the project's dashboard where you will be prompted to “Import Data.”
- You will then be taken to the project's dashboard where you will be prompted to “Import Data.”
Updating Settings
You can also update settings, users, add database file rooms, and passwords in the settings tab.
The General tab allows you to edit some of your profile information, email your CS director, or change the time zone of your profile. Changing the time zone will change how the dates are viewed within the app. All data will continue to be processed in UTC time.
The S3 Locations tab allows you to view current file rooms and other S3 storage instances that this project has access to for imports and exports. You can add new locations here by clicking on the "Add New Location" button and inserting the access keys and location information in the resulting pop-up.
The location can be anything that will help you reference this location. The other information can be found for a Nextpoint database under Settings > Imports. If you have not used your s3 credentials in a Nextpoint database, you may need to reach out to support@nextpoint.com so that we can set your credentials.
You can also add new locations on the fly from within an import or export.
The Security tab allows you to update your password.
The Users tab allows you to view all of the users who have accessed your account and projects including the date they were added and the last accessed date. If you are your firm's Data Mining administrator (usually the first person added on a Data Mining account), you are also able to add new users in this tab by hitting the "Invite New User" button. Adding the new user's name and email address will automatically send them an email inviting them to the account. It they are a new user, they will be prompted to set up there account before accessing the project.
Next up: Data Mining – Uploading and Importing Data
Or view one of the other support resources in the Data Mining series:
Data Mining - Project Dashboard
Data Mining - Searching and Slicing Data
Data Mining - Exporting Reports and Data
Data Mining – Getting Started
The dashboard gives a high level visual overview of the data that has been imported into your project. You can choose to see how your data breaks down via a specific import, or holistically with all your data. This can help you make informed decisions for future imports prior to creating searches and slices for export.
Project Dashboard Cards
Valuable aspects of the overview include breakdowns of your data within cards such as:
Imported documents are categorized as follows:
- Fully Searchable – The metadata was successfully extracted from these files and their content was OCR'd to create searchable text.
- Metadata Only – The metadata was successfully extracted from these files, but they are of a type that would not generate searchable OCR'd text, so only the metadata is searchable.
- Unsupported – These file types are not supported by Nextpoint's Data Mining app, so neither the metadata nor any text in these files will be searchable.
- Unknown – These file types are unknown/unreadable to Nextpoint's Data Mining app and could not be processed. Neither their metadata nor any text in these files will be searchable. This may also include files that have readability issues (e.g. a corrupt eml file or an encrypted spreadsheet)
Individual documents may contain multiple languages. Each document is categorized based on the "dominant" language – or most prevalent language – detected in the text.
Importing errors are categorized as follows:
- Unsupported Type – This file type is not supported by the Data Mining app and cannot be processed.
- Unsupported Size – The file size is too large to process.
- DeNIST – Computer system files and NSRL not generally user generated and therefore often not relevant to most litigation.
- Error – An error occurred while processing this file and/or extracting its metadata.
- Other – File specific issues that make them unreadable (e.g. corruption, encryption, empty files...)
The data timeline shows the frequency of documents through your data's date range. It will include custodian frequency along the timeline (if applicable).
Next up: Data Mining – Uploading and Importing Data
Or view one of the other support resources in the Data Mining series:
Data Mining - Searching and Slicing Data
Data Mining – Project Dashboard
Assigned Slice: The slice(s) that a specific search in the Search Table is assigned to.
Conditions: Specific parameters placed on a slice to limit the return of a search (e.g. custodians or a date range to include)
Created On: The date that a Search was created and added to the Search Table.
Doc Hit Count: The number of documents that directly have one or more search hits. In the context of a report with multiple lines of syntax, overlap may occur if a document directly hits on more than one term.
Early Case Assessment (ECA): The process of evaluating the strengths and weaknesses of a case prior to investing substantial resources in litigation.
Early Data Assessment (EDA): a subset of ECA that involves using data analytics and advanced eDiscovery filtering techniques to understand the contents of digital data at the outset of a matter. The primary goal of an Early Data Assessment project is to conduct an initial review and identify specific documents or other pieces of evidence that establish the strength and weakness of the litigation position.
Family Count: The number of total documents included when full families (parent emails and all of their attachments) are included with the search hits.
File Count: The number of direct hits on a search or included in a slice. This number does not include family members of hits that do not also hit on the search or were included in the slice.
Noise Words (also known as Stop words): A list of common words that are not indexed and therefore not searchable. (e.g. “the project” would only search on the word “project”)
S3 Repository: The data storage location for Data Mining projects (exactly like the file room in a Nextpoint database).
Search Proportion: The percentage of documents that hit directly on a line of syntax (a search). This is calculated out of the total number of searchable documents in the slice. Search proportion is also known as "inclusiveness."
Search/Searches - A search is a line of terms (words), phrases, boolean connectors, and specialized syntax used to find specific documents. Each search can also have conditions that limit the reach of that search (FOR EXAMPLE a date range or a specific set of custodians).
Slice - Groups of searches run in bulk that can also have conditions that will be applied to each search line contained within.
Term - A single word or phrase within the syntax of a search.
Uniqueness: The number of direct documents hits where only the current term is hit and no other.
View one of the other support resources in the Data Mining series:
Data Mining - Project Dashboard
Data Mining – Uploading and Importing Data
Data Mining - Searching and Slicing Data
Data Mining - Exporting Reports and Data