Data Mining – Uploading and Importing Data


The first step in mining your data is to upload and import it into the Data Mining tool. 

Step 1: Getting Started with Imports

  • First Time User without imported data: As a first time user, you will be immediately prompted to import your data from the Dashboard.
  • Returning User with existing data: To import new data, simply navigate to the import tab and select “New Import”.

Step 2: Naming your Import and Selecting Your Source for Data Mining

In order to import your files directly into the Data Mining project, users have the ability to add any outside S3 sources, including their Nextpoint database(s). Once a location has been added and successfully verified, the source is saved and the user will be able to access that for all future imports. Additionally, each Data Mining project comes with a Data Mining Repository pre-created for that project which can be used directly to house source data.

Name the Import and Selecting an Existing Source


    1. Name your import. This name will appear later on your import batch list, so make the name clear and unique to this import data set.
    2. Select the source of the data (your Data Mining s3 repository, a Nextpoint File Room, or an external s3 Repository). If you need to add a new source location, check out the next section "Adding a New Source Location". 
    3. Click the "Next" button.

Adding a New Source Location


Amazon s3 sources are virtual data storage locations used for housing large data sets. Your Discovery or Litigation Nextpoint File Room is an example of an s3 location. To add a non-Data Mining s3 location (like a Nextpoint File Room) to your data mining project: 

  1. Click on “Add New” at the bottom of a new import window. 
  2. Name your new s3 location (e.g. "Hoven v. Enron Discovery Database").
  3. Copy and Paste your AWS Access Key ID into the textbox below that option. In a Nextpoint database, all of these can be found in the “Settings” tab under “Import” in the "File Room" section. For more information about accessing your AWS keys and File Room Path, visit this support article.
  4. Copy and Paste your Secret Access Key into the text box below that option.
  5. Copy and Paste your File Room Path into the textbox below that option. 
  6. Click “Add” and confirm that the system was able to verify your credentials. You should see the word "Success" in green with a checkmark next to it by the new source you added. 

DM S3 Repository

If you choose to import directly from your Data Mining s3 repository, a tool tip on the import screen labeled “How do I transfer files into my Data Mining repository?” will guide you through how to pull your data into your DM repository. The required AWS Access Key ID, Secret Access Key, and File Path will be provided here for input into your external sources.


 Using these keys you can use any of the tools listed below to transfer your data into your repository. 

Source Errors

If you get an error when adding an external s3 location after adding your keys, it could be because of a CORS error. If this occurs, take the following steps to add a CORS configuration to an s3 bucket: 

  1. Sign in to the AWS Management Console and open the Amazon S3 console at
  2. In the Buckets list, choose the name of the bucket that you want to create a bucket policy for.
  3. Choose Permissions.
  4. In the Cross-origin resource sharing (CORS) section, choose Edit.
  5. In the CORS configuration editor text box, type or copy and paste a "new CORS configuration", or "edit an existing configuration": 
        "AllowedHeaders": [
        "AllowedMethods": [
        "AllowedOrigins": [
        "ExposeHeaders": [
        "MaxAgeSeconds": 3000

6. The CORS configuration is a JSON file. The text that you type in the editor must be valid JSON. For more information, see CORS configuration.

7. Choose Save changes.

Still Getting Errors?

AWS IAM policy will grant permission to list and download objects from an S3 bucket. But the following script could help you set up AWS permissions. Note - this will not work for exports, only imports.

  1. Sign in to the AWS Management Console and open the Amazon S3 console at
  2. In the Buckets list, choose the name of the bucket that you want to create a bucket policy for.
  3. Choose Permissions.
  4. In the Bucket Policy Section, choose Edit
  5. Editor text box, type or copy and paste the following, updated to include information about your bucket
  6. {
      "Version": "2012-10-17",
      "Statement": [
          "Effect": "Allow",
          "Action": "s3:GetObject",
          "Resource": "arn:aws:s3:::bucketname/*"
          "Effect": "Allow",
          "Action": "s3:ListBucket",
          "Resource": "arn:aws:s3:::bucketname"

If uploading the data yourself is not possible or you have questions about your specific situation, reach out to your client success manager for other options. 


Linking data from your file room to the Data Mining tool will create a copy of the data in the Data Mining repository. If you are uploading new data, we recommend placing it directly into your Data Mining repository. If the data has already been uploaded to your database's file room, it is fine to utilize this option for data mining. 


Step 3: Selecting Data for Import


  1. Review the selections from the previous step such as "Import Name", "Source Selected", and the data within the selected source can be seen within the table above.
  2. You have the option (not a required field) to assign a custodian(s) to the import (if applicable). At this time, custodians added to a batch are assigned to all files in the batch (so a custodian cannot be assigned to only certain parts of a batch). Custodians can be edited or added after the import completes as well (see the custodians section below for details).
  3. Select the folder or file for import. At this time only a singe folder or file is eligible for import in one batch. 
  4. Click “Import.” Import_List_Update.png
  5. The import list will show the batch as "queued" (waiting in line to start processing) and then “processing” until the batch is “complete.” In very rare occasions a batch will show as “failed” at which point you should contact the support team to identify the issue with the import. 
  6. To download a csv of your import batch list, click on “Export CSV” at the bottom of the batch list.
  7. If you click on the hamburger (3 dot) menu next to any import batch, you have the option to "Edit Import Details" or "Download CSV Error Report."


Considerations for Importing

Supported file types


Full text + Metadata Metadata only Can be Identified, not processed















































anything encrypted


Extracted Metadata
Data mining extracts the following metadata on import:
Universal Fields  File Type Based* 



































*Fields from this category are dependent on file type.
Password Protected Files
Files may be password protected.  Password protected archives prevent documents from being extracted.  Password protected individual files may prevent indexing of that file.  It is most efficient to provide any possible passwords prior to processing.

Nextpoint will assign custodians upon request. Please note that the custodian of a piece of data is not intrinsic to that data, rather it is an employee or other person or group with ownership, custody, or control over potentially relevant information. For example, an individual custodian's electronically stored information (ESI) usually includes their mail file, whereas a group custodian's ESI may include a shared network folder. Due to this, custodians cannot be assigned without direction as to how the data was collected. 

Email archives collected and combined into a single PST file with multiple folders can be split among multiple custodians after processing has been completed. Assignment of more than 10 custodians in a single import may be billed as an additional hourly charge.

To add or edit the custodian(s) assigned to an import batch, click on the 3 dots next to the import and select the option to "Edit Assigned Custodians". Then click on the "-" next to an existing custodian to remove them or click on the "+ Assign Custodian" link to add a new or existing custodian. Once you select a new custodian for the batch, click the "Assign", and their name should appear on the list of existing custodians. Click the "Save" button to add the new custodian to the data from this batch. 

add custodian settings.gif


Dates and Time zones

Documents are standardized and processed into coordinated universal time (UTC) unless otherwise requested. This time zone will be used for all date filters and to standardize any datetime metadata fields. Any time zone offset can be provided in document metadata. For example, a time zone offset from GMT that the data was processed in. For example, if the data was processed in GMT-5 this would be populated with -5.00.

Master Date

Master date of the document is the date used for filtering and date restrictions. Master date will be generated from the date sent of parent email for emails and their attachments and the last modified date for efiles. 

When applying date restrictions, the kept documents are inclusive of the chosen date (master date as described above).


We deduplicate documents based on email message ID or MD5 hash (if no email message ID is available). Any files having matching email message IDs (or MD5 hashes) will be deduplicated, only one native copy will be stored in the system, and their metadata will be merged by default. That said, documents within different document families will not be deduplicated to split up the family. So attachments with matching MD5 hashes but attached to two different emails will be retained as separate documents. Deduplication is done globally within each project, across all batches and custodians. 

Currently, this feature cannot be turned off or customized.

Upon import into the Discovery platform, Nextpoint dedupes email families and loose files globally across all custodians. To do so a MD5 hash value is generated, for emails, from Date Sent, Sender Name, Sender Email Address, Recipient Email Addresses, Display To, Display CC, Display BCC, Subject, Body, Attachment Names, Attachment Size and for loose files the bit stream of that file.

Import QC

Archives with zero extracted files or mismatched expected file count (coming soon) will be addressed on import in a quality control pass. Individual file processing and indexing errors will not be addressed, only reported upon.

Additional Options
  • Video/Audio Transcription*
  • Language Detection (will occur on all imports) and Translation*
  • Image Recognition*
  • Entity Recognition/PII*

*These services may incur additional costs. Reach out to your client success representative for details. 


Next up: Data Mining - Project Dashboard

Or view one of the other support resources in the Data Mining series:

Data Mining – Getting Started

Data Mining - Searches and Search Groups

Data Mining Search Guide

Data Mining - Exporting Reports and Data

Data Mining - Glossary

0 out of 0 found this helpful



Please sign in to leave a comment.

Articles in this section