Importing
The first step in mining your data is to upload and import it into the Nextpoint EDA tool.
Step 1: Getting Started with Imports
- First Time User without imported data: As a first time user, you will be immediately prompted to import your data from the Dashboard.
- Returning User with existing data: To import new data, simply navigate to the import tab and select “New Import”.
Step 2: Naming your Import and Selecting Your Source for Nextpoint EDA
In order to import your files directly into the Nextpoint EDA project, users have the ability to add any outside S3 sources, including their Nextpoint database(s). Once a location has been added and successfully verified, the source is saved and the user will be able to access that for all future imports. Additionally, each Nextpoint EDA project comes with a Nextpoint EDA Repository pre-created for that project which can be used directly to house source data.
Name the Import and Selecting an Existing Source
-
- Name your import. This name will appear later on your import batch list, so make the name clear and unique to this import data set.
- Select the source of the data (your Nextpoint EDA s3 repository, a Nextpoint File Room, or an external s3 Repository). If you need to add a new source location, check out the next section "Adding a New Source Location".
- Click the "Next" button.
Adding a New Source Location
Amazon s3 sources are virtual data storage locations used for housing large data sets. Your Discovery or Litigation Nextpoint File Room is an example of an s3 location. To add a non-Nextpoint EDA s3 location (like a Nextpoint File Room) to your Nextpoint EDA project:
- Click on “Add New” at the bottom of a new import window.
- Name your new s3 location (e.g. "Hoven v. Enron Discovery Database").
- Copy and Paste your AWS Access Key ID into the textbox below that option. In a Nextpoint database, all of these can be found in the “Settings” tab under “Import” in the "File Room" section. For more information about accessing your AWS keys and File Room Path, visit this support article.
- Copy and Paste your Secret Access Key into the text box below that option.
- Copy and Paste your File Room Path into the textbox below that option.
- Click “Add” and confirm that the system was able to verify your credentials. You should see the word "Success" in green with a checkmark next to it by the new source you added.
Nextpoint EDA S3 Repository
If you choose to import directly from your Nextpoint EDA s3 repository, a tool tip on the import screen labeled “How do I transfer files into my Nextpoint EDA repository?” will guide you through how to pull your data into your Nextpoint EDA repository. The required AWS Access Key ID, Secret Access Key, and File Path will be provided here for input into your external sources.
Using these keys you can use any of the tools listed below to transfer your data into your repository.
If you get an error when adding an external s3 location after adding your keys, it could be because of a CORS error. If this occurs, take the following steps to add a CORS configuration to an s3 bucket:
- Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
- In the Buckets list, choose the name of the bucket that you want to create a bucket policy for.
- Choose Permissions.
- In the Cross-origin resource sharing (CORS) section, choose Edit.
- In the CORS configuration editor text box, type or copy and paste a "new CORS configuration", or "edit an existing configuration":
[ { "AllowedHeaders": [ "authorization", "content-length", "content-md5", "content-type", "host", "origin", "x-amz-acl", "x-amz-content-sha256", "x-amz-date", "x-amz-meta-path", "x-amz-meta-qqfilename", "x-amz-security-token", "x-amz-server-side-encryption", "x-amz-user-agent", "amz-sdk-invocation-id", "amz-sdk-request", "x-amz-bucket-region", "x-amz-expected-bucket-owner" ], "AllowedMethods": [ "GET", "POST", "PUT", "HEAD" ], "AllowedOrigins": [ "*" ], "ExposeHeaders": [ "ETag" ], "MaxAgeSeconds": 3000 } ]
6. The CORS configuration is a JSON file. The text that you type in the editor must be valid JSON. For more information, see CORS configuration.
7. Choose Save changes.
Still Getting Errors?
AWS IAM policy will grant permission to list and download objects from an S3 bucket. But the following script could help you set up AWS permissions. Note - this will not work for exports, only imports.
- Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
- In the Buckets list, choose the name of the bucket that you want to create a bucket policy for.
- Choose Permissions.
- In the Bucket Policy Section, choose Edit.
- Editor text box, type or copy and paste the following, updated to include information about your bucket:
-
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::bucketname/*" }, { "Effect": "Allow", "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::bucketname" } ] }
If uploading the data yourself is not possible or you have questions about your specific situation, reach out to your client success manager for other options.
Note
Linking data from your file room to the Nextpoint EDA tool will create a copy of the data in the Nextpoint EDA repository. If you are uploading new data, we recommend placing it directly into your Nextpoint EDA repository. If the data has already been uploaded to your database's file room, it is fine to utilize this option for Nextpoint EDA.
Step 3: Selecting Data for Import
- Review the selections from the previous step such as "Import Name", "Source Selected", and the data within the selected source can be seen within the table above.
- You have the option (not a required field) to assign a custodian(s) to the import (if applicable). At this time, custodians added to a batch are assigned to all files in the batch (so a custodian cannot be assigned to only certain parts of a batch). Custodians can be edited or added after the import completes as well (see the custodians section below for details).
- Select the folder or file for import. At this time only a singe folder or file is eligible for import in one batch.
- Click “Import.”
- The import list will show the batch as "queued" (waiting in line to start processing) and then “processing” until the batch is “complete.” In very rare occasions a batch will show as “failed” at which point you should contact the support team to identify the issue with the import.
- To download a csv of your import batch list, click on “Export CSV” at the bottom of the batch list.
- If you click on the hamburger (3 dot) menu next to any import batch, you have the option to "Edit Import Details" or "Download CSV Error Report."
Considerations for Importing
Full text + Metadata | Metadata only | Can be Identified, not processed |
pst zip mbox eml msg jpg png tiff bmp gif rtf txt doc docx xls xlsx ppt pptx dat data csv htm html mht mhtml xml |
mp3 wav flac mp4 m4v m4a mov mpg |
ics vcf flv pnm pbm pgm ppm ps svg emlx mbx anything encrypted |
Universal Fields | File Type Based* |
import_path ancestry file_type file_size md5 s3_path status searchability project_id batch_id file_id family_id unique_id |
mailbox_path author content_type creation_date creator subject language email_from email_to email_cc email_bcc email_subject email_date email_content_transfer_encoding email_content_type email_in_reply_to email_message_id email_thread_index email_thread_topic has_children family_date |
Nextpoint will assign custodians upon request. Please note that the custodian of a piece of data is not intrinsic to that data, rather it is an employee or other person or group with ownership, custody, or control over potentially relevant information. For example, an individual custodian's electronically stored information (ESI) usually includes their mail file, whereas a group custodian's ESI may include a shared network folder. Due to this, custodians cannot be assigned without direction as to how the data was collected.
Email archives collected and combined into a single PST file with multiple folders can be split among multiple custodians after processing has been completed. Assignment of more than 10 custodians in a single import may be billed as an additional hourly charge.
To add or edit the custodian(s) assigned to an import batch, click on the 3 dots next to the import and select the option to "Edit Assigned Custodians". Then click on the "-" next to an existing custodian to remove them or click on the "+ Assign Custodian" link to add a new or existing custodian. Once you select a new custodian for the batch, click the "Assign", and their name should appear on the list of existing custodians. Click the "Save" button to add the new custodian to the data from this batch.
Documents are standardized and processed into coordinated universal time (UTC) unless otherwise requested. This time zone will be used for all date filters and to standardize any datetime metadata fields. Any time zone offset can be provided in document metadata. For example, a time zone offset from GMT that the data was processed in. For example, if the data was processed in GMT-5 this would be populated with -5.00.
Master date of the document is the date used for filtering and date restrictions. Master date will be generated from the date sent of parent email for emails and their attachments and the last modified date for efiles.
When applying date restrictions, the kept documents are inclusive of the chosen date (master date as described above).
We deduplicate documents based on email message ID or MD5 hash (if no email message ID is available). Any files having matching email message IDs (or MD5 hashes) will be deduplicated, only one native copy will be stored in the system, and their metadata will be merged by default. That said, documents within different document families will not be deduplicated to split up the family. So attachments with matching MD5 hashes but attached to two different emails will be retained as separate documents. Deduplication is done globally within each project, across all batches and custodians.
Currently, this feature cannot be turned off or customized.
Upon import into the Discovery platform, Nextpoint dedupes email families and loose files globally across all custodians. To do so a MD5 hash value is generated, for emails, from Date Sent, Sender Name, Sender Email Address, Recipient Email Addresses, Display To, Display CC, Display BCC, Subject, Body, Attachment Names, Attachment Size and for loose files the bit stream of that file.
Archives with zero extracted files or mismatched expected file count (coming soon) will be addressed on import in a quality control pass. Individual file processing and indexing errors will not be addressed, only reported upon.
- Video/Audio Transcription*
- Language Detection (will occur on all imports) and Translation*
- Image Recognition*
- Entity Recognition/PII*
*These services may incur additional costs. Reach out to your client success representative for details.
Next up: Nextpoint EDA - Project Dashboard
Or view one of the other support resources in the Nextpoint EDA series:
Nextpoint EDA – Getting Started
Nextpoint EDA - Searches and Search Groups