Nextpoint EDA – Uploading and Importing Data

Follow

The first step in mining your data is to upload and import it into the Nextpoint EDA tool. 

Step 1: Getting Started with Imports

  • First Time User without imported data: As a first time user, you will be immediately prompted to import your data from the Dashboard.
  • Returning User with existing data: To import new data, simply navigate to the import tab and select “New Import”.

Step 2: Naming your Import and Selecting Your Source for Nextpoint EDA

In order to import your files directly into the Nextpoint EDA project, users have the ability to add any outside S3 sources, including their Nextpoint database(s). Once a location has been added and successfully verified, the source is saved and the user will be able to access that for all future imports. Additionally, each Nextpoint EDA project comes with a Nextpoint EDA Repository pre-created for that project which can be used directly to house source data.

Name the Import and Selecting an Existing Source

Import_P1.png

    1. Name your import. This name will appear later on your import batch list, so make the name clear and unique to this import data set.
    2. Select the source of the data (your Nextpoint EDA s3 repository, a Nextpoint File Room, or an external s3 Repository). If you need to add a new source location, check out the next section "Adding a New Source Location". 
    3. Click the "Next" button.

Adding a New Source Location

Add_s3_locationFilledNEW.png

Amazon s3 sources are virtual data storage locations used for housing large data sets. Your Discovery or Litigation Nextpoint File Room is an example of an s3 location. To add a non-Nextpoint EDA s3 location (like a Nextpoint File Room) to your Nextpoint EDA project: 

  1. Click on “Add New” at the bottom of a new import window. 
  2. Name your new s3 location (e.g. "Hoven v. Enron Discovery Database").
  3. Copy and Paste your AWS Access Key ID into the textbox below that option. In a Nextpoint database, all of these can be found in the “Settings” tab under “Import” in the "File Room" section. For more information about accessing your AWS keys and File Room Path, visit this support article.
  4. Copy and Paste your Secret Access Key into the text box below that option.
  5. Copy and Paste your File Room Path into the textbox below that option. 
  6. Click “Add” and confirm that the system was able to verify your credentials. You should see the word "Success" in green with a checkmark next to it by the new source you added. 

Nextpoint EDA S3 Repository

If you choose to import directly from your Nextpoint EDA s3 repository, a tool tip on the import screen labeled “How do I transfer files into my Nextpoint EDA repository?” will guide you through how to pull your data into your Nextpoint EDA repository. The required AWS Access Key ID, Secret Access Key, and File Path will be provided here for input into your external sources.

s3_tool_tip.png

 Using these keys you can use any of the tools listed below to transfer your data into your repository. 

Source Errors

If you get an error when adding an external s3 location after adding your keys, it could be because of a CORS error. If this occurs, take the following steps to add a CORS configuration to an s3 bucket: 

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
  2. In the Buckets list, choose the name of the bucket that you want to create a bucket policy for.
  3. Choose Permissions.
  4. In the Cross-origin resource sharing (CORS) section, choose Edit.
  5. In the CORS configuration editor text box, type or copy and paste a "new CORS configuration", or "edit an existing configuration": 
[
    {
        "AllowedHeaders": [
            "authorization",
            "content-length",
            "content-md5",
            "content-type",
            "host",
            "origin",
            "x-amz-acl",
            "x-amz-content-sha256",
            "x-amz-date",
            "x-amz-meta-path",
            "x-amz-meta-qqfilename",
            "x-amz-security-token",
            "x-amz-server-side-encryption",
            "x-amz-user-agent",
            "amz-sdk-invocation-id",
            "amz-sdk-request",
            "x-amz-bucket-region",
            "x-amz-expected-bucket-owner"
        ],
        "AllowedMethods": [
            "GET",
            "POST",
            "PUT",
            "HEAD"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [
            "ETag"
        ],
        "MaxAgeSeconds": 3000
    }
]

6. The CORS configuration is a JSON file. The text that you type in the editor must be valid JSON. For more information, see CORS configuration.

7. Choose Save changes.

Still Getting Errors?

AWS IAM policy will grant permission to list and download objects from an S3 bucket. But the following script could help you set up AWS permissions. Note - this will not work for exports, only imports.

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
  2. In the Buckets list, choose the name of the bucket that you want to create a bucket policy for.
  3. Choose Permissions.
  4. In the Bucket Policy Section, choose Edit
  5. Editor text box, type or copy and paste the following, updated to include information about your bucket
  6. {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "s3:GetObject",
          "Resource": "arn:aws:s3:::bucketname/*"
        },
        {
          "Effect": "Allow",
          "Action": "s3:ListBucket",
          "Resource": "arn:aws:s3:::bucketname"
        }
      ]
    }

If uploading the data yourself is not possible or you have questions about your specific situation, reach out to your client success manager for other options. 

Note

Linking data from your file room to the Nextpoint EDA tool will create a copy of the data in the Nextpoint EDA repository. If you are uploading new data, we recommend placing it directly into your Nextpoint EDA repository. If the data has already been uploaded to your database's file room, it is fine to utilize this option for Nextpoint EDA

 

Step 3: Selecting Data for Import

      Select_File_Update.png

  1. Review the selections from the previous step such as "Import Name", "Source Selected", and the data within the selected source can be seen within the table above.
  2. You have the option (not a required field) to assign a custodian(s) to the import (if applicable). At this time, custodians added to a batch are assigned to all files in the batch (so a custodian cannot be assigned to only certain parts of a batch). Custodians can be edited or added after the import completes as well (see the custodians section below for details).
  3. Select the folder or file for import. At this time only a singe folder or file is eligible for import in one batch. 
  4. Click “Import.” Import_List_Update.png
  5. The import list will show the batch as "queued" (waiting in line to start processing) and then “processing” until the batch is “complete.” In very rare occasions a batch will show as “failed” at which point you should contact the support team to identify the issue with the import. 
  6. To download a csv of your import batch list, click on “Export CSV” at the bottom of the batch list.
  7. If you click on the hamburger (3 dot) menu next to any import batch, you have the option to "Edit Import Details" or "Download CSV Error Report."

 

Considerations for Importing

Supported file types

 

Full text + Metadata Metadata only Can be Identified, not processed

pst

zip

mbox

eml

msg

jpg

png

tiff

bmp

gif

pdf

rtf

txt

doc

docx

xls

xlsx

ppt

pptx

dat

data

csv

htm

html

mht

mhtml

xml

mp3

wav

flac

mp4

m4v

m4a

mov

mpg

ics

vcf

flv

pnm

pbm

pgm

ppm

ps

svg

emlx

mbx

anything encrypted

 

Extracted Metadata
Nextpoint EDA extracts the following metadata on import:
Universal Fields  File Type Based* 

import_path

ancestry

file_type

file_size

md5

s3_path

status

searchability

project_id

batch_id

file_id

family_id

unique_id

mailbox_path

author

content_type

creation_date

creator

subject

language

email_from

email_to

email_cc

email_bcc

email_subject

email_date

email_content_transfer_encoding

email_content_type

email_in_reply_to

email_message_id

email_thread_index

email_thread_topic

has_children

family_date

*Fields from this category are dependent on file type.
Password Protected Files
Files may be password protected.  Password protected archives prevent documents from being extracted.  Password protected individual files may prevent indexing of that file.  It is most efficient to provide any possible passwords prior to processing.
Custodians

Nextpoint will assign custodians upon request. Please note that the custodian of a piece of data is not intrinsic to that data, rather it is an employee or other person or group with ownership, custody, or control over potentially relevant information. For example, an individual custodian's electronically stored information (ESI) usually includes their mail file, whereas a group custodian's ESI may include a shared network folder. Due to this, custodians cannot be assigned without direction as to how the data was collected. 

Email archives collected and combined into a single PST file with multiple folders can be split among multiple custodians after processing has been completed. Assignment of more than 10 custodians in a single import may be billed as an additional hourly charge.

To add or edit the custodian(s) assigned to an import batch, click on the 3 dots next to the import and select the option to "Edit Assigned Custodians". Then click on the "-" next to an existing custodian to remove them or click on the "+ Assign Custodian" link to add a new or existing custodian. Once you select a new custodian for the batch, click the "Assign", and their name should appear on the list of existing custodians. Click the "Save" button to add the new custodian to the data from this batch. 

add custodian settings.gif

 

Dates and Time zones

Documents are standardized and processed into coordinated universal time (UTC) unless otherwise requested. This time zone will be used for all date filters and to standardize any datetime metadata fields. Any time zone offset can be provided in document metadata. For example, a time zone offset from GMT that the data was processed in. For example, if the data was processed in GMT-5 this would be populated with -5.00.

Master Date

Master date of the document is the date used for filtering and date restrictions. Master date will be generated from the date sent of parent email for emails and their attachments and the last modified date for efiles. 

When applying date restrictions, the kept documents are inclusive of the chosen date (master date as described above).

Deduplication

We deduplicate documents based on email message ID or MD5 hash (if no email message ID is available). Any files having matching email message IDs (or MD5 hashes) will be deduplicated, only one native copy will be stored in the system, and their metadata will be merged by default. That said, documents within different document families will not be deduplicated to split up the family. So attachments with matching MD5 hashes but attached to two different emails will be retained as separate documents. Deduplication is done globally within each project, across all batches and custodians. 

Currently, this feature cannot be turned off or customized.

Upon import into the Discovery platform, Nextpoint dedupes email families and loose files globally across all custodians. To do so a MD5 hash value is generated, for emails, from Date Sent, Sender Name, Sender Email Address, Recipient Email Addresses, Display To, Display CC, Display BCC, Subject, Body, Attachment Names, Attachment Size and for loose files the bit stream of that file.

Import QC

Archives with zero extracted files or mismatched expected file count (coming soon) will be addressed on import in a quality control pass. Individual file processing and indexing errors will not be addressed, only reported upon.

Additional Options
  • Video/Audio Transcription*
  • Language Detection (will occur on all imports) and Translation*
  • Image Recognition*
  • Entity Recognition/PII*

*These services may incur additional costs. Reach out to your client success representative for details. 

 

Next up: Nextpoint EDA - Project Dashboard

Or view one of the other support resources in the Nextpoint EDA series:

Nextpoint EDA – Getting Started

Nextpoint EDA - Searches and Search Groups

Nextpoint EDA - Search Guide

Nextpoint EDA - Exporting Reports and Data

Nextpoint EDA - Glossary

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.

Articles in this section