Data Mining – Uploading and Importing Data

Follow

The first step in mining your data is to upload and import it into the Data Mining tool. 

Step 1: Getting Started with Imports

  • First Time User: As a first time user, you will be immediately prompted to import your data from the Dashboard
  • Returning User: To import new data, simply navigate to the import tab and select “New Import”

Step 2: Naming your Import and Selecting Your Source for Data Mining

In order to import your files directly into the Data Mining project, users have the ability to add any outside S3 sources, including their Nextpoint database(s). Once a location has been added and successfully verified, the source is saved and the user will be able to access that for all future imports. Additionally, each Data Mining project comes with a Data Mining Repository pre-created for that project which can be used directly to house source data.

Name the Import and Selecting an Existing Source

Import_P1.png

    1. Name your import. This name will appear later on your import batch list, so make the name clear and unique to this import data set.
    2. Select the source of the data (your Data Mining s3 repository, a Nextpoint File Room, or an external s3 Repository). If you need to add a new source location, go back to Step 2. 
    3. Click the "Next" button.

Adding a New Source Location

Add_s3_locationFilledNEW.png

To add a non-data mining s3 location to your data mining project: 

  1. Click on “Add New” at the bottom of a new import window. 
  2. Name your new s3 location (e.g. Hoven v. Enron Discovery Database).
  3. Copy and Paste in your AWS Access Key ID.
  4. Copy and Paste in your Secret Access Key.
  5. Copy and Paste in your File Room Path. In a Nextpoint database, all of these can be found in the “Settings” tab under “Import” in the "File Room" section. For more information about accessing your AWS keys, visit this support article.
  6. Click “Add” and confirm that the system was able to verify your credentials. You should see a green “Success” with a checkmark next to it next to the new source you added. 

If you select your Data Mining s3 repository, a tool tip on the import screen labeled “How do I transfer files into my Data Mining repository?” will allow you to access the AWS Access Key ID, Secret Access Key, and File Path for that repository. Using these key you can use any of the same tools listed below to transfer your data into your repository. 

If uploading the data yourself is not possible, reach out to your client success manager for other options. 

Note

Linking data from your file room to the Data Mining tool will create a copy of the data in the Data Mining repository. If you are uploading new data, we recommend placing it directly into your Data Mining repository.

DM S3 Repository

If you choose to import directly from your Data Mining s3 repository, a tool tip on the import screen labeled “How do I transfer files into my Data Mining repository?” will guide you through how to pull your data into your DM repository. The required AWS Access Key ID, Secret Access Key, and File Path will be provided here for input into your external sources.

s3_tool_tip.png

 

Step 3: Selecting Data for Import

      Select_File_Update.png

  1. User can review the selections from the previous step such as name of import, Location selected/file path, and the data within the selected source can be seen within the table below.
  2. User has the option (not a required field) to assign a custodian(s) to the import (if applicable). At this time, custodians added to a batch are assigned to all files in the batch (so a custodian cannot be assigned to only certain parts of a batch). 
  3. Select the folder or file for import. At this time only a singe folder or file is eligible for import in one batch. 
  4. Click “Import.” Import_List_Update.png
  5. The import list will show the batch as “processing” until the batch is “complete.” If very rare occasions a batch will show as “failed” at which point you should contact the support team to identify the issue with the import. 
  6. To download a csv of your import batch list, click on “Export CSV” at the bottom of the batch list.
  7. If you click on the hamburger (3 dot) menu next to any import batch, you have the option to "Edit Import Details" or "Download CSV Error Report." The error report will 

 

Considerations for Importing

Supported file types

 

Full text + Metadata Metadata only Can be Identified, not processed

pst

zip

mbox

eml

msg

jpg

png

tiff

bmp

gif

pdf

rtf

txt

doc

docx

xls

xlsx

ppt

pptx

dat

data

csv

htm

html

mht

mhtml

xml

mp3

wav

flac

mp4

m4v

m4a

mov

mpg

ics

vcf

flv

pnm

pbm

pgm

ppm

ps

svg

emlx

mbx

anything encrypted

 

Password Protected Files
Files may be password protected.  Password protected archives prevent documents from being extracted.  Password protected individual files may prevent indexing of that file.  It is most efficient to provide any possible passwords prior to processing.
Custodians

Nextpoint will assign custodians upon request. Please note that the custodian of a piece of data is not intrinsic to that data, rather it is an employee or other person or group with ownership, custody, or control over potentially relevant information. For example, an individual custodian's electronically stored information (ESI) usually includes their mail file, whereas a group custodian's ESI may include a shared network folder. Due to this, custodians cannot be assigned without direction as to how the data was collected. 

Email archives collected and combined into a single PST file with multiple folders can be split among multiple custodians after processing has been completed. Assignment of more than 10 custodians in a single import may be billed as an additional hourly charge.

Dates and Time zones

Documents are standardized and processed into coordinated universal time (UTC) unless otherwise requested. This time zone will be used for all date filters and to standardize any datetime metadata fields. Any time zone offset can be provided in document metadata. For example, a time zone offset from GMT that the data was processed in. For example, if the data was processed in GMT-5 this would be populated with -5.00.

Master Date

Master date of the document is the date used for filtering and date restrictions. Master date will be generated from the date sent of parent email for emails and their attachments and the last modified date for efiles. 

When applying date restrictions, the kept documents are inclusive of the chosen date (master date as described above).

Deduplication

We deduplicate documents based on email message ID or MD5 hash (if no email message ID is available). Any files having matching email message IDs (or MD5 hashes) will be deduplicated, only one native copy will be stored in the system, and their metadata will be merged by default. That said, documents within different document families will not be deduplicated to split up the family. So attachments with matching MD5 hashes but attached to two different emails will be retained as separate documents. Deduplication is done globally within each project, across all batches and custodians. 

Currently, this feature cannot be turned off or customized.

Upon import into the Discovery platform, Nextpoint dedupes email families and loose files globally across all custodians. To do so a MD5 hash value is generated, for emails, from Date Sent, Sender Name, Sender Email Address, Recipient Email Addresses, Display To, Display CC, Display BCC, Subject, Body, Attachment Names, Attachment Size and for loose files the bit stream of that file.

Import QC

Archives with zero extracted files or mismatched expected file count (coming soon) will be addressed on import in a quality control pass. Individual file processing and indexing errors will not be addressed, only reported upon.

Additional Options
  • Video/Audio Transcription*
  • Language Detection (will occur on all imports) and Translation*
  • Image Recognition*
  • Entity Recognition/PII*

*These services may incur additional costs. Reach out to your client success representative for details. 

Next up: Data Mining - Project Dashboard

Or view one of the other support resources in the Data Mining series:

Data Mining – Getting Started

Data Mining - Searching and Slicing Data

 

Data Mining Search Guide

Data Mining - Exporting Reports and Data

Data Mining - Glossary

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.

Articles in this section