Import Types and Import Data Settings

Follow

Below, we take a closer look at the different Import Types and Import Data Settings components of our improved import experience.  Click here to review the complete guided import workflow.

Table of Contents

  1. Import Types
  2. Import Data Settings

Imports available to users with Advanced user permissions, only.

 


Import Types

When importing data into Nextpoint, there are four different Import Types.  Each import type has corresponding pre-set recommendations for Deduplication and DeNIST settings.

If you initiate the import process by selecting your files from the File Room, Nextpoint will detect the type of data you selected (single mailbox, loose files, or produced data with load file) and will automatically set such in Step 1 of the guided import workflow, and you will be navigated to Step 2, Import Data Settings.

If you initiate the import process by navigating to  DATA      Imports, you will be navigated to Step 1 to make your Import Type selection.  Since you have not yet selected your files for import, Nextpoint needs to know what type of data you intend to import.  After making this selection, you will meet the 'Import selected files from File Room' workflow at Step 2, Import Data Settings.

ImportWizard_Initiate_Import_Import_Types.png

 

A closer look at the different Import Types

Single Mailbox

Any single mailbox container file (pst, mbox).

Our recommended best practice is to import one mailbox at a time.  You may import multiple mailboxes at once, but your import will be recognized as a Multiple Files Import Type.  Please note, only one custodian assignment is allowed per import batch, so consider keeping mailbox imports limited to one custodian per batch, at the minimum.

Multiple Files

Any single non-mailbox file, any selection of multiple loose files (including pdfs, office files, mailboxes and archives), or any folder that does not contain a nextpoint_load_file.csv in the first level.

Production with Load File

Any folder containing a load file titled nextpoint_load_file.csv in the first level.

Our recommended best practice for production data sets is to upload to the File Room unzipped.  If you upload a zipped production, then it will be considered a Multiple Files Import Type.  

Manual

Any type of file selection.  This import type will be most applicable with the upcoming release of our load file mapper.  While we are simplifying the load file mapping process, we recognize some users may have existing workflows for produced data imports which they would like to maintain and bypass the load file mapper.

 


Import Data Settings

Once your files have been selected for import,  you will be navigated to Step 2 of the import sequence, Import Data Settings.  Here, you will verify and/or outline settings applicable to your current import. 

Import_Wizard_Import_data_settings.png

Import Data Settings include the following:

  1. Type of Import:  If you initiated your import from the File Room, verify the Type of Import selected.  To modify the import type, click the pencil icon ImportWizard_pencil.png and you will be returned to Step 1 of the sequence to make your selection.
  2. Selected Files for Import: If you initiated your import from the File Room, verify the selected files.  To modify your selection, click the folder icon ImportWizard_fileselect.png to access the pop-up file picker which is populated by the File Room contents.
  3. Batch Name: Recommended for most efficient tracking once the data has been imported. 
  4. Assign Custodian on Import: Search list of existing custodians or add new via the profile + icon ImportWizard_add_folder.png.
  5. Add to Folder on Import: Search list of existing folders or add new via the folder + icon ImportWizard_add_custodian.png.  
  6. Deduplication and DeNIST Detection: Pre-set recommendations for Deduplication and DeNIST settings will be populated based on which type of data is detected from the File Room (or selected in Step 1).

    If you would like to modify the recommended settings, make sure the applicable toggle is turned on ImportWizard_toggle.png and click the gear to open the settings pop-up.

    Continue below for a complete list of Import Types and associated Deduplication + DeNIST settings.

 

A closer look at Deduplication and DeNIST Detection

Complete list of Import Types and associated Deduplication + DeNIST settings outlined below:

Import Type Deduplication Setting DeNIST Setting
Manual Dedupe - OFF DeNIST - OFF
Single Mailbox Dedupe - ON , File Match - ON , Context - ON DeNIST - OFF
Multiple Files Dedupe - ON , File Match - ON , Context - ON DeNIST - ON , Tag-ON
Production with load file Dedupe - OFF DeNIST - OFF

 

Deduplication

Deduplication at the time of import prevents existing documents and email families from entering your database multiple times. The deduplication settings selected in the import workflow determine the definition of 'Duplicate' for the import batch at-hand. 

When deduplication is turned off at the time of import, no deduplication will occur and all files will be imported.

When deduplication is turned on, there are two criteria settings to consider:

  1. File Match Criteria: First, we look for File Match Criteria to identify potential duplicates.  This setting looks at MD5 and Email Message ID values and any matches are considered duplicates. Any matches move on to be addressed by the Context Criteria Setting.  If no match is found, the file is imported. 

    Use include File Match Criteria for more aggressive deduping. To dedupe conservatively, turn off include Email-Message-ID. Only content hash matches will be considered duplicates.
  2. Context Criteria: If a file has been flagged as a potential duplicate during processing because of matching File Match Criteria, we then consider the Context Setting.
    1. If Context is OFF, we take everything flagged as potential duplicates due to File Match, and merge field values which may conflict (e.g. file_path of file A is different than file B).  We keep the first copy of the file which entered the database and discard the other(s).
    2. If Context is ON, we take any sets of duplicates and handle field value conflicts accordingly:
      • If fields from our Conflict Field List do not match in a set of duplicates, we import both.
      • If fields from our Merge Field List do not match in a set of duplicates, we keep one copy of the file, merge the mismatched values into the respective field, and only keep the first copy of the file which entered the database. 
      • If fields from our Ignore Field List do not match in a set of duplicates, we do nothing with the fields and only keep the first copy of the file which entered the database.
Conflict Field List Merge Field List Ignore Field List
author document_last_author email_sent
bcc document_subject email_subject
created_date_time email_author last_print_date
document_author email_message_id modified_date_time
document_date email_reply_id  

Note: All deduplication is considered at a family level.  If after a loose file is added to your case, that same file is added, but as part of a larger email family (or vice versa), no deduplication will occur.  

 

DeNIST Detection

DeNIST provides a way to filter known, unnecessary files from uploaded data. During import and processing, files are checked against the National Institute of Standards and Technology Reference Library and matching documents are removed from the upload.

When DeNIST Detection is turned on, there are two options to consider:

  1. Tagging: Files found to match a DeNIST record will be imported and processed as usual, but will be assigned an additional "NIST" tag.
  2. Filtering: Files found to match a DeNIST record will be removed entirely from imports.

 


To return to the complete import workflow, click here >>

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.