Data collection can be the most complex and technically rigorous of all eDiscovery phases.
It involves the extraction of potentially relevant electronically stored information from its native source into a separate, secure repository for review. The collection process should be comprehensive without being over-inclusive. It should preserve the integrity of the data, the chain of custody and authenticity of the documents - all while not disrupting the organization or individual’s operations.
Managing Modern Data Collections
Don't begin ediscovery collections without this comprehensive, strategic guidebook.
General Considerations for Collections
- Consider and understand your source(s): Identify your key custodians, where their data is located, and the accessibility of that information (e.g. do you need a username and password to access?). Certain source types may require special considerations when collecting to ensure you collect the entirety of the data set. For example, when collecting emails, you may want to collect both server and local copies to ensure all emails are collected.
- Consider your collection method: In conjunction with the above consideration of the who, what and where for your different sources, it is also important to consider how you will collect from each source. Outlined below are three different approaches for collections:
- Employee self-collection (riskiest): Most employees aren’t technically savvy and are highly likely to make errors or overlook key documents. Several courts have also questioned whether employee self-collection constitutes a ‘defensible’ eDiscovery response.
- IT collection: Understand the data and technology landscape and possess the technical skill to extract everything needed, but ensure they are provided with clear guidance from the legal team on what specifically to target (otherwise, more likely to collect very broadly)
- External collection: An outside expert is likely to have proven procedures and the necessary tools and skill to perform a collection that will withstand the highest levels of judicial scrutiny.
- Collect only what you need: More data collected means more data to process, and ultimately to review. And that all adds up to more money spent on eDiscovery. Instead, develop strong preservation and early case assessment processes, and target your collections so that you are only collecting the potentially relevant ESI—nothing more or less.
- Be Proactive: It’s always in your best interest—financially and procedurally—to be proactive in assessing your needs and determining if outside resources will be needed. Even if outside assistance or experts ultimately are not needed, it’s important to give your internal IT team early notice that a big project is potentially looming, so they can plan resources accordingly.
- Phase Your Collections: In a phased collection strategy, data is prioritized so that only the highly relevant data is collected immediately. Less relevant data is collected only when absolutely needed.
- Avoid Collecting Archived Mailboxes: Whenever possible, you should try to avoid collecting from mailboxes in an archived state as doing so can produce unexpected email metadata information, especially when coming from Microsoft 365. For example, if a sender's email address is email@example.com, it could be reflected incorrectly in an archived state as firstname.lastname@example.org instead.
Tactical Best Practices
- Collected Mailbox File Size: Nextpoint recommends keeping your collected mailbox files (e.g. PST, Mbox, etc..) under 10 GB when possible, with a maximum file size of 20GB. Doing so will speed up processing, lesson chances of corruption, and improve error correction when needed. If you have more than the recommended size to collect, it is suggested the data is segmented into smaller sets prior to or during collection.
- If the client is self-collecting, it is recommended they do not forward data to you via email as attachments. Utilize Nextpoint’s request files instead to ensure you maintain an accurate and easily traceable chain of custody.
- Maintain clear custodian ownership when collecting so that information can be effectively assigned during import (e.g. avoid a mass collection across multiple custodians into one pst/mbox).
- For remote collections, consider how it will be accomplished, if you have custodians login information, and timing of collection. This ensures the mailbox custodian will have as little downtime as possible.
- Maintain clear organization in the File Room. This will ease the import and subsequent quality control processes when moving your data from the File Room into your Nextpoint database.
- If you are working with text messages, consider how you would like to organize and review the data prior to collection (e.g. do you want a separate document for each message or a spreadsheet with all of them or both or something in-between like a single spreadsheet for each conversation). These requirements may affect your collection method.
- If you are working with data from proprietary software, that proprietary software will likely be necessary if you would like to review images of the files. It is important to consider if the party collecting these files can obtain an image during the collection process and/or if an image will need to be generated post-collection. For more information on Nextpoint’s Custom Imaging Services, please contact email@example.com.
- What parties are involved?
- What deadlines have been agreed upon, to date?
- Have any preservation steps been taken?
- Who are your key custodians and where are they located?
- Do any of the identified custodians have direct IT resources available?
- What are each custodian's key sources (e.g. Email, phone, tablet, company server, etc.?)
- How accessible are each identified key source (e.g. password protection)?
- Which collection method is preferred/necessary for each source? (e.g. self v. external v. remote )
- Do you anticipate the authenticity of any evidence may come into question during the course of your matter?
- Is there a priority hierarchy that can be created from all identified custodians and their respective sources?
- Are there any parameters to be applied at the time of collection (e.g. date range)?