Microsoft Purview- Paint By Numbers Series (Part 5b) - Premium eDiscovery Data Sources & Collections
Published Dec 19 2022 10:29 AM 1,400 Views




Before we start, please not that if you want to see a table of contents for all the sections of this blog and their various Purview topics, you can locate the in the following link:

Microsoft Purview- Paint By Numbers Series (Part 0) - Overview - Microsoft Tech Community



This document is not meant to replace any official documentation, including those found at  Those documents are continually updated and maintained by Microsoft Corporation.  If there is a discrepancy between this document and what you find in the Compliance User Interface (UI) or inside of a reference in, you should always defer to that official documentation and contact your Microsoft Account team as needed.  Links to the data will be referenced both in the document steps as well as in the appendix.

All of the following steps should be done with test data, and where possible, testing should be performed in a test environment.  Testing should never be performed against production data.


Target Audience

The Advanced eDiscovery (Aed) section of this blog series is aimed at legal and HR officers who need to understand how to perform a basic investigation.


Document Scope

Once a case is created, you will need to add data sources “to be searched” and then you need will run a collection, meaning the actual search with search criteria.



This document does not cover any other aspect of Microsoft E5 Compliance, including:

  • Data Classification
  • Information Protection
  • Data Protection Loss (DLP) for Exchange, OneDrive, Devices
  • Data Lifecycle Management (retention and disposal)
  • Records Management (retention and disposal)
  • Premium eDiscovery
    • Overview and Settings
    • Case Creation and Case Settings
    • Review Sets
    • Communications
    • Holds
    • Processing
    • Exports
    • Jobs
  • Insider Risk Management (IRM)
  • Priva
  • Advanced Audit
  • Microsoft Cloud App Security (MCAS)
  • Information Barriers
  • Communications Compliance
  • Licensing
  • It is presumed that you have a pre-existing of understanding of what Microsoft E5 Compliance does and how to navigate the User Interface (UI).

It is also presumed you are using an existing Information Types (SIT) or a SIT you have created for your testing.


If you wish to set up and test any of the other aspects of Microsoft E5 Compliance, please refer to Part 1 of this blog series (listed in the link below) for the latest entries to this blog.  That webpage will be updated with any new walk throughs or Compliance relevant information, as time allows.


Microsoft Compliance - Paint By Numbers Series (Part 1) - Sensitive Information Types - Microsoft Te...


Use Case

There are many use cases for Advanced eDiscovery.  For the sake of simplicity, we will use the following: Your organization has a Human Resources investigation against a specific user.



  • Data Sources – These are the locations (EXO, SPO, OneDrive) where searches will be performed.  These are all the custodians (users) being investigated.  This is not the users performing the investigation.
  • Collections – This is the actual search being performed.  Collections include user, keyword, data, etc.
  • Review Sets – Once a collection/search has been performed, the data most be reviewed.  This tab is where secondary searches can be done and a review of the data.
  • Communications – If the HR or legal team wishes, they can notify the user that they are under investigation.  You can also set up reminder notifications in this section of the UI. 
    • Note - This task is optional.
  • Hold – Once the data has been collected/searched or reviewed, either all or part of the data can be placed on legal hold.  This means that the data cannot be deleted by the end user and if they do, then only their reference to the data is deleted.  If the user deletes their reference, then the data is placed into a hidden hold directory.
  • Processing – This tab is related to the indexing of data in your production environment.  You would use this if you are not finding data that you expect and you need to re-run indexing activities.
    • Note - This task is optional.
  • Exports #1 – When referring to the tab, this provides the data from the case to be exported to a laptop or desktop.
  • Export #2 – This is also the term used to export a .CSV report.
  • Jobs – This provides a list of every job run in eDiscovery and is useful when trying to see the current status of your jobs (example – Collection, Review, Processing, Export, etc).  This is useful if you launch an activity and want to monitor its status in real-time.
  • Setting – High level analytics and settings and reports, etc.
  • Custodian – This is the individual being investigated.




  • Core vs Advanced eDiscovery (high level overview)
    • Core eDiscovery – This allows for searching and export of data only.  It is perfect for basic “search and export” needs of data.  It is not the best tool for data migration or HR and/or Legal case management and workflows.
    • Advanced eDiscovery – This tool is best used as a first and second pass tool to cull the data before handing that same data to outside council or legal entity.  This tool provides a truer work flow for discovery, review, and export of data along with reporting and redacting of data.
  • If you are not familiar with the Electronic Discovery Reference Model (EDRM), I recommend you learn more about it as it is a universal workflow for eDiscoveries in the United States.  The link is in the appendix.
  • For my test, I am using a file named “1-MB-Test-SSN-1-AeD” with the phrase “Friedrich Conrad Rontgen invented the X-Ray” inside it. This file name stands for 1MB file with SSN information for Advanced eDiscovery testing.
  • We will not be using all of the tabs in available in a AeD case.
  • How do user deletes of data work with AeD?
  • If the end user deletes the data on their end and there IS NO Hold, then the data will be placed into the recycle bin on the corresponding applications.
  • If the end user deletes the data on their end and there IS a Hold, then the data will NOT be placed into the recycle bin on the corresponding applications.  However, the user reference to the data will be deleted so they will believe that the data is deleted.



If you have performed Part 1 of this blog series (creating a Sensitive Information Type), then you have everything you need.  If you have not done that part of the blog, you will need to populate your test environment with test data for the steps to follow.


First Investigation Steps

Now that you have configured the case and case settings, it is time to look into who will be investigated or what locations of your tenant will be searched (Data Sources) and 2) what criteria will be applied to the investigated (Collections).


Configure Data Sources


There are 2 ways to indicate what data sources will be searched: custodian or location.



  1. Select the Data Sources tab and then click Add Data Source.  You will have several options.  We will choose Add new custodians.  This allows you to search across multiple Office 365 applications for a user.
    1. NoteImport custodians imports a list of custodians via a CVS spreadsheet. Will not be covering this in depth.  You can find information on this in the Appendix and Links section below.





  1. Type the name of the custodian you want to search.  I will only be selecting one user at this time, Pradeep.




  1. Select your Hold Setting.  The Hold Setting indicates which users’ data set to place on automatic hold when searched.  If you do not select Hold for a user, the user’s data will be searched but not placed automatically on legal hold.




  1. In the Review section of the wizard, you will see what data locations are being searched and which are placed on automatic hold. 
    1. Note #1 – Any data location associated with that user will have a number 1 associated with it.  If there is no number associated with the data location, then, the user is not determined to have any data in that location.  Automatic Hold will be placed on locations where the user has data, per the 2nd step of the wizard.
    2. Note #2 – When you edit a custodian, you can change or clear the setting in this screen.



  1. If you are content with what you see, click Submit.  Then click Done on the next screen.



  1. If wish to search specific locations, and not just users and their associated data locations, you can select Add data locations.




  1. You can add SharePoint, Exchange, or M365 connected apps locations.  I will add the default SharePoint location of the “The Landing” which is one of my pre-populated SharePoint sites.  I will not be adding an Exchange location. 




  1. Then click Add.



Run a Collection


  1. Now we will run a collection (ie. search) of data.  Select the Collections tab.  Click Add Collection, and chose Standard collection.




  1. Give the collection a name and description and click Next.  In my example, I’ve entered the name of the inventor of the X-Ray machine.




  1. For the custodians being searched (Custodial Data Sources), you can search either a) specific custodians assigned to this case or b) all users associated with your case.  I will choose All users.  Click Next.




  1. Next are Non-custodial data sources.  These are sites, groups and other sources that are not associated with the custodians that you might want to add to your search.  For now, accept the default and select Next.



  1. If you want to add other locations, other than those associated with the user via their Identity, then you can add them in the Additional Locations part of the wizard.  For example, you can search Sarah Smith’s email in addition to Pradeep’s by adding her mailbox in this section.  Accept the default and click Next.





  1. We have come at long last to the search criteria itself.  In this section labeled Conditions, you can run searches based on keyword or other conditions.
  2. For my test I am using a file named “1-MB-Test-SSN-1-AeD” with the phrase “Friedrich Conrad Rontgen invented the X-Ray” inside it. This stands for 1MB file with SSN information for Advanced eDiscovery testing.   I will search against the three names of this inventor.





  1. Here is a list of those other conditions you can choose from.




  1. Select the criteria that you want to search.  When you are ready, click Next.
    1. Note – A common initial search is to search a user or set of users and a date range.  Then run a secondary search against a secondar search on a narrower data range, keywords, a subset of users, etc.  In Advanced eDiscovery, we will do those sorts of searches in the Review Sets tab which is next.
  2. Next is Save Draft or Collection.  Here you have the option to save this collection as a draft (meaning the data set is not officially placed on hold) or you can collect items into a review set.  We will choose the latter (Collect items and add to review set), and I will add it to a new Review set.





  1. Note #1 – If you have a case with multiple collections, you might decide to add a collection to a pre-existing Review Set.  I do not have one here and so will use a new review set.
  2. Note #2 – If adding to an existing Review Set, you can select Additional collection settings.  Again, we will not do those here as those options are also found in the Review Set section of the eDiscovery too.
  3. Note #3- placing data in a Review Set does not place that data on hold.  That is performed in the Hold tab which will allow you to place a “hold in place” action on data.  We will not be performing that in this blog
  1. Under Collection ingestion scale, I will choose the first option, Add all collection to review set.
    1. Note – you can choose to add only part of the collection to a review set, if you wish.



  1. Now review your collection and select Submit and then Done.





Click Done and move to the Review Sets tab.


Appendix and Links

















Version history
Last update:
‎Dec 19 2022 10:27 AM
Updated by: