Forum Discussion
Pre-migration queries related to data discovery and file analysis
Hi Team,
A scenario involves migrating approximately 25 TB of data from on‑premises file shares to SharePoint. Before the migration, a discovery phase is required to understand the composition of the data. The goal is to identify file types (Microsoft Office documents, PDFs, images, etc.) without applying any labels at this stage. The discovery requirements include:
- Identification of file types
- Detection of duplicate or redundant files
- Identification of embedded UNC paths, macros, and document links
- Detection of applications running directly from file shares
Guidance is needed on which Microsoft Purview components—such as the on‑premises scanner or the Data Map—can support these discovery requirements. Clarification is also needed on whether Purview is capable of meeting all the above needs.
Clarification is also needed on whether Purview can detect duplicate or redundant files, and if so, which module or capability enables this.
Additionally, since Purview allows downloading only up to 10,000 logs at a time, what would be the best approach to obtain discovery logs for a dataset of this size (25 TB)?
Thank you !