Performance in scanning

Question

We are trying to search for CUI data on internal file stores. Last week, I decided to run another discovery scan, this time using ALL instead of Policy Only. It took much longer and left the scanner server in an almost unusable state and didn’t give really any more information than the first one did.

Based on my research, we need to define and set the policy before we run scans. This is the information tip from the Purview scanner settings:

Scan started at: 2026-05-20 22:54:06Z

Scan ended at: 2026-05-24 16:16:51Z

Scan duration: 3 days, 17 hours, 22 minutes, 45 seconds

Scan id: 93acb922-e2ac-4fb7-b259-d6184e7aa434

Repository: \\cab-filesrv-01.fg.com\Departments. Enforce mode is Off

Scanned files:3509640

Actions:

Classified:3369456

Classified as Public:14

Classified as Fg Private:3369442

Labeled:0

Remove label:0

Protected:0

Remove protection:0

Files with matched information types:572895

Skipped due to - No match:0

Skipped due to - Not supported:0

Skipped due to - Already labeled:0

Skipped due to - Already scanned:0

Skipped due to - Require justification:0

Skipped due to - Unknown reason:0

Skipped due to - Excluded:98833

Skipped due to - Attribute:0

Failed:41318

aladinh · Answer

Hi sagedogusa​,We’ve seen similar behavior in large repositories. In practice, Policy Only often provides the best balance between visibility and performance. We typically reserve ALL scans for initial discovery, policy validation, or major policy changes, then use targeted scans to validate and refine classifications. I’d also be curious about the root cause of the ~41k failed files.

ammar0 · Answer

Hi sagedogusa​,This is expected behavior when switching from Policy Only to ALL, you're essentially classifying every file against every built-in sensitive information type, which is significantly heavier on both CPU and I/O.A few things worth checking:Scan scope: 3.5M files in ~4 days is actually within normal range for an on-prem scanner, but 41K failures worth investigating, pull the scanner log and look for access denied or timeout patterns.Policy Only is usually the right call for CUI discovery: define your SIT-based policy targeting the CUI types you care about (ITAR, NIST 800-171 categories, etc.), then run in enforce-off mode. You get the match data without the full classification overhead.98K excluded files: worth reviewing your exclusion rules to make sure you're not inadvertently skipping containers where CUI might live.The 572K files with matched information types is your real number to work with, what types are matching? That'll tell you whether your policy is scoped correctly before you run another full scan.

Forum Discussion

Performance in scanning

2 Replies