Strong authentication controls like MFA significantly reduce account compromise — but they don’t eliminate the risk of password exposure.
In many organizations, users still interact with legacy systems, third‑party tools, or service accounts that rely on password‑only authentication. When those credentials are shared or stored in plain text — whether accidentally or out of convenience — they introduce a serious security risk.
Microsoft Purview helps organizations identify and protect sensitive information using Sensitive Information Types (SITs). While built‑in detections provide a solid foundation, certain scenarios benefit from organization‑specific context and policy‑driven patterns.
This post walks through how to extend password detection using a custom regex pattern — allowing you to identify strong passwords stored in plain text and respond before exposure turns into an incident.
The Challenge: Passwords Still Appear in Everyday Content
Despite user awareness training and improved security posture, passwords still surface in places like:
- Emails shared for “quick access”
- Documents stored in collaboration sites
- Notes created during troubleshooting
- Spreadsheets used for credential tracking
Even a single exposed password — especially for non‑MFA‑protected systems — can lead to unauthorized access or data leakage.
Extending Password Detection to Align with Organizational Policies
Microsoft Purview includes built‑in patterns to detect generic password formats. These offer a strong baseline and are effective for broad protection scenarios.
However, many organizations define specific password standards and want detection logic that reflects how passwords are referenced according to their organization policy. For example:
- Enforcing minimum and maximum password length
- Requiring complexity (letters, digits, special characters)
- Detecting passwords only when explicitly referenced, such as near the word password
- Reducing false positives from random strong strings (API keys, hashes, tokens)
In these cases, custom regex‑based Sensitive Information Types allow organizations to build on existing protection and apply targeted, high‑confidence detection.
Detection Requirements for This Scenario
In this example, we want to identify passwords that meet all of the following criteria:
✔ Minimum length: 10 characters
✔ Maximum length: 20 characters
✔ Must contain:
- At least one alphabet character
- At least one digit
- At least one special character
✔ Must appear in close proximity (within 2 characters) to a keyword such as: - password
- pwd
- passcode
This ensures we’re detecting intentional password disclosures, not unrelated strong strings.
In this scenario, the detection logic is intentionally split across three components:
- Primary element – Detects password length and structure
- First supporting element – Validates password complexity rules
- Second supporting element (keywords) – Adds human context using proximity
This structured design ensures that detection aligns closely with real‑world password disclosure patterns.
Detection Architecture Overview
|
Component |
Purpose |
|
Primary Element |
Identifies candidate password strings |
|
Supporting Element (Complexity) |
Confirms password strength |
|
Supporting Element (Keywords) |
Confirms contextual intent |
Primary Element: Password Length Identification
The primary element focuses purely on identifying potential password strings based on length.
Regex Pattern
\S{10,20}
What this enforces
- No whitespace characters
- Minimum length: 10 characters
- Maximum length: 20 characters
Proximity Configuration
- Distance between Primary and Supporting Element: 1 character
This ensures that the supporting complexity patterns evaluate directly against the same string, rather than unrelated values nearby.
First Supporting Element: Password Complexity Validation
The first supporting element ensures that the detected string meets organizational password complexity requirements.
All the following patterns are grouped within the same supporting element, and no internal proximity is configured (as they evaluate the same primary value).
Complexity Patterns Included
|
Requirement |
Regex Pattern |
|
At least one uppercase letter |
[A-Z] |
|
At least one lowercase letter |
[a-z] |
|
At least one digit |
[0-9] |
|
Allowed character set |
[A-Za-z0-9!@#$%^&*()_+\-=]{10,} |
|
At least one special character |
[!@#$%&*+=] |
This approach avoids relying on a single large regex, making the detection more readable, maintainable, and auditable.
Second Supporting Element: Keyword Context (Human Intent)
To further improve accuracy, a second supporting element is used to ensure the password appears in a meaningful, human context.
Keyword List (Case‑Insensitive)
credential
password
pwd
pswd
Keywords are configured in case‑insensitive mode to match variations such as Password, PWD, or Pswd.
(You can change the keyword and Proximity Character as per the need)
Proximity Configuration
- Proximity value: 30 characters
Why 30 Characters?
This value accounts for:
- Maximum keyword length: 10 characters
- Maximum password length: 20 characters
This ensures the keyword and password must appear within the same meaningful sentence or fragment, for example:
Password: P@ssW0rd123!
credential=Adm1n#Secure
pwd -> Qwerty@2024!
It avoids triggering on:
RandomStrongString123!
API_KEY = A9$kLmZpQw
How This Comes Together in Microsoft Purview
When implemented as a custom Sensitive Information Type:
- The primary element detects candidate passwords
- The first supporting element confirms password strength
- The second supporting element confirms user intent via keywords
- Proximity rules ensure all components relate to the same disclosure
This SIT can then be used across:
- Data Loss Prevention (DLP)
- Endpoint DLP
- Auto‑labelling
- Email and collaboration workload protection
Why This Design Is Effective
This structured approach allows organizations to:
- Detect real password disclosures with high confidence
- Align detection with internal password policy
- Reduce false positives from random strong strings
- Apply protection consistently across Microsoft 365 workloads
- Maintain a clean, auditable detection design
Most importantly, it extends Microsoft Purview’s native capabilities without changing the underlying security model.
Final Takeaway
Even in environments with strong authentication controls, password exposure remains a real risk — especially for legacy and third‑party systems.
By combining length validation, complexity enforcement, and contextual keyword proximity, Microsoft Purview enables precise and scalable password detection, helping organizations identify and protect sensitive credentials before they are misused.