Blog Post

Microsoft Security Community Blog
4 MIN READ

Detecting Plain‑Text Password Exposure Using Custom Regex in Microsoft Purview

samsul_ahamed's avatar
samsul_ahamed
Icon for Microsoft rankMicrosoft
Apr 20, 2026

Strong authentication controls like MFA significantly reduce account compromise — but they don’t eliminate the risk of password exposure.
In many organizations, users still interact with legacy systems, third‑party tools, or service accounts that rely on password‑only authentication. When those credentials are shared or stored in plain text — whether accidentally or out of convenience — they introduce a serious security risk.

Microsoft Purview helps organizations identify and protect sensitive information using Sensitive Information Types (SITs). While built‑in detections provide a solid foundation, certain scenarios benefit from organization‑specific context and policy‑driven patterns.

This post walks through how to extend password detection using a custom regex pattern — allowing you to identify strong passwords stored in plain text and respond before exposure turns into an incident.

The Challenge: Passwords Still Appear in Everyday Content

Despite user awareness training and improved security posture, passwords still surface in places like:

  • Emails shared for “quick access”
  • Documents stored in collaboration sites
  • Notes created during troubleshooting
  • Spreadsheets used for credential tracking

Even a single exposed password — especially for non‑MFA‑protected systems — can lead to unauthorized access or data leakage.

Extending Password Detection to Align with Organizational Policies

Microsoft Purview includes built‑in patterns to detect generic password formats. These offer a strong baseline and are effective for broad protection scenarios.

However, many organizations define specific password standards and want detection logic that reflects how passwords are referenced according to their organization policy. For example:

  • Enforcing minimum and maximum password length
  • Requiring complexity (letters, digits, special characters)
  • Detecting passwords only when explicitly referenced, such as near the word password
  • Reducing false positives from random strong strings (API keys, hashes, tokens)

In these cases, custom regex‑based Sensitive Information Types allow organizations to build on existing protection and apply targeted, high‑confidence detection.

Detection Requirements for This Scenario

In this example, we want to identify passwords that meet all of the following criteria:

✔ Minimum length: 10 characters
✔ Maximum length: 20 characters
✔ Must contain:

  • At least one alphabet character
  • At least one digit
  • At least one special character
    ✔ Must appear in close proximity (within 2 characters) to a keyword such as:
  • password
  • pwd
  • passcode

This ensures we’re detecting intentional password disclosures, not unrelated strong strings.

In this scenario, the detection logic is intentionally split across three components:

  1. Primary element – Detects password length and structure
  2. First supporting element – Validates password complexity rules
  3. Second supporting element (keywords) – Adds human context using proximity

This structured design ensures that detection aligns closely with real‑world password disclosure patterns.

Detection Architecture Overview

Component

Purpose

Primary Element

Identifies candidate password strings

Supporting Element (Complexity)

Confirms password strength

Supporting Element (Keywords)

Confirms contextual intent

Primary Element: Password Length Identification

The primary element focuses purely on identifying potential password strings based on length.

Regex Pattern

\S{10,20}

What this enforces

  • No whitespace characters
  • Minimum length: 10 characters
  • Maximum length: 20 characters

Proximity Configuration

  • Distance between Primary and Supporting Element: 1 character

This ensures that the supporting complexity patterns evaluate directly against the same string, rather than unrelated values nearby.

First Supporting Element: Password Complexity Validation

The first supporting element ensures that the detected string meets organizational password complexity requirements.

All the following patterns are grouped within the same supporting element, and no internal proximity is configured (as they evaluate the same primary value).

Complexity Patterns Included

Requirement

Regex Pattern

At least one uppercase letter

[A-Z]

At least one lowercase letter

[a-z]

At least one digit

[0-9]

Allowed character set

[A-Za-z0-9!@#$%^&*()_+\-=]{10,}

At least one special character

[!@#$%&*+=]

 

This approach avoids relying on a single large regex, making the detection more readable, maintainable, and auditable.

Second Supporting Element: Keyword Context (Human Intent)

To further improve accuracy, a second supporting element is used to ensure the password appears in a meaningful, human context.

Keyword List (Case‑Insensitive)

credential

password

pwd

pswd

Keywords are configured in case‑insensitive mode to match variations such as Password, PWD, or Pswd.

(You can change the keyword and Proximity Character as per the need)

Proximity Configuration

  • Proximity value: 30 characters

Why 30 Characters?

This value accounts for:

  • Maximum keyword length: 10 characters
  • Maximum password length: 20 characters

This ensures the keyword and password must appear within the same meaningful sentence or fragment, for example:

Password: P@ssW0rd123!

credential=Adm1n#Secure

pwd -> Qwerty@2024!

It avoids triggering on:

RandomStrongString123!

API_KEY = A9$kLmZpQw

How This Comes Together in Microsoft Purview

When implemented as a custom Sensitive Information Type:

  • The primary element detects candidate passwords
  • The first supporting element confirms password strength
  • The second supporting element confirms user intent via keywords
  • Proximity rules ensure all components relate to the same disclosure

This SIT can then be used across:

  • Data Loss Prevention (DLP)
  • Endpoint DLP
  • Auto‑labelling
  • Email and collaboration workload protection
Why This Design Is Effective

This structured approach allows organizations to:

  • Detect real password disclosures with high confidence
  • Align detection with internal password policy
  • Reduce false positives from random strong strings
  • Apply protection consistently across Microsoft 365 workloads
  • Maintain a clean, auditable detection design

Most importantly, it extends Microsoft Purview’s native capabilities without changing the underlying security model.

Final Takeaway

Even in environments with strong authentication controls, password exposure remains a real risk — especially for legacy and third‑party systems.

By combining length validation, complexity enforcement, and contextual keyword proximity, Microsoft Purview enables precise and scalable password detection, helping organizations identify and protect sensitive credentials before they are misused.

Updated Apr 20, 2026
Version 2.0
No CommentsBe the first to comment