microsoft purview
570 TopicsAccelerate Your Security Copilot Readiness with Our Global Technical Workshop Series
The Security Copilot Technical Customer Readiness team is delivering free, virtual hands-on workshops year-round, available across multiple time zones to fit global schedules. These sessions are designed specifically for technical practitioners who want to deepen their AI for Security expertise with Microsoft Entra, Intune, Microsoft Purview, and Microsoft Threat Protection. What You’ll Learn Our workshop series combines scenario-based instruction, live demos, hands-on exercises, and expert Q&A to help you operationalize Security Copilot across your security stack. These sessions are all moderated by experts from Microsoft’s engineering teams and are aligned with the latest Security Copilot capabilities. Who Should Attend These workshops are ideal for: Security Architects & Engineers SOC Analysts Identity & Access Management Engineers Endpoint & Device Admins Compliance & Risk Practitioners Partner Technical Consultants Customer technical teams adopting AI powered defense Every session delivers 100% technical content, designed to accelerate real-world Security Copilot adoption. Register now for these upcoming Security Copilot Virtual Workshops Start building Security Copilot skills—choose the product area and time zone that works best for you. Please take note of pre-requisites for each workshop in the registration page Security Copilot Virtual Workshop: Copilot in Intune January 14, 2026 8:00-9:00 AM (PST) - register here January 15, 2026 2:00 – 3:30 PM (AEDT) - register here Note this is an Asia Pacific optimized delivery. Time conversion: 4:00-5:30 PM NZDT; 11:00-12:30 AM GMT +8; 8:30-10:00 AM IST; Jan. 14 7:00-8:30 PM PST Security Copilot Virtual Workshop: Copilot in Purview January 21, 2026 8:00 – 9:30 AM (PST) - register here January 22, 2026 2:00 – 3:30 AEDT - register here Note this is an Asia Pacific optimized delivery. Time conversion: 4:00-5:30 PM NZDT; 11:00-12:30 AM GMT +8; 8:30-10:00 AM IST; Jan. 14 7:00-8:30 PM PST Security Copilot Virtual Workshop: Copilot in Defender Sign in and click 'follow' above this blog to be notified on new delivery dates, or bookmark this page and check back in. Security Copilot Virtual Workshop: Copilot in Entra Sign in and click 'follow' above this blog to be notified on new delivery dates, or bookmark this page and check back in. ______________ Learn and Engage with the Microsoft Security Community Log in and follow this Microsoft Security Community Blog and post/ interact in the Microsoft Security Community discussion spaces. Follow = Click the heart in the upper right when you're logged in 🤍 Join the Microsoft Security Community and be notified of upcoming events, product feedback surveys, and more. Get early access to Microsoft Security products and provide feedback to engineers by joining the Microsoft Customer Connection Community. Learn about the Microsoft MVP Program. Join the Microsoft Security Community LinkedIn and the Microsoft Entra Community LinkedInAlways‑on Diagnostics for Purview Endpoint DLP: Effortless, Zero‑Friction troubleshooting for admins
Historically, some security teams have struggled with the challenge of troubleshooting issues with endpoint DLP. Investigations can often slow down because reproducing issues, collecting traces, and aligning on context can be tedious. With always-on diagnostics in Purview endpoint data loss prevention (DLP), our goal has been simple: make troubleshooting seamless, and effortless—without ever disrupting the information worker. Today, we’re excited to share new enhancements to always-on diagnostics for Purview endpoint DLP. This is the next step in our journey to modernize supportability in Microsoft Purview and dramatically reduce admin friction during investigations. Where We Started: Introduction of continuous diagnostic collection Earlier this year, we introduced continuous diagnostic trace collection on Windows endpoints (support for macOS endpoints coming soon). This eliminated the single largest source of friction: the need to reproduce issues. With this capability: Logs are captured persistently for up to 90 days Information workers no longer need admin permissions to retrieve traces Admins can submit complete logs on the first attempt Support teams can diagnose transient or rare issues with high accuracy In just a few months, we saw resolution times drop dramatically. The message was clear: Always-on diagnostics is becoming a new troubleshooting standard. Our Newest Enhancements: Built for Admins. Designed for Zero Friction. The newest enhancements to always-on diagnostics unlock the most requested capability from our IT and security administrators: the ability to retrieve and upload always-on diagnostic traces directly from devices using the Purview portal — with no user interaction required. This means: Admins can now initiate trace uploads on demand No interruption to information workers and their productivity No issue reproduction sessions, minimizing unnecessary disruption and coordination Every investigation starts with complete context Because the traces are already captured on-device, these improvements now help complete the loop by giving admins a seamless, portal-integrated workflow to deliver logs to Microsoft when needed. This experience is now fully available for customers using endpoint DLP on Windows. Why This Matters As a product team, our success is measured not just by usage, but by how effectively we eliminate friction for customers. Always-on diagnostics minimizes the friction and frustration that has historically affected some customers. - No more asking your employee or information worker to "can you reproduce that?" and share logs - No more lost context - No more delays while logs are collected after the fact How it Works Local trace capture Devices continuously capture endpoint DLP diagnostic data in a compressed, proprietary format, and this data stays solely on the respective device based on the retention period and storage limits configured by the admin. Users no longer need to reproduce issues during retrieval—everything the investigation requires is already captured on the endpoint. Admin-triggered upload Admins can now request diagnostic uploads directly from the Purview portal, eliminating the need to disrupt users. Upload requests can be initiated from multiple entry points, including: Alerts (Data Loss Prevention → Alerts → Events) Activity Explorer (Data Loss Prevention → Explorers → Activity explorer) Device Policy Status Page (Settings → Device onboarding → Devices) From any of these locations, admins can simply choose Request device log, select the date range, add a brief description, and submit the request. Once processed, the device’s always-on diagnostic logs are securely uploaded to Microsoft telemetry as per customer-approved settings. Admins can include the upload request number in their ticket with Microsoft Support, and sharing this number removes the need for the support engineer to ask for logs again during the investigation. This workflow ensures investigations start with complete diagnostic context. Privacy & compliance considerations Data is only uploaded during admin-initiated investigations Data adheres to our published diagnostic data retention policies Logs are only accessible to the Microsoft support team, not any other parties We Want to Hear From You Are you using always-on diagnostics? We'd love to hear about your experience. Share your feedback, questions, or success stories in the Microsoft Tech Community, or reach out to our engineering team directly. Making troubleshooting effortless—so you can focus on what matters, not on chasing logs.Security Copilot Skilling Series
Starting this October, Security Copilot joins forces with your favorite Microsoft Security products in a skilling series miles above the rest. The Security Copilot Skilling Series is your opportunity to strengthen your security posture through threat detection, incident response, and leveraging AI for security automation. These technical skilling sessions are delivered live by experts from our product engineering teams. Come ready to learn, engage with your peers, ask questions, and provide feedback. Upcoming sessions are noted below and will be available on-demand on the Microsoft Security Community YouTube channel. Coming Up January 22 | Security Copilot Skilling Series | Building Custom Agents: Unlocking Context, Automation, and Scale Speakers: Innocent Wafula, Sean Wesonga, and Sebuh Haileleul Microsoft Security Copilot already features a robust ecosystem of first-party and partner-built agents, but some scenarios require solutions tailored to your organization’s specific needs and context. In this session, you'll learn how the Security Copilot agent builder platform and MCP servers empower you to create tailored agents that provide context-aware reasoning and enterprise-scale solutions for your unique scenarios. January 28 | Security Copilot in Purview Technical Deep Dive Speakers: Patrick David, Thao Phan, Alexandra Roland Discover how AI-powered alert triage agents for Data Loss Prevention (DLP) and Insider Risk Management (IRM) are transforming incident response and compliance workflows. Explore new Data Security Posture Management (DSPM) capabilities that deliver deeper insights and automation to strengthen your security posture. This session will showcase real-world scenarios and actionable strategies to help you protect sensitive data and simplify compliance. Now On-Demand December 18 | What's New in Security Copilot for Defender Speaker: Doug Helton Discover the latest innovations in Microsoft Security Copilot embedded in Defender that are transforming how organizations detect, investigate, and respond to threats. This session will showcase powerful new capabilities—like AI-driven incident response, contextual insights, and automated workflows—that help security teams stop attacks faster and simplify operations. Why Attend: Stay Ahead of Threats: Learn how cutting-edge AI features accelerate detection and remediation. Boost Efficiency: See how automation reduces manual effort and improves SOC productivity. Get Expert Insights: Hear directly from product leaders and explore real-world use cases. Don’t miss this opportunity to future-proof your security strategy and unlock the full potential of Security Copilot in Defender! December 4 | Discussion of Ignite Announcements Speakers: Zineb Takafi, Mike Danoski and Oluchi Chukwunwere, Priyanka Tyagi, Diana Vicezar, Thao Phan, Alex Roland, and Doug Helton Ignite 2025 is all about driving impact in the era of AI—and security is at the center of it. In this session, we’ll unpack the biggest Security Copilot announcements from Ignite on agents and discuss how Copilot capabilities across Intune, Entra, Purview, and Defender deliver end-to-end protection. November 13 | Microsoft Entra AI: Unlocking Identity Intelligence with Security Copilot Skills and Agents Speakers: Mamta Kumar, Sr. Product Manager; Margaret Garcia Fani, Sr. Product Manager This session will demonstrate how Security Copilot in Microsoft Entra transforms identity security by introducing intelligent, autonomous capabilities that streamline operations and elevate protection. Customers will discover how to leverage AI-driven tools to optimize conditional access, automate access reviews, and proactively manage identity and application risks - empowering them into a more secure, and efficient digital future. October 30 | What's New in Copilot in Microsoft Intune Speaker: Amit Ghodke, Principal PM Architect, CxE CAT MEM Join us to learn about the latest Security Copilot capabilities in Microsoft Intune. We will discuss what's new and how you can supercharge your endpoint management experience with the new AI capabilities in Intune. October 16 | What’s New in Copilot in Microsoft Purview Speaker: Patrick David, Principal Product Manager, CxE CAT Compliance Join us for an insider’s look at the latest innovations in Microsoft Purview —where alert triage agents for DLP and IRM are transforming how we respond to sensitive data risks and improve investigation depth and speed. We’ll also dive into powerful new capabilities in Data Security Posture Management (DSPM) with Security Copilot, designed to supercharge your security insights and automation. Whether you're driving compliance or defending data, this session will give you the edge. October 9 | When to Use Logic Apps vs. Security Copilot Agents Speaker: Shiv Patel, Sr. Product Manager, Security Copilot Explore how to scale automation in security operations by comparing the use cases and capabilities of Logic Apps and Security Copilot Agents. This webinar highlights when to leverage Logic Apps for orchestrated workflows and when Security Copilot Agents offer more adaptive, AI-driven responses to complex security scenarios. All sessions will be published to the Microsoft Security Community YouTube channel - Security Copilot Skilling Series Playlist __________________________________________________________________________________________________________________________________________________________________ Looking for more? Keep up on the latest information on the Security Copilot Blog. Join the Microsoft Security Community mailing list to stay up to date on the latest product news and events. Engage with your peers one of our Microsoft Security discussion spaces.1.6KViews1like0CommentsAggregate alerts not showing up for Email DLP
Hi, I’m unable to see the “Aggregate alerts” option while configuring an Email DLP policy, although the same option is visible for Endpoint DLP. The available license is Microsoft 365 E5 Information Protection and DLP (add-on). If this is a licensing limitation, why am I still able to see the option for Endpoint DLP but not for Email DLP? Screen short showing option for Endpoint DLP alertsSolved31Views0likes2CommentsTest DLP Policy: On-Prem
We have DLP policies based on SIT and it is working well for various locations such as Sharepoint, Exchange and Endpoint devices. But the DLP policy for On-Prem Nas shares is not matching when used with Microsoft Information Protection Scanner. DLP Rule: Conditions Content contains any of these sensitive info types: Credit Card Number U.S. Bank Account Number U.S. Driver's License Number U.S. Individual Taxpayer Identification Number (ITIN) U.S. Social Security Number (SSN) The policy is visible to the Scanner and it is being logged as being executed MSIP.Lib MSIP.Scanner (30548) Executing policy: Data Discovery On-Prem, policyId: 85........................ and the MIP reports are listing files with these SITs The results Information Type Name - Credit Card Number U.S. Social Security Number (SSN) U.S. Bank Account Number Action - Classified Dlp Mode -- Test Dlp Status -- Skipped Dlp Comment -- No match There is no other information in logs. Why is the DLP policy not matching and how can I test the policy ? thanks53Views0likes1CommentData Quality Error (Internal Service Error)
I am facing an issue while running the DQ scan, when i tried doing manual scan and scheduled scans both time i faced Internal Service Error. ( DataQualityInternalError Internal service error occurred .Please retry or contact Microsoft support ) Data Profiling is running successfully but for none of the asset, DQ is working. After the lineage patch which MS had fixed, they had introduced Custom SQL option to create a rule, and after that only i am facing this issue. Is anyone else also facing the same? I tried with different data sources (ADLS, and Synapse) its same for both. If anyone has an idea, do share it here, it will be helpful.29Views0likes1CommentA Quick Look at Purview Data Security Investigations
During the quiet holiday period, I tested the new Purview Data Security Investigations (DSI) solution, which seems to be put together from bits of Microsoft 365 together with Security Copilot and some generative AI. Assembling new solutions from existing components makes sense because it reduces engineering effort. Without real data, it's hard to know how effective DSI is, but the cost of an investigation came as a real surprise. https://office365itpros.com/2026/01/06/data-security-investigation/29Views0likes1CommentMicrosoft Copilot Studio vs. Microsoft Foundry: Building AI Agents and Apps
Microsoft Copilot Studio and Microsoft Foundry (often referred to as Azure AI Foundry) are two key platforms in Microsoft’s AI ecosystem that allow organizations to create custom AI agents and AI-enabled applications. While both share the goal of enabling businesses to build intelligent, task-oriented “copilot” solutions, they are designed for different audiences and use cases. To help you decide which path suits your organization, this blog provides an educational comparison of Copilot Studio vs. Azure AI Foundry, focusing on their unique strengths, feature parity and differences, and key criteria like control requirements, preferences, and integration needs. By understanding these factors, technical decision-makers, developers, IT admins, and business leaders can confidently select the right platform or even a hybrid approach for their AI agent projects. Copilot Studio and Azure AI Foundry: At a Glance Copilot Studio is designed for business teams, pro‑makers, and IT admins who want a managed, low‑code SaaS environment with plug‑and‑play integrations. Microsoft Foundry is built for professional developers who need fine‑grained control, customization, and integration into their existing application and cloud infrastructure. And the good news? Organizations often use both and they work together beautifully. Feature Parity and Key Differences While both platforms can achieve similar outcomes, they do so via different means. Here’s a high-level comparison of Copilot Studio and Azure AI Foundry: Factor Copilot Studio (SaaS, Low-Code) Microsoft (Azure) AI Foundry (PaaS, Pro-Code) Target Users & Skills Business domain experts, IT pros, and “pro-makers” comfortable with low-code tools. Little to no coding is required for building agents. Ideal for quick solutions within business units. Professional developers, software engineers, and data scientists with coding/DevOps expertise. Deep programming skills needed for custom code, DevOps, and advanced AI scenarios. Suited for complex, large-scale AI projects. Platform Model Software-as-a-Service – fully managed by Microsoft. Agents and tools are built and run in Microsoft’s cloud (M365/Copilot service) with no infrastructure to manage. Simplified provisioning, automatic updates, and built-in compliance with Microsoft 365 environment. Platform-as-a-Service, runs in your Azure subscription. You deploy and manage the agent’s infrastructure (e.g. Azure compute, networking, storage) in your cloud. Offers full control over environment, updates, and data residency. Integration & Data Out-of-box connectors & data integrations for Microsoft 365 (SharePoint, Outlook, Teams) and 3rd-party SaaS via Power Platform connectors. Easy integration with business systems without coding, ideal for leveraging existing M365 and Power Platform assets. Data remains in Microsoft’s cloud (with M365 compliance and Purview governance) by default. Deep custom integration with any system or data source via code. Natively works with Azure services (Azure SQL, Cosmos DB, Functions, Kubernetes, Service Bus, etc.) and can connect to on-prem or multi-cloud resources via custom connectors. Suitable when data/code must stay in your network or cloud for compliance or performance reasons. Development Experience Low-code, UI-driven development. Build agents with visual designers and prompt editors. No-code orchestration through Topics (conversational flows) and Agent Flows (Power Automate). Rich library of pre-built components (tools/capabilities) that are auto-managed and continuously improved by Microsoft (e.g. Copilot connectors for M365, built-in tool evaluations). Emphasizes speed and simplicity over granular control. Code-first development. Offers web-based studio plus extensive SDKs, CLI, and VS Code integration for coding agents and custom tools. Supports full DevOps: you can use GitHub/Azure DevOps for CI/CD, custom testing, version control, and integrate with your existing software development toolchain. Provides maximum flexibility to define bespoke logic, but requires more time and skill, sacrificing immediate simplicity for long-term extensibility. Control & Governance Managed environment – minimal configuration needed. Governance is handled via Microsoft’s standard M365 admin centers: e.g. Admin Center, Entra ID, Microsoft Purview, Defender for identity, access, auditing, and compliance across copilots. Updates and performance optimizations (e.g. tool improvements) are applied automatically by Microsoft. Limited need (or ability) to tweak infrastructure or model behavior under the hood – fits organizations that want Microsoft to manage the heavy lifting. Microsoft Foundry provides a pro‑code, Azure‑native environment for teams that need full control over the agent runtime, integrations, and development workflow. Full stack control – you manage how and where agents run. Customizable governance using Azure’s security & monitoring tools: Azure AD (identity/RBAC), Key Vault, network security (private endpoints, VNETs), plus integrated logging and telemetry via Azure Monitor, App Insights, etc. Foundry includes a developer control plane for observing, debugging, and evaluating agents during development and runtime. This is ideal for organizations requiring fine-grained control, custom compliance configurations, and rigorous LLMOps practices. Deployment Channels One-click publishing to Microsoft 365 experiences (Teams, Outlook), web chat, SharePoint, email, and more – thanks to native support for multiple channels in Copilot Studio. Everything runs in the cloud; you don’t worry about hosting the bot. Flexible deployment options. Foundry agents can be exposed via APIs or the Activity Protocol, and integrated into apps or custom channels using the M365 Agents SDK. Foundry also supports deploying agents as web apps, containers, Azure Functions, or even private endpoints for internal use, giving teams freedom to run agents wherever needed (with more setup). Control and customization Copilot Studio trades off fine-grained control for simplicity and speed. It abstracts away infrastructure and handles many optimizations for you, which accelerates development but limits how deeply you can tweak the agent’s behavior. Azure Foundry, by contrast, gives you extensive control over the agent’s architecture, tools and environment – at the cost of more complex setup and effort. Consider your project’s needs: Does it demand custom code, specialized model tuning or on-premises data? If yes, Foundry provides the necessary flexibility. Common Scenarios · HR or Finance teams building departmental AI assistants · Sales operations automating workflows and knowledge retrieval · Fusion teams starting quickly without developer-heavy resources Copilot Studio gives teams a powerful way to build agents quickly without needing to set up compute, networking, identity or DevOps pipeline · Embedding agents into production SaaS apps · If team uses professional developer frameworks (Semantic Kernel, LangChain, AutoGen, etc.) · Building multi‑agent architectures with complex toolchains · You require integration with existing app code or multi-cloud architecture. · You need full observability, versioning, instrumentation or custom DevOps. Foundry is ideal for software engineering teams who need configurability, extensibility and industrial-grade DevOps. Benefits of Combined Use: Embracing Hybrid approach One important insight is that Copilot Studio and Foundry are not mutually exclusive. In fact, Microsoft designed them to be interoperable so that organizations can use both in tandem for different parts of a solution. This is especially relevant for large projects or “fusion teams” that include both low-code creators and pro developers. The pattern many enterprises land on: Developers build specialized tools / agents in Foundry Makers assemble user-facing workflow experience in Copilot Studio Agents can collaborate via agent-to-agent patterns (including A2A, where applicable) Using both platforms together unlocks the best of both worlds: Seamless User Experience: Copilot Studio provides a polished, user-friendly interface for end-users, while Azure AI Foundry handles complex backend logic and data processing. Advanced AI Capabilities: Leverage Azure AI Foundry’s extensive model library and orchestration features to build sophisticated agents that can reason, learn, and adapt. Scalability & Flexibility: Azure AI Foundry’s cloud-native architecture ensures scalability for high-demand scenarios, while Copilot Studio’s low-code approach accelerates development cycles. For the customers who don’t want to decide up front, Microsoft introduced a unified approach for scaling agent initiatives: Microsoft Agent Pre-Purchase Plan (P3) as part of the broader Agent Factory story, designed to reduce procurement friction across both platforms. Security & Compliance using Microsoft Purview Microsoft Copilot Studio: Microsoft Purview extends enterprise-grade security and compliance to agents built with Microsoft Copilot Studio by bringing AI interaction governance into the same control plane you use for the rest of Microsoft 365. With Purview, you can apply DSPM for AI insights, auditing, and data classification to Copilot Studio prompts and responses, and use familiar compliance capabilities like sensitivity labels, DLP, Insider Risk Management, Communication Compliance, eDiscovery, and Data Lifecycle Management to reduce oversharing risk and support investigations. For agents published to non-Microsoft channels, Purview management can require pay-as-you-go billing, while still using the same Purview policies and reporting workflows teams already rely on. Microsoft Foundry: Microsoft Purview integrates with Microsoft Foundry to help organizations secure and govern AI interactions (prompts, responses, and related metadata) using Microsoft’s unified data security and compliance capabilities. Once enabled through the Foundry Control Plane or through Microsoft Defender for Cloud in Microsoft Azure Portal, Purview can provide DSPM for AI posture insights plus auditing, data classification, sensitivity labels, and enforcement-oriented controls like DLP, along with downstream compliance workflows such as Insider Risk, Communication Compliance, eDiscovery, and Data Lifecycle Management. This lets security and compliance teams apply consistent policies across AI apps and agents in Foundry, while gaining visibility and governance through the same Purview portal and reports used across the enterprise. Conclusion When it comes to Copilot Studio vs. Azure AI Foundry, there is no universally “best” choice – the ideal platform depends on your team’s composition and project requirements. Copilot Studio excels at enabling functional business teams and IT pros to build AI assistants quickly in a managed, compliant environment with minimal coding. Azure AI Foundry shines for developer-centric projects that need maximal flexibility, custom code, and deep integration with enterprise systems. The key is to identify what level of control, speed, and skill your scenario calls for. Use both together to build end-to-end intelligent systems that combine ease of use with powerful backend intelligence. By thoughtfully aligning the platform to your team’s strengths and needs, you can minimize friction and maximize momentum on your AI agent journey delivering custom copilot solutions that are both quick to market and built for the long haul Resources to explore Copilot Studio Overview Microsoft Foundry Use Microsoft Purview to manage data security & compliance for Microsoft Copilot Studio Use Microsoft Purview to manage data security & compliance for Microsoft Foundry Optimize Microsoft Foundry and Copilot Credit costs with Microsoft Agent pre-purchase plan Accelerate Innovation with Microsoft Agent FactorySecuring the AI Pipeline – From Data to Deployment
In our first post, we established why securing AI workloads is mission-critical for the enterprise. Now, we turn to the AI pipeline—the end-to-end journey from raw data to deployed models—and explore why every stage must be fortified against evolving threats. As organizations accelerate AI adoption, this pipeline becomes a prime target for adversaries seeking to poison data, compromise models, or exploit deployment endpoints. Enterprises don’t operate a single “AI system”; they run interconnected pipelines that transform data into decisions across a web of services, models, and applications. Protecting this chain demands a holistic security strategy anchored in Zero Trust for AI, supply chain integrity, and continuous monitoring. In this post, we map the pipeline, identify key attack vectors at each stage, and outline practical defenses using Microsoft’s security controls—spanning data governance with Purview, confidential training environments in Azure, and runtime threat detection with Defender for Cloud. Our guidance aligns with leading frameworks, including the NIST AI Risk Management Framework and MITRE ATLAS, ensuring your AI security program meets recognized standards while enabling innovation at scale. A Security View of the AI Pipeline Securing AI isn’t just about protecting a single model—it’s about safeguarding the entire pipeline that transforms raw data into actionable intelligence. This pipeline spans multiple stages, from data collection and preparation to model training, validation, and deployment, each introducing unique risks that adversaries can exploit. Data poisoning, model tampering, and supply chain attacks are no longer theoretical—they’re real threats that can undermine trust and compliance. By viewing the pipeline through a security lens, organizations can identify these vulnerabilities early and apply layered defenses such as Zero Trust principles, data lineage tracking, and runtime monitoring. This holistic approach ensures that AI systems remain resilient, auditable, and aligned with enterprise risk and regulatory requirements. Stages & Primary Risks Data Collection & Ingestion Sources: enterprise apps, data lakes, web, partners. Key risks: poisoning, PII leakage, weak lineage, and shadow datasets. Frameworks call for explicit governance and provenance at this earliest stage. [nist.gov] Data Prep & Feature Engineering Risks: backdoored features, bias injection, and transformation tampering that evades standard validation. ATLAS catalogs techniques that target data, features, and preprocessing. [atlas.mitre.org] Model Training / Fine‑Tuning Risks: model theft, inversion, poisoning, and compromised compute. Confidential computing and isolated training domains are recommended. [learn.microsoft.com] Validation & Red‑Team Testing Risks: tainted validation sets, overlooked LLM‑specific risks (prompt injection, unbounded consumption), and fairness drift. OWASP’s LLM Top 10 highlights the unique classes of generative threats. [owasp.org] Registry & Release Management Risks: supply chain tampering (malicious models, dependency confusion), unsigned artifacts, and missing SBOM/AIBOM. [codesecure.com], [github.com] Deployment & Inference Risks: adversarial inputs, API abuse, prompt injection (direct & indirect), data exfiltration, and model abuse at runtime. Microsoft has documented multi‑layer mitigations and integrated threat protection for AI workloads. [techcommun…rosoft.com], [learn.microsoft.com] Reference Architecture (Zero Trust for AI) The Reference Architecture for Zero Trust in AI establishes a security-first blueprint for the entire AI pipeline—from raw data ingestion to model deployment and continuous monitoring. Its importance lies in addressing the unique risks of AI systems, such as data poisoning, model tampering, and adversarial attacks, which traditional security models often overlook. By embedding Zero Trust principles at every stage—governance with Microsoft Purview, isolated training environments, signed model artifacts, and runtime threat detection—organizations gain verifiable integrity, regulatory compliance, and resilience against evolving threats. Adopting this architecture ensures that AI innovations remain trustworthy, auditable, and aligned with business and compliance objectives, ultimately accelerating adoption while reducing risk and safeguarding enterprise reputation. Below is a visual of what this architecture looks like: Why this matters: Microsoft Purview establishes provenance, labels, and lineage Azure ML enforces network isolation Confidential Computing protects data-in-use Responsible AI tooling addresses safety & fairness Defender for Cloud adds runtime AI‑specific threat detection Azure ML Model Monitoring closes the loop with drift and anomaly detection. [microsoft.com], [azure.microsoft.com], [learn.microsoft.com], [learn.microsoft.com], [learn.microsoft.com], [learn.microsoft.com], [learn.microsoft.com], [learn.microsoft.com] Stage‑by‑Stage Threats & Concrete Mitigations (with Microsoft Controls) Data Collection & Ingestion - Attack Scenarios Data poisoning via partner feed or web‑scraped corpus; undetected changes skew downstream models. Research shows Differential Privacy (DP) can reduce impact but is not a silver bullet. Differential Privacy introduces controlled noise into training data or model outputs, making it harder for attackers to infer individual data points and limiting the influence of any single poisoned record. This helps reduce the impact of targeted poisoning attacks because malicious entries cannot disproportionately affect the model’s parameters. However, DP is not sufficient on its own for several reasons: Aggregate poisoning still works: DP protects individual records, but if an attacker injects a large volume of poisoned data, the cumulative effect can still skew the model. Utility trade-offs: Adding noise to achieve strong privacy guarantees often degrades model accuracy, creating tension between security and performance. Doesn’t detect malicious intent: DP doesn’t validate data quality or provenance—it only limits exposure. Poisoned data can still enter the pipeline undetected. Vulnerable to sophisticated attacks: Techniques like backdoor poisoning or gradient manipulation can bypass DP protections because they exploit model behavior rather than individual record influence. Bottom line, DP is a valuable layer for privacy and resilience, but it must be combined with data validation, anomaly detection, and provenance checks to effectively mitigate poisoning risks. [arxiv.org], [dp-ml.github.io] Sensitive data drift into training corpus (PII/PHI), later leaking through model inversion. NIST RMF calls for privacy‑enhanced design and provenance from the outset. When personally identifiable information (PII) or protected health information (PHI) unintentionally enters the training dataset—often through partner feeds, logs, or web-scraped sources—it creates a latent risk. If the model memorizes these sensitive records, adversaries can exploit model inversion attacks to reconstruct or infer private details from outputs or embeddings. [nvlpubs.nist.gov] Mitigations & Integrations Classify & label sensitive fields with Microsoft Purview Use Purview’s automated scanning and classification to detect PII, PHI, financial data, and other regulated fields across your data estate. Apply sensitivity labels and tags to enforce consistent governance policies. [microsoft.com] Enable lineage across Microsoft Fabric/Synapse/SQL Implement Data Loss Prevention (DLP) rules to block unauthorized movement of sensitive data and prevent accidental leaks. Combine this with role-based access control (RBAC) and attribute-based access control (ABAC) to restrict who can view, modify, or export sensitive datasets. Integrate with SOC and DevSecOps Pipelines Feed Purview alerts and lineage events into your SIEM/XDR workflows for real-time monitoring. Automate policy enforcement in CI/CD pipelines to ensure models only train on approved, sanitized datasets. Continuous Compliance Monitoring Schedule recurring scans and leverage Purview’s compliance dashboards to validate adherence to regulatory frameworks like GDPR, HIPAA, and NIST RMF. Maintain dataset hashes and signatures; store lineage metadata and approvals before a dataset can enter training (Purview + Fabric). [azure.microsoft.com] For externally sourced data, sandbox ingestion and run poisoning heuristics; if using Data Privacy (DP)‑training, document tradeoffs (utility vs. robustness). [aclanthology.org], [dp-ml.github.io] 3.2 Data Preparation & Feature Engineering Attack Scenarios Feature backdoors: crafted tokens in a free‑text field activate hidden behaviors only under specific conditions. MITRE ATLAS lists techniques that target features/preprocessing. [atlas.mitre.org] Mitigations & Integrations Version every transformation; capture end‑to‑end lineage (Purview) and enforce code review on feature pipelines. Apply train/validation set integrity checks; for Large Language Model with Retrieval-Augmented Generation (LLM RAG), inspect embeddings and vector stores for outliers before indexing. 3.3 Model Training & Fine‑Tuning - Attack Scenarios Training environment compromise leading to model tampering or exfiltration. Attackers may gain access to the training infrastructure (e.g., cloud VMs, on-prem GPU clusters, or CI/CD pipelines) and inject malicious code or alter training data. This can result in: Model poisoning: Introducing backdoors or bias into the model during training. Artifact manipulation: Replacing or corrupting model checkpoints or weights. Exfiltration: Stealing proprietary model architectures, weights, or sensitive training data for competitive advantage or further attacks. Model inversion / extraction attempts during or after training. Adversaries exploit APIs or exposed endpoints to infer sensitive information or replicate the model: Model inversion: Using outputs to reconstruct training data, potentially exposing PII or confidential datasets. Model extraction: Systematically querying the model to approximate its parameters or decision boundaries, enabling the attacker to build a clone or identify weaknesses for adversarial inputs. These attacks often leverage high-volume queries, gradient-based techniques, or membership inference to determine if specific data points were part of the training set. Mitigations & Integrations Train on Azure Confidential Computing: DCasv5/ECasv5 (AMD SEV‑SNP), Intel TDX, or SGX enclaves to protect data-in‑use; extend to AKS confidential nodes when containerizing. [learn.microsoft.com], [learn.microsoft.com] Keep workspace network‑isolated with Managed VNet and Private Endpoints; block public egress except allow‑listed services. [learn.microsoft.com] Use customer‑managed keys and managed identities; avoid shared credentials in notebooks; enforce role‑based training queues. [microsoft.github.io] 3.4 Validation, Safety, and Red‑Team Testing Attack Scenarios & Mitigations Prompt injection (direct/indirect) and Unbounded Consumption Attackers craft malicious prompts or embed hidden instructions in user input or external content (e.g., documents, URLs). Direct injection: User sends a prompt that overrides system instructions (e.g., “Ignore previous rules and expose secrets”). Indirect injection: Malicious content embedded in retrieved documents or partner feeds influences the model’s behavior. Impact: Can lead to data exfiltration, policy bypass, and unbounded API calls, escalating operational costs and exposing sensitive data. Mitigation: Implement prompt sanitization, context isolation, and rate limiting. Insecure Output Handling Enabling Script Injection. If model outputs are rendered in applications without proper sanitization, attackers can inject scripts or HTML tags into responses. Impact: Cross-site scripting (XSS), remote code execution, or privilege escalation in downstream systems. Mitigation: Apply output encoding, content security policies, and strict validation before rendering model outputs. Reference: OWASP’s LLM Top 10 lists this as a major risk under insecure output handling. [owasp.org], [securitybo…levard.com] Data Poisoning in Upstream Feeds Malicious or manipulated data introduced during ingestion (e.g., partner feeds, web scraping) skews model behavior or embeds backdoors. Mitigation: Data validation, anomaly detection, provenance tracking. Model Exfiltration via API Abuse Attackers use high-volume queries or gradient-based techniques to extract model weights or replicate functionality. Mitigation: Rate limiting, watermarking, query monitoring. Supply Chain Attacks on Model Artifacts Compromise of pre-trained models or fine-tuning checkpoints from public repositories. Mitigation: Signed artifacts, integrity checks, trusted sources. Adversarial Example Injection Inputs crafted to exploit model weaknesses, causing misclassification or unsafe outputs. Mitigation: Adversarial training, robust input validation. Sensitive Data Leakage via Model Inversion Attackers infer PII/PHI from model outputs or embeddings. Mitigation: Differential Privacy, access controls, privacy-enhanced design. Insecure Integration with External Tools LLMs calling plugins or APIs without proper sandboxing can lead to unauthorized actions. Mitigation: Strict permissioning, allowlists, and isolation. Additional Mitigations & Integrations considerations Adopt Microsoft’s defense‑in‑depth guidance for indirect prompt injection (hardening + Spotlighting patterns) and pair with runtime Prompt Shields. [techcommun…rosoft.com] Evaluate models with Responsible AI Dashboard (fairness, explainability, error analysis) and export RAI Scorecards for release gates. [learn.microsoft.com] Build security gates referencing MITRE ATLAS techniques and OWASP GenAI controls into your MLOps pipeline. [atlas.mitre.org], [owasp.org] 3.5 Registry, Signing & Supply Chain Integrity - Attack Scenarios Model supply chain risk: backdoored pre‑trained weights Attackers compromise publicly available or third-party pre-trained models by embedding hidden behaviors (e.g., triggers that activate under specific inputs). Impact: Silent backdoors can cause targeted misclassification or data leakage during inference. Mitigation: Use trusted registries and verified sources for model downloads. Perform model scanning for anomalies and backdoor detection before deployment. [raykhira.com] Dependency Confusion Malicious actors publish packages with the same name as internal dependencies to public repositories. If build pipelines pull these packages, attackers gain code execution. Impact: Compromised training or deployment environments, leading to model tampering or data exfiltration. Mitigation: Enforce private package registries and pin versions. Validate dependencies against allowlists. Unsigned Artifacts Swapped in the Registry If model artifacts (weights, configs, containers) are not cryptographically signed, attackers can replace them with malicious versions. Impact: Deployment of compromised models or containers without detection. Mitigation: Implement artifact signing and integrity verification (e.g., SHA256 checksums). Require signature validation in CI/CD pipelines before promotion to production. Registry Compromise Attackers gain access to the model registry and alter metadata or inject malicious artifacts. Mitigation: RBAC, MFA, audit logging, and registry isolation. Tampered Build Pipeline CI/CD pipeline compromised to inject malicious code during model packaging or containerization. Mitigation: Secure build environments, signed commits, and pipeline integrity checks. Poisoned Container Images Malicious base images used for model deployment introduce vulnerabilities or malware. Mitigation: Use trusted container registries, scan images for CVEs, and enforce image signing. Shadow Artifacts Attackers upload artifacts with similar names or versions to confuse operators and bypass validation. Mitigation: Strict naming conventions, artifact fingerprinting, and automated validation. Additional Mitigations & Integrations considerations Store models in Azure ML Registry with version pinning; sign artifacts and publish SBOM/AI‑BOM metadata for downstream verifiers. [microsoft.github.io], [github.com], [codesecure.com] Maintain verifiable lineage and attestations (policy says: no signature, no deploy). Emerging work on attestable pipelines reinforces this approach. [arxiv.org] 3.6 Secure Deployment & Runtime Protection - Attack Scenarios Adversarial inputs and prompt injections targeting your inference APIs or agents Attackers craft malicious queries or embed hidden instructions in user input or retrieved content to manipulate model behavior. Impact: Policy bypass, sensitive data leakage, or execution of unintended actions via connected tools. Mitigation: Prompt sanitization and isolation (strip unsafe instructions). Context segmentation for multi-turn conversations. Rate limiting and anomaly detection on inference endpoints. Jailbreaks that bypass safety filters Attackers exploit weaknesses in safety guardrails by chaining prompts or using obfuscation techniques to override restrictions. Impact: Generation of harmful, disallowed, or confidential content; reputational and compliance risks. Mitigation: Layered safety filters (input + output). Continuous red-teaming and adversarial testing. Dynamic policy enforcement based on risk scoring. API abuse and model extraction. High-volume or structured queries designed to infer model parameters or replicate its functionality. Impact: Intellectual property theft, exposure of proprietary model logic, and enabling downstream attacks. Mitigation: Rate limiting and throttling. Watermarking responses to detect stolen outputs. Query pattern monitoring for extraction attempts. [atlas.mitre.org] Insecure Integration with External Tools or Plugins LLM agents calling APIs without sandboxing can trigger unauthorized actions. Mitigation: Strict allowlists, permission gating, and isolated execution environments. Model Output Injection into Downstream Systems Unsanitized outputs rendered in apps or dashboards can lead to XSS or command injection. Mitigation: Output encoding, validation, and secure rendering practices. Runtime Environment Compromise Attackers exploit container or VM vulnerabilities hosting inference services. Mitigation: Harden runtime environments, apply OS-level security patches, and enforce network isolation. Side-Channel Attacks Observing timing, resource usage, or error messages to infer sensitive details about the model or data. Mitigation: Noise injection, uniform response timing, and error sanitization. Unbounded Consumption Leading to Cost Escalation Attackers flood inference endpoints with requests, driving up compute costs. Mitigation: Quotas, usage monitoring, and auto-scaling with cost controls. Additional Mitigations & Integrations considerations Deploy Managed Online Endpoints behind Private Link; enforce mTLS, rate limits, and token‑based auth; restrict egress in managed VNet. [learn.microsoft.com] Turn on Microsoft Defender for Cloud – AI threat protection to detect jailbreaks, data leakage, prompt hacking, and poisoning attempts; incidents flow into Defender XDR. [learn.microsoft.com] For Azure OpenAI / Direct Models, enterprise data is tenant‑isolated and not used to train foundation models; configure Abuse Monitoring and Risks & Safety dashboards, with clear data‑handling stance. [learn.microsoft.com], [learn.microsoft.com], [learn.microsoft.com] 3.7 Post‑Deployment Monitoring & Response - Attack Scenarios Data/Prediction Drift silently degrades performance Over time, input data distributions change (e.g., new slang, market shifts), causing the model to make less accurate predictions without obvious alerts. Impact: Reduced accuracy, operational risk, and potential compliance violations if decisions become unreliable. Mitigation: Continuous drift detection using statistical tests (KL divergence, PSI). Scheduled model retraining and validation pipelines. Alerting thresholds for performance degradation. Fairness Drift Shifts Outcomes Across Cohorts Model performance or decision bias changes for specific demographic or business segments due to evolving data or retraining. Impact: Regulatory risk (GDPR, EEOC), reputational damage, and ethical concerns. Mitigation: Implement bias monitoring dashboards. Apply fairness metrics (equal opportunity, demographic parity) in post-deployment checks. Trigger remediation workflows when drift exceeds thresholds. Emergent Jailbreak Patterns evolve over time Attackers discover new prompt injection or jailbreak techniques that bypass safety filters after deployment. Impact: Generation of harmful or disallowed content, policy violations, and security breaches. Mitigation: Behavioral anomaly detection on prompts and outputs. Continuous red-teaming and adversarial testing. Dynamic policy updates integrated into inference pipelines. Shadow Model Deployment Unauthorized or outdated models running in production environments without governance. Mitigation: Registry enforcement, signed artifacts, and deployment audits. Silent Backdoor Activation Backdoors introduced during training activate under rare conditions post-deployment. Mitigation: Runtime scanning for anomalous triggers and adversarial input detection. Telemetry Tampering Attackers manipulate monitoring logs or metrics to hide drift or anomalies. Mitigation: Immutable logging, cryptographic integrity checks, and SIEM integration. Cost Abuse via Automated Bots Bots continuously hit inference endpoints, driving up operational costs unnoticed. Mitigation: Rate limiting, usage analytics, and anomaly-based throttling. Model Extraction Over Time Slow, distributed queries across months to replicate model behavior without triggering rate limits. Mitigation: Long-term query pattern analysis and watermarking. Additional Mitigations & Integrations considerations Enable Azure ML Model Monitoring for data drift, prediction drift, data quality, and custom signals; route alerts to Event Grid to auto‑trigger retraining and change control. [learn.microsoft.com], [learn.microsoft.com] Correlate runtime AI threat alerts (Defender for Cloud) with broader incidents in Defender XDR for a complete kill‑chain view. [learn.microsoft.com] Real‑World Scenarios & Playbooks Scenario A — “Clean” Model, Poisoned Validation Symptom: Model looks great in CI, fails catastrophically on a subset in production. Likely cause: Attacker tainted validation data so unsafe behavior was never detected. ATLAS documents validation‑stage attacks. [atlas.mitre.org] Playbook: Require dual‑source validation sets with hashes in Purview lineage; incorporate RAI dashboard probes for subgroup performance; block release if variance exceeds policy. [microsoft.com], [learn.microsoft.com] Scenario B — Indirect Prompt Injection in Retrieval-Augmented Generation (RAG) Symptom: The assistant “quotes” an external PDF that quietly exfiltrates secrets via instructions in hidden text. Playbook: Apply Microsoft Spotlighting patterns (delimiting/datamarking/encoding) and Prompt Shields; enable Defender for Cloud AI alerts and remediate via Defender XDR. [techcommun…rosoft.com], [learn.microsoft.com] Scenario C — Model Extraction via API Abuse Symptom: Spiky usage, long prompts, and systematic probing. Playbook: Enforce rate/shape limits; throttle token windows; monitor with Defender for Cloud and block high‑risk consumers; for OpenAI endpoints, validate Abuse Monitoring telemetry and adjust content filters. [learn.microsoft.com], [learn.microsoft.com] Product‑by‑Product Implementation Guide (Quick Start) Data Governance & Provenance Microsoft Purview Data Governance GA: unify cataloging, lineage, and policy; integrate with Fabric; use embedded Copilot to accelerate stewardship. [microsoft.com], [azure.microsoft.com] Secure Training Azure ML with Managed VNet + Private Endpoints; use Confidential VMs (DCasv5/ECasv5) or SGX/TDX where enclave isolation is required; extend to AKS confidential nodes for containerized training. [learn.microsoft.com], [learn.microsoft.com] Responsible AI Responsible AI Dashboard & Scorecards for fairness/interpretability/error analysis—use as release artifacts at change control. [learn.microsoft.com] Runtime Safety & Threat Detection Azure AI Content Safety (Prompt Shields, groundedness, protected material detection) + Defender for Cloud AI Threat Protection (alerts for leakage/poisoning/jailbreak/credential theft) integrated to Defender XDR. [ai.azure.com], [learn.microsoft.com] Enterprise‑grade LLM Access Azure OpenAI / Direct Models: data isolation, residency (Data Zones), and clear privacy commitments for commercial & public sector customers. [learn.microsoft.com], [azure.microsoft.com], [blogs.microsoft.com] Monitoring & Continuous Improvement Azure ML Model Monitoring (drift/quality) + Event Grid triggers for auto‑retrain; instrument with Application Insights for latency/reliability. [learn.microsoft.com] Policy & Governance: Map → Measure → Manage (NIST AI RMF) Align your controls to NIST’s four functions: Govern: Define AI security policies: dataset admission, cryptographic signing, registry controls, and red‑team requirements. [nvlpubs.nist.gov] Map: Inventory models, data, and dependencies (Purview catalog + SBOM/AIBOM). [microsoft.com], [github.com] Measure: RAI metrics (fairness, explainability), drift thresholds, and runtime attack rates (Defender/Content Safety). [learn.microsoft.com], [learn.microsoft.com] Manage: Automate mitigations: block unsigned artifacts, quarantine suspect datasets, rotate keys, and retrain on alerts. [nist.gov] What “Good” Looks Like: A 90‑Day Hardening Plan Days 0–30: Establish Foundations Turn on Purview scans across Fabric/SQL/Storage; define sensitivity labels + DLP. [microsoft.com] Lock Azure ML workspaces into Managed VNet, Private Endpoints, and Managed Identity. [learn.microsoft.com], [microsoft.github.io] Move training to Confidential VMs for sensitive projects. [learn.microsoft.com] Days 31–60: Shift‑Left & Gate Releases Integrate RAI Dashboard/Scorecards into CI; add ATLAS + OWASP LLM checks to release gates. [learn.microsoft.com], [atlas.mitre.org], [owasp.org] Require SBOM/AIBOM and artifact signing for models. [codesecure.com], [github.com] Days 61–90: Runtime Defense & Observability Enable Defender for Cloud – AI Threat Protection and Azure AI Content Safety; wire alerts to Defender XDR. [learn.microsoft.com], [ai.azure.com] Roll out Model Monitoring (drift/quality) with auto‑retrain triggers via Event Grid. [learn.microsoft.com] FAQ: Common Leadership Questions Q: Do differential privacy and adversarial training “solve” poisoning? A: They reduce risk envelopes but do not eliminate attacks—plan for layered defenses and continuous validation. [arxiv.org], [dp-ml.github.io] Q: How do we prevent indirect prompt injection in agentic apps? A: Combine Spotlighting patterns, Prompt Shields, least‑privilege tool access, explicit consent for sensitive actions, and Defender for Cloud runtime alerts. [techcommun…rosoft.com], [learn.microsoft.com] Q: Can we use Azure OpenAI without contributing our data to model training? A: Yes—Azure Direct Models keep your prompts/completions private, not used to train foundation models without your permission; with Data Zones, you can align residency. [learn.microsoft.com], [azure.microsoft.com] Closing As your organization scales AI, the pipeline is the perimeter. Treat every stage—from data capture to model deployment—as a control point with verifiable lineage, signed artifacts, network isolation, runtime detection, and continuous risk measurement. But securing the pipeline is only part of the story—what about the models themselves? In our next post, we’ll dive into hardening AI models against adversarial attacks, exploring techniques to detect, mitigate, and build resilience against threats that target the very core of your AI systems. Key Takeaway Securing AI requires protecting the entire pipeline—from data collection to deployment and monitoring—not just individual models. Zero Trust for AI: Embed security controls at every stage (data governance, isolated training, signed artifacts, runtime threat detection) for integrity and compliance. Main threats and mitigations by stage: Data Collection: Risks include poisoning and PII leakage; mitigate with data classification, lineage tracking, and DLP. Data Preparation: Watch for feature backdoors and tampering; use versioning, code review, and integrity checks. Model Training: Risks are environment compromise and model theft; mitigate with confidential computing, network isolation, and managed identities. Validation & Red Teaming: Prompt injection and unbounded consumption are key risks; address with prompt sanitization, output encoding, and adversarial testing. Supply Chain & Registry: Backdoored models and dependency confusion; use trusted registries, artifact signing, and strict pipeline controls. Deployment & Runtime: Adversarial inputs and API abuse; mitigate with rate limiting, context segmentation, and Defender for Cloud AI threat protection. Monitoring: Watch for data/prediction drift and cost abuse; enable continuous monitoring, drift detection, and automated retraining. References NIST AI RMF (Core + Generative AI Profile) – governance lens for pipeline risks. [nist.gov], [nist.gov] MITRE ATLAS – adversary tactics & techniques against AI systems. [atlas.mitre.org] OWASP Top 10 for LLM Applications / GenAI Project – practical guidance for LLM‑specific risks. [owasp.org] Azure Confidential Computing – protect data‑in‑use with SEV‑SNP/TDX/SGX and confidential GPUs. [learn.microsoft.com] Microsoft Purview Data Governance – GA feature set for unified data governance & lineage. [microsoft.com] Defender for Cloud – AI Threat Protection – runtime detections and XDR integration. [learn.microsoft.com] Responsible AI Dashboard / Scorecards – fairness & explainability in Azure ML. [learn.microsoft.com] Azure AI Content Safety – Prompt Shields, groundedness detection, protected material checks. [ai.azure.com] Azure ML Model Monitoring – drift/quality monitoring & automated retraining flows. [learn.microsoft.com] #AIPipelineSecurity; #AITrustAndSafety; #SecureAI; #AIModelSecurity; #AIThreatModeling; #SupplyChainSecurity; #DataSecurityMicrosoft Purview Data Governance - Authoring Custom Data Quality rules using expression languages
The cost of poor-quality data runs into millions of dollars in direct losses. When indirect costs—such as missed opportunities—are included, the total impact is many times higher. Poor data quality also creates significant societal costs. It can lead customers to pay higher prices for goods and services and force citizens to bear higher taxes due to inefficiencies and errors. In critical domains, the consequences can be severe. Defective or inaccurate data can result in injury or loss of life, for example due to medication errors or incorrect medical procedures, especially as healthcare increasingly relies on data- and AI-driven decision-making. Students may be unfairly denied admission to universities because of errors in entrance exam scoring. Consumers may purchase unsafe or harmful food products if nutritional labels are inaccurate or misleading. Research and industry measurements show that 20–35 percent of an organization’s operating revenue is often wasted on recovering from process failures, data defects, information scrap, and rework caused by poor data quality (Larry P. English, Information Quality Applied). Data Quality Rules To maintain high-quality data, organizations must continuously measure and monitor data quality and understand the negative impact of poor-quality data on their specific use cases. Data quality rules play a critical role in objectively measuring, enforcing, and quantifying data quality, enabling organizations to improve trust, reduce risk, and maximize the value of their data assets. Data Quality (DQ) rules define how data should be structured, related, constrained, and validated so it can be trusted for operational, analytical, and AI use cases. Data quality rules are essential guidelines that organizations establish to ensure the accuracy, consistency, and completeness of their data. These rules fall into four major categories: Business Entity rules, Business Attribute rules, Data Dependency rules, and Data Validity rules (Ref: Informit.com/articles). Business Entity Rules These rules ensure that core business objects (such as Customer, Order, Account, or Product) are well-defined and correctly related. Business entity rules prevent duplicate records, broken relationships, and incomplete business processes. Business Entity Rules Definition Example Uniqueness Every entity instance must be uniquely identifiable. Each customer must have a unique Customer ID that is never NULL. Duplicate customer records indicate poor data quality. Cardinality Defines how many instances of one entity can relate to another. One customer can place many orders (one-to-many), but an order belongs to exactly one customer. Optionality Defines whether a relationship is mandatory or optional. An order must be linked to a customer (mandatory), but a customer may exist without having placed any orders (optional). Business Attribute Rules These rules focus on individual data elements (columns/fields) within business entities. Attribute rules ensure consistency, interpretability, and prevent invalid or meaningless values. Business Attribute Rules Definition Example Data Inheritance Attributes defined in a supertype must be consistent across subtypes. An Account Number remains the same whether the account is Checking or Savings. Data Domains Attribute values must conform to allowed formats or ranges. · State Code must be one of the 50 U.S. state abbreviations · Age must be between 0 and 120 · Date must follow CCYY/MM/DD format Data Dependency Rules These rules define logical and conditional relationships between entities and attributes. Data dependency rules enforce business logic and prevent contradictory or illogical data states. Data Dependency Rules Definition Example Entity Relationship Dependency The existence of one relationship depends on another condition. Orders cannot be placed for customers with a “Delinquent” status. Attribute Dependency The value of one attribute depends on others. · If Loan Status = “Funded,” then Loan Amount > 0 and Funding Date is required · Pay Amount = Hours Worked × Hourly Rate · If Monthly Salary > 0, then Commission Rate must be NULL Data Validity Rules These rules ensure that actual data values are complete, correct, accurate, precise, unique, and consistent. Validity rules ensure data is trustworthy for reporting, regulatory compliance, and AI/ML models. Data Validity Rules Definition Example Completeness Required records, relationships, attributes, and values must exist. No NULLs in mandatory fields like Customer ID or Order Date. Correctness & Accuracy Values must reflect real-world truth and business rules. A customer’s credit limit must align with approved financial records. Precision Data must be stored with the required level of detail. Interest rates stored to four decimal places if required for calculations. Uniqueness No duplicate records, keys, definitions, or overloaded columns. A “Customer Type Code” column should not mix customer types and shipping methods. Consistency Duplicate or redundant data must match everywhere it appears. Customer address stored in multiple systems must be identical. Compliance PII and sensitive data Check and validate personal information like credit card, passport number, national id, bank account, etc. System Rules Microsoft Purview Data Quality provides both system (out-of-the-box) rules and custom rules, along with an AI-enabled data quality rule recommendation feature. Together, these capabilities help organizations effectively measure, monitor, and improve data quality by applying the right set of data quality rules. System (out-of-the-box) rules cover the majority of business attribute and data validity scenarios. List of the system rules are illustrated below (see the screenshot below). Custom Rules Custom rules allow you to define validations that evaluate one or more values within a row, enabling complex, context-aware data quality checks tailored to specific business requirements. Custom rules support all four major categories of data quality rules: Business Entity rules, Business Attribute rules, Data Dependency rules, and Data Validity rules. You can use regular expression language, Azure Data Factory expression, and SQL expression language to create custom rules. Purview Data Quality custom rule has three parts: Row expression: This Boolean expression applies to each row that the filter expression approves. If this expression returns true, the row passes. If it returns false, the row fails. Filter expression: This optional condition narrows down the dataset on which the row condition is evaluated. You activate it by selecting the Use filter expression checkbox. This expression returns a Boolean value. The filter expression applies to a row and if it returns true, then that row is considered for the rule. If the filter expression returns false for that row, then it means that row is ignored for the purposes of this rule. The default behavior of the filter expression is to pass all rows, so if you don't specify a filter expression, all rows are considered. Null expression: Checks how NULL values should be handled. This expression returns to a Boolean that handles cases where data is missing. If the expression returns true, the row expression isn't applied. Each part of the rule works similarly to existing Microsoft Purview Data Quality conditions. A rule only passes if the row expression evaluates to TRUE for the dataset that matches the filter expression and handles missing values as specified in the null expression. Examples: Ensure that the location of the salesperson is correct. Azure data factory expression language is used to author this rule. 2. Ensure "fare Amount" is positive and "trip Distance" is valid. SQL expression language is used to author this rule. 3. For each trip, check if the fare is above the average for its payment type. SQL expression language is used to author this rule. Together, above listed four categories of data quality rules: Prevent errors at the source Enforce business logic Improve trust in analytics and AI Reduce remediation costs downstream In short, high-quality data is not accidental—it is enforced through well-defined data quality rules across entities, attributes, relationships, and values. References Create Data Quality Rules in Unified Catalog | Microsoft Learn Expression builder in mapping data flows - Azure Data Factory & Azure Synapse | Microsoft Learn Expression Functions in the Mapping Data Flow - Azure Data Factory & Azure Synapse | Microsoft Learn http://www.informit.com/articles/article.aspx?p=399325&seqNum=3 Information Quality Applied, Larry P. English