The cost of poor-quality data runs into millions of dollars in direct losses. When indirect costs—such as missed opportunities—are included, the total impact is many times higher. Poor data quality also creates significant societal costs. It can lead customers to pay higher prices for goods and services and force citizens to bear higher taxes due to inefficiencies and errors.
In critical domains, the consequences can be severe. Defective or inaccurate data can result in injury or loss of life, for example due to medication errors or incorrect medical procedures, especially as healthcare increasingly relies on data- and AI-driven decision-making. Students may be unfairly denied admission to universities because of errors in entrance exam scoring. Consumers may purchase unsafe or harmful food products if nutritional labels are inaccurate or misleading.
Research and industry measurements show that 20–35 percent of an organization’s operating revenue is often wasted on recovering from process failures, data defects, information scrap, and rework caused by poor data quality (Larry P. English, Information Quality Applied).
Figure 1: Data quality rulesData Quality Rules
To maintain high-quality data, organizations must continuously measure and monitor data quality and understand the negative impact of poor-quality data on their specific use cases. Data quality rules play a critical role in objectively measuring, enforcing, and quantifying data quality, enabling organizations to improve trust, reduce risk, and maximize the value of their data assets.
Data Quality (DQ) rules define how data should be structured, related, constrained, and validated so it can be trusted for operational, analytical, and AI use cases. Data quality rules are essential guidelines that organizations establish to ensure the accuracy, consistency, and completeness of their data. These rules fall into four major categories: Business Entity rules, Business Attribute rules, Data Dependency rules, and Data Validity rules (Ref: Informit.com/articles).
Business Entity Rules
These rules ensure that core business objects (such as Customer, Order, Account, or Product) are well-defined and correctly related. Business entity rules prevent duplicate records, broken relationships, and incomplete business processes.
|
Business Entity Rules |
Definition |
Example |
|
Uniqueness |
Every entity instance must be uniquely identifiable. |
Each customer must have a unique Customer ID that is never NULL. Duplicate customer records indicate poor data quality. |
|
Cardinality |
Defines how many instances of one entity can relate to another. |
One customer can place many orders (one-to-many), but an order belongs to exactly one customer. |
|
Optionality |
Defines whether a relationship is mandatory or optional. |
An order must be linked to a customer (mandatory), but a customer may exist without having placed any orders (optional). |
Business Attribute Rules
These rules focus on individual data elements (columns/fields) within business entities. Attribute rules ensure consistency, interpretability, and prevent invalid or meaningless values.
|
Business Attribute Rules |
Definition |
Example |
|
Data Inheritance |
Attributes defined in a supertype must be consistent across subtypes. |
An Account Number remains the same whether the account is Checking or Savings. |
|
Data Domains |
Attribute values must conform to allowed formats or ranges. |
· State Code must be one of the 50 U.S. state abbreviations · Age must be between 0 and 120 · Date must follow CCYY/MM/DD format |
Data Dependency Rules
These rules define logical and conditional relationships between entities and attributes. Data dependency rules enforce business logic and prevent contradictory or illogical data states.
|
Data Dependency Rules |
Definition |
Example |
|
Entity Relationship Dependency |
The existence of one relationship depends on another condition. |
Orders cannot be placed for customers with a “Delinquent” status. |
|
Attribute Dependency |
The value of one attribute depends on others. |
· If Loan Status = “Funded,” then Loan Amount > 0 and Funding Date is required · Pay Amount = Hours Worked × Hourly Rate · If Monthly Salary > 0, then Commission Rate must be NULL |
Data Validity Rules
These rules ensure that actual data values are complete, correct, accurate, precise, unique, and consistent. Validity rules ensure data is trustworthy for reporting, regulatory compliance, and AI/ML models.
|
Data Validity Rules |
Definition |
Example |
|
Completeness |
Required records, relationships, attributes, and values must exist. |
No NULLs in mandatory fields like Customer ID or Order Date. |
|
Correctness & Accuracy |
Values must reflect real-world truth and business rules. |
A customer’s credit limit must align with approved financial records. |
|
Precision |
Data must be stored with the required level of detail. |
Interest rates stored to four decimal places if required for calculations. |
|
Uniqueness |
No duplicate records, keys, definitions, or overloaded columns. |
A “Customer Type Code” column should not mix customer types and shipping methods. |
|
Consistency |
Duplicate or redundant data must match everywhere it appears. |
Customer address stored in multiple systems must be identical. |
|
Compliance |
PII and sensitive data |
Check and validate personal information like credit card, passport number, national id, bank account, etc. |
System Rules
Microsoft Purview Data Quality provides both system (out-of-the-box) rules and custom rules, along with an AI-enabled data quality rule recommendation feature. Together, these capabilities help organizations effectively measure, monitor, and improve data quality by applying the right set of data quality rules. System (out-of-the-box) rules cover the majority of business attribute and data validity scenarios. List of the system rules are illustrated below (see the screenshot below).
Figure 2: System rulesCustom Rules
Custom rules allow you to define validations that evaluate one or more values within a row, enabling complex, context-aware data quality checks tailored to specific business requirements. Custom rules support all four major categories of data quality rules: Business Entity rules, Business Attribute rules, Data Dependency rules, and Data Validity rules.
You can use regular expression language, Azure Data Factory expression, and SQL expression language to create custom rules. Purview Data Quality custom rule has three parts:
- Row expression: This Boolean expression applies to each row that the filter expression approves. If this expression returns true, the row passes. If it returns false, the row fails.
- Filter expression: This optional condition narrows down the dataset on which the row condition is evaluated. You activate it by selecting the Use filter expression checkbox. This expression returns a Boolean value. The filter expression applies to a row and if it returns true, then that row is considered for the rule. If the filter expression returns false for that row, then it means that row is ignored for the purposes of this rule. The default behavior of the filter expression is to pass all rows, so if you don't specify a filter expression, all rows are considered.
- Null expression: Checks how NULL values should be handled. This expression returns to a Boolean that handles cases where data is missing. If the expression returns true, the row expression isn't applied.
Each part of the rule works similarly to existing Microsoft Purview Data Quality conditions. A rule only passes if the row expression evaluates to TRUE for the dataset that matches the filter expression and handles missing values as specified in the null expression.
Examples:
- Ensure that the location of the salesperson is correct. Azure data factory expression language is used to author this rule.
2. Ensure "fare Amount" is positive and "trip Distance" is valid. SQL expression language is used to author this rule.
Figure 4: Ensuring fare SQL expression is correct3. For each trip, check if the fare is above the average for its payment type. SQL expression language is used to author this rule.
Figure 5: Ensure fare correctnessTogether, above listed four categories of data quality rules:
- Prevent errors at the source
- Enforce business logic
- Improve trust in analytics and AI
- Reduce remediation costs downstream
In short, high-quality data is not accidental—it is enforced through well-defined data quality rules across entities, attributes, relationships, and values.
References
- Create Data Quality Rules in Unified Catalog | Microsoft Learn
- Expression builder in mapping data flows - Azure Data Factory & Azure Synapse | Microsoft Learn
- Expression Functions in the Mapping Data Flow - Azure Data Factory & Azure Synapse | Microsoft Learn
- http://www.informit.com/articles/article.aspx?p=399325&seqNum=3
- Information Quality Applied, Larry P. English