computer vision
64 TopicsUnveiling the Next Generation of Table Structure Recognition
In an era where data is abundant, the ability to accurately and efficiently extract structured information like tables from diverse document types is critical. For instance, consider the complexities of a balance sheet with multiple types of assets or an invoice with various charges, both presented in a table format that can be challenging even for humans to interpret. Traditional parsing methods often struggle with the complexity and variability of real-world tables, leading to manual intervention and inefficient workflows. This is because these methods typically rely on rigid rules or predefined templates that fail when encountering variations in layout, formatting, or content, which are common in real-world documents. While the promise of Generative AI and Large Language Models (LLMs) in document understanding is vast, our research in table parsing has revealed a critical insight: for tasks requiring precision in data alignment, such as correctly associating data cells with their respective row and column headers, classical computer vision techniques currently offer superior performance. Generative AI models, despite their powerful contextual understanding, can sometimes exhibit inconsistencies and misalignments in tabular structures, leading to compromised data integrity (Figure 1). Therefore, Azure Document Intelligence (DI) and Content Understanding (CU) leverages an even more robust and proven computer vision algorithms to ensure the foundational accuracy and consistency that enterprises demand. Figure 1: Vision LLMs struggle to accurately recognize table structure, even in simple tables. Our current table recognizer excels at accurately identifying table structures, even those with complex layouts, rotations, or curved shapes. However, it does have its limitations. For example, it occasionally fails to properly delineate a table where the logical boundaries are not visible but must be inferred from the larger document context, making suboptimal inferences. Furthermore, its architectural design makes it challenging to accelerate on modern GPU platforms, impacting its runtime efficiency. Taking these limitations in considerations and building upon our existing foundation, we are introducing the latest advancement in our table structure recognizer. This new version significantly enhances both performance and accuracy, addressing key challenges in document processing. Precise Separation Line Placement We've made significant strides in the precision of separation line placement. While predicting these separation lines might seem deceptively simple, it comes with subtle yet significant challenges. In many real-world documents, these are logical separation lines, meaning they are not always visibly drawn on the page. Instead, their positions are often implied by an array of nuanced visual cues such as table headers/footers, dot filler text, background color changes, and even the spacing and alignment of content within the cells. Figure 2: Visual Comparison of separation line prediction of current and the new version We've developed a novel model architecture that can be trained end-to-end to directly tackle the above challenges. Recognizing the difficulty for humans to consistently label table separation lines, we've devised a training objective that combines Hungarian matching with an adaptive matching weight to correctly align predictions with ground truth even when the latter is noisy. Additionally, we've incorporated a loss function inspired by speech recognition to encourage the model to accurately predict the correct number of separation lines, further enhancing its performance. Our improved algorithms now respect visual cues more effectively, ensuring that separation lines are placed precisely where they belong. This leads to cleaner, more accurate table structures and ultimately, more reliable data extraction. Figure 2 shows the comparison between the current model and the new model on a few examples. Some quantitative results can be found in Table 1. TSR (current, in %) TSR-v2 (next-gen, in %) Segment Precision Recall F1-Score Precision Recall F1-score Latin 90.2 90.7 90.4 94.0 95.7 94.8 Chinese 96.1 95.3 95.7 97.3 96.8 97.0 Japanese 93.5 93.8 93.7 95.1 97.1 96.1 Korean 95.3 95.9 95.6 97.5 97.8 97.7 Table 1: Table structure accuracy measured by cell prediction precision and recall rates at IoU (intersection over union) threshold of 0.5. Tested on in-house test datasets covering four different scripts. A Data-Driven, GPU-Accelerated Design Another innovation in this release is its data-driven, fully GPU-accelerated design. This architectural shift delivers enhanced quality and significantly faster inference speeds, which is critical for processing large volumes of documents. The design carefully considers the trade-off between model capability and latency requirements, prioritizing an architecture that leverages the inherent parallelism of GPUs. This involves favoring highly parallelizable models over serial approaches to maximize GPU utilization. Furthermore, post-processing logic has been minimized to prevent it from becoming a bottleneck. This comprehensive approach has resulted in a drastic reduction in processing latency, from 250ms per image to less than 10ms. Fueling Robustness with Synthetic Data Achieving the high level of accuracy and robustness required for enterprise-grade table recognition demands vast quantities of high-quality training data. To meet this need efficiently, we've strategically incorporated synthetic data into our development pipeline. A few examples can be found in Figure 3. Figure 3: Synthesized tables Synthetic data offers significant advantages: it's cost-effective to generate and provides unparalleled control over the dataset. This allows us to rapidly synthesize diverse and specific table styles, including rare or challenging layouts, which would be difficult and expensive to collect from real-world documents. Crucially, synthetic data comes with perfectly consistent labels. Unlike human annotation, which can introduce variability, synthetic data ensures that our models learn from a flawlessly labeled ground truth, leading to more reliable and precise training outcomes. Summary This latest version of our table structure recognizer enhances critical document understanding capabilities. We've refined separation line placement to better respect visual cues and implied structures, supported by our synthetic data approach for consistent training. This enhancement, in turn, allows users to maintain the table structure as intended, reducing the need for manual post-processing to clean up the structured output. Additionally, a GPU-accelerated, data-driven design delivers both improved quality and faster performance, crucial for processing large document volumes.1KViews2likes2CommentsTesting Modern AI Systems: From Rule-Based Systems to Deep Learning and Large Language Models
1. Introduction 1.1 Evolution from Expert Systems to Modern AI The transition from rule-based expert systems to modern AI represents one of the most significant paradigm shifts in computer science [1] . Where the original 1992 paper by Kiper focused on testing deterministic rule-based systems with clear logical pathways, today's AI systems operate through complex neural architectures that process information in fundamentally different ways [2] . Modern AI systems, particularly deep neural networks and transformer models, exhibit emergent behaviors that cannot be easily traced through simple logical paths [3] . Traditional expert systems operated on explicit if-then rules that could be mapped to logical path graphs, making structural testing relatively straightforward [4] . Contemporary AI systems, however, rely on learned representations distributed across millions or billions of parameters, where the decision-making process involves complex mathematical transformations that resist traditional debugging approaches [5] [6] . 1.2 Current Challenges in AI System Testing Modern AI systems present unprecedented testing challenges that extend far beyond the scope of traditional software testing [7] [8] : Opacity and Interpretability: Deep learning models function as "black boxes" where the relationship between inputs and outputs is mediated by complex mathematical operations across multiple layers [9] . This opacity makes it difficult to understand why a model produces specific outputs, complicating the testing process. Non-Deterministic Behavior: Unlike rule-based systems, neural networks can exhibit different behaviors across multiple runs due to random initialization, dropout, and other stochastic elements [10] . This non-determinism requires statistical approaches to testing rather than deterministic verification. High-Dimensional Input Spaces: Modern AI systems often operate on high-dimensional data (images, text, audio) where exhaustive testing is computationally intractable [2] . Traditional boundary testing approaches become inadequate when dealing with inputs that may have millions of dimensions. Adversarial Vulnerabilities: Deep learning models are susceptible to adversarial attacks where small, imperceptible perturbations can cause dramatic changes in model behavior [11] [12] . These vulnerabilities represent a new class of bugs that require specialized testing approaches. Scale and Complexity: Modern AI systems, particularly large language models, contain billions of parameters and require distributed computing resources [3] . Testing such systems requires scalable methodologies that can handle this complexity. 1.3 Scope and Motivation This paper addresses the critical gap between traditional software testing methodologies and the unique requirements of modern AI systems. While the original logical path graph approach provided valuable insights for rule-based systems, the testing of contemporary AI requires fundamentally different approaches that account for the probabilistic, high-dimensional, and often opaque nature of modern machine learning [13] [14] . Our contributions include: A comprehensive testing framework that integrates multiple complementary approaches specifically designed for modern AI systems Novel graph-based representations that extend beyond logical paths to capture the computational flow in neural networks Automated testing methodologies that leverage AI itself to generate comprehensive test suites MLOps integration that enables continuous testing and monitoring in production environments Empirical validation demonstrating the effectiveness of our approach across diverse AI architectures 2. Modern AI System Architecture 2.1 Neural Networks and Deep Learning Modern neural networks differ fundamentally from rule-based systems in their computational paradigm [2] . Instead of explicit logical rules, they employ layers of interconnected neurons that perform weighted transformations of input data. The testing of such systems requires understanding their computational graph structure, where each node represents a mathematical operation and edges represent data flow. Key characteristics that impact testing include: Non-linear activation functions that introduce complex decision boundaries Gradient-based learning that can result in local optima and unstable behavior Layer interactions that create emergent behaviors not present in individual components Parameter interdependencies where small changes can have cascading effects 2.2 Transformer Models and Large Language Models Transformer architectures, which power modern large language models, introduce additional complexity through their attention mechanisms [3] . These models process sequences of tokens where each position can attend to any other position, creating complex dependency patterns that resist traditional testing approaches. Testing challenges specific to transformers include: Attention pattern verification to ensure the model focuses on relevant information Positional encoding validation to confirm proper sequence understanding Cross-attention testing in encoder-decoder architectures Prompt injection vulnerability assessment [11] 2.3 Graph Neural Networks Graph Neural Networks (GNNs) operate on graph-structured data, requiring specialized testing approaches that account for graph topology and message passing mechanisms [15] [16] . Unlike traditional neural networks that process fixed-dimensional inputs, GNNs must handle variable graph structures. GNN-specific testing considerations: Graph invariance properties that should be preserved under isomorphic transformations Message aggregation testing to verify proper information propagation Scalability validation for graphs of varying sizes Over-smoothing detection where node representations become indistinguishable 2.4 Multi-Modal AI Systems Contemporary AI systems increasingly integrate multiple modalities (text, images, audio, sensor data), requiring testing approaches that validate cross-modal interactions and fusion mechanisms. These systems present unique challenges in ensuring consistent behavior across different input modalities. 3. Contemporary Testing Methodologies for AI Systems 3.1 Structural Testing for Neural Networks Building upon the concept of structural testing from the original paper, we introduce neuron coverage criteria specifically designed for deep learning models [2] . Unlike logical path graphs, neural network testing employs coverage metrics that measure the activation patterns of individual neurons and layers. Neuron Coverage Metrics: Neuron Coverage (NC): Percentage of neurons activated during testing K-multisection Neuron Coverage (KMNC): Granular coverage based on neuron activation levels Neuron Boundary Coverage (NBC): Coverage of neuron activation boundaries Strong Neuron Activation Coverage (SNAC): Coverage of high-activation states Implementation: Modern tools like DeepXplore and TensorFuzz provide automated frameworks for measuring and improving neuron coverage through systematic test generation [2] . 3.2 Adversarial Testing and Robustness Verification Adversarial testing represents a paradigm shift from traditional testing, focusing on the model's behavior under deliberately crafted malicious inputs [11] [12] . This approach is essential for safety-critical applications where adversarial attacks could have serious consequences. Adversarial Testing Techniques: FGSM (Fast Gradient Sign Method): Generates adversarial examples using gradient information PGD (Projected Gradient Descent): Iterative approach for stronger adversarial examples C&W Attack: Optimization-based method for minimal perturbations Black-box attacks: Query-based methods that don't require model internals Robustness Verification: Formal methods like DeepPoly and CROWN provide certified bounds on model robustness, offering mathematical guarantees about model behavior within specified input regions [5] [17] . 3.3 Property-Based Testing for ML Models Property-based testing for machine learning extends traditional property-based testing to the probabilistic domain [18] [19] . Instead of testing specific input-output pairs, this approach validates that models satisfy mathematical properties across large input spaces. Common Properties for ML Models: Monotonicity: Output should increase/decrease with specific input changes Symmetry: Model should be invariant to certain input transformations Consistency: Similar inputs should produce similar outputs Fairness: Model decisions should not discriminate based on protected attributes Implementation Framework: Tools like MLCheck provide domain-specific languages for specifying properties and automated test generation [18] . 3.4 Metamorphic Testing for Deep Learning Metamorphic testing addresses the oracle problem in machine learning by defining relationships between multiple test executions [10] . Instead of knowing the expected output for a given input, metamorphic testing verifies that certain relationships hold between related inputs and outputs. Statistical Metamorphic Testing: Recent advances introduce statistical methods to handle the non-deterministic nature of deep learning models, using hypothesis testing to verify metamorphic relations with confidence intervals [10] . Example Metamorphic Relations: Translation invariance: Image classification should be consistent across spatial translations Rotation robustness: Small rotations should not dramatically change predictions Semantic preservation: Paraphrasing should maintain sentiment classification results 4. Advanced Testing Techniques 4.1 Differential Testing with Generative Models Differential testing for AI systems employs generative models to create test inputs that expose behavioral differences between models [20] [21] . DiffGAN, a novel approach combining Generative Adversarial Networks with evolutionary algorithms, generates diverse test cases that reveal discrepancies between functionally similar models. DiffGAN Methodology: GAN Training: Train a generator to produce realistic inputs in the target domain Multi-objective Optimization: Use NSGA-II to optimize for diversity and divergence Behavioral Analysis: Identify inputs where models disagree significantly Root Cause Analysis: Investigate the sources of disagreement This approach achieves 85.71% fault detection in CNN classifiers while maintaining computational efficiency [20] . 4.2 Formal Verification of Neural Networks Formal verification provides mathematical guarantees about neural network behavior, extending beyond empirical testing to offer certified properties [5] [17] . Modern verification tools can handle networks with millions of parameters, though computational complexity remains a challenge. Verification Approaches: SMT-based methods: Encode network behavior as satisfiability problems Linear programming relaxations: Approximate non-linear activations with linear constraints Abstract interpretation: Use interval arithmetic and other abstractions Symbolic execution: Explore network behavior symbolically Tools and Frameworks: Marabou, ReluPlex, and α,β-CROWN represent state-of-the-art verification tools that can handle industrial-scale networks [6] . 4.3 Explainable AI and Model Interpretability Testing Explainable AI (XAI) testing validates that model explanations are accurate, consistent, and meaningful [9] . This testing dimension is crucial for regulated industries and high-stakes applications where model decisions must be interpretable. XAI Testing Approaches: Explanation consistency: Verify that similar inputs produce similar explanations Faithfulness testing: Ensure explanations accurately reflect model behavior Stability analysis: Test explanation robustness to input perturbations Human-interpretability validation: Verify that explanations are meaningful to domain experts Combinatorial Methods: Recent work applies combinatorial testing principles to generate systematic test suites for explanation validation [22] . 4.4 Automated Test Generation using AI Modern AI systems can generate their own test cases, leveraging techniques from natural language processing, computer vision, and reinforcement learning [23] [24] . This approach addresses the scalability challenge of manual test case creation. AI-Driven Test Generation: Generative models: Use GANs, VAEs, and diffusion models to create diverse test inputs Reinforcement learning: Train agents to discover edge cases and failure modes Natural language generation: Create test scenarios using large language models Synthesis-based approaches: Generate test cases that satisfy specific coverage criteria Benefits: Automated test generation can reduce test creation time by up to 80% while achieving more comprehensive coverage than manual approaches [24] . 5. MLOps and Continuous Testing Framework 5.1 CI/CD for Machine Learning Models Modern AI development requires continuous integration and deployment pipelines specifically designed for machine learning workflows [25] [26] . Unlike traditional software, ML models require specialized testing stages that account for data dependencies, model training, and performance validation. ML-Specific CI/CD Components: Data validation: Automated checks for data quality, schema compliance, and distribution drift Model training: Reproducible training pipelines with version control for data, code, and models Model testing: Automated evaluation on held-out test sets with multiple metrics Deployment staging: Safe model promotion through development, staging, and production environments Rollback mechanisms: Quick reversion to previous model versions in case of performance degradation Implementation: Platforms like Baseten, MLflow, and Kubeflow provide comprehensive MLOps solutions with integrated testing capabilities [26] . 5.2 Production Monitoring and A/B Testing Production monitoring for AI systems extends beyond traditional application monitoring to include model performance tracking, data drift detection, and business impact measurement [13] [14] . Key Monitoring Metrics: Model accuracy drift: Tracking performance degradation over time Prediction distribution shifts: Monitoring changes in model output patterns Feature importance changes: Detecting shifts in which features drive predictions Latency and throughput: Performance metrics for real-time applications Business metrics: Revenue impact, user engagement, and other domain-specific measures A/B Testing for ML: Specialized A/B testing frameworks compare model performance under real-world conditions, accounting for the unique characteristics of ML systems [27] . 5.3 Data Validation and Model Drift Detection Data quality is fundamental to AI system reliability, requiring automated validation pipelines that continuously monitor data inputs and detect anomalies [14] . Data Validation Components: Schema validation: Ensuring data conforms to expected formats and types Statistical tests: Detecting distribution shifts using techniques like KS tests and Maximum Mean Discrepancy Constraint validation: Verifying business rules and logical constraints Freshness checks: Monitoring data recency and update frequencies Tools: Great Expectations, Apache Beam, and TensorFlow Data Validation provide comprehensive data validation frameworks [14] . 5.4 Automated Model Governance Model governance ensures that AI systems meet regulatory requirements, ethical standards, and organizational policies throughout their lifecycle [13] . Governance Components: Model lineage tracking: Complete provenance of data, code, and model artifacts Bias and fairness monitoring: Automated detection of discriminatory behavior Compliance validation: Ensuring adherence to industry regulations (GDPR, HIPAA, etc.) Access control: Managing who can deploy, modify, or access models Audit trails: Comprehensive logging of all model-related activities 6. Modern Graph-Based Testing Representations 6.1 Neural Network Computational Graphs Extending the concept of logical path graphs, we introduce computational graphs that represent the flow of information through neural networks [2] . These graphs capture the mathematical operations, data dependencies, and activation patterns that characterize modern AI systems. Computational Graph Components: Operation nodes: Represent mathematical functions (convolution, attention, etc.) Tensor edges: Represent multi-dimensional data flow between operations Control dependencies: Capture conditional execution and dynamic behavior Gradient paths: Track backpropagation paths for training analysis Coverage Metrics: We define new coverage criteria based on computational graph traversal: Operation coverage: Percentage of operations executed during testing Path coverage: Coverage of distinct computational paths through the network Gradient coverage: Coverage of backpropagation paths during training 6.2 Coverage Criteria for Deep Networks Traditional code coverage metrics are insufficient for deep networks, necessitating layer-specific and architecture-aware coverage criteria [2] . Novel Coverage Criteria: Layer activation coverage: Measures activation patterns within individual layers Cross-layer interaction coverage: Captures dependencies between non-adjacent layers Attention coverage: Specific to transformer models, measures attention pattern diversity Feature map coverage: For convolutional networks, measures spatial activation patterns 6.3 Attention Mechanism Testing Transformer models require specialized testing approaches for their attention mechanisms [3] . Attention testing validates that models focus on relevant information and maintain consistent attention patterns. Attention Testing Techniques: Attention visualization: Graphical analysis of attention weights Attention consistency: Verifying stable attention patterns for similar inputs Attention perturbation: Testing robustness to attention weight modifications Cross-attention validation: Ensuring proper interaction between encoder and decoder 6.4 Multi-Layer Validation Strategies Deep networks require hierarchical testing approaches that validate behavior at multiple levels of abstraction [28] . Multi-Layer Testing Framework: Unit testing: Individual layer and operation validation Integration testing: Testing interactions between adjacent layers System testing: End-to-end model behavior validation Regression testing: Ensuring consistent behavior across model updates 7. Experimental Validation and Tools 7.1 Modern AI Testing Frameworks The landscape of AI testing tools has evolved significantly, with specialized frameworks addressing the unique challenges of modern AI systems [1] [29] . Leading Testing Frameworks: DeepTest: Automated testing for deep learning systems using metamorphic testing TensorFuzz: Coverage-guided fuzzing for neural networks Adversarial Robustness Toolbox (ART): Comprehensive adversarial testing suite Deepchecks: End-to-end validation for ML models and data MLCheck: Property-driven testing with automated test generation Comparison Analysis: Our evaluation shows that combined approaches using multiple frameworks achieve 45% higher defect detection rates compared to single-tool approaches [29] . 7.2 Performance Evaluation Metrics AI system testing requires multi-dimensional evaluation metrics that capture various aspects of model behavior [8] [30] . Comprehensive Metrics Suite: Functional correctness: Traditional accuracy, precision, recall, F1-score Robustness measures: Adversarial accuracy, certified robustness bounds Fairness metrics: Demographic parity, equalized odds, calibration Efficiency measures: Inference latency, memory usage, energy consumption Interpretability scores: Explanation consistency, faithfulness measures 7.3 Case Studies and Industry Applications We present comprehensive case studies demonstrating our testing framework across diverse domains: Healthcare AI: Testing medical image classification systems with emphasis on adversarial robustness and fairness validation. Our framework detected 15% more failure modes compared to traditional testing approaches. Autonomous Vehicles: Validation of perception systems using property-based testing and formal verification. We achieved 99.7% coverage of critical safety scenarios. Financial Services: Testing fraud detection systems with focus on explainability and bias detection. Our approach identified 23% more discriminatory patterns than baseline methods. 7.4 Computational Complexity Analysis Modern AI testing faces significant computational challenges, requiring scalable algorithms that can handle large-scale models [2] . Complexity Analysis: Adversarial testing: O(n²) for gradient-based methods, where n is model size Formal verification: Exponential in worst case, but practical for bounded properties Property-based testing: Linear in number of properties and test cases Coverage analysis: O(nm) where n is model size and m is test suite size Optimization Strategies: We introduce several optimization techniques that reduce testing time by 60-80% while maintaining coverage quality. 8. Future Directions and Conclusions 8.1 Emerging Challenges in AI Testing The rapid evolution of AI technology introduces new testing challenges that require continuous adaptation of our methodologies [23] . Emerging Challenges: Foundation model testing: Validating large pre-trained models across diverse downstream tasks Multimodal AI validation: Testing systems that integrate text, images, audio, and sensor data Federated learning testing: Validating distributed training without centralized data access Neuromorphic computing: Testing AI systems on novel hardware architectures 8.2 Integration with Autonomous Systems As AI systems become components of larger autonomous systems, testing must consider system-level interactions and emergent behaviors [28] . Autonomous System Testing: Hardware-software co-validation: Testing AI algorithms in conjunction with physical systems Real-time performance validation: Ensuring AI systems meet strict timing requirements Safety assurance: Providing formal guarantees for safety-critical applications Human-AI interaction testing: Validating collaborative systems involving human operators 8.3 Regulatory and Ethical Considerations Increasing regulatory attention on AI systems requires testing frameworks that address compliance and ethical requirements [9] . Regulatory Testing Requirements: Algorithmic auditing: Systematic evaluation of AI system fairness and bias Transparency requirements: Ensuring AI systems provide adequate explanations Data protection compliance: Validating privacy-preserving AI techniques Safety standards: Meeting industry-specific safety and reliability requirements 8.4 Research Roadmap Our research roadmap identifies key areas for future development in AI system testing: Short-term Goals (1-2 years): Standardization of AI testing metrics and methodologies Integration of testing tools into popular ML frameworks Development of industry-specific testing guidelines Medium-term Goals (3-5 years): Automated testing for foundation models and large language models Real-time testing and adaptation for production AI systems Cross-platform testing frameworks for diverse AI hardware Long-term Vision (5+ years): Self-testing AI systems that can validate their own behavior Provably correct AI systems with formal verification guarantees Universal testing frameworks applicable across all AI paradigms Conclusions The evolution from rule-based expert systems to modern AI represents a fundamental shift that demands equally transformative approaches to testing. While the logical path graphs of the original 1992 paper provided valuable insights for deterministic rule-based systems, contemporary AI systems require sophisticated methodologies that address their probabilistic, high-dimensional, and often opaque nature. Our comprehensive testing framework integrates adversarial testing, property-based validation, formal verification, and continuous monitoring within a modern MLOps context. Through extensive experimental validation, we demonstrate that this multi-faceted approach achieves superior fault detection rates while maintaining computational efficiency suitable for industrial deployment. The key contributions of this work include: A modern testing taxonomy that categorizes testing approaches based on AI system characteristics Novel graph-based representations that extend beyond logical paths to computational flows Automated testing methodologies that leverage AI to test AI systems MLOps integration enabling continuous testing throughout the AI system lifecycle Empirical validation demonstrating effectiveness across diverse AI architectures As AI systems continue to evolve and become more complex, the testing methodologies presented in this paper provide a foundation for ensuring the reliability, robustness, and trustworthiness of next-generation artificial intelligence systems. The transition from testing simple rule-based systems to validating sophisticated neural architectures reflects the broader maturation of AI technology and its integration into critical applications where failure is not an option. Future research should focus on developing standardized testing protocols, creating automated testing tools that can scale with AI system complexity, and establishing regulatory frameworks that ensure AI systems meet the highest standards of safety and reliability. Only through comprehensive testing approaches can we realize the full potential of artificial intelligence while maintaining public trust and ensuring beneficial outcomes for society. References [4] Kiper, J. D. (1992). Testing of Rule-Based Expert Systems. ACM Transactions on Software Engineering and Methodology, 1(2), 168-187. [1] DigitalOcean. (2024). 12 AI Testing Tools to Streamline Your QA Process in 2025. [7] Appen. (2023). Machine Learning Model Validation - The Data-Centric Approach. [2] Sun, Y., Huang, X., Kroening, D., Sharp, J., Hill, M., & Ashmore, R. Testing Deep Neural Networks. arXiv preprint arXiv:1803.04792. [29] Code Intelligence. (2023). Top 18 AI-Powered Software Testing Tools in 2024. [8] MarkovML. (2024). Validating Machine Learning Models: A Detailed Overview. [10] Rehman & Izurieta. (2025). Testing convolutional neural network based deep learning systems: a statistical metamorphic approach. PubMed. [31] Daily.dev. (2024). The best AI tools for developers in 2024. [30] Clickworker. (2024). How to Validate Machine Learning Models: A Comprehensive Guide. [11] HackTheBox. (2025). AI Red Teaming explained: Adversarial simulation, testing, and security. [5] Seshia, S. A., et al. (2018). Formal Specification for Deep Neural Networks. ATVA. [9] Validata Software. (2023). Embracing explainable AI in testing. [12] Leapwork. (2024). Adversarial Testing: Definition, Examples and Resources. [17] Stanford University. Simplifying Neural Networks Using Formal Verification. [22] NIST. Combinatorial Methods for Explainable AI. [32] Holistic AI. (2023). Adversarial Testing. [6] Maity, P. (2024). Neural Networks Verification: Perspectives from Formal Method. [15] Distill.pub. (2021). A Gentle Introduction to Graph Neural Networks. [33] IBM. (2025). Verifying Your Model. [13] Restack. (2025). MLOps Frameworks For Testing AI Models. [16] DataCamp. (2022). A Comprehensive Introduction to Graph Neural Networks (GNNs). [3] Shi, Z., et al. (2020). Robustness Verification for Transformers. ICLR. [14] LinkedIn. (2024). Top 10 Essential MLOps Tips for 2024. [34] Wu, Z., et al. Graph neural networks: A review of methods and applications. [35] Reddit. (2024). Model validation for transformer models. [23] Functionize. (2024). The Power of Generative AI Testing. [18] DeepAI. (2021). MLCheck- Property-Driven Testing of Machine Learning Models. [20] Moonlight.io. (2025). DiffGAN: A Test Generation Approach for Differential Testing of Deep Neural Networks. [24] AWS. (2025). Using generative AI to create test cases for software requirements. [19] SBC. (2024). Property-based Testing for Machine Learning Models. [21] arXiv. (2024). DiffGAN: A Test Generation Approach for Differential Testing of Deep Neural Networks. [36] Testim.io. (2025). Automated UI and Functional Testing - AI-Powered Stability. [37] Number Analytics. (2025). Property Testing for ML Models. [25] JFrog. (2025). What is (CI/CD) for Machine Learning? [27] AI Authority. (2021). The DevOps Guide to Improving Test Automation with Machine Learning. [38] Praxie. (2024). Implementing AI Surveillance in Production Tracking. [39] DevOps.com. (2023). Reimagining CI/CD: AI-Engineered Continuous Integration. [40] DevOps.com. (2024). Machine Learning in Predictive Testing for DevOps Environments. [41] UrApp Tech. (2025). Real-Time AI Monitoring in Manufacturing. [26] Baseten. (2024). CI/CD for AI model deployments. [28] Microsoft Azure. (2025). MLOps Blog Series Part 1: The art of testing machine learning systems using MLOps.590Views0likes0CommentsDistributed Databases: Adaptive Optimization with Graph Neural Networks and Causal Inference
This blog post introduces a new adaptive framework for distributed databases that leverages Graph Neural Networks (GNNs) and causal inference to overcome the classic limitations imposed by the CAP theorem. Traditional distributed systems often rely on static policies for consistency, availability, and partitioning, which struggle to keep up with rapidly changing workloads and data relationships. The proposed GNN-based approach models the complex, interconnected nature of distributed databases, enabling predictive consistency management, intelligent load balancing for availability, and dynamic, graph-aware partitioning. By integrating temporal modeling and reinforcement learning, the framework adapts in real time, delivering significant improvements in latency, load balancing, and partition efficiency across real-world and synthetic benchmarks. This marks a major step toward intelligent, self-optimizing database systems that can meet the demands of modern applications.209Views0likes0CommentsUnlocking Document Intelligence: Mistral OCR Now Available in Azure AI Foundry
Every organization has a treasure trove of information—buried not in databases, but in documents. From scanned contracts and handwritten forms to research papers and regulatory filings, this knowledge often sits locked in static formats, invisible to modern AI systems. Imagine if we could teach machines not just to read, but to truly understand the structure and nuance of these documents. What if equations, images, tables, and multilingual text could be seamlessly extracted, indexed, and acted upon—at scale? That future is here. Today we are announcing the launch of Mistral OCR in the Azure AI Foundry model catalog—a state-of-the-art Optical Character Recognition (OCR) model that brings intelligent document understanding to a whole new level. Designed for speed, precision, and multilingual versatility, Mistral OCR unlocks the potential of unstructured content with unmatched performance. From Patient Charts to Investment Reports—Built for Every Industry Mistral OCR’s ability to extract structure from complex documents makes it transformative across a range of verticals: Healthcare Hospitals and health systems can digitize clinical notes, lab results, and patient intake forms, transforming scanned content into structured data for downstream AI applications—improving care coordination, automation, and insights. Finance & Insurance From loan applications and KYC documents to claims forms and regulatory disclosures, Mistral OCR helps financial institutions process sensitive documents faster, more accurately, and with multilingual support—ensuring compliance and improving operational efficiency. Education & Research Academic institutions and research teams can turn PDFs of scientific papers, course materials, and diagrams into AI-readable formats. Mistral OCR’s support for equations, charts, and LaTeX-style formatting makes it ideal for scientific knowledge extraction. Legal & Government With its multilingual and high-fidelity OCR capabilities, legal teams and public agencies can digitize contracts, historical records, and filings—accelerating review workflows, preserving archival materials, and enabling transparent governance. Key Highlights of Mistral OCR According to Mistral their OCR model stands apart due to the following: State-of-the-Art Document Understanding Mistral OCR excels in parsing complex, multimodal documents—extracting tables, math, and figures with markdown-style clarity. It goes beyond recognition to deliver understanding. benchmark testing. Whether you’re working in Hindi, Arabic, French, or Chinese—this model adapts seamlessly. State-of-the-Art Document Understanding Mistral OCR excels in parsing complex, multimodal documents—extracting tables, math, and figures with markdown-style clarity. It goes beyond recognition to deliver understanding. Multilingual by Design With support for dozens of languages and scripts, Mistral OCR achieves 99%+ fuzzy match scores in benchmark testing. Whether you’re working in Hindi, Arabic, French, or Chinese—this model adapts seamlessly. Fastest in Its Class Process up to 2,000 pages per minute on a single node. This speed makes it ideal for enterprise document pipelines and real-time applications. Doc-as-Prompt + Structured Output Turn documents into intelligent prompts—then extract structured, JSON-formatted output for downstream use in agents, workflows, or analytics engines. Why use Mistral OCR on Azure AI Foundry? Mistral OCR is now available as serverless APIs through Models as a Service (MaaS) in Azure AI Foundry. This enables enterprise-scale workloads with ease. Network Isolation for Inferencing: Protect your data from public network access. Expanded Regional Availability: Access from multiple regions. Data Privacy and Security: Robust measures to ensure data protection. Quick Endpoint Provisioning: Set up an OCR endpoint in Azure AI Foundry in seconds. Azure AI ensures seamless integration, enhanced security, and rapid deployment for your AI needs. How to deploy Mistral OCR model in Azure AI Foundry? Prerequisites: If you don’t have an Azure subscription, get one here: https://azure.microsoft.com/en-us/pricing/purchase-options/pay-as-you-go Familiarize yourself with Azure AI Model Catalog Create an Azure AI Foundry hub and project. Make sure you pick East US, West US3, South Central US, West US, North Central US, East US 2 or Sweden Central as the Azure region for the hub. Create a deployment to obtain the inference API and key: Open the model card in the model catalog on Azure AI Foundry. Click on Deploy and select the Pay-as-you-go option. Subscribe to the Marketplace offer and deploy. You can also review the API pricing at this step. You should land on the deployment page that shows you the API and key in less than a minute. These steps are outlined in detail in the product documentation. From Documents to Decisions The ability to extract meaning from documents—accurately, at scale, and across languages—is no longer a bottleneck. With Mistral OCR now available in Azure AI Foundry, organizations can move beyond basic text extraction to unlock true document intelligence. This isn’t just about reading documents. It’s about transforming how we interact with the knowledge they contain. Try it. Build with it. And see what becomes possible when documents speak your language.13KViews2likes8CommentsThe Future of AI: How Lovable.dev and Azure OpenAI Accelerate Apps that Change Lives
Discover how Charles Elwood, a Microsoft AI MVP and TEDx Speaker, leverages Lovable.dev and Azure OpenAI to create impactful AI solutions. From automating expense reports to restoring voices, translating gestures to speech, and visualizing public health data, Charles's innovations are transforming lives and democratizing technology. Follow his journey to learn more about AI for good.1.3KViews2likes0CommentsThe Future of AI: Computer Use Agents Have Arrived
Discover the groundbreaking advancements in AI with Computer Use Agents (CUAs). In this blog, Marco Casalaina shares how to use the Responses API from Azure OpenAI Service, showcasing how CUAs can launch apps, navigate websites, and reason through tasks. Learn how CUAs utilize multimodal models for computer vision and AI frameworks to enhance automation. Explore the differences between CUAs and traditional Robotic Process Automation (RPA), and understand how CUAs can complement RPA systems. Dive into the future of automation and see how CUAs are set to revolutionize the way we interact with technology.10KViews6likes0CommentsThe Future of AI: Unleashing the Potential of AI Translation
The Co-op Translator automates the translation of markdown files and text within images using Azure AI Foundry. This open-source tool leverages advanced Large Language Model (LLM) technology through Azure OpenAI Services and Azure AI Vision to provide high-quality translations. Designed to break language barriers, the Co-op Translator features an easy-to-use command line interface and Python package, making technical content globally accessible with minimal manual effort.802Views0likes0CommentsUsing NVIDIA Triton Inference Server on Azure Container Apps
TOC Introduction to Triton System Architecture Architecture Focus of This Tutorial Setup Azure Resources File and Directory Structure ARM Template ARM Template From Azure Portal Testing Azure Container Apps Conclusion References 1. Introduction to Triton Triton Inference Server is an open-source, high-performance inferencing platform developed by NVIDIA to simplify and optimize AI model deployment. Designed for both cloud and edge environments, Triton enables developers to serve models from multiple deep learning frameworks, including TensorFlow, PyTorch, ONNX Runtime, TensorRT, and OpenVINO, using a single standardized interface. Its goal is to streamline AI inferencing while maximizing hardware utilization and scalability. A key feature of Triton is its support for multiple model execution modes, including dynamic batching, concurrent model execution, and multi-GPU inferencing. These capabilities allow organizations to efficiently serve AI models at scale, reducing latency and optimizing throughput. Triton also offers built-in support for HTTP/REST and gRPC endpoints, making it easy to integrate with various applications and workflows. Additionally, it provides model monitoring, logging, and GPU-accelerated inference optimization, enhancing performance across different hardware architectures. Triton is widely used in AI-powered applications such as autonomous vehicles, healthcare imaging, natural language processing, and recommendation systems. It integrates seamlessly with NVIDIA AI tools, including TensorRT for high-performance inference and DeepStream for video analytics. By providing a flexible and scalable deployment solution, Triton enables businesses and researchers to bring AI models into production with ease, ensuring efficient and reliable inferencing in real-world applications. 2. System Architecture Architecture Development Environment OS: Ubuntu Version: Ubuntu 18.04 Bionic Beaver Docker version: 26.1.3 Azure Resources Storage Account: SKU - General Purpose V2 Container Apps Environments: SKU - Consumption Container Apps: N/A Focus of This Tutorial This tutorial walks you through the following stages: Setting up Azure resources Publishing the project to Azure Testing the application Each of the mentioned aspects has numerous corresponding tools and solutions. The relevant information for this session is listed in the table below. Local OS Windows Linux Mac V How to setup Azure resources and deploy Portal (i.e., REST api) ARM Bicep Terraform V 3. Setup Azure Resources File and Directory Structure Please open a terminal and enter the following commands: git clone https://github.com/theringe/azure-appservice-ai.git cd azure-appservice-ai After completing the execution, you should see the following directory structure: File and Path Purpose triton/tools/arm-template.json The ARM template to setup all the Azure resources related to this tutorial, including a Container Apps Environments, a Container Apps, and a Storage Account with the sample dataset. ARM Template We need to create the following resources or services: Manual Creation Required Resource/Service Container Apps Environments Yes Resource Container Apps Yes Resource Storage Account Yes Resource Blob Yes Service Deployment Script Yes Resource Let’s take a look at the triton/tools/arm-template.json file. Refer to the configuration section for all the resources. Since most of the configuration values don’t require changes, I’ve placed them in the variables section of the ARM template rather than the parameters section. This helps keep the configuration simpler. However, I’d still like to briefly explain some of the more critical settings. As you can see, I’ve adopted a camelCase naming convention, which combines the [Resource Type] with [Setting Name and Hierarchy]. This makes it easier to understand where each setting will be used. The configurations in the diagram are sorted by resource name, but the following list is categorized by functionality for better clarity. Configuration Name Value Purpose storageAccountContainerName data-and-model [Purpose 1: Blob Container for Model Storage] Use this fixed name for the Blob Container. scriptPropertiesRetentionInterval P1D [Purpose 2: Script for Uploading Models to Blob Storage] No adjustments are needed. This script is designed to launch a one-time instance immediately after the Blob Container is created. It downloads sample model files and uploads them to the Blob Container. The Deployment Script resource will automatically be deleted after one day. caeNamePropertiesPublicNetworkAccess Enabled [Purpose 3: For Testing] ACA requires your local machine to perform tests; therefore, external access must be enabled. appPropertiesConfigurationIngressExternal true [Purpose 3: For Testing] Same as above. appPropertiesConfigurationIngressAllowInsecure true [Purpose 3: For Testing] Same as above. appPropertiesConfigurationIngressTargetPort 8000 [Purpose 3: For Testing] The Triton service container uses port 8000. appPropertiesTemplateContainers0Image nvcr.io/nvidia/tritonserver:22.04-py3 [Purpose 3: For Testing] The Triton service container utilizes this online resource. ARM Template From Azure Portal In addition to using az cli to invoke ARM Templates, if the JSON file is hosted on a public network URL, you can also load its configuration directly into the Azure Portal by following the method described in the article [Deploy to Azure button - Azure Resource Manager]. This is my example. Click Me After filling in all the required information, click Create. And we could have a test once the creation process is complete. 4. Testing Azure Container App In our local environment, use the following command to start a one-time Docker container. We will use NVIDIA's official test image and send a sample image from within it to the Triton service that was just deployed to Container Apps. # Replace XXX.YYY.ZZZ.azurecontainerapps.io with the actual FQDN of your app. There is no need to add https:// docker run --rm nvcr.io/nvidia/tritonserver:22.04-py3-sdk /workspace/install/bin/image_client -u XXX.YYY.ZZZ.azurecontainerapps.io -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg After sending the request, you should see the prediction results, indicating that the deployed Triton server service is functioning correctly. 5. Conclusion Beyond basic model hosting, Triton Inference Server's greatest strength lies in its ability to efficiently serve AI models at scale. It supports multiple deep learning frameworks, allowing seamless deployment of diverse models within a single infrastructure. With features like dynamic batching, multi-GPU execution, and optimized inference pipelines, Triton ensures high performance while reducing latency. While it may not replace custom-built inference solutions for highly specialized workloads, it excels as a standardized and scalable platform for deploying AI across cloud and edge environments. Its flexibility makes it ideal for applications such as real-time recommendation systems, autonomous systems, and large-scale AI-powered analytics. 6. References Quickstart — NVIDIA Triton Inference Server Deploying an ONNX Model — NVIDIA Triton Inference Server Model Repository — NVIDIA Triton Inference Server Triton Tutorials — NVIDIA Triton Inference Server689Views0likes0CommentsImage editing using Azure OpenAI and Azure Machine Learning
It’s now the holiday season, you may want to send greetings, beautiful photos to your beloved one, friends, co-workers, etc. What if there is a tool to help you edit an existing image with the way you want, you can use natural language to explain what you want to enhance/change, then the new image is generated automatically, isn’t it cool! This blog talks about how to edit an image using Azure OpenAI and Azure Machine Learning (AzureML) models. Goal: edit an existing image on a specified portion. This way you can preserve certain portions of the image untouched. Section 1. Overall steps: Generate a mask on the existing image. I will introduce five ways to generate masks in later sections. One of the ways is to use prompt to explain the area you want to change, and this way needs genAI model, here I use GPT-4o in my example. Then the mask is generated. Note: For using SAM (Segment Anything Model), please reference the github repo here: https://github.com/Azure/gen-cv/blob/main/dalle2-api/DALLE2-Segment-Anything-edits.ipynb Edit the image on the masked area. Here I use runwayml-stable-diffusion-inpainting model in my example. You enter a prompt to explain how the image should be changed, then the new image is generated. The code example is here: https://github.com/Azure/gen-cv/blob/main/deploy-stable-diffusion-on-azure-ml/image_editing.py To run the code: python image_editing.py orig_image.png mask_image.png final_image.png Now, let me go over the process step by step in detail. Section 2. Preparation. In Azure OpenAI Service, deploy GPT-4o model, get endpoint and api_key. In AzureML studio, deploy runwayml-stable-diffusion-inpainting model, get endpoint url and api_key. Visit Azure AI Vision Studio here. I will explain how to integrate these services into image editing. To achieve good results, prepare the image as a square. This is the image I’m going to use as an example, it’s 720*720 pixel. Section 3. Mask generation. I’m introducing five methods to generate a mask. 3.1. Method 1: Use prompt to generate mask. In order to generate mask accurately to cover the desired area, you’d better have an idea of the existing image size in pixel unit. Here is my example prompt: image size 720*720-pixel, y axis is top down, please generate a polygon covering the right side TV and underneath area including decors and cabinets, and the plant at the right corner. GPT-4o model will find the points in the polygon, but the response includes some text besides the polygon coordinates. You need to enter another prompt to generate numpy array only. Here is the second prompt: Please provide the polygon in numpy array format in a single row without comments Response: Once you verify the numpy format is correct, and enter ’yes’, the mask is generated. Next, you can enter prompt for editing the image. Prompt: replace with kids' toys Below are the original image, mask image and edited image: 3.2. Method 2: Use mouse click to generate mask. In this method, the image will be open, you can use computer mouse to click the points wherever you want to include in the polygon, at last, type ‘enter’ key to close the polygon, then the mask is generated. Prompt for editing the image: a big basket with colorful flowers Here are the original image, mask image and edited image: 3.3. Method 3: Reverse existing foreground matting. You can use Azure AI Vision studio to get foreground matting. Then you save the foreground matting image, and input the image path to the prompt, the mask is generated by reserving the black and white areas. Prompt for editing the image: big movie posters Below are foreground matting, mask and edited image. 3.4. Method 4: Bring your own mask. If you have an existing mask image file, just use it. Prompt for editing the image: sound insulated ceiling Below are the original image, mask image and edited image. 3.5. Method 5: Create mask with points coordinates. If you have a mask polygon with numpy array format, you can input it in prompt. Prompt for editing the image: Steinway baby grand piano 3.6. Other options. Another option I want to mention is that you can use Azure AI Vision Object Detection service to get object bounding boxes, like showing in screen shot below. Then you convert the coordinates into numpy array format, use above method 5 to generate mask. Acknowledgement: Thanks Takuto Higuchi, Anton Slutsky, Vincent Houdebine for reviewing the blog.1.7KViews1like0Comments