azure ai
89 TopicsThe Future of AI: Horses for Courses - Task-Specific Models and Content Understanding
Task-specific models are designed to excel at specific use cases, offering highly specialized solutions that can be more efficient and cost-effective than general-purpose models. These models are optimized for particular tasks, resulting in faster performance and lower latency, and they often do not require prompt engineering or fine-tuning.1.2KViews2likes1CommentUnveiling the Next Generation of Table Structure Recognition
In an era where data is abundant, the ability to accurately and efficiently extract structured information like tables from diverse document types is critical. For instance, consider the complexities of a balance sheet with multiple types of assets or an invoice with various charges, both presented in a table format that can be challenging even for humans to interpret. Traditional parsing methods often struggle with the complexity and variability of real-world tables, leading to manual intervention and inefficient workflows. This is because these methods typically rely on rigid rules or predefined templates that fail when encountering variations in layout, formatting, or content, which are common in real-world documents. While the promise of Generative AI and Large Language Models (LLMs) in document understanding is vast, our research in table parsing has revealed a critical insight: for tasks requiring precision in data alignment, such as correctly associating data cells with their respective row and column headers, classical computer vision techniques currently offer superior performance. Generative AI models, despite their powerful contextual understanding, can sometimes exhibit inconsistencies and misalignments in tabular structures, leading to compromised data integrity (Figure 1). Therefore, Azure Document Intelligence (DI) and Content Understanding (CU) leverages an even more robust and proven computer vision algorithms to ensure the foundational accuracy and consistency that enterprises demand. Figure 1: Vision LLMs struggle to accurately recognize table structure, even in simple tables. Our current table recognizer excels at accurately identifying table structures, even those with complex layouts, rotations, or curved shapes. However, it does have its limitations. For example, it occasionally fails to properly delineate a table where the logical boundaries are not visible but must be inferred from the larger document context, making suboptimal inferences. Furthermore, its architectural design makes it challenging to accelerate on modern GPU platforms, impacting its runtime efficiency. Taking these limitations in considerations and building upon our existing foundation, we are introducing the latest advancement in our table structure recognizer. This new version significantly enhances both performance and accuracy, addressing key challenges in document processing. Precise Separation Line Placement We've made significant strides in the precision of separation line placement. While predicting these separation lines might seem deceptively simple, it comes with subtle yet significant challenges. In many real-world documents, these are logical separation lines, meaning they are not always visibly drawn on the page. Instead, their positions are often implied by an array of nuanced visual cues such as table headers/footers, dot filler text, background color changes, and even the spacing and alignment of content within the cells. Figure 2: Visual Comparison of separation line prediction of current and the new version We've developed a novel model architecture that can be trained end-to-end to directly tackle the above challenges. Recognizing the difficulty for humans to consistently label table separation lines, we've devised a training objective that combines Hungarian matching with an adaptive matching weight to correctly align predictions with ground truth even when the latter is noisy. Additionally, we've incorporated a loss function inspired by speech recognition to encourage the model to accurately predict the correct number of separation lines, further enhancing its performance. Our improved algorithms now respect visual cues more effectively, ensuring that separation lines are placed precisely where they belong. This leads to cleaner, more accurate table structures and ultimately, more reliable data extraction. Figure 2 shows the comparison between the current model and the new model on a few examples. Some quantitative results can be found in Table 1. TSR (current, in %) TSR-v2 (next-gen, in %) Segment Precision Recall F1-Score Precision Recall F1-score Latin 90.2 90.7 90.4 94.0 95.7 94.8 Chinese 96.1 95.3 95.7 97.3 96.8 97.0 Japanese 93.5 93.8 93.7 95.1 97.1 96.1 Korean 95.3 95.9 95.6 97.5 97.8 97.7 Table 1: Table structure accuracy measured by cell prediction precision and recall rates at IoU (intersection over union) threshold of 0.5. Tested on in-house test datasets covering four different scripts. A Data-Driven, GPU-Accelerated Design Another innovation in this release is its data-driven, fully GPU-accelerated design. This architectural shift delivers enhanced quality and significantly faster inference speeds, which is critical for processing large volumes of documents. The design carefully considers the trade-off between model capability and latency requirements, prioritizing an architecture that leverages the inherent parallelism of GPUs. This involves favoring highly parallelizable models over serial approaches to maximize GPU utilization. Furthermore, post-processing logic has been minimized to prevent it from becoming a bottleneck. This comprehensive approach has resulted in a drastic reduction in processing latency, from 250ms per image to less than 10ms. Fueling Robustness with Synthetic Data Achieving the high level of accuracy and robustness required for enterprise-grade table recognition demands vast quantities of high-quality training data. To meet this need efficiently, we've strategically incorporated synthetic data into our development pipeline. A few examples can be found in Figure 3. Figure 3: Synthesized tables Synthetic data offers significant advantages: it's cost-effective to generate and provides unparalleled control over the dataset. This allows us to rapidly synthesize diverse and specific table styles, including rare or challenging layouts, which would be difficult and expensive to collect from real-world documents. Crucially, synthetic data comes with perfectly consistent labels. Unlike human annotation, which can introduce variability, synthetic data ensures that our models learn from a flawlessly labeled ground truth, leading to more reliable and precise training outcomes. Summary This latest version of our table structure recognizer enhances critical document understanding capabilities. We've refined separation line placement to better respect visual cues and implied structures, supported by our synthetic data approach for consistent training. This enhancement, in turn, allows users to maintain the table structure as intended, reducing the need for manual post-processing to clean up the structured output. Additionally, a GPU-accelerated, data-driven design delivers both improved quality and faster performance, crucial for processing large document volumes.1.1KViews2likes3CommentsThe Future of AI: Fine-Tuning Llama 3.1 8B on Azure AI Serverless, why it's so easy & cost efficient
In this article, you will learn how to fine-tune the Llama 3.1 8B model using RAFT and LoRA with Azure AI Serverless Fine-Tuning for efficient, cost-effective model customization.5.2KViews1like0CommentsThe Future of AI: The paradigm shifts in Generative AI Operations
Dive into the transformative world of Generative AI Operations (GenAIOps) with Microsoft Azure. Discover how businesses are overcoming the challenges of deploying and scaling generative AI applications. Learn about the innovative tools and services Azure AI offers, and how they empower developers to create high-quality, scalable AI solutions. Explore the paradigm shift from MLOps to GenAIOps and see how continuous improvement practices ensure your AI applications remain cutting-edge. Join us on this journey to harness the full potential of generative AI and drive operational excellence.7.3KViews1like1CommentThe Future of AI: Maximize your fine-tuned model performance with the new Azure AI Evaluation SDK
In this article, we will explore how to effectively evaluate fine-tuned AI models using the new Azure AI Evaluation SDK. This comprehensive guide is the fourth part of our series on making large language model distillation easier. We delve into the importance of model evaluation, outline a systematic process for assessing the performance of a distilled student model against a baseline model, and demonstrate the use of advanced metrics provided by Azure's SDK. Join us as we navigate the intricacies of AI evaluation and provide insights for continuous model improvement and operational efficiency.1.6KViews1like0CommentsThe Future of AI: Generative AI for...Time Series Forecasting?!? A Look at Nixtla TimeGEN-1
Have you ever wondered how meteorologists predict tomorrow's weather, or how businesses anticipate future sales? These predictions rely on analyzing patterns over time, known as time series forecasts. With advancements in artificial intelligence, forecasting the future has become more accurate and accessible than ever before. Understanding Time Series Forecasting Time series data is a collection of observations recorded at specific time intervals. Examples include daily temperatures, monthly sales figures, or hourly website visitors. By examining this data, we can identify trends and patterns that help us predict future events. Forecasting involves using mathematical models to analyze past data and make informed guesses about what comes next. Traditional Forecasting Methods: ARIMA and Prophet Two of the most popular traditional methods for doing time series forecasting are ARIMA and Prophet. ARIMA, which stands for AutoRegressive Integrated Moving Average, predicts future values based on past data. It involves making the data stationary by removing trends and seasonal effects, then applying statistical techniques. However, ARIMA requires manual setup of parameters like trends and seasonality, which can be complex and time-consuming. It's best suited for simple, one-variable data with minimal seasonal changes. Prophet, a forecasting tool developed by Facebook (now Meta), automatically detects trends, seasonality, and holiday effects in the data, making it more user-friendly than ARIMA. Prophet works well with data that has strong seasonal patterns and doesn't need as much historical data. However, it may struggle with more complex patterns or irregular time intervals. Introducing Nixtla TimeGEN-1: A New Era in Forecasting Nixtla TimeGEN-1 represents a significant advancement in time series forecasting. Unlike traditional models, TimeGEN-1 is a generative pretrained transformer model, much like the GPT models, but rather than working with language, it's specifically designed for time series data. It has been trained on over 100 billion data points from various fields such as finance, weather, energy, and web data. This extensive training allows TimeGEN-1 to handle a wide range of data types and patterns. One of the standout features of TimeGEN-1 is its ability to perform zero-shot inference. This means it can make accurate predictions on new datasets without needing additional training. It can also be fine-tuned on specific datasets for even better accuracy. TimeGEN-1 handles irregular data effortlessly, working with missing timestamps or uneven intervals. Importantly, it doesn't require users to manually specify trends or seasonal components, making it accessible even to those without deep technical expertise. The transformer architecture of TimeGEN-1 enables it to capture complex patterns in data that traditional models might miss. It brings the power of advanced machine learning to time series forecasting – and related tasks like anomaly detection – making the process more efficient and accurate. Real-World Comparison: TimeGEN-1 vs. ARIMA and Prophet To test these claims, I decided to run an experiment to compare the performance of TimeGEN-1 with ARIMA and Prophet. I used a retail dataset where the actual future values were known, which in data science parlance is known as a "backtest." In my dataset, ARIMA struggled to predict future values accurately due to its limitations with complex patterns. Prophet performed better than ARIMA by automatically detecting some patterns, but its predictions still didn't quite hit the mark. TimeGEN-1, however, delivered predictions that closely matched the actual data, significantly outperforming both ARIMA and Prophet. The accuracy of these models was measured using metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). TimeGEN-1 had the lowest MAE and RMSE, indicating higher accuracy. This experiment highlights how TimeGEN-1 can provide more precise forecasts, even when compared to established methods. The Team Behind TimeGEN-1: Nixtla Nixtla is a company dedicated to making advanced predictive insights accessible to everyone. It was founded by a team of experts passionate about simplifying forecasting processes while maintaining high accuracy and efficiency. The team includes Max Mergenthaler Canseco, CEO; Azul Garza, CTO; and Cristian Challu, CSO, experts in the forecasting field with extensive experience in machine learning and software engineering.< Their collective goal is to simplify the forecasting process, making powerful tools available to users with varying levels of technical expertise. By integrating TimeGEN-1 into easy-to-use APIs, they ensure that businesses and individuals can leverage advanced forecasting without needing deep machine learning knowledge. The Azure AI Model Catalog TimeGEN-1 is one of the 1700+ models that are now available in the Azure AI model catalog. The model catalog is continuously updated with the latest advancements, like TimeGEN-1, ensuring that users have access to the most cutting-edge tools. Its user-friendly interface makes it easy to navigate and deploy models, and Azure's cloud infrastructure provides the scalability needed to run these models, allowing users to handle large datasets and complex computations efficiently. In the following video, I show how Data Scientists and Developers can build time series forecasting models using data stored in Microsoft Fabric paired with the Nixtla TimeGEN-1 model. The introduction of Nixtla TimeGEN-1 marks a transformative moment in time series forecasting. Whether you're a data scientist, a business owner, or a student interested in AI, TimeGEN-1 opens up new possibilities for understanding and predicting future trends. Explore TimeGEN-1 and thousands of other models through the Azure AI model catalog today!4.3KViews3likes0Comments