artificial intelligence
371 TopicsVector Drift in Azure AI Search: Three Hidden Reasons Your RAG Accuracy Degrades After Deployment
What Is Vector Drift? Vector drift occurs when embeddings stored in a vector index no longer accurately represent the semantic intent of incoming queries. Because vector similarity search depends on relative semantic positioning, even small changes in models, data distribution, or preprocessing logic can significantly affect retrieval quality over time. Unlike schema drift or data corruption, vector drift is subtle: The system continues to function Queries return results But relevance steadily declines Cause 1: Embedding Model Version Mismatch What Happens Documents are indexed using one embedding model, while query embeddings are generated using another. This typically happens due to: Model upgrades Shared Azure OpenAI resources across teams Inconsistent configuration between environments Why This Matters Embeddings generated by different models: Exist in different vector spaces Are not mathematically comparable Produce misleading similarity scores As a result, documents that were previously relevant may no longer rank correctly. Recommended Practice A single vector index should be bound to one embedding model and one dimension size for its entire lifecycle. If the embedding model changes, the index must be fully re-embedded and rebuilt. Cause 2: Incremental Content Updates Without Re-Embedding What Happens New documents are continuously added to the index, while existing embeddings remain unchanged. Over time, new content introduces: Updated terminology Policy changes New product or domain concepts Because semantic meaning is relative, the vector space shifts—but older vectors do not. Observable Impact Recently indexed documents dominate retrieval results Older but still valid content becomes harder to retrieve Recall degrades without obvious system errors Practical Guidance Treat embeddings as living assets, not static artifacts: Schedule periodic re-embedding for stable corpora Re-embed high-impact or frequently accessed documents Trigger re-embedding when domain vocabulary changes meaningfully Declining similarity scores or reduced citation coverage are often early signals of drift. Cause 3: Inconsistent Chunking Strategies What Happens Chunk size, overlap, or parsing logic is adjusted over time, but previously indexed content is not updated. The index ends up containing chunks created using different strategies. Why This Causes Drift Different chunking strategies produce: Different semantic density Different contextual boundaries Different retrieval behavior This inconsistency reduces ranking stability and makes retrieval outcomes unpredictable. Governance Recommendation Chunking strategy should be treated as part of the index contract: Use one chunking strategy per index Store chunk metadata (for example, chunk_version) Rebuild the index when chunking logic changes Design Principles Versioned embedding deployments Scheduled or event-driven re-embedding pipelines Standardized chunking strategy Retrieval quality observability Prompt and response evaluation Key Takeaways Vector drift is an architectural concern, not a service defect It emerges from model changes, evolving data, and preprocessing inconsistencies Long-lived RAG systems require embedding lifecycle management Azure AI Search provides the controls needed to mitigate drift effectively Conclusion Vector drift is an expected characteristic of production RAG systems. Teams that proactively manage embedding models, chunking strategies, and retrieval observability can maintain reliable relevance as their data and usage evolve. Recognizing and addressing vector drift is essential to building and operating robust AI solutions on Azure. Further Reading The following Microsoft resources provide additional guidance on vector search, embeddings, and production-grade RAG architectures on Azure. Azure AI Search – Vector Search Overview - https://learn.microsoft.com/azure/search/vector-search-overview Azure OpenAI – Embeddings Concepts - https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/embeddings?view=foundry-classic&tabs=csharp Retrieval-Augmented Generation (RAG) Pattern on Azure - https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview?tabs=videos Azure Monitor – Observability Overview - https://learn.microsoft.com/azure/azure-monitor/overview456Views2likes2CommentsGemma 4 now available in Microsoft Foundry
Experimenting with open-source models has become a core part of how innovative AI teams stay competitive: experimenting with the latest architectures and often fine-tuning on proprietary data to achieve lower latencies and cost. Today, we’re happy to announce that the Gemma 4 family, Google DeepMind’s newest model family, is now available in Microsoft Foundry via the Hugging Face collection. Azure customers can now discover, evaluate, and deploy Gemma 4 inside their Azure environment with the same policies they rely on for every other workload. Foundry is the only hyperscaler platform where developers can access OpenAI, Anthropic, Gemma, and over 11,000+ models under a single control plane. Through our close collaboration with Hugging Face, Gemma 4 joining that collection continues Microsoft’s push to bring customers the widest selection of models from any cloud – and fits in line with our enhanced investments in open-source development. Frontier Intelligence, open-source weights Released by Google DeepMind on April 2, 2026, Gemma 4 is built from the same research foundation as Gemini 3 and packaged as open weights under an Apache 2.0 license. Key capabilities across the Gemma 4 family: Native multimodal: Text + image + video inputs across all sizes; analyze video by processing sequences of frames; audio input on edge models (E2B, E4B) Enhanced reasoning & coding capabilities: Multi-step planning, deep logic, and improvements in math and instruction-following enabling autonomous agents Trained for global deployment: Pretrained on 140+ languages with support for 35+ languages out of the box Long context: Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B) allow developers to reason across extensive codebases, lengthy documents, or multi-session histories Why choose Foundry? Foundry is built to give developers breadth -- access to models from major model providers, open and proprietary, under one roof. Stay within Azure to work leading models. When you deploy through Foundry, models run inside your Azure environment and are subject to the same network policies, identity controls, and audit processes your organization already has in place. Managed online endpoints handle serving, scaling, and monitoring without manually setting up and managing the underlying infrastructure. Serverless deployment with Azure Container Apps allows developers to deploy and run containerized applications while reducing infrastructure management and saving costs. Gated model access integrates directly with Hugging Face user tokens, so models that require license acceptance stay compliant can be accessed without manual approvals. Foundry Local lets you run optimized Hugging Face models directly on your own hardware using the same model catalog and SDK patterns as your cloud deployments. Read the documentation here: https://aka.ms/foundrylocal and https://aka.ms/HF/foundrylocal Microsoft’s approach to Responsible AI is grounded in our AI principles of fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. Microsoft Foundry provides governance controls, monitoring, and evaluation capabilities to help organizations deploy new models responsibly in production environments. What are teams building with Gemma 4 in Foundry Gemma 4’s combination of multimodal input, agentic function calling, and long context offers a wide range of production use cases: Document intelligence: Processing PDFs, charts, invoices, and complex tables using native vision capabilities Multilingual enterprise apps: 140+ natively trained languages — ideal for multinational customer support, content platforms as well as language learning tools for grammar correction and writing practice Long-context analytics: Reasoning across entire codebases, legal documents, or multi-session conversation histories Getting started Try Gemma 4 in Microsoft Foundry today. New models from Hugging Face continue to roll out to Foundry on a regular basis through our ongoing collaboration. If there's a model you want to see added, let us know here. Stay connected to our developer community on Discord and stay up to date on what is new in Foundry through the Model Mondays series.329Views1like0CommentsIntroducing MAI-Image-2-Efficient: Faster, More Efficient Image Generation
Building on our momentum Just last week, we celebrated a major milestone: the public preview launch of three new first-party Microsoft AI models in Microsoft Foundry: MAI-Image-2, MAI-Voice-1, and MAI-Transcribe-1. Together, they represent a comprehensive multimedia AI stack purpose-built for developers that spans image generation, natural speech synthesis, and enterprise-grade transcription across 25 languages. The response from the developer community has been incredible, and we're not slowing down. Fast on the heels of that launch, we're thrilled to introduce the next addition to the MAI image generation family: MAI-Image-2-Efficient – or Image-2e for short. It’s now available in public preview in Microsoft Foundry and MAI Playground. What makes MAI-Image-2-Efficient unique? MAI-Image-2-Efficient is built on the same architecture as MAI-Image-2 which is the model that debuted at #3 on the Arena.ai leaderboard for image model families. Based on customer feedback, we’ve now improved it and engineered for speed and efficiency. It’s up to 22% faster with 4x more efficiency compared to MAI-Image-2 when normalized by latency and GPU usage 1 . It also outpaces leading text-to-image models by 40% on average 2 . In short, MAI-Image-2-Efficient gives developers more output for less compute, unlocking a whole new category of use cases. Who is MAI-Image-2-Efficient for? MAI-Image-2-Efficient is designed for builders who need high-quality image generation at speed and scale. Here are the top use cases where Image-2-Efficient shines: High-volume production workflows: E-commerce platforms, media companies, and marketing teams often need to generate thousands of images per day, as part of their business processes to generate targeted advertisements, concept art and mood boards. MAI-Image-2-Efficient's superior efficiency means larger batches at lower GPU cost, so your team can think and iterate as fast as you want and reach the end-product faster. Real-time and conversational experiences: When users expect images to appear mid-conversation (in a chatbot, a creative copilot, or an AI-powered design tool), every millisecond counts. Thanks to its lower latency, MAI-Image-2-Efficient serves as an excellent backbone for interactive applications that require fast response times. Rapid prototyping and creative iteration: MAI-Image-2-Efficient enables your team to quickly and affordably test new pipelines, experiment with creative ideas, or refine prompts. You don't need the complete model to validate a concept; what you need is speed, and that's exactly what MAI-Image-2-Efficient provides. MAI-Image-2 vs. MAI-Image-2-Efficient — which should you use? MAI-Image-2-Efficient and MAI-Image-2 are built for different strengths, so choosing the right model depends on the needs of your workflow. MAI-Image-2-Efficient is the ideal choice for high-volume workflows where latency and speed are priorities. If your pipeline needs to generate images quickly and at scale, MAI-Image-2-Efficient delivers without compromise. MAI-Image-2 is the recommended option when your images require precise, detailed text rendering, or when scenes demand the deepest photorealistic contrast and smoothness. The two models also have distinct visual signatures: MAI-Image-2-Efficient renders with sharpness and defined lines, making it a strong choice for illustration, animation, and photoreal images designed to grab attention. MAI-Image-2 delivers smoother, more nuanced contrast, making it the go-to for photorealistic imagery that prioritizes depth and subtlety. Try it today MAI-Image-2-Efficient is available now in Microsoft Foundry and MAI Playground. For builders in Foundry, MAI-Image-2-Efficient starts at $5 USD per 1M tokens for text input and $19.50 USD per 1M tokens for image output. And this is just the beginning. We have more exciting announcements lined up; stay tuned for what we're bringing to Microsoft Build 2026. References: As tested on April 13, 2026. Compared to MAI-Image-2 when normalized by latency and GPU usage. Throughput per GPU vs MAI-Image-2 on NVIDIA H100 at 1024×1024; measured with optimized batch sizes and matched latency targets. Results vary with batch size, concurrency, and latency constraints. As tested on April 13, 2026. Compared to Gemini 3.1 Flash (high reasoning), Gemini 3.1 Flash Image and Gemini 3 Pro Image: Measured at p50 latency via AI Studio API (1:1, 1K images; minimal reasoning unless noted; web search disabled). MAI-Image-2, MAI-Image-2-Efficient, GPT-Image-1.5-High: Measured at p50 latency via Foundry API.1KViews0likes0CommentsSimplifying Image Classification with Azure AutoML for Images: A Practical Guide
1. The Challenge of Traditional Image Classification Anyone who has worked with computer vision knows the drill: you need to classify images, so you dive into TensorFlow or PyTorch, spend days architecting a convolutional neural network, experiment with dozens of hyperparameters, and hope your model generalizes well. It’s time-consuming, requires deep expertise, and often feels like searching for a needle in a haystack. What if there was a better way? 2. Enter Azure AutoML for Images Azure AutoML for Images is a game-changer in the computer vision space. It’s a feature within Azure Machine Learning that automatically builds high-quality vision models from your image data with minimal code. Think of it as having an experienced ML engineer working alongside you, handling all the heavy lifting while you focus on your business problem. What Makes AutoML for Images Special? 1. Automatic Model Selection Instead of manually choosing between ResNet, EfficientNet, or dozens of other architectures, AutoML for Images (Azure ML) evaluates multiple state-of-the-art deep learning models and selects the best one for your specific dataset. It’s like having access to an entire model zoo with an intelligent curator. 2. Intelligent Hyperparameter Tuning The system doesn’t just pick a model — it optimizes it. Learning rates, batch sizes, augmentation strategies, and more are automatically tuned to squeeze out the best possible performance. What would take weeks of manual experimentation happens in hours. 3. Built-in Best Practices Data preprocessing, augmentation techniques, and training strategies that would require extensive domain knowledge are pre-configured and applied automatically. You get enterprise-grade ML without needing to be an ML expert. Key Capabilities The repository demonstrates several powerful features: Multi-class and Multi-label Classification: Whether you need to classify an image into a single category or tag it with multiple labels, AutoML manages both scenarios seamlessly. Format Flexibility: Works with standard image formats including JPEG and PNG, making it easy to integrate with existing datasets. Full Transparency: Unlike black-box solutions, you maintain complete visibility and control over the training process. You can monitor metrics, understand model decisions, and fine-tune as needed. Production-Ready Deployment: Once trained, models can be easily deployed to Azure endpoints, ready to serve predictions at scale. Real-World Applications The practical applications are vast: E-commerce: Automatically categorize product images for better search and recommendations. Healthcare: Classify medical images for diagnostic support. Manufacturing: Detect defects in production line images. Agriculture: Identify crop diseases or estimate yield from aerial imagery. Content Moderation: Automatically flag inappropriate visual content. 3. A Practical Example: Metal Defect Detection The repository includes a complete end-to-end example of detecting defects in metal surfaces — a critical quality control task in manufacturing. The notebooks demonstrate how to: Download and organize image data from sources like Kaggle, Create training and validation splits with proper directory structure, Upload data to Azure ML as versioned datasets, Configure GPU compute that scales based on demand, Train multiple models with automated hyperparameter tuning, Evaluate results with comprehensive metrics and visualizations, Deploy the best model as a production-ready REST API, Export to ONNX for edge deployment scenarios. The metal defect use case is particularly instructive because it mirrors real industrial applications where quality control is critical but expertise is scarce. The notebooks show how a small team can build production-grade computer vision systems without a dedicated ML research team. Getting Started: What You Need The prerequisites are straightforward: An Azure subscription (free tier available for experimentation) An Azure Machine Learning workspace Python 3.7 or later That’s it. No local GPU clusters to configure, no complex deep learning frameworks to master. Repository Structure The repository is thoughtfully organized into three progressive notebooks: Downloading images.ipynb Shows how to acquire and prepare image datasets Demonstrates proper directory structure for classification tasks Includes data exploration and visualization techniques image-classification-azure-automl-for-images/1. Downloading images.ipynb at main · retkowsky/image-classification-azure-automl-for-images Azure ML AutoML for Images.ipynb The core workflow: connect to Azure ML, upload data, configure training Covers both simple model training and advanced hyperparameter tuning Shows how to evaluate models and select the best performing one Demonstrates deployment to managed online endpoints image-classification-azure-automl-for-images/2. Azure ML AutoML for Images.ipynb at main · retkowsky/image-classification-azure-automl-for-images Edge with ONNX local model.ipynb Exports trained models to ONNX format Shows how to run inference locally without cloud connectivity Perfect for edge computing and IoT scenarios image-classification-azure-automl-for-images/3. Edge with ONNX local model.ipynb at main · retkowsky/image-classification-azure-automl-for-images Each Python notebook is self-contained with clear explanations, making it easy to understand each step of the process. You can run them sequentially to build a complete solution, or jump to specific sections relevant to your use case. The Developer Experience What sets this approach apart is the developer experience. The repository provides Python notebooks that guide you through the entire workflow. You’re not just reading documentation — you’re working with practical, runnable examples that demonstrate real scenarios. Let’s walk through the code to see how straightforward this actually is. Use-case description This image classification model is designed to identify and classify defects on metal surfaces in a manufacturing context. We want to classify defective images into Crazing, Inclusion, Patches, Pitted, Rolled & Scratches. Press enter or click to view image in full size All code and images are available here: retkowsky/image-classification-azure-automl-for-images: Azure AutoML for images — Image classification Step 1: Connect to Azure ML Workspace First, establish connection to your Azure ML workspace using Azure credentials: print("Connection to the Azure ML workspace…") credential = DefaultAzureCredential() ml_client = MLClient( credential, os.getenv("subscription_id"), os.getenv("resource_group"), os.getenv("workspace") ) print("✅ Done") That’s it. Step 2: Upload Your Dataset Upload your image dataset to Azure ML. The code handles this elegantly: my_images = Data( path=TRAIN_DIR, type=AssetTypes.URI_FOLDER, description="Metal defects images for images classification", name="metaldefectimagesds", ) uri_folder_data_asset = ml_client.data.create_or_update(my_images) print("🖼️ Informations:") print(uri_folder_data_asset) print("\n🖼️ Path to folder in Blob Storage:") print(uri_folder_data_asset.path) Your local images are now versioned data assets in Azure, ready for training. Step 3: Create GPU Compute Cluster AutoML needs compute power. Here’s how you create a GPU cluster that auto-scales: compute_name = "gpucluster" try: _ = ml_client.compute.get(compute_name) print("✅ Found existing Azure ML compute target.") except ResourceNotFoundError: print(f"🛠️ Creating a new Azure ML compute cluster '{compute_name}'…") compute_config = AmlCompute( name=compute_name, type="amlcompute", size="Standard_NC16as_T4_v3", # GPU VM idle_time_before_scale_down=1200, min_instances=0, # Scale to zero when idle max_instances=4, ) ml_client.begin_create_or_update(compute_config).result() print("✅ Done") The cluster scales from 0 to 4 instances based on workload, so you only pay for what you use. Step 4: Configure AutoML Training Now comes the magic. Here’s the entire configuration for an AutoML image classification job using a specific model (here a resnet34). It is possible as well to access all the available models from the image classification AutoML library. Press enter or click to view image in full size https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models?view=azureml-api-2&tabs=python#supported-model-architectures image_classification_job = automl.image_classification( compute=compute_name, experiment_name=exp_name, training_data=my_training_data_input, validation_data=my_validation_data_input, target_column_name="label", ) # Set training parameters image_classification_job.set_limits(timeout_minutes=60) image_classification_job.set_training_parameters(model_name="resnet34") That’s approximately 10 lines of code to configure what would traditionally require hundreds of lines and deep expertise. Step 5: Hyperparameter Tuning (Optional) Want to explore multiple models and configurations? image_classification_job = automl.image_classification( compute=compute_name, # Compute cluster experiment_name=exp_name, # Azure ML job training_data=my_training_data_input, # Training validation_data=my_validation_data_input, # Validation target_column_name="label", # Target primary_metric=ClassificationPrimaryMetrics.ACCURACY, # Metric tags={"usecase": "metal defect", "type" : "computer vision", "product" : "azure ML", "ai": "image classification", "hyper": "YES"}, ) image_classification_job.set_limits( timeout_minutes=60, # Timeout in min max_trials=5, # Max model number max_concurrent_trials=2, # Concurrent training ) image_classification_job.extend_search_space([ SearchSpace( model_name=Choice(["vitb16r224", "vits16r224"]), learning_rate=Uniform(0.001, 0.01), # LR number_of_epochs=Choice([15, 30]), # Epoch ), SearchSpace( model_name=Choice(["resnet50"]), learning_rate=Uniform(0.001, 0.01), # LR layers_to_freeze=Choice([0, 2]), # Layers to freeze ), ]) image_classification_job.set_sweep( sampling_algorithm="Random", # Random sampling to select combinations of hyperparameters. early_termination=BanditPolicy(evaluation_interval=2, # The model is evaluated every 2 iterations. slack_factor=0.2, # If a run’s performance is 20% worse than the best run so far, it may be terminated. delay_evaluation=6), # The policy waits until 6 iterations have completed before starting to # evaluate and potentially terminate runs. ) AutoML will now automatically try different model architectures, learning rates, and augmentation strategies to find the best configuration. Step 6: Launch Training Submit the job and monitor progress: # Submit the job returned_job = ml_client.jobs.create_or_update(image_classification_job) print(f"✅ Created job: {returned_job}") # Stream the logs in real-time ml_client.jobs.stream(returned_job.name) While training runs, you can monitor metrics, view logs, and track progress through the Azure ML Studio UI or programmatically. Step 7: Results Step 8: Deploy to Production Once training completes, deploy the best model as a REST endpoint: # Create endpoint configuration online_endpoint_name = "metal-defects-classification" endpoint = ManagedOnlineEndpoint( name=online_endpoint_name, description="Metal defects image classification", auth_mode="key", tags={ "usecase": "metal defect", "type": "computer vision" }, ) # Deploy the endpoint ml_client.online_endpoints.begin_create_or_update(endpoint).result() Your model is now a production API endpoint, ready to classify images at scale. Beyond the Cloud: Edge Deployment with ONNX One of the most powerful aspects of this approach is flexibility in deployment. The repository includes a third notebook demonstrating how to export your trained model to ONNX (Open Neural Network Exchange) format for edge deployment. This means you can: Deploy models on IoT devices for real-time inference without cloud connectivity Reduce latency by processing images locally on edge hardware Lower costs by eliminating constant cloud API calls Ensure privacy by keeping sensitive images on-premises The ONNX export process is straightforward and integrates seamlessly with the AutoML workflow. Your cloud-trained model can run anywhere ONNX Runtime is supported — from Raspberry Pi devices to industrial controllers. import onnxruntime # Load the ONNX model session = onnxruntime.InferenceSession("model.onnx") # Run inference locally results = session.run(None, {input_name: image_data}) This cloud-to-edge workflow is particularly valuable for manufacturing, retail, and remote monitoring scenarios where edge processing is essential. Interactive webapp for image classification Interpreting model predictions Deployed endpoint returns base64 encoded image string if both model_explainability and visualizations are set to True. Why This Matters? In the AI era, the competitive advantage isn’t about who can build the most complex models — it’s about who can deploy effective solutions fastest. Azure AutoML for Images democratizes computer vision by making sophisticated ML accessible to a broader audience. Small teams can now accomplish what previously required dedicated ML specialists. Prototypes that took months can be built in days. And the quality? Often on par with or better than manually crafted solutions, thanks to AutoML’s systematic approach and access to cutting-edge techniques. What the Code Reveals Looking at the actual implementation reveals several important insights: Minimal Boilerplate: The entire training pipeline — from data upload to model deployment — requires less than 50 lines of meaningful code. Compare this to traditional PyTorch or TensorFlow implementations that often exceed several hundred lines. Built-in Best Practices: Notice how the code automatically manages concerns like data versioning, experiment tracking, and compute auto-scaling. These aren’t afterthoughts — they’re integral to the platform. Production-Ready from Day One: The deployed endpoint isn’t a prototype. It includes authentication, scaling, monitoring, and all the infrastructure needed for production workloads. You’re building production systems, not demos. Flexibility Without Complexity: The simple API hides complexity without sacrificing control. Need to specify a particular model architecture? One parameter. Want hyperparameter tuning? Add a few lines. The abstraction level is perfectly calibrated. Observable and Debuggable: The `.stream()` method and comprehensive logging mean you’re never in the dark about what’s happening. You can monitor training progress, inspect metrics, and debug issues — all critical for real projects. The Cost of Complexity Traditional ML projects fail not because of technology limitations but because of complexity. The learning curve is steep, the iteration cycles are long, and the resource requirements are high. By abstracting away this complexity, AutoML for Images changes the economics of computer vision projects. You can now: Validate ideas quickly: Test whether image classification solves your problem before committing significant resources Iterate faster: Experiment with different approaches in hours rather than weeks Scale expertise: Enable more team members to work with computer vision, not just ML specialists Conclusion Image classification is a fundamental building block for countless AI applications. Azure AutoML for Images makes it accessible, practical, and production-ready. Whether you’re a seasoned data scientist looking to accelerate your workflow or a developer taking your first steps into computer vision, this approach offers a compelling path forward. The future of ML isn’t about writing more complex code — it’s about writing smarter code that leverages powerful platforms to deliver business value faster. This repository shows you exactly how to do that. Practical Tips from the Code After reviewing the notebooks, here are some key takeaways for your own projects: Start with a Single Model: The basic configuration with `model_name=”resnet34"` is perfect for initial experiments. Only move to hyperparameter sweeps once you’ve validated your data and use case. Use Tags Strategically: The code demonstrates adding tags to jobs and endpoints (e.g., `”usecase”: “metal defect”`). This becomes invaluable when managing multiple experiments and models in production. Leverage Auto-Scaling: The compute configuration with `min_instances=0` means you’re not paying for idle resources. The cluster scales up when needed and scales down to zero when idle. Monitor Training Live: The `ml_client.jobs.stream()` method is your best friend during development. You see exactly what’s happening and can catch issues early. Version Your Data: Creating named data assets (`name=”metaldefectimagesds”`) means your experiments are reproducible. You can always trace back which data version produced which model. Think Cloud-to-Edge: Even if you’re deploying to the cloud initially, the ONNX export capability gives you flexibility for future edge scenarios without retraining. Resources Azure ML: https://azure.microsoft.com/en-us/products/machine-learning Demos notebooks: https://github.com/retkowsky/image-classification-azure-automl-for-images AutoML for Images documentation: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models Available models: Set up AutoML for computer vision — Azure Machine Learning | Microsoft Learn Connect with the author: https://www.linkedin.com/in/serger/81Views0likes0CommentsBringing GigaTIME to Microsoft Foundry: Unlocking Tumor Microenvironment Insights with Multimodal AI
Expanding Microsoft Foundry for Scientific AI Workloads AI is increasingly being applied to model complex real-world systems, from climate and industrial processes to human biology. In healthcare, one of the biggest challenges is translating routinely available data into deeper biological insight at scale. GigaTIME is now available in Microsoft Foundry, bringing advanced multimodal capabilities to healthcare and life sciences. This brings advanced multimodal capabilities into Foundry, with Foundry Labs enabling early exploration and the Foundry platform supporting scalable deployment aligned to the model’s intended use. Read on to understand how GigaTIME works and how you can start exploring it in Foundry. From Routine Slides to Deep Biological Insight Understanding how tumors interact with the immune system is central to modern cancer research. While techniques like multiplex immunofluorescence provide these insights, they are expensive and difficult to scale across large patient populations. GigaTIME addresses this challenge by translating widely available hematoxylin and eosin pathology slides into spatially resolved protein activation maps. This allows researchers to infer biological signals such as immune activity, tumor growth, and cellular interactions at a much deeper level. Developed in collaboration with Providence and the University of Washington, GigaTIME enables analysis of tumor microenvironments across diverse cancer types, helping accelerate discovery and improve understanding of disease biology. Use cases for GigaTIME GigaTIME is designed to support research and evaluation workflows across a range of real-world scientific scenarios: Population-Scale Tumor Microenvironment Analysis: Enable large-scale analysis of tumor–immune interactions by generating virtual multiplex immunofluorescence outputs from routine pathology slides. Biomarker Association Discovery: Identify relationships between protein activation patterns and clinical attributes such as mutations, biomarkers, and disease characteristics. Patient Stratification and Cohort Analysis: Segment patient populations across cancer types and subtypes using spatial and combinatorial signals for research and hypothesis generation. Clinical Trial Retrospective Analysis: Apply GigaTIME to H&E archives from completed clinical trials to retrospectively characterize tumor microenvironment features associated with treatment outcomes, enabling new insights from existing trial data without additional tissue processing. Tumor–Immune Interaction Analysis: Assess whether immune cells are infiltrating tumor regions or being excluded by analyzing spatial relationships between tumor and immune signals. Immune System Structure Characterization: Understand how immune cell populations are organized within tissue to evaluate coordination or fragmentation of immune response. Immune Checkpoint Context Interpretation: Examine how immune activity may be locally regulated by analyzing overlap between immune markers and checkpoint signals. Tumor Proliferation Analysis: Identify actively growing tumor regions by combining proliferation signals with tumor localization. Stromal and Vascular Context Understanding: Analyze how tissue architecture, such as vascular density and desmoplastic stroma, shapes immune cell access to tumor regions, helping characterize mechanisms of immune exclusion or infiltration. Start with Exploration, Then Go Deeper Explore in Foundry Labs Foundry Labs provides a lightweight environment for early exploration of emerging AI capabilities. It allows developers and researchers to quickly understand how models like GigaTIME behave before integrating them into production workflows. With GigaTIME in Foundry Labs, you can engage with real-world healthcare scenarios and explore how multimodal models translate pathology data into meaningful biological insight. Through curated experiences, you can: Run inference on pathology images Visualize spatial protein activation patterns Explore tumor and immune interactions in context This helps you build intuition and evaluate how the model can be applied to your specific use cases. Go Deeper with GitHub Examples For advanced scenarios, you can access the underlying notebooks and workflows via GitHub. These examples provide flexibility to customize pipelines, extend workflows, and integrate GigaTIME into broader research and application environments. Together, Foundry Labs and GitHub provide a path from guided exploration to deeper customization. Discover and Deploy GigaTIME in Microsoft Foundry Discover in the Foundry Catalog GigaTIME is available in the Foundry model catalog alongside a growing set of domain-specific models across healthcare, geospatial intelligence, physical systems and more. Deploy for Research Workflows For advanced usage, GigaTIME can be deployed as an endpoint within Foundry to support research and evaluation workflows such as: Biomarker discovery Patient stratification Clinical research pipelines You can start with early exploration in Foundry Labs and transition to scalable deployment on the Foundry platform using the tools and workflows designed for each stage, in line with the intended use of the model. A New Class of AI for Scientific Discovery GigaTIME reflects a broader shift toward AI systems designed to model real-world phenomena. These systems are multimodal, deeply tied to domain-specific data, and designed to produce spatial and contextual outputs. They rely on workflows that combine data processing, model inference, and interpretation, which requires platforms that support the full lifecycle from exploration to production. Microsoft Foundry is built to support this evolution. Learn More and Get Started To explore GigaTIME in more detail: Read the Microsoft Research blog on the underlying research, and population-scale findings. Try hands-on scenarios in Foundry Labs Access GitHub examples for advanced workflows Explore and deploy the model through the Foundry catalog Looking Ahead As AI continues to expand into domains like healthcare, climate science, and industrial systems, the ability to connect models, data, and workflows becomes increasingly important. GigaTIME highlights what is possible when these elements come together, transforming routinely available data into actionable scientific insight. We are excited to see what you build next.174Views0likes0CommentsNow in Foundry: Microsoft Harrier and NVIDIA EGM-8B
This week's Model Mondays edition highlights three models that share a common thread: each achieves results comparable to larger leading models, as a result of targeted training strategies rather than scale. Microsoft Research's harrier-oss-v1-0.6b from achieves state-of-the-art results on the Multilingual MTEB v2 embedding benchmark at 0.6B parameters through contrastive learning and knowledge distillation. NVIDIA's EGM-8B scores 91.4 average IoU on the RefCOCO visual grounding benchmark by training a small Vision Language Model (VLM) with reinforcement learning to match the output quality of much larger models. Together they represent a practical argument for efficiency-first model development: the gap between small and large models continues to narrow when training methodology is the focus rather than parameter count alone. Models of the week Microsoft Research: harrier-oss-v1-0.6b Model Specs Parameters / size: 0.6B Context length: 32,768 tokens Primary task: Text embeddings (retrieval, semantic similarity, classification, clustering, reranking) Why it's interesting State-of-the-art on Multilingual MTEB v2 from Microsoft Research: harrier-oss-v1-0.6b is a new embedding model released by Microsoft Research, achieving a 69.0 score on the Multilingual MTEB v2 (Massive Text Embedding Benchmark) leaderboard—placing it at the top of its size class at release. It is part of the harrier-oss family spanning harrier-oss-v1-270m (66.5 MTEB v2), harrier-oss-v1-0.6b (69.0), and harrier-oss-v1-27b (74.3), with the 0.6B variant further trained with knowledge distillation from the larger family members. Benchmarks: Multilingual MTEB v2 Leaderboard. Decoder-only architecture with task-instruction queries: Unlike most embedding models that use encoder-only transformers, harrier-oss-v1-0.6b uses a decoder-only architecture with last-token pooling and L2 normalization. Queries are prefixed with a one-sentence task instruction (e.g., "Instruct: Retrieve relevant passages that answer the query\nQuery: ...") while documents are encoded without instructions—allowing the same deployed model to be specialized for retrieval, classification, or similarity tasks through the prompt alone. Broad task coverage across six embedding scenarios: The model is trained and evaluated on retrieval, clustering, semantic similarity, classification, bitext mining, and reranking—making it suitable as a general embedding backbone for multi-task pipelines rather than a single-use retrieval model. One endpoint, consistent embeddings across the stack. 100+ language support: Trained on a large-scale mixture of multilingual data covering Arabic, Chinese, Japanese, Korean, and 100+ additional languages, with strong cross-lingual transfer for tasks that span language boundaries. Try it Use Case Prompt Pattern Multilingual semantic search Prepend task instruction to query; encode documents without instruction; rank by cosine similarity Cross-lingual document clustering Embed documents across languages; apply clustering to group semantically related content Text classification with embeddings Encode labeled examples + new text; classify by nearest-neighbor similarity in embedding space Bitext mining Encode parallel corpora in source and target languages; align segments by embedding similarity Sample prompt for a global enterprise knowledge base deployment: You are building a multilingual internal knowledge base for a global professional services firm. Using the harrier-oss-v1-0.6b endpoint deployed in Microsoft Foundry, encode all internal documents—policy guides, project case studies, and technical documentation—across English, French, German, and Japanese. At query time, prepend the task instruction to each employee query: "Instruct: Retrieve relevant internal documents that answer the employee's question\nQuery: {question}". Retrieve the top-5 most similar documents by cosine similarity and pass them to a language model with the instruction: "Using only the provided documents, answer the question and cite the source document title for each claim. If no document addresses the question, say so." NVIDIA: EGM-8B Model Specs Parameters / size: ~8.8B Context length: 262,144 tokens Primary task: Image-text-to-text (visual grounding) Why it's interesting Preforms well on visual grounding compared to larger models even at its small size: EGM-8B achieves 91.4 average Intersection over Union (IoU) on the RefCOCO benchmark—the standard measure of how accurately a model localizes a described region within an image. Compared to its base model Qwen3-VL-8B-Thinking (87.8 IoU), EGM-8B achieves a +3.6 IoU gain through targeted Reinforcement Learning (RL) fine-tuning. Benchmarks: EGM Project Page. 5.9x faster than larger models at inference: EGM-8B achieves 737ms average latency. The research demonstrates that test-time compute can be scaled horizontally across small models—generating many medium-quality responses and selecting the best—rather than relying on a single expensive forward pass through a large model. Two-stage training: EGM-8B is trained first with Supervised Fine-Tuning (SFT) on detailed chain-of-thought reasoning traces generated by a proprietary VLM, then refined with Group Relative Policy Optimization (GRPO) using a reward function combining IoU accuracy and task success. The intermediate SFT checkpoint is available as nvidia/EGM-8B-SFT for developers who want to experiment with the intermediate stage. Addresses a root cause of small model grounding errors: The EGM research identifies that 62.8% of small model errors on visual grounding stem from complex multi-relational descriptions—where a model must reason about spatial relationships, attributes, and context simultaneously. By focusing test-time compute on reasoning through these complex prompts, EGM-8B closes the gap without increasing the underlying model size. Try it Use Case Prompt Pattern Object localization Submit image + natural language description; receive bounding box coordinates Document region extraction Provide scanned document image + field description; extract specific regions Visual quality control Submit product image + defect description; localize defect region for downstream classification Retail shelf analysis Provide shelf image + product description; return location of specified SKU Sample prompt for a retail and logistics deployment: You are building a visual inspection system for a logistics warehouse. Using the EGM-8B endpoint deployed in Microsoft Foundry, submit each incoming package scan image along with a natural language grounding query describing the region of interest: "Please provide the bounding box coordinate of the region this sentence describes: {description}". For example: "the label on the upper-left side of the box", "the barcode on the bottom face", or "the damaged corner on the right side". Use the returned bounding box coordinates to route each package to the appropriate inspection station based on the identified region. Getting started You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation: Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry231Views0likes0CommentsPower Up Your Open WebUI with Azure AI Speech: Quick STT & TTS Integration
Introduction Ever found yourself wishing your web interface could really talk and listen back to you? With a few clicks (and a bit of code), you can turn your plain Open WebUI into a full-on voice assistant. In this post, you’ll see how to spin up an Azure Speech resource, hook it into your frontend, and watch as user speech transforms into text and your app’s responses leap off the screen in a human-like voice. By the end of this guide, you’ll have a voice-enabled web UI that actually converses with users, opening the door to hands-free controls, better accessibility, and a genuinely richer user experience. Ready to make your web app speak? Let’s dive in. Why Azure AI Speech? We use Azure AI Speech service in Open Web UI to enable voice interactions directly within web applications. This allows users to: Speak commands or input instead of typing, making the interface more accessible and user-friendly. Hear responses or information read aloud, which improves usability for people with visual impairments or those who prefer audio. Provide a more natural and hands-free experience especially on devices like smartphones or tablets. In short, integrating Azure AI Speech service into Open Web UI helps make web apps smarter, more interactive, and easier to use by adding speech recognition and voice output features. If you haven’t hosted Open WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Open WebUI deployed already. Learn More about OpenWeb UI here. Deploy Azure AI Speech service in Azure. Navigate to the Azure Portal and search for Azure AI Speech on the Azure portal search bar. Create a new Speech Service by filling up the fields in the resource creation page. Click on “Create” to finalize the setup. After the resource has been deployed, click on “View resource” button and you should be redirected to the Azure AI Speech service page. The page should display the API Keys and Endpoints for Azure AI Speech services, which you can use in Open Web UI. Settings things up in Open Web UI Speech to Text settings (STT) Head to the Open Web UI Admin page > Settings > Audio. Paste the API Key obtained from the Azure AI Speech service page into the API key field below. Unless you use different Azure Region, or want to change the default configurations for the STT settings, leave all settings to blank. Text to Speech settings (TTS) Now, let's proceed with configuring the TTS Settings on OpenWeb UI by toggling the TTS Engine to Azure AI Speech option. Again, paste the API Key obtained from Azure AI Speech service page and leave all settings to blank. You can change the TTS Voice from the dropdown selection in the TTS settings as depicted in the image below: Click Save to reflect the change. Expected Result Now, let’s test if everything works well. Open a new chat / temporary chat on Open Web UI and click on the Call / Record button. The STT Engine (Azure AI Speech) should identify your voice and provide a response based on the voice input. To test the TTS feature, click on the Read Aloud (Speaker Icon) under any response from Open Web UI. The TTS Engine should reflect Azure AI Speech service! Conclusion And that’s a wrap! You’ve just given your Open WebUI the gift of capturing user speech, turning it into text, and then talking right back with Azure’s neural voices. Along the way you saw how easy it is to spin up a Speech resource in the Azure portal, wire up real-time transcription in the browser, and pipe responses through the TTS engine. From here, it’s all about experimentation. Try swapping in different neural voices or dialing in new languages. Tweak how you start and stop listening, play with silence detection, or add custom pronunciation tweaks for those tricky product names. Before you know it, your interface will feel less like a web page and more like a conversation partner.2KViews3likes2CommentsAzure IoT Operations 2603 is now available: Powering the next era of Physical AI
Industrial AI is entering a new phase. For years, AI innovation has largely lived in dashboards, analytics, and digital decision support. Today, that intelligence is moving into the real world, onto factory floors, oil fields, and production lines, where AI systems don’t just analyze data, but sense, reason, and act in physical environments. This shift is increasingly described as Physical AI: intelligence that operates reliably where safety, latency, and real‑world constraints matter most. With the Azure IoT Operations 2603 (v1.3.38) release, Microsoft is delivering one of its most significant updates to date, strengthening the platform foundation required to build, deploy, and operate Physical AI systems at industrial scale. Why Physical AI needs a new kind of platform Physical AI systems are fundamentally different from digital‑only AI. They require: Real‑time, low‑latency decision‑making at the edge Tight integration across devices, assets, and OT systems End‑to‑end observability, health, and lifecycle management Secure cloud‑to‑edge control planes with governance built in Industry leaders and researchers increasingly agree that success in Physical AI depends less on isolated models, and more on software platforms that orchestrate data, assets, actions, and AI workloads across the physical world. Azure IoT Operations was built for exactly this challenge. What’s new in Azure IoT Operations 2603 The 2603 release delivers major advancements across data pipelines, connectivity, reliability, and operational control, enabling customers to move faster from experimentation to production‑grade Physical AI. Cloud‑to‑edge management actions Cloud‑to‑edge management actions enable teams to securely execute control and configuration operations on on‑premises assets, such as invoking methods, writing values, or adjusting settings, using Azure Resource Manager and Event Grid–based MQTT messaging. This capability extends the Azure control plane beyond the cloud, allowing intent, policy, and actions to be delivered reliably to physical systems while remaining decoupled from protocol and device specifics. For Physical AI, this closes the loop between perception and action: insights and decisions derived from models can be translated into governed, auditable changes in the physical world, even when assets operate in distributed or intermittently connected environments. Built‑in RBAC, managed identity, and activity logs ensure every action is authorized, traceable, and compliant, preserving safety, accountability, and human oversight as intelligence increasingly moves from observation to autonomous execution at the edge. No‑code dataflow graphs Azure IoT Operations makes it easier to build real‑time data pipelines at the edge without writing custom code. No‑code data flow graphs let teams design visual processing pipelines using built‑in transforms, with improved reliability, validation, and observability. Visual Editor – Build multi-stage data processing systems in the Operations Experience canvas. Drag and connect sources, transforms, and destinations visually. Configure map rules, filter conditions, and window durations inline. Deploy directly from the browser or define in Bicep/YAML for GitOps. Composable Transforms, Any Order – Chain map, filter, branch, concatenate, and window transforms in any sequence. Branch splits messages down parallel paths based on conditions. Concatenate merges them back. Route messages to different MQTT topics based on content. No fixed pipeline shape. Expressions, Enrichment, and Aggregation – Unit conversions, math, string operations, regex, conditionals, and last-known-value lookups, all built into the expression language. Enrich messages with external data from a state store. Aggregate high-frequency sensor data over tumbling time windows to compute averages, min/max, and counts. Open and Extensible – Connect to MQTT, Kafka, and OpenTelemetry (OTel) endpoints with built-in security through Azure Key Vault and managed identities. Need logic beyond what no-code covers? Drop a custom Wasm module (even embed and run ONNX AI ML models) into the middle of any graph alongside built-in transforms. You're never locked into declarative configuration. Together, these capabilities allow teams to move from raw telemetry to actionable signals directly at the edge without custom code or fragile glue logic. Expanded, production‑ready connectivity The MQTT connector enables customers to onboard MQTT devices as assets and route data to downstream workloads using familiar MQTT topics, with the flexibility to support unified namespace (UNS) patterns when desired. By leveraging MQTT’s lightweight publish/subscribe model, teams can simplify connectivity and share data across consumers without tight coupling between producers and applications. This is especially important for Physical AI, where intelligent systems must continuously sense state changes in the physical world and react quickly based on a consistent, authoritative operational context rather than fragmented data pipelines. Alongside MQTT, Azure IoT Operations continues to deliver broad, industrial‑grade connectivity across OPC UA, ONVIF, Media, REST/HTTP, and other connectors, with improved asset discovery, payload transformation, and lifecycle stability, providing the dependable connectivity layer Physical AI systems rely on to understand and respond to real‑world conditions. Unified health and observability Physical AI systems must be trustworthy. Azure IoT Operations 2603 introduces unified health status reporting across brokers, dataflows, assets, connectors, and endpoints, using consistent states and surfaced through both Kubernetes and Azure Resource Manager. This enables operators to see—not guess—when systems are ready to act in the physical world. Optional OPC UA connector deployment Azure IoT Operations 2603 introduces optional OPC UA connector deployment, reinforcing a design goal to keep deployments as streamlined as possible for scenarios that don’t require OPC UA from day one. The OPC UA connector is a discrete, native component of Azure IoT Operations that can be included during initial instance creation or added later as needs evolve, allowing teams to avoid unnecessary footprint and complexity in MQTT‑only or non‑OPC deployments. This reflects the broader architectural principle behind Azure IoT Operations: a platform built for composability and decomposability, where capabilities are assembled based on scenario requirements rather than assumed defaults, supporting faster onboarding, lower resource consumption, and cleaner production rollouts without limiting future expansion. Broker reliability and platform hardening The 2603 release significantly improves broker reliability through graceful upgrades, idempotent replication, persistence correctness, and backpressure isolation—capabilities essential for always‑on Physical AI systems operating in production environments. Physical AI in action: What customers are achieving today Azure IoT Operations is already powering real‑world Physical AI across industries, helping customers move beyond pilots to repeatable, scalable execution. Procter & Gamble Consumer goods leader P&G continually looks for ways to drive manufacturing efficiency and improve overall equipment effectiveness—a KPI encompassing availability, performance, and quality that’s tracked in P&G facilities around the world. P&G deployed Azure IoT Operations, enabled by Azure Arc, to capture real-time data from equipment at the edge, analyze it in the cloud, and deploy predictive models that enhance manufacturing efficiency and reduce unplanned downtime. Using Azure IoT Operations and Azure Arc, P&G is extrapolating insights and correlating them across plants to improve efficiency, reduce loss, and continue to drive global manufacturing technology forward. More info. Husqvarna Husqvarna Group faced increasing pressure to modernize its fragmented global infrastructure, gain real-time operational insights, and improve efficiency across its supply chain to stay competitive in a rapidly evolving digital and manufacturing landscape. Husqvarna Group implemented a suite of Microsoft Azure solutions—including Azure Arc, Azure IoT Operations, and Azure OpenAI—to unify cloud and on-premises systems, enable real-time data insights, and drive innovation across global manufacturing operations. With Azure, Husqvarna Group achieved 98% faster data deployment and 50% lower infrastructure imaging costs, while improving productivity, reducing downtime, and enabling real-time insights across a growing network of smart, connected factories. More info. Chevron With its Facilities and Operations of the Future initiative, Chevron is reimagining the monitoring of its physical operations to support remote and autonomous operations through enhanced capabilities and real-time access to data. Chevron adopted Microsoft Azure IoT Operations, enabled by Azure Arc, to manage and analyze data locally at remote facilities at the edge, while still maintaining a centralized, cloud-based management plane. Real-time insights enhance worker safety while lowering operational costs, empowering staff to focus on complex, higher-value tasks rather than routine inspections. More info. A platform purpose‑built for Physical AI Across manufacturing, energy, and infrastructure, the message is clear: the next wave of AI value will be created where digital intelligence meets the physical world. Azure IoT Operations 2603 strengthens Microsoft’s commitment to that future—providing the secure, observable, cloud‑connected edge platform required to build Physical AI systems that are not only intelligent, but dependable. Get started To explore the full Azure IoT Operations 2603 release, review the public documentation and release notes, and start building Physical AI solutions that operate and scale confidently in the real world.333Views2likes0CommentsGrok 4.20 is now available in Microsoft Foundry
Grok 4.20 from xAI is now available in Microsoft Foundry. Microsoft Foundry is designed to help teams move from model exploration to production-grade AI systems with consistency and control. With the addition of Grok 4.20, customers can evaluate advanced reasoning models within a governed environment, apply safety and content policies, and operationalize them with confidence. About Grok 4.20 Grok 4.20 is a general-purpose large language model in the Grok 4.x family, designed for reasoning-intensive and real-world problem-solving tasks. A key architectural concept is its agentic “swarm” approach, where multiple specialized agents collaborate across workflows combining reasoning, coding, retrieval, and coordination to improve accuracy on complex tasks. This release also reflects a rapid iteration model, with frequent updates that teams can continuously evaluate in Foundry using consistent benchmarks and datasets. Capabilities According to xAI, Grok 4.20 introduces a set of capabilities that teams can evaluate directly in Foundry for their specific workloads: High reliability and reduced hallucination Grok 4.20 is designed to prioritize truthfulness, favoring grounded responses and explicitly acknowledging uncertainty when needed. Multi-agent verification patterns help reduce errors and improve reliability for high-stakes applications. Strong instruction following and consistency The model demonstrates high adherence to prompts, system instructions, and structured workflows, enabling more predictable and controllable outputs across agentic and multi-step tasks. Multi-agent reasoning (agentic swarms) Specialized agents working in parallel enable stronger reasoning across complex workflows, improving performance in areas like coding, analysis, and decision-making. Fast performance and cost efficiency Grok 4.20 is optimized for high-throughput, low-latency workloads, making it suitable for interactive applications and large-scale deployments while maintaining strong cost efficiency. Tool use and real-time retrieval The model is designed to integrate with tools and external data sources, enabling real-time search, retrieval, and evidence-grounded responses for dynamic and time-sensitive scenarios. Coding and technical reasoning Grok 4.20 performs well on code generation, debugging, and iterative development tasks, making it a strong fit for developer workflows and agent-driven engineering scenarios. Long-form and creative workflows With strong instruction adherence and long context, the model supports consistent long-form generation, including document creation, storytelling, and structured content pipelines. Pricing Model Deployment Input/1M Tokens Output/1M Tokens Availability Grok 4.20 Global Standard $2.00 $6.00 Public Preview Using Grok 4.20 in Foundry Grok 4.20 is available in the Foundry model catalog for teams that want to evaluate and operationalize the model within a standardized pipeline. Foundry provides: Tools to run repeatable evaluations on your own datasets The ability to apply scenario-specific safety and content-filtering policies Managed endpoints with monitoring and governance for production deployments To get started, select Grok 4.20 in the Foundry model catalog and run an initial evaluation with a small prompt set. You can expand to broader and more complex scenarios, compare results across models, and deploy the configuration that best meets your requirements. Conclusion Microsoft Foundry enables teams to explore models like Grok 4.20 within a governed environment designed for evaluation and production deployment. By combining model choice with consistent evaluation, safety controls, and operational tooling, teams can move from experimentation to reliable AI systems faster. Try Grok 4.20 in Foundry Models Today809Views1like0CommentsAdvancing to Agentic AI with Azure NetApp Files VS Code Extension v1.2.0
The Azure NetApp Files VS Code Extension v1.2.0 introduces a major leap toward agentic, AI‑informed cloud operations with the debut of the autonomous Volume Scanner. Moving beyond traditional assistive AI, this release enables intelligent infrastructure analysis that can detect configuration risks, recommend remediations, and execute approved changes under user governance. Complemented by an expanded natural language interface, developers can now manage, optimize, and troubleshoot Azure NetApp Files resources through conversational commands - from performance monitoring to cross‑region replication, backup orchestration, and ARM template generation. Version 1.2.0 establishes the foundation for a multi‑agent system built to reduce operational toil and accelerate a shift toward self-managing enterprise storage in the cloud.170Views0likes0Comments