updates
58 TopicsBuilding AI Apps with the Foundry Local C# SDK
What Is Foundry Local? Foundry Local is a lightweight runtime designed to run AI models directly on user devices. It supports a wide range of hardware (CPU, GPU, NPU) and provides a consistent developer experience across platforms. The SDKs are available in multiple languages, including Python, JavaScript, Rust, and now C#. Why a C# SDK? The C# SDK brings Foundry Local into the heart of the .NET ecosystem. It allows developers to: Download and manage models locally. Run inference using OpenAI-compatible APIs. Integrate seamlessly with existing .NET applications. This means you can build intelligent apps that run offline, reduce latency, and maintain data privacy—all without sacrificing developer productivity. Bootstrap Process: How the SDK Gets You Started One of the most developer-friendly aspects of the C# SDK is its automatic bootstrap process. Here's what happens under the hood when you initialise the SDK: Service Discovery and Startup The SDK automatically locates the Foundry Local installation on the device and starts the inference service if it's not already running. Model Download and Caching If the specified model isn't already cached locally, the SDK will download the most performant model variant (e.g. GPU, CPU, NPU) for the end user's hardware from the Foundry model catalog. This ensures you're always working with the latest optimised version. Model Loading into Inference Service Once downloaded (or retrieved from cache), the model is loaded into the Foundry Local inference engine, ready to serve requests. This streamlined process means developers can go from zero to inference with just a few lines of code—no manual setup or configuration required. Leverage Your Existing AI Stack One of the most exciting aspects of the Foundry Local C# SDK is its compatibility with popular AI tools such as: OpenAI SDK - Foundry local provides an OpenAI compliant chat completions (and embedding) API meaning. If you’re already using `OpenAI` chat completions API, you can reuse your existing code with minimal changes. Semantic Kernel - Foundry Local also integrates well with Semantic Kernel, Microsoft’s open-source framework for building AI agents. You can use Foundry Local models as plugins or endpoints within Semantic Kernel workflows—enabling advanced capabilities like memory, planning, and tool calling. Quick Start Example Follow these three steps: 1. Create a new project Create a new C# project and navigate to it: dotnet new console -n hello-foundry-local cd hello-foundry-local 2. Install NuGet packages Install the following NuGet packages into your project: dotnet add package Microsoft.AI.Foundry.Local --version 0.1.0 dotnet add package OpenAI --version 2.2.0-beta.4 3. Use the OpenAI SDK with Foundry Local The following example demonstrates how to use the OpenAI SDK with Foundry Local. The code initializes the Foundry Local service, loads a model, and generates a response using the OpenAI SDK. Copy-and-paste the following code into a C# file named Program.cs: using Microsoft.AI.Foundry.Local; using OpenAI; using OpenAI.Chat; using System.ClientModel; using System.Diagnostics.Metrics; var alias = "phi-3.5-mini"; var manager = await FoundryLocalManager.StartModelAsync(aliasOrModelId: alias); var model = await manager.GetModelInfoAsync(aliasOrModelId: alias); ApiKeyCredential key = new ApiKeyCredential(manager.ApiKey); OpenAIClient client = new OpenAIClient(key, new OpenAIClientOptions { Endpoint = manager.Endpoint }); var chatClient = client.GetChatClient(model?.ModelId); var completionUpdates = chatClient.CompleteChatStreaming("Why is the sky blue'"); Console.Write($"[ASSISTANT]: "); foreach (var completionUpdate in completionUpdates) { if (completionUpdate.ContentUpdate.Count > 0) { Console.Write(completionUpdate.ContentUpdate[0].Text); } } Run the code using the following command: dotnet run Final thoughts The Foundry Local C# SDK empowers developers to build intelligent, privacy-preserving applications that run anywhere. Whether you're working on desktop, mobile, or embedded systems, this SDK offers a robust and flexible way to bring AI closer to your users. Ready to get started? Dive into the official documentation: Getting started guide C# Reference documentation You can also make contributions to the C# SDK by creating a PR on GitHub: Foundry Local on GitHub302Views0likes0CommentsAnnouncing a new Azure AI Translator API (Public Preview)
Microsoft has launched the Azure AI Translator API (Public Preview), offering flexible translation options using either neural machine translation (NMT) or generative AI models like GPT-4o. The API supports tone, gender, and adaptive custom translation, allowing enterprises to tailor output for real-time or human-reviewed workflows. Customers can mix models in a single request and authenticate via resource key or Entra ID. LLM features require deployment in Azure AI Foundry. Pricing is based on characters (NMT) or tokens (LLMs).679Views0likes0CommentsThe Future of AI: Vibe Code with Adaptive Custom Translation
This blog explores how vibe coding—a conversational, flow-based development approach—was used to build the AdaptCT playground in Azure AI Foundry. It walks through setting up a productive coding environment with GitHub Copilot in Visual Studio Code, configuring the Copilot agent, and building a translation playground using Adaptive Custom Translation (AdaptCT). The post includes real-world code examples, architectural insights, and advanced UI patterns. It also highlights how AdaptCT fine-tunes LLM outputs using domain-specific reference sentence pairs, enabling more accurate and context-aware translations. The blog concludes with best practices for vibe coding teams and a forward-looking view of AI-augmented development paradigms.427Views0likes0CommentsAnnouncing gpt-realtime on Azure AI Foundry:
We are thrilled to announce that we are releasing today the general availability of our latest advancement in speech-to-speech technology: gpt-realtime. This new model represents a significant leap forward in our commitment to providing advanced and reliable speech-to-speech solutions. gpt-realtime is a new S2S (speech-to-speech) model with improved instruction following, designed to merge all of our speech-to-speech improvements into a single, cohesive model. This model is now available in the Real-time API, offering enhanced voice naturalness, higher audio quality, and improved function calling capabilities. Key Features New, natural, expressive voices: New voice options (Marin and Cedar) that bring a new level of naturalness and clarity to speech synthesis. Improved Instruction Following: Enhanced capabilities to follow instructions more accurately and reliably. Enhanced Voice Naturalness: More lifelike and expressive voice output. Higher Audio Quality: Superior audio quality for a better user experience. Improved Function Calling: Enhanced ability to call custom code defined by developers. Image Input Support: Add images to context and discuss them via voice—no video required. Check out the model card here: gpt-realtime Pricing Pricing for gpt-realtime is 20% lower compared to the previous gpt-4o-realtime preview: Pricing is based on usage per 1 million tokens. Below is the breakdown: Getting Started gpt-realtime is available on Azure AI Foundry via Azure Models direct from Azure today. We are excited to see how developers and users will leverage these new capabilities to create innovative and impactful solutions. Check out the model on Azure AI Foundry and see detailed documentation in Microsoft Learn docs.3.5KViews1like0CommentsAnnouncing the Text PII August preview model release in Azure AI language
Azure AI Language is excited to announce a new preview model release for the PII (Personally Identifiable Information) redaction service, which includes support for more entities and languages, addressing customer-sourced scenarios and international use cases. What’s New | Updated Model 2025-08-01-preview Tier 1 language support for DateOfBirth entity: expanding upon the original English-only support earlier this year, we’ve added support for all Tier 1 languages: French, German, Italian, Spanish, Portuguese, Brazilian Portuguese, and Dutch New entity support: SortCode - a financial code used in the UK and Ireland to identify the specific bank and branch where an account is held. Currently we support this in only English. LicensePlateNumber - the standard alphanumeric code for vehicle identification. Note that our current scope does not support a license plate that contains only letters. Currently we support this in only English. AI quality improvements for financial entities, reducing false positives/negatives These updates respond directly to customer feedback and address gaps in entity coverage and language support. The broader language support enables global deployments and the new entity types allow for more comprehensive data extraction for our customers. This ensures an improved service quality for financial, criminal justice, and many other regulatory use cases, enabling more accurate and reliable service for our customers. Get started A more detailed tutorial and overview of the service feature can be found in our public docs. Learn more about these releases and several others enhancing our Azure AI Language offerings on our What’s new page. Explore Azure AI Language and its various capabilities Access full pricing details on the Language Pricing page Find the list of sensitive PII entities supported Try out Azure AI Foundry for a code-free experience We are looking forward to continuously improving our product offerings and features to meet customer needs and are keen to hear any comments and feedback.298Views1like0CommentsHubs and Workspaces on Azure Machine Learning – General Availability
We are pleased to announce that hubs and workspaces is now generally available on Azure machine learning allowing users to use hub for team’s collaboration environment for machine learning applications. Azure Hubs and Workspaces provide a centralized platform capability for Azure Machine Learning. This feature enables developers to innovate faster by creating project workspaces and accessing shared company resources without needing repeated assistance from IT administrators. Quick Model Building and Experimentation without IT bottleneck Hubs and Workspaces in Azure Machine Learning provide a centralized solution for managing machine learning resources. Hubs act as a central resource management construct that oversees security, connectivity, computing resources, and team quotas. Once created, they allow developers to create individual workspaces to manage their tasks while adhering to IT setup guidelines. Key Benefits Centralized Management: Hubs allow for centralized settings such as connectivity, compute resources, and security, making it easier for IT admins to manage resources and monitor costs. Cost Efficiency: Utilizing a hub workspace for sharing and reusing configurations enhances cost efficiency when deploying Azure Machine Learning on a large scale. There is a cost associated with setting separate firewall per workspace which scales up as the number of workspaces go up. With hubs, only one firewall is needed which extends across workspaces saving cost. Resource Management: Hubs provide a single pool of compute across workspaces on a user level, eliminating repetitive compute setup and duplicate management steps. This ensures higher utilization of available capacity and fair share of compute resources. Improved Security and Compliance: Hubs act as security boundaries, ensuring that different teams can work in isolated environments without compromising security. Simplified Workspace Creation: Hubs allow for the creation of "light-weight" workspaces in a single step by an ML professional. Enhanced Collaboration: Hubs enable better collaboration among data scientists by providing a centralized platform for managing projects and resource How to get started with Hubs and Projects There are different ways to create hubs for users. You can create hubs via Azure portal, with Azure Resource Manager templates, or via Azure Machine Learning SDK/CLI. Hub properties like networking, monitoring, encryption, identity can be customized while creating a hub and can be set depending on org’s requirements. Workspaces associated with a hub will share hub’s security, connectivity and compute resources. While creating hubs via ML Studio is not supported currently, once hub is created users can create workspaces which get shared access to the company resources made available by the administrator including compute, security and connections. Besides ML Studio, workspaces can be created via Using Azure SDK, Using automation templates, Using Azure CLI. Secure access for Azure resources For accessing data sources outside hubs, connections can help make data available to Azure machine learning. External sources like Snowflake DB, Amazon S3 and Azure SQL DB can be connected to AML resources. Users can also set access permissions to the azure resources with Role based access controls. Besides default built-in roles, users can also create custom roles for more granular access. To conclude, the General Availability of Azure Machine Learning Hubs and Workspaces marks a significant milestone in our commitment to providing scalable, secure, and efficient machine learning solutions. We look forward to seeing how our customers leverage this new feature to drive innovation and achieve their business goals. For more information on hubs and workspaces in Azure machine learning, please refer the following links. What are Azure hubs and workspaces - AML Manage AML hub workspaces in the portal Create a hub using AML SDK and CLI626Views0likes0CommentsThe Future of AI: Computer Use Agents Have Arrived
Discover the groundbreaking advancements in AI with Computer Use Agents (CUAs). In this blog, Marco Casalaina shares how to use the Responses API from Azure OpenAI Service, showcasing how CUAs can launch apps, navigate websites, and reason through tasks. Learn how CUAs utilize multimodal models for computer vision and AI frameworks to enhance automation. Explore the differences between CUAs and traditional Robotic Process Automation (RPA), and understand how CUAs can complement RPA systems. Dive into the future of automation and see how CUAs are set to revolutionize the way we interact with technology.10KViews6likes0CommentsAnnouncing DeepSeek-V3 on Azure AI Foundry and GitHub
We are pleased to announce the availability of DeepSeek-V3 on Azure AI Foundry model catalog with token-based billing. This latest iteration is part of our commitment to enable powerful, efficient, and accessible AI solutions through the breadth and diversity of choice in the model catalog.8.7KViews3likes0CommentsNew controls for model governance and secure access to on-premises or custom VNET resources
Learn how to create an allowed model list for the Azure AI model catalog, plus a new way to access on-premises and custom VNET resources from your managed VNET for your training, fine-tuning, and inferencing scenarios.3.3KViews3likes1CommentDiscover the Azure AI Training Profiler: Transforming Large-Scale AI Jobs
Meet the AI Training Profiler Large-scale AI training can be complicated, especially in distributed environments like healthcare, finance, and e-commerce, where the need for accuracy, speed, and massive data processing is crucial. Efficiently managing hardware resources, ensuring smooth parallelism, and minimizing bottlenecks are crucial for optimal performance. The AI Training Profiler powered by PyTorch Profiler inAzure Machine Learning is here to help! By giving you detailed visibility into hardware and software metrics, this tool helps you spot inefficiencies, make the best use of resources, and scale your training workflows like a pro. Why Choose the AI Training Profiler? Running large AI training jobs on distributed infrastructure is inherently complex, and inefficiencies can quickly escalate into increased costs and delays in deploying models. The AI Training Profiler addresses these issues by providing a comprehensive breakdown of compute resource usage throughout the training lifecycle. This enables users to fine-tune and streamline their AI workflows, yielding several key benefits: Improved Performance: Identify bottlenecks and inefficiencies, such as slow data loading or underutilized GPUs, to enhance training throughput. Reduced Costs: Detect idle or underused resources, thereby minimizing compute time and hardware expenses. Faster Debugging: Leverage real-time monitoring and intuitive visualizations to troubleshoot performance issues swiftly. Key Features of the AI Training Profiler GPU Core and Tensor Core Utilization The profiler meticulously tracks GPU kernel execution, reporting utilization metrics such as time spent on forward and backward passes, tensor core operations, and other computation-heavy tasks. This detailed breakdown enables users to pinpoint under-utilized resources and optimize kernel execution patterns. Memory Profiling Memory Allocation and Peak Usage: Monitors GPU memory usage throughout the training process, offering insights into underutilized or over-allocated memory. CUDA Memory Footprint: Visualizes memory consumption during forward/backward propagation and optimizer steps to identify bottlenecks or fragmentation. Page Fault and Out-of-Memory Events: Detects critical events that could slow training or cause job failures due to insufficient memory allocation. Kernel Execution Metrics Kernel Execution Time: Provides per-kernel timing, breaking down execution into compute-bound and memory-bound operations, allowing users to discern whether performance bottlenecks stem from inefficient kernel launches or memory access patterns. Instruction-level Performance: Measures IPC (Instructions Per Cycle) to understand kernel-level performance and identify inefficient operations. Distributed Training Communication Primitives: Captures inter-GPU and inter-node communication patterns, focusing on the performance of primitives like AllReduce, AllGather, and Broadcast in multi-GPU training. This helps users identify communication bottlenecks such as imbalanced data distribution or excessive communication overhead. Synchronization Events: Measures the time spent on synchronization barriers between GPUs, highlighting where parallel execution is slowed by synchronization. Getting Started with the Profiling Process Using the AI Training Profiler is a breeze! Activate it when you launch a job, either through the CLI or our platform’s user-friendly interface. Here are the three environment variables you need to set: Enable/Disable the Profiler: ENABLE_AZUREML_TRAINING_PROFILER: 'true' Configure Trace Capture Duration: AZUREML_PROFILER_RUN_DURATION_MILLISECOND: '50000' Delay the Start of Trace Capturing: AZUREML_PROFILER_WAIT_DURATION_SECOND: '1200' Once your training job is running, the profiler collects metrics and stores them centrally. After the run, this data is analyzed to give you visual insights into critical metrics like kernel execution times. Use Cases The AI Training Profiler is a game-changer for fine-tuning large language models and other extensive architectures. By ensuring efficient GPU utilization and minimizing distributed training costs, this tool helps organizations get the most out of their infrastructure, whether they're working on cutting-edge models or refining existing workflows. In conclusion, the AI Training Profiler is a must-have for teams running large-scale AI training jobs. It offers the visibility and control needed to optimize resource utilization, reduce costs, and accelerate time to results. Embrace the future of AI training optimization with the AI Training Profiler and unlock the full potential of your AI endeavors. How to Get Started? The feature is available as a preview, you can just set up the environment variables and start using the profiler! Stay tuned for future repository with many samples that you can use as well!814Views2likes0Comments