Shaping tomorrow: Developing and deploying generative AI apps responsibly with Azure AI Studio
Published May 21 2024 08:30 AM 8,019 Views

In the last twelve months, from casual conversations at home to in-depth debates on tech forums, the buzz around generative AI has been pervasive. It is changing how companies think about their products, how they develop software, and how they leverage technology themselves for improved productivity.


In 2023, companies across the globe took time to understand the technology’s capabilities and applicability. However, a parallel realization surfaced: unconstrained, generative AI makes errors. It might generate non-existent URLs, fabricate data, issue unwarranted apologies, or even revise the Seahawks' Super Bowl 49 outcome (we are good with the last one!).  These quirks are inherent to the 'generative' aspect of GenAI.


With this, companies embarked on a second journey to drive quality. They tackled the challenge of mitigating unsuitable AI responses and embraced new paradigms in continuous integration/continuous delivery (CI/CD) and regression testing. They also intensified the monitoring of their AI solutions in production environments. It's a journey akin to navigating the complexities of traditional software development but compounded by the unpredictable nature of generative AI.


In 2024, bugs now include inconsistencies in groundedness, fluency, output length, and unpredictable latencies - all of which require their own suites of regression testing prior to merging into the main branch. Moreover, ensuring that the generated text and media adheres to responsibility requirements is critical, with rigorous testing needed to prevent hate speech, self-harm, inappropriate sexual content, and factual inaccuracies. Additional measures are also required to repel attempts to manipulate or "jailbreak" the system. Once these solutions are deployed, continuous monitoring is essential, both for individual requests and overall system performance, to guard against any drift over time.


To create responsible and truly transformative, customized, production-ready copilots that support advanced use cases, multiple interoperating APIs need to be combined with models, prompts, and grounding data, finetuned, tested, and deployed at scale. To accomplish this, developers need the right tools.





Announcing Azure AI Studio 

At Microsoft, we're thrilled to announce Azure AI Studio, now generally available, as your go-to platform for developing and deploying generative AI applications securely and safely. No matter your generative AI use case, Azure AI Studio accelerates the entire generative AI development lifecycle, empowering developers to build and shape the future with AI. 




Azure AI Studio is a key component of Microsoft's copilot platform. It is a pro-code platform offering capabilities to fully customize and configure generative AI applications with Azure-grade security, privacy, and compliance. Flexible and integrated visual and code-first tooling and pre-built quick-start templates streamline and accelerate copilot creation using Azure AI services and tools, with full control over infrastructure. 




It simplifies the transition from concept to production with easy setup, management, and API support, while also helping developers address safety and quality issues. The platform includes Azure AI services like Azure OpenAI Service and Azure AI Search and familiar tooling from Azure Machine Learning, like prompt flow for guided experiences for quick prototyping. It supports code-first SDKs and CLIs, integrated with the Azure Developer (AZD) CLI and AI Toolkit for Visual Studio Code to provide the needed scalability as demand grows. 



API and Model Choice 

Discover the best AI services and models for your use case  

Whatever the use case, developers can build intelligent multimodal, multi-lingual copilots with out-of-the-box and customizable models and APIs, like language, speech, content safety, and more.  


With the model catalog, you will find over 1600 models from providers like Meta, Mistral, Microsoft, and OpenAI, including GPT 4 Turbo with Vision and Microsoft’s small language model (SLM) Phi3- and new models from Core42 and Nixtla. Models from Bria AI, Gretel, NTT DATA, Stability AI, AI21, and Cohere Rerank are coming soon. Models curated by Azure AI are the most widely deployed models, packaged and optimized to work on the Azure AI platform. At the same time, the Hugging Face collection provides the breadth of many hundreds of models which allow users to consume the exact model best for them. And there are so many more to choose from!



Azure AI Studio's model benchmark dashboard allows developers to compare the performance of models across various industry-standard datasets to understand where specific models perform best. Benchmarks provide model evaluations using metrics such as accuracy, coherence, fluency, and GPT similarity. Users can view benchmark results in dashboard graph and list formats, enabling side-by-side model comparisons. 




The model catalog offers two ways to deploy models: Models as a Service (MaaS) and Models as a Platform (MaaP). MaaS provides pay-as-you-go per-token pricing, while MaaP offers models deployed on dedicated virtual machines (VMs), billed as VMs per-hour. 


Azure AI Studio also scans open models for security threats and vulnerabilities before onboarding them to the Azure AI collection, providing validations within model cards so developers can deploy models with confidence. 




Complete AI Toolchain 

Azure AI Studio offers collaborative and comprehensive tooling to support the development lifecycle and differentiate your apps. 


Getting setup with your hub and project 

Azure AI Studio accelerates team-based AI development with a central hub for sharing resources across projects, helping remove IT bottlenecks. Developers can also kick-off their projects through starter scripts or by using the studio UI. Once executed, the scripts will generate an .env file that includes references to the connected resources, as well as the needed access keys. 




Each hub can be connected to any number of projects, which inherit the hub's security configurations. Hubs and projects are security-aware entities.  Administrators can be assigned within the hub to manage AI resources and control access for project members. Azure AI Studio's connection framework is designed to authenticate and integrate a diverse range of resources from Microsoft's ecosystem and external providers.




Experiment with prompts in the playground  

Developers are equipped with a suite of dev-light playgrounds within AI Studio, encompassing areas like chatbots, assistants, image generation, and text completion. This flexible sandbox allows developers to experiment with various models, refine system prompts through iterative testing, and customize models – securely using their own datasets for tailored results. Developers can also experiment with safety system messages. 


playground (2).png


Data retrieval with Azure AI Search 

Azure AI Search is natively supported in Azure AI Studio for retrieval augmented generation (RAG) scenarios, enabling developers to utilize data retrieval methods to ground responses based on secure, customer-specific data. The platform allows for easy integration with numerous data sources, including OneLake in Microsoft Fabric, Azure Blob Storage, and Azure Files. This integration of connections allows users to develop more intelligent and context-aware copilots because data assets can be integrated within the model workflow. 


Data retrieval with Azure AI Search.jpg



When developing generative AI applications, RAG should be used for tasks that require external knowledge while fine-tuning is appropriate for adapting pre-trained models to tasks with specific labeled data. Supervised fine-tuning is crucial for customizing models, as specialized tasks often need the reasoning of a broad model but with a relatively narrow scope of the specific task. Within Azure AI Studio, users can fine-tune models such as Babbage, Davinci, GPT-35-Turbo, and GPT-4 along with the family of Llama 3, and Phi-3.  




Agent-based orchestration 

Developers are increasingly driving sophisticated real-world application development as they recognize the potential of LLMs and SLMs. They're leveraging agent systems such as the Azure OpenAI Service Assistants API, function-based applications, and the AutoGen framework to solve more complex, open-ended problem statements. As one might expect, this shift brings new challenges, particularly due to the open-ended nature of the orchestration applied.




Tracing and debugging  

Tracing is essential for understanding how your copilot works, especially in complex workflows where traditional IDE (Integrated Development Environment) breakpoints might not be effective. Many operations happen asynchronously or involve streaming data, causing the same line of code to execute multiple times for a single user query. Azure AI Studio’s tracing feature helps developers debug these scenarios through the prompt flow SDK with simple source code instrumentation. Tracing helps track latency issues, LLM errors, token usage, function calls, and dependency misalignments. 


For a code-focused experience, users can initiate a local playground using the prompt flow SDK. This allows for comprehensive unit testing while logging traces seamlessly to Azure AI Studio in the cloud or to a local repository. The service can be started from the command line or will automatically start when a trace begins. 




Tracing can then be done with a simple decorator. Model calls are captured automatically.  




Users can initiate a local testing environment through their IDE by executing the command 'pf flow test --flow'. This command leverages the prompt flow SDK to create an interactive playground with tracing enabled for each interaction, facilitating interactive testing of their application.  




Tracing captures and details each step of a copilot's request journey, thereby enhancing system health visibility and simplifying the debugging of complex or non-deterministic issues. Leveraging OpenTelemetry, prompt flow tracing integrates with Azure Monitor, allowing streamlined monitoring setup using connection strings for seamless configuration. 






Along with tools for observability while in development and production, Azure AI Studio provides tools to systematically assess the accuracy, quality, and safety of generated outputs. Manual evaluation, that is, manually reviewing and grading an application’s generated outputs, is especially useful for tracking progress on a targeted set of priorities. For example, developers or domain experts might look at how grounded responses are for different app variants and compare the results to inform the next iteration. 


Automated evaluation is useful for measuring an app’s quality and safety at scale, to provide more comprehensive evaluation results. Developers can run automated evaluations using pre-built metrics or customize and build their own metrics for their unique concerns using the studio UI or the prompt flow SDK.


While customers can bring their own test datasets, AI Studio helps address a key blocker for many customers, which is a lack of high-quality adversarial test data to evaluate an application’s outputs for content risks or susceptibility to jailbreak attacks. To test the safety of an application at scale, Azure AI Studio will automatically generate adversarial inputs and role-play attacks on an app to generate a test dataset of prompts and responses for evaluation. Developers can use the final scores and explanations to understand if their application is ready to ship or needs more work to mitigate risks.


Evaluators help developers take customization and scale even further. Users can define an evaluator to assess their own defined attributes, such as a mix of pre-built and custom metrics and corresponding parameters, to be assessed by a GPT model. For example, a retailer that is concerned about a customer service bot exemplifying its brand attributes may design an evaluator to evaluate outputs for groundedness (a pre-built metric) and politeness (a custom metric). Evaluators can be versioned and shared across an organization, so the retailer could opt to run their custom brand evaluator with every automated evaluation for improved consistency across projects. Developers can run evaluators locally and log results in the cloud using the prompt flow SDK or run them as part of an automated evaluation within the Azure AI Studio UI. 





Responsible AI Tools & Practices 

Safeguard AI apps with configurable filters and controls 

Once customers deploy their solutions, Azure AI Content Safety protects the application endpoint by running input prompts and output completions through a variety of classification models. Built-in safety metrics are designed to help identify and prevent harmful, biased, ungrounded, and inappropriate content, as well as prompt injection attacks critical for maintaining user trust. At Build, we are announcing custom categories so users can create and use custom content filters in addition to the provided default filters.  



Enterprise-grade production at scale 

Developers can deploy and scale their AI innovations to Azure web apps for use in websites and applications or as containerized models for local deployment – with tools to manage and continuously monitor their solutions for safety, quality, and token consumption. They can also automate workflows and alerts for timely issue resolution. 


Developers maintain agility with resource management across the organization. They can secure managed online endpoints with Microsoft Entra ID, and with Azure, enterprise-grade security, privacy, and compliance are included for governance at scale. 


Custom copilots in production  

Azure AI is driving innovation for over 53,000 customers to date and growing. Customers are delivering multi-modal knowledge mining with enterprise chat and are improving customer interactions and service with advanced data and speech analytics. They are also generating content more efficiently, while supporting enhanced sales and marketing strategies with hyper-personalization. 


Sweco’s GPT  

Sweco, one of Europe’s architecture and engineering firms, developed SwecoGPT to help their consultants find critical project information, create and analyze documents, and use the time they save to deliver more personalized services to their customers. With Azure AI Studio, they were able to rapidly deploy, highlighting Azure AI's scalability and power. 


“With Azure AI Studio, [we were] able to rapidly develop a proof of concept (POC) to show how a SwecoGPT could look, operate, and benefit our consultants and our business as a whole. This just showcases the power and scalability of Azure AI.” - David Hunter, Sweco Head of AI and Automation  


"The potential of Azure AI Studio for us—and what we can do with it for our customers—is infinite.” - Shah Muhammad, Sweco AB Head of AI Innovation 


Parloa’s Conversational AI platform 

Parloa, used Azure AI Studio to create a multilingual AI copilot that streamlines customer service across communication channels. 


"We see Azure AI Studio as a powerful new developer platform that helps us develop AI agents for the intelligent contact center platform of the future.” - Ciaran O'Reilly, Parloa Conversational AI Engineering Lead 


Vodafone’s SuperTOBI 

Earlier this year, Vodafone announced a 10-year generative AI strategic partnership with Microsoft. The telecommunications provider used Azure AI Studio to develop its SuperTOBI Chatbot, a real-time, hyper-personalized call center experience. The AI agent helps customers pay their bills, solve network issues, and order a new phone if needed. If TOBI is unable to answer a customer’s question, it automatically transfers the customer to a human customer support agent.  


“Part of using new technologies is experimentation and the ability to easily collaborate. With Azure AI Studio, you can interact with other people and with projects through a code-first approach to seamlessly explore, build, test, and deploy, using cutting-edge AI tools and machine learning models.” - Ahmed Elsayed, Vodafone CIO UK and Digital Engineering Director 


H&R Block's AI Tax Assist   

H&R Block is a long-time Azure AI customer. Its newest innovation, AI Tax Assist, is an AI agent that streamlines tax filing.  


“With Azure AI Studio, our devs can code faster, so they had time to ‘experiment’ to fine-tune features like enabling individuals to ask as many questions as needed conversationally and the ability to revisit previous conversation threads. It’s an approach we’re continuing—to push innovation and deliver the best experiences.” - Aditya Thadani, H&R Block Vice President Artificial Intelligence Platforms 


Customer innovation with Azure AI Studio not only highlights the platform’s robust capabilities, but also demonstrates its role in driving significant time savings and efficiency improvements across industries. There are many additional customer stories featured on 


Your journey into the next generation of AI starts now 

Azure AI Studio is at the forefront of reshaping how we approach AI application development, presenting a thoughtful and powerful platform that aligns innovation with responsibility. With the support of Microsoft's technology, development teams have the tools to confidently explore the possibilities of generative AI and deploy production-ready copilots. So why wait? Dive in and experience the cutting-edge capabilities of Azure AI Studio for yourself – start building, testing, and deploying with confidence and ease today!  


Get started with Azure AI Studio 

Version history
Last update:
‎May 29 2024 04:04 PM
Updated by: