genai
10 TopicsGenAI for Scam & Fraud Detection
Chief Technology Officer and Microsoft Regional Director, Dr David Goad, recently highlighted the transformative potential of generative AI in combating financial scams, in a recent live stream on the Microsoft Reactor YouTube channel. With a wealth of experience in artificial intelligence and machine learning, David shared his expertise and insights, providing practical examples on the applications of generative AI and Azure AI Foundry for scam and fraud detection in the banking sector. The rise of digital banking and fraud David Goad began by setting the scene, discussing the significant shift towards digital banking over the past decade. This transition has brought numerous benefits, including convenience and access to information, but it has also led to a surge in financial scams and fraud. The banking industry, particularly in Australia, has seen billions of dollars lost annually due to these fraudulent activities. Key industry trends David highlighted several key trends and challenges in managing fraud and scams within the banking sector. He pointed out that phishing remains one of the most prevalent methods used by fraudsters, with targeted spear phishing and whale phishing aimed at senior individuals. This demographic is often targeted, leading to significant financial losses and stress. Additionally, phone calls, text messages, and emails are common delivery methods for scams. Opportunities for improvement with generative AI David Goad emphasized the opportunities for improvement in the scam detection process through the use of generative AI. He explained that generative AI can enhance various aspects of fraud detection, including identifying fraudulent emails and texts, summarizing customer complaints, categorizing complaints for efficient routing, and evaluating customer sentiment. By leveraging generative AI, banks can improve the accuracy and efficiency of their fraud detection processes, ultimately reducing customer frustration and enhancing service levels. Demonstrating generative AI for phishing detection To illustrate the practical application of generative AI, David demonstrated Azure AI Foundry and Azure OpenAI Studio. He showcased how generative AI can be trained to identify phishing emails by fine-tuning a model with a dataset of classified emails. The model that David presented, was able to classify emails as phishing or non-phishing and provide explanations for its decisions, demonstrating the potential for generative AI to streamline the fraud detection process. Learn more David Goad's presentation emphasized the transformative potential of generative AI in the banking sector and the opportunities that adopting generative AI can bring to help customer experiences. For those interested in learning more about this topic, David has written a LinkedIn article that delves deeper into the use of generative AI in fraud detection. To watch the full recording of David Goad's insightful presentation and technical demonstrations, visit the Microsoft Reactor YouTube channel.162Views0likes0CommentsUBS unlocks advanced AI techniques with PostgreSQL on Azure
This blog was authored by Jay Yang, Executive Director, and Orhun Oezbek, GenAI Architect, UBS RiskLab UBS Group AG is a multinational investment bank and world-leading asset manager that manages $5.7 trillion in assets across 15 different markets. We continue to evolve our tools to suit the needs of data scientists and to integrate the use of AI. Our UBS RiskLab data science platform helps over 1,200 UBS data scientists expedite development and deployment of their analytics and AI solutions, which support functions such as risk, compliance, and finance, as well as front-office divisions such as investment banking and wealth management. RiskLab and UBS GOTO (Group Operations and Technology Office) have a long-term AI strategy to provide a scalable and easy-to-use AI platform. This strategy aims to remove friction and pain points for users, such as developers and data scientists, by introducing DevOps automation, centralized governance and AI service simplification. These efforts have significantly democratized AI development for our business users. This blog walks through how we created two RiskLab products using Azure services. We also explain how we’re using Azure Database for PostgreSQL to power advanced Retrieval Augmented-Generation (RAG) techniques—such as new vector search algorithms, parameter tuning, hybrid search, semantic ranking, and a graphRAG approach—to further the work of our financial generative AI use cases. The RiskLab AI Common Ecosystem (AICE) provides fully governed and simplified generative AI platform services, including: Governed production data access for AI development Managed large language model (LLM) endpoints access control Tenanted RAG environments Enhanced document insight AI processing Streamlined AI agent standardization, development, registration, and deployment solutions End-to-end machine learning (ML) model continuous integration, training, deployment, and monitoring processes The AICE Vector Embedding Governance Application (VEGA) is a fully governed and multi-tenant vector store built on top of Azure Database for PostgreSQL that provides self-service vector store lifecycle management and advanced indexing and retrieval techniques for financial RAG use cases. A focus on best practices like AIOps and MLOps As generative AI gained traction in 2023, we noticed the need for a platform that simplified the process for our data scientists to build, test, and deploy generative AI applications. In this age of AI, the focus should be on data science best practices—GenAIOps and MLOps. Most of our data scientists aren’t fully trained on MLOps, GenAIOps, and setting up complex pipelines, so AICE was designed to provide automated, self-serve DevOps provisioning of the Azure resources they need, as well as simplified MLOps and AIOps pipelines libraries. This removes operational complexities from their workflows. The second reason for AICE was to make sure our data scientists were working in fully governed environments that comply with data privacy regulations from the multiple countries in which UBS operates. To meet that need, AICE provides a set of generative AI libraries that fully manages data governance and reduces complexity. Overall, AICE greatly simplifies the work for our data scientists. For instance, the platform provides managed Azure LLM endpoints, MLflow for generative AI evaluation, and AI agent deployment pipelines along with their corresponding Python libraries. Without going into the nitty gritty of setting up a new Azure subscription, managing MLFlow instances, and navigating Azure Kubernetes Service (AKS) deployments, data scientists can just write three lines of code to obtain a fully governed and secure generative AI ecosystem to manage their entire application lifecycle. And, as a governed, secure lab environment, they can also develop and prototype ML models and generative AI applications in the production tier. We found that providing production read-only datasets to build these models significantly expedites our AI development. In fact, the process for developing an ML model, building a pipeline for model training, and putting it into production has dropped from six months to just one month. Azure Database for PostgreSQL and pgvector: The best of both worlds for relational and vector databases Once AICE adoption ramped up, our next step was to develop a comprehensive, flexible vector store that would simplify vector store resource provisioning while supporting hundreds of RAG use cases and tenants across both lab and production environments. Essentially, we needed to create RAG as a Service (RaaS) so our data scientists could build custom AI solutions in a self-service manner. When we started building VEGA and this vector store, we anticipated that effective RAG would require a diverse range of search capabilities covering not only vector searches but also more traditional document searches or even relational queries. Therefore, we needed a database that could pivot easily. We were looking for a really flexible relational database and decided on Azure Database for PostgreSQL. For a while, Azure Database for PostgreSQL has been our go-to database at RiskLab for our structured data use cases because it’s like the Swiss Army Knife of databases. It’s very compact and flexible, and we have all the tools we need in a single package. Azure Database for PostgreSQL offers excellent relational queries and JSONB document search. When used in conjunction with the pgvector extension for vector search, we created some very powerful hybrid search and hierarchical search RAG functionalities for our end users. The relational nature of Azure Database for PostgreSQL also allowed us to build a highly regulated authorization and authentication mechanism that makes it easy and secure for data scientists to share their embeddings. This involved meeting very stringent access control policies so that users’ access to vector stores is on a need-to-know basis. Integrations with the Azure Graph API help us manage those identities and ensure that the environment is fully secure. Using VEGA, data scientists can just click a button to add a user or group and provide access to all their embeddings/documents. It’s very easy, but it’s also governed and highly regulated. Speeding vector store initialization from days to seconds With VEGA, the time it takes to provision a vector store has dropped from days to less than 30 seconds. Instead of waiting days on a request for new instances of Azure Database for PostgreSQL, pgvector, and Azure AI Search, data scientists can now simply write five lines of code to stand up virtual, fully governed, and secure collections. And the same is true for agentic deployment frameworks. This speed is critical for lab work that involves fast iterations and experiments. And because we built on Azure Database for PostgreSQL, a single instance of VEGA can support thousands of vector stores. It’s cost-effective and seamlessly scales. Creating a hybrid search to analyze thousands of documents Since launching VEGA, one of the top hybrid search use cases has been Augmented Indexing Search (AIR Search), allowing data scientists to comb through financial documents and pinpoint the correct sections and text. This search uses LLMs as agents that first filter based on metadata stored in JSONB columns of the Azure Database for PostgreSQL, then apply vector similarity retrieval. Our thousands of well-structured financial documents are built with hierarchical headers that act as metadata, providing a filtering mechanism for agents and allowing them to retrieve sections in our documents to find precisely what they’re looking for. Because these agents are autonomous, they can decide on the best tools to use for the situation—either metadata filtering or vector similarity search. As a hybrid search, this approach also minimizes AI hallucinations because it gives the agents more context to work with. To enable this search, we used ChatGPT and Azure OpenAI. But because most of our financial documents are saved as PDFs, the challenge was retaining hierarchical information from headers that were lost when simply dumping in text from PDFs. We also had to determine how to make sure ChatGPT understood the meaning behind aspects like tables and figures. As a solution, we created PNG images of PDF pages and told ChatGPT to semantically chunk documents by titles and headers. And if it came across a table, we asked it to provide a YAML or JSON representation of it. We also asked ChatGPT to interpret figures to extract information, which is an important step because many of our documents contain financial graphs and charts. We’re now using Azure AI Document Intelligence for layout detection and section detection as the first step, which simplified our document ingestion pipelines significantly. Forecasting economic implications with PostgreSQL Graph Extension Since creating AICE and VEGA using Azure services, we’ve significantly enhanced our data science workflows. We’ve made it faster and easier to develop generative AI applications thanks to the speed and flexibility of Azure Database for PostgreSQL. Making advanced AI features accessible to our data scientists has accelerated innovation in RiskLab and ultimately allowed UBS to deliver exceptional value to our customers. Looking ahead, we plan to use the Apache AGE graph extension in Azure Database for PostgreSQL for macroeconomics knowledge retention capabilities. Specifically, we’re considering Azure tooling such as GraphRAG to equip UBS economist and portfolio managers with advanced RAG capabilities. This will allow them to retrieve more coherent RAG search results for use cases such as economics scenario generation and impact analysis, as well as investment forecasting and decision-making. For instance, a UBS business user will be able to ask an AI agent: if a country’s interest rate increases by a certain percentage, what are the implications to my client’s investment portfolio? The agent can perform a graph search to obtain all other connected economic entity nodes that might be affected by the interest rate entity node in the graph. We anticipate the AI-assisted graph knowledge will gain significant traction in the financial industry. Learn more For a deeper dive on how we created AICE and VEGA, check out this on-demand session from Ignite. We talk through our use of Azure Database for PostgreSQL and pgvector, plus we show a demo of our GraphRAG capabilities. About Azure Database for PostgreSQL Azure Database for PostgreSQL is a fully managed, scalable, and secure relational database service that supports open-source PostgreSQL. It enables organizations to build and manage mission-critical applications with high availability, built-in security, and automated maintenance.Feedback Loops in GenAI with Azure Functions, Azure OpenAI and Neon serverless Postgres
Generative Feedback Loops (GFL) are focused on optimizing and improving the AI’s outputs over time through a cycle of feedback and learning based on the production data. Learn how to build GenAI solution with feedback loops using Azure OpenAI, Azure Functions and Neon Serverless PostgresBuild AI Agents with MCP Tool Use in Minutes with AI Toolkit for VSCode
We’re excited to announce Agent Builder, the newest evolution of what was formerly known as Prompt Builder, now reimagined and supercharged for intelligent app development. This powerful tool in AI Toolkit enables you to create, iterate, and optimize agents—from prompt engineering to tool integration—all in one seamless workflow. Whether you're designing simple chat interactions or complex task-performing agents with tool access, Agent Builder simplifies the journey from idea to integration. Why Agent Builder? Agent Builder is designed to empower developers and prompt engineers to: 🚀 Generate starter prompts with natural language 🔁 Iterate and refine prompts based on model responses 🧩 Break down tasks with prompt chaining and structured outputs 🧪 Test integrations with real-time runs and tool use such as MCP servers 💻 Generate production-ready code for rapid app development And a lot of features are coming soon, stay tuned for: 📝 Use variables in prompts �� Run agent with test cases to test your agent easily 📊 Evaluate the accuracy and performance of your agent with built-in or your custom metrics ☁️ Deploy your agent to cloud Build Smart Agents with Tool Use (MCP Servers) Agents can now connect to external tools through MCP (Model Control Protocol) servers, enabling them to perform real-world actions like querying a database, accessing APIs, or executing custom logic. Connect to an Existing MCP Server To use an existing MCP server in Agent Builder: In the Tools section, select + MCP Server. Choose a connection type: Command (stdio) – run a local command that implements the MCP protocol HTTP (server-sent events) – connect to a remote server implementing the MCP protocol If the MCP server supports multiple tools, select the specific tool you want to use. Enter your prompts and click Run to test the agent's interaction with the tool. This integration allows your agents to fetch live data or trigger custom backend services as part of the conversation flow. Build and Scaffold a New MCP Server Want to create your own tool? Agent Builder helps you scaffold a new MCP server project: In the Tools section, select + MCP Server. Choose MCP server project. Select your preferred programming language: Python or TypeScript. Pick a folder to create your server project. Name your project and click Create. Agent Builder generates a scaffolded implementation of the MCP protocol that you can extend. Use the built-in VS Code debugger: Press F5 or click Debug in Agent Builder Test with prompts like: System: You are a weather forecast professional that can tell weather information based on given location. User: What is the weather in Shanghai? Agent Builder will automatically connect to your running server and show the response, making it easy to test and refine the tool-agent interaction. AI Sparks from Prototype to Production with AI Toolkit Building AI-powered applications from scratch or infusing intelligence into existing systems? AI Sparks is your go-to webinar series for mastering the AI Toolkit (AITK) from foundational concepts to cutting-edge techniques. In this bi-weekly, hands-on series, we’ll cover: 🚀SLMs & Local Models – Test and deploy AI models and applications efficiently on your own terms locally, to edge devices or to the cloud 🔍 Embedding Models & RAG – Supercharge retrieval for smarter applications using existing data. 🎨 Multi-Modal AI – Work with images, text, and beyond. 🤖 Agentic Frameworks – Build autonomous, decision-making AI systems. Watch on Demand Share your feedback Get started with the latest version, share your feedback, and let us know how these new features help you in your AI development journey. As always, we’re here to listen, collaborate, and grow alongside our amazing user community. Thank you for being a part of this journey—let’s build the future of AI together! Join our Microsoft Azure AI Foundry Discord channel to continue the discussion 🚀Construa, inove e #Hacktogether!
🛠️ Construa, inove e #Hacktogether! 🛠️ 2025 é o ano dos agentes de IA! Mas o que exatamente é um agente? E como você pode criar um? Seja você um desenvolvedor experiente ou esteja apenas começando, este hackathon virtual GRATUITO de três semanas é sua chance de mergulhar no desenvolvimento de agentes de IA. 🔥 Aprenda com mais de 20 sessões lideradas por especialistas, transmitidas ao vivo no YouTube, abordando os principais frameworks, como Semantic Kernel, Autogen, o novo Azure AI Agents SDK e o Microsoft 365 Agents SDK. 💡 Coloque a mão na massa, explore sua criatividade e crie agentes de IA poderosos! Depois, envie seu projeto e concorra a prêmios incríveis! 💸 Datas importantes: Sessões com especialistas: 8 de abril de 2025 – 30 de abril de 2025 Prazo para envio do hack: 30 de abril de 2025, 23:59 PST Não perca essa oportunidade—junte-se a nós e comece a construir o futuro da IA! 🔥 Inscrição 🎟️ Garanta sua vaga agora! Preencha o formulário para confirmar sua participação no hackathon. Em seguida, confira a programação das transmissões ao vivo e inscreva-se nas sessões que mais te interessam. Após se inscrever, apresente-se e procure por colegas de equipe! Submissão de Projetos 🚀 Leia atentamente as regras oficiais e certifique-se de entender os requisitos. Quando seu projeto estiver pronto, siga o processo de submissão. 📝 Prêmios e Categorias 🏅 Os projetos serão avaliados por um painel de jurados, incluindo engenheiros da Microsoft, gerentes de produto e defensores de desenvolvedores. Os critérios de avaliação incluirão inovação, impacto, usabilidade técnica e alinhamento com a categoria correspondente do hackathon. Cada equipe vencedora nas categorias abaixo receberá um prêmio. 💸 Melhor Agente Geral - $20,000 Melhor Agente em Python - $5,000 Melhor Agente em C# - $5,000 Melhor Agente em Java - $5,000 Melhor Agente em JavaScript/TypeScript - $5,000 Melhor Agente Copilot (usando Microsoft Copilot Studio ou Microsoft 365 Agents SDK) - $5,000 Melhor Uso do Azure AI Agent Service - $5,000 Cada equipe pode ganhar em apenas uma categoria. Todos os participantes que submeterem um projeto receberão um badge digital. Transmissões 📅 Português Inscreva-se em todas as sessões em português Dia/Horário Tópico Recursos 4/8 12:00 PM PT Bem-vindo ao AI Agents Hackathon - 4/10 12:00 PM PT Crie um aplicativo com o Azure AI Agent Service - 4/17 06:00 AM PT Seu primeiro agente de IA em JavaScript com o Azure AI Agent Service - Outros Idiomas Teremos mais de 30 transmissões em inglês, além de transmissões em espanhol e chinês. Veja a página principal para mais detalhes. 🕒 Horário de Suporte Técnico Precisa de ajuda com seu projeto? Participe do Horário de Suporte Técnico no canal de Discord de IA e receba orientação de especialistas! 🚀 Aqui estão os horários de atendimento já agendados: Dia/Horário Tópico/Anfitriões Toda quinta-feira, 12:30 PM PT Python + IA (Inglês) Toda segunda-feira, 03:00 PM PT Python + IA (Espanhol) Recursos de Aprendizado 📚 Acesse os recursos aqui! Junte-se ao TheSource EHub para explorar os principais recursos, incluindo treinamentos, transmissões ao vivo, repositórios, guias técnicos, blogs, downloads, certificações e muito mais, atualizados mensalmente. A seção de Agentes de IA oferece recursos essenciais para criar agentes de IA, enquanto outras seções fornecem insights sobre IA, ferramentas de desenvolvimento e linguagens de programação. Você também pode postar perguntas em nosso fórum de discussões ou conversar com outros participantes no canal do Discord.Cut Costs and Speed Up AI API Responses with Semantic Caching in Azure API Management
This article is part of a series of articles on API Management and Generative AI. We believe that adding Azure API Management to your AI projects can help you scale your AI models, make them more secure and easier to manage. We previously covered the hidden risks of AI APIs in today's AI-driven technological landscape. In this article, we dive deeper into one of the supported Gen AI policies in API Management, which allows you to minimize Azure OpenAI costs and make your applications more performant by reducing the number of calls sent to your LLM service. How does it currently work without the semantic caching policy? For simplicity, let's look at a scenario where we only have a single client app, a single user, and a single model deployment. This of course does not represent most real-world use-cases, as you often have multiple users talking to different services. Take the following cases into consideration: - A user lands on your application and sends in a query (query 1), They then send the exact same query again, with similar verbiage, in the same session (query 2), The user changes the wording of the query, but it is still relevant and related to the original query (query 3) The last query, (query 4), is completely different and unrelated to the previous queries. In a normal implementation, all these queries will cost you tokens (TPM), resulting in higher cuts in your billing. Your users are also likely to experience some latency as they wait for the LLM to build a response with each call. As the user base grows, you anticipate that the expenses will grow exponentially, making it more expensive to run your system eventually. How does Semantic caching in Azure API Management fix this? Let's look at the same scenario as described above (at a high level first), with a flow diagram representing how you can cut costs and boost your app's performance with the semantic cache policy. When the user sends in the first query, the LLM will be used to generate a response, which will then be stored in the cache. Queries 2 and 3 are somewhat related to query 1, which could be a semantic similarity, or exact match, or could contain a specified keyword, i.e.. price. In all these cases, a lookup will be performed, and the appropriate response will be retrieved from the cache, without waiting on the LLM to regenerate a response. Query 4, which is different from the previous prompts, will require the call to be passed through to the LLM, then grabs the generated response and stores it in the cache for future searches. Okay. Tell me more - How does this work and how do I set it up? Think about this - What would be the likelihood of your users asking related questions or exactly comparable questions in your app? I'd argue that the odds are quite high. Semantic caching for Azure OpenAI API requests To start, you will need to add Azure OpenAI Service APIs to your Azure API Management instance with semantic caching enabled. Luckily, this step has been reduced to just a one-click step. I'll link a tutorial on this in the 'Resources' section. Before you get to configure the policies, you first need to set up a backend for the embeddings API. Oh yes, as part of your deployments, you will need an embedding model to convert your input to the corresponding vector representation, allowing Azure Redis cache to perform the vector similarity search. This step also allows you to set a score_threshold, a parameter used to determine how similar user queries need to be to retrieve responses from the cache. Next, is to add the two policies that you need: azure-openai-semantic-cache-store/ llm-semantic-cache-store and azure-openai-semantic-cache-lookup/ llm-semantic-cache-lookup The azure-openai-semantic-cache-store policy will cache the completions and requests to the configured cache service. You can use the internal Azure Redis enterprise or any another external cache as long as it's a Redis-compatible cache in Azure API Management. The second policy, azure-openai-semantic-cache-lookup, based on the proximity result of the similarity search and the score_threshold, will perform a cache lookup through the compilation of cached requests and completions. In addition to the score_threshold attribute, you will also specify the id of the embeddings backend created in an earlier step and can choose to omit the system messages from the prompt at this step. These two policies enhance your system's efficiency and performance by reusing completions, increasing response speed, and making your API calls much cheaper. Alright, so what should be my next steps? This article just introduced you to one of the many Generative AI supported capabilities in Azure API Management. We have more policies that you can use to better manage your AI APIs, covered in other articles in this series. Do check them out. Do you have any resources I can look at in the meantime to learn more? Absolutely! Check out: - Using external Redis-compatible cache in Azure API Management documentation Use Azure Cache for Redis as a semantic cache tutorial Enable semantic caching for Azure OpenAI APIs in Azure API Management article Improve the performance of an API by adding a caching policy in Azure API Management Learn moduleImprove LLM backend resiliency with load balancer and circuit breaker rules in Azure API Management
This article is part of a series of articles on Azure API Management and Generative AI. We believe that adding Azure API Management to your AI projects can help you scale your AI models, make them more secure and easier to manage. We previously covered the hidden risks of AI APIs in today's AI-driven technological landscape. In this article, we dive deeper into one of the supported Gen AI policies in API Management, which allows your applications to change the effective Gen AI backend based on unexpected and specified events. In Azure API Management, you can set up your different LLMs as backends and define structures to route requests to prioritized backends and add automatic circuit breaker rules to protect backends from too many requests. Under normal conditions, if your Azure OpenAI service fails, users of your application will continue to receive error messages, an experience that will persist until the backend issue is resolved and becomes ready to serve requests again. Similarly, managing multiple Azure OpenAI resources can be cumbersome, as manual URL changes are required in your API settings to switch between backend entities. This approach lacks efficiency and does not account for dynamic user conditions, preventing seamless switching to the optimal backend services for enhanced performance and reliability. How load balancing will work First configure your Azure OpenAI resources as referenceable backends, defining the base-url and assign a backend-id. As an example, let's assume we have three different Azure OpenAI resources as follows: To set up load balancing across the backends, you can either use one of supported approaches/ strategies or a combination of two to ensure optimal use of your Azure OpenAI resources. 1. Round Robin As the name suggests, API Management will evenly distribute requests to the available backends in the pool. 2. Priority-based For this approach, you organize multiple backends into priority groups, and API Management will follow and assign requests to these backends in order of priority. Back to our example, we are going to assign openai1 the top priority (priority 1), assign openai2 to priority 2 and add openai3 with priority 3 This will mean that requests will be forwarded to openai1 (priority 1), but if the service is unreachable, the calls will reroute to hit openai2 defined in the next priority group and so on. 3. Weighted Here, you assign weights to your backends, and requests will be distributed based on these relative weights. For our example above, we want to be even more specific by saying that while all requests default to openai1, in the event of its failure, we now want requests to be equally distributed to our priority 2 backends (specified by the 50/50 weight allocation) Now, configure your circuit breaker rules The next step is to define rules to that listen to the events in your API, and trip when specified conditions are met. Let's look at the example below to learn more about how this works. Inside your CircuitBreaker property configuration, you define an array that can hold multiple rules This section defines the conditions that must be met for the circuit breaker to trip. a. The circuit breaker will trip if there is at least one failure b. The number of failures specified in count will be monitored within 5-minute intervals c. We are looking out for errors that return a status code of 429 (Too Many Requests), and you can define a range of codes here The circuit will remain tripped for 1 minute, after which it will reset and route traffic to the endpoint Alright, so what should be my next steps? This article just introduced you to one of the many Generative AI supported capabilities in Azure API Management. We have more policies that you can use to better manage your AI APIs, covered in other articles in this series. Do check them out. Do you have any resources I can look at in the meantime to learn more? Absolutely! Check out: - https://learn.microsoft.com/en-us/azure/api-management/set-backend-service-policy https://learn.microsoft.com/en-us/azure/api-management/backends?tabs=bicep https://github.com/Azure-Samples/AI-Gateway/tree/main/labs/backend-pool-load-balancingManaging Token Consumption with GitHub Copilot for Azure
Introduction AI Engineers often face challenges that require creative solutions. One such challenge is managing the consumption of tokens when using large language models. For example, you may observe heavy token consumption from a single client app or user, and determine that with that kind of usage pattern, the shared quota for other client applications relying on the same OpenAI backend will be depleted quickly. To prevent this, we need a solution that doesn't involve spending hours reading documentation or watching tutorials. Enter GitHub Copilot for Azure. GitHub Copilot for Azure Instead of diving into extensive documentation, we can leverage GitHub Copilot for Azure directly within VS Code. By invoking Copilot using azure, we can describe our issue in natural language. For our example, we might say: "Some users of my app are consuming too many tokens, which will affect tokens left for my other services. I need to limit the number of tokens a user can consume." Refer to video above for more context. azure GitHub Copilot in Action GitHub Copilot pools relevant ideas from https://learn.microsoft.com/ and suggests Azure services that can help. We can engage in a chat conversation, with follow-up questions like, "What happens if a user exceeds their token limit?" etcetera. This response from GitHub Copilot accurately describes the specific feature we need, along with the expected outcome/ behavior of user requests being blocked from accessing the backend, and users will receive a "too many requests" warning—exactly what we need. At this point, it felt like I was having a 1:1 chat with docs 🙃 Implementation To implement this, we ask GitHub Copilot for an example on enforcing the Azure token limit policy. It references the docs on Learn and provides a policy statement. Since we're not fully conversant with the product, we continue using Copilot to help with the implementation. Although GitHub Copilot chat cannot directly update our code, we can switch to GitHub Copilot Edits, provide some custom instructions in natural language, and watch as GitHub Copilot makes the necessary changes, which we review and accept/ decline. Testing and Deployment After implementing the policy, we redeploy our application using the Azure Developer CLI (azd) and restart our application and API to test. We now see that if a user sends another prompt after hitting the applied token limit, their request is terminated with a warning that the allocated limit is exceeded, along with instructions on what to do next. Conclusion Managing token consumption effectively is just one of the many ways GitHub Copilot for Azure can assist developers. Download and install the extension today to try it out yourself. If you have any scenarios you'd like to see us cover, drop them in the comments, and we'll feature them. See you in the next blog!Discord Community Call #1 - RAG Resources
Blog resource with Towards Data science: https://medium.com/towards-data-science/add-your-own-data-to-an-llm-using-retrieval-augmented-generation-rag-b1958bf56a5a Samples: Azure-Samples/aistudio-python-quickstart-sample: Quickstart Python sample for getting started using the Azure AI Studio with the SDK or CLI options (github.com) RAG w/ Langchain sample: Azure-Samples/aistudio-python-langchain-sample: Quickstart sample for using the Azure AI Studio with the SDK or CLI options - and the LangChain framework. (github.com) Hugging Face Leaderboard:390Views0likes0Comments