genai
30 TopicsChaining and Streaming with Responses API in Azure
Responses API is an enhancement of the existing Chat Completions API. It is stateful and supports agentic capabilities. As a superset of the Chat Completions class, it continues to support functionality of chat completions. In addition, reasoning models, like GPT-5 result in better model intelligence when compared to Chat Completions. It has input flexibility, supporting a range of input types. It is currently available in the following regions on Azure and can be used with all the models available in the region. The API supports response streaming, chaining and also function calling. In the examples below, we use the gpt-5-nano model for a simple response, a chained response and streaming responses. To get started update the installed openai library. pip install --upgrade openai Simple Message 1) Build the client with the following code from openai import OpenAI client = OpenAI( base_url=endpoint, api_key=api_key, ) 2) The response received is an id which can then be used to retrieve the message. # Non-streaming request resp_id = client.responses.create( model=deployment, input=messages, ) 3) Message is retrieved using the response id from previous step response = client.responses.retrieve(resp_id.id) Chaining For a chained message, an extra step is sharing the context. This is done by sending the response id in the subsequent requests. resp_id = client.responses.create( model=deployment, previous_response_id=resp_id.id, input=[{"role": "user", "content": "Explain this at a level that could be understood by a college freshman"}] ) Streaming A different function call is used for streaming queries. client.responses.stream( model=deployment, input=messages, # structured messages ) In addition, the streaming query response has to be handled appropriately till end of event stream for event in s: # Accumulate only text deltas for clean output if event.type == "response.output_text.delta": delta = event.delta or "" text_out.append(delta) # Echo streaming output to console as it arrives print(delta, end="", flush=True) The code is available in the following github link - https://github.com/arunacarunac/ResponsesAPI Additional details are available in the following links - Azure OpenAI Responses API - Azure OpenAI | Microsoft Learn41Views0likes0CommentsUBS unlocks advanced AI techniques with PostgreSQL on Azure
This blog was authored by Jay Yang, Executive Director, and Orhun Oezbek, GenAI Architect, UBS RiskLab UBS Group AG is a multinational investment bank and world-leading asset manager that manages $5.7 trillion in assets across 15 different markets. We continue to evolve our tools to suit the needs of data scientists and to integrate the use of AI. Our UBS RiskLab data science platform helps over 1,200 UBS data scientists expedite development and deployment of their analytics and AI solutions, which support functions such as risk, compliance, and finance, as well as front-office divisions such as investment banking and wealth management. RiskLab and UBS GOTO (Group Operations and Technology Office) have a long-term AI strategy to provide a scalable and easy-to-use AI platform. This strategy aims to remove friction and pain points for users, such as developers and data scientists, by introducing DevOps automation, centralized governance and AI service simplification. These efforts have significantly democratized AI development for our business users. This blog walks through how we created two RiskLab products using Azure services. We also explain how we’re using Azure Database for PostgreSQL to power advanced Retrieval Augmented-Generation (RAG) techniques—such as new vector search algorithms, parameter tuning, hybrid search, semantic ranking, and a graphRAG approach—to further the work of our financial generative AI use cases. The RiskLab AI Common Ecosystem (AICE) provides fully governed and simplified generative AI platform services, including: Governed production data access for AI development Managed large language model (LLM) endpoints access control Tenanted RAG environments Enhanced document insight AI processing Streamlined AI agent standardization, development, registration, and deployment solutions End-to-end machine learning (ML) model continuous integration, training, deployment, and monitoring processes The AICE Vector Embedding Governance Application (VEGA) is a fully governed and multi-tenant vector store built on top of Azure Database for PostgreSQL that provides self-service vector store lifecycle management and advanced indexing and retrieval techniques for financial RAG use cases. A focus on best practices like AIOps and MLOps As generative AI gained traction in 2023, we noticed the need for a platform that simplified the process for our data scientists to build, test, and deploy generative AI applications. In this age of AI, the focus should be on data science best practices—GenAIOps and MLOps. Most of our data scientists aren’t fully trained on MLOps, GenAIOps, and setting up complex pipelines, so AICE was designed to provide automated, self-serve DevOps provisioning of the Azure resources they need, as well as simplified MLOps and AIOps pipelines libraries. This removes operational complexities from their workflows. The second reason for AICE was to make sure our data scientists were working in fully governed environments that comply with data privacy regulations from the multiple countries in which UBS operates. To meet that need, AICE provides a set of generative AI libraries that fully manages data governance and reduces complexity. Overall, AICE greatly simplifies the work for our data scientists. For instance, the platform provides managed Azure LLM endpoints, MLflow for generative AI evaluation, and AI agent deployment pipelines along with their corresponding Python libraries. Without going into the nitty gritty of setting up a new Azure subscription, managing MLFlow instances, and navigating Azure Kubernetes Service (AKS) deployments, data scientists can just write three lines of code to obtain a fully governed and secure generative AI ecosystem to manage their entire application lifecycle. And, as a governed, secure lab environment, they can also develop and prototype ML models and generative AI applications in the production tier. We found that providing production read-only datasets to build these models significantly expedites our AI development. In fact, the process for developing an ML model, building a pipeline for model training, and putting it into production has dropped from six months to just one month. Azure Database for PostgreSQL and pgvector: The best of both worlds for relational and vector databases Once AICE adoption ramped up, our next step was to develop a comprehensive, flexible vector store that would simplify vector store resource provisioning while supporting hundreds of RAG use cases and tenants across both lab and production environments. Essentially, we needed to create RAG as a Service (RaaS) so our data scientists could build custom AI solutions in a self-service manner. When we started building VEGA and this vector store, we anticipated that effective RAG would require a diverse range of search capabilities covering not only vector searches but also more traditional document searches or even relational queries. Therefore, we needed a database that could pivot easily. We were looking for a really flexible relational database and decided on Azure Database for PostgreSQL. For a while, Azure Database for PostgreSQL has been our go-to database at RiskLab for our structured data use cases because it’s like the Swiss Army Knife of databases. It’s very compact and flexible, and we have all the tools we need in a single package. Azure Database for PostgreSQL offers excellent relational queries and JSONB document search. When used in conjunction with the pgvector extension for vector search, we created some very powerful hybrid search and hierarchical search RAG functionalities for our end users. The relational nature of Azure Database for PostgreSQL also allowed us to build a highly regulated authorization and authentication mechanism that makes it easy and secure for data scientists to share their embeddings. This involved meeting very stringent access control policies so that users’ access to vector stores is on a need-to-know basis. Integrations with the Azure Graph API help us manage those identities and ensure that the environment is fully secure. Using VEGA, data scientists can just click a button to add a user or group and provide access to all their embeddings/documents. It’s very easy, but it’s also governed and highly regulated. Speeding vector store initialization from days to seconds With VEGA, the time it takes to provision a vector store has dropped from days to less than 30 seconds. Instead of waiting days on a request for new instances of Azure Database for PostgreSQL, pgvector, and Azure AI Search, data scientists can now simply write five lines of code to stand up virtual, fully governed, and secure collections. And the same is true for agentic deployment frameworks. This speed is critical for lab work that involves fast iterations and experiments. And because we built on Azure Database for PostgreSQL, a single instance of VEGA can support thousands of vector stores. It’s cost-effective and seamlessly scales. Creating a hybrid search to analyze thousands of documents Since launching VEGA, one of the top hybrid search use cases has been Augmented Indexing Search (AIR Search), allowing data scientists to comb through financial documents and pinpoint the correct sections and text. This search uses LLMs as agents that first filter based on metadata stored in JSONB columns of the Azure Database for PostgreSQL, then apply vector similarity retrieval. Our thousands of well-structured financial documents are built with hierarchical headers that act as metadata, providing a filtering mechanism for agents and allowing them to retrieve sections in our documents to find precisely what they’re looking for. Because these agents are autonomous, they can decide on the best tools to use for the situation—either metadata filtering or vector similarity search. As a hybrid search, this approach also minimizes AI hallucinations because it gives the agents more context to work with. To enable this search, we used ChatGPT and Azure OpenAI. But because most of our financial documents are saved as PDFs, the challenge was retaining hierarchical information from headers that were lost when simply dumping in text from PDFs. We also had to determine how to make sure ChatGPT understood the meaning behind aspects like tables and figures. As a solution, we created PNG images of PDF pages and told ChatGPT to semantically chunk documents by titles and headers. And if it came across a table, we asked it to provide a YAML or JSON representation of it. We also asked ChatGPT to interpret figures to extract information, which is an important step because many of our documents contain financial graphs and charts. We’re now using Azure AI Document Intelligence for layout detection and section detection as the first step, which simplified our document ingestion pipelines significantly. Forecasting economic implications with PostgreSQL Graph Extension Since creating AICE and VEGA using Azure services, we’ve significantly enhanced our data science workflows. We’ve made it faster and easier to develop generative AI applications thanks to the speed and flexibility of Azure Database for PostgreSQL. Making advanced AI features accessible to our data scientists has accelerated innovation in RiskLab and ultimately allowed UBS to deliver exceptional value to our customers. Looking ahead, we plan to use the Apache AGE graph extension in Azure Database for PostgreSQL for macroeconomics knowledge retention capabilities. Specifically, we’re considering Azure tooling such as GraphRAG to equip UBS economist and portfolio managers with advanced RAG capabilities. This will allow them to retrieve more coherent RAG search results for use cases such as economics scenario generation and impact analysis, as well as investment forecasting and decision-making. For instance, a UBS business user will be able to ask an AI agent: if a country’s interest rate increases by a certain percentage, what are the implications to my client’s investment portfolio? The agent can perform a graph search to obtain all other connected economic entity nodes that might be affected by the interest rate entity node in the graph. We anticipate the AI-assisted graph knowledge will gain significant traction in the financial industry. Learn more For a deeper dive on how we created AICE and VEGA, check out this on-demand session from Ignite. We talk through our use of Azure Database for PostgreSQL and pgvector, plus we show a demo of our GraphRAG capabilities. About Azure Database for PostgreSQL Azure Database for PostgreSQL is a fully managed, scalable, and secure relational database service that supports open-source PostgreSQL. It enables organizations to build and manage mission-critical applications with high availability, built-in security, and automated maintenance.Leveraging the power of NPU to run Gen AI tasks on Copilot+ PCs
Thanks to their massive scale and impressive technical evolution, large language models (LLMs) have become the public face of Generative AI innovation. However, bigger isn’t always better. While LLMs like the ones behind Microsoft Copilot are incredibly capable at a wide range of tasks, less-discussed small language models (SLMs) expand the utility of Gen AI for real-time and edge applications. SLMs can run efficiently on a local device with low power consumption and fast performance, enabling new scenarios and cost models. SLMs can run on universally available chips like CPUs and GPUs, but their potential really comes alive running on Neural Processing Units (NPUs), such as the ones found in Microsoft Surface Copilot+ PCs. NPUs are specifically designed for processing machine learning workloads, leading to high performance per watt and thermal efficiency compared to CPUs or GPUs [1]. SLMs and NPUs together support running quite powerful Gen AI workloads efficiently on a laptop, even when running on battery power or multitasking. In this blog, we focus on running SLMs on Snapdragon® X Plus processors on the recently launched Surface Laptop 13-inch, using the Qualcomm® AI Hub, leading to efficient local inference, increased hardware utilization and minimal setup complexity. This is only one of many methods available - before diving into this specific use case, let’s first provide an overview of the possibilities for deploying small language models on Copilot+ PC NPUs. Qualcomm AI Engine Direct (QNN) SDK: This process requires converting SLMs into QNN binaries that can be executed through the NPU. The Qualcomm AI Hub provides a convenient way to compile any PyTorch, TensorFlow, or ONNX-converted models into QNN binaries executable by the Qualcomm AI Engine Direct SDK. Various precompiled models are directly available in the Qualcomm AI Hub, their collection of over 175 pre-optimized models, ready for download and integration into your application. ONNX Runtime: ONNX Runtime is an open-source inference engine from Microsoft designed to run models in the ONNX format. The QNN Execution Provider (EP) by Qualcomm Technologies optimizes inference on Snapdragon processors using AI acceleration hardware, mainly for mobile and embedded use. ONNX Runtime Gen AI is a specialized version optimized for generative AI tasks, including transformer-based models, aiming for high-performance inference in applications like large language models. Although ONNX Runtime with QNN EP can run models on Copilot+ PCs, some operator support is missing for Gen AI workloads. ONNX Runtime Gen AI is not yet publicly available for NPU; a private beta is currently out with an unclear ETA on public release at the time of releasing this blog. Here is the link to the Git repo for more info on upcoming releases microsoft/onnxruntime-genai: Generative AI extensions for onnxruntime Windows AI Foundry: Windows AI Foundry provides AI-supported features and APIs for Copilot+ PCs. It includes pre-built models such as Phi-Silica that can be inferred using Windows AI APIs. Additionally, it offers the capability to download models from the cloud for local inference on the device using Foundry Local. This feature is still in preview. You can learn more about Windows AI Foundry here: Windows AI Foundry | Microsoft Developer AI Toolkit for VS Code: The AI Toolkit for Visual Studio Code (VS Code) is a VS Code extension that simplifies generative AI app development by bringing together cutting-edge AI development tools and models from the Azure AI Foundry catalog and other catalogs like Hugging Face. This platform allows users to download multiple models either from the cloud or locally. It currently houses several models optimized to run on CPU, with support for NPU-based models forthcoming, starting with Deepseek R1. Comparison between different approaches Feature Qualcomm AI Hub ONNX Runtime (ORT) Windows AI Foundry AI Toolkit for VS code Availability of Models Wide set of AI models (vision, Gen AI, object detection, and audio). Any models can be integrated. NPU support for Gen AI tasks and ONNX Gen AI Runtime are not yet generally available. Phi Silica model is available through Windows AI APIs, additional AI models from cloud can be downloaded for local inference using Foundry Local Access to models from sources such as Azure AI Foundry and Hugging Face. Currently only supports Deepseek R1 and Phi 4 Mini models for NPU inference. Ease of development The API is user-friendly once the initial setup and end-to-end replication are complete. Simple setup, developer-friendly; however, limited support for custom operators means not all models deploy through ORT. Easiest framework to adopt—developers familiar with Windows App SDK face no learning curve. Intuitive interface for testing models via prompt-response, enabling quick experimentation and performance validation. Is processor or SoC independent No. Supports Qualcomm Technologies processors only. Models must be compiled and optimized for the specific SOC on the device. A list of supported chipsets is provided, and the resulting .bin files are SOC-specific. Limitations exist with QNN EP’s HTP backend: only quantized models and those with static shapes are currently supported. Yes. The tool can operate independently of SoC. It is part of the broader Windows Copilot Runtime framework, now rebranded as the Windows AI Foundry. Model-dependent. Easily deployable on-device; model download and inference are straightforward. As of writing this article and based on our team's research, we found Qualcomm AI Hub to be the most user-friendly and well-supported solution available at this time. In contrast, most other frameworks are still under development and not yet generally available. Before we dive into how to use Qualcomm AI Hub to run Small Language Models (SLMs), let’s first understand what Qualcomm AI Hub is. What is Qualcomm AI Hub? Qualcomm AI Hub is a platform designed to simplify the deployment of AI models for vision, audio, speech, and text applications on edge devices. It allows users to upload, optimize, and validate their models for specific target hardware—such as CPU, GPU, or NPU—within minutes. Models developed in PyTorch or ONNX are automatically converted for efficient on-device execution using frameworks like TensorFlow Lite, ONNX Runtime, or Qualcomm AI Engine Direct. The Qualcomm AI Hub offers access to a collection of over 100 pre-optimized models, with open-source deployment recipes available on GitHub and Hugging Face. Users can also test and profile these models on real devices with Snapdragon and Qualcomm platforms hosted in the cloud. In this blog we will be showing how you can use Qualcomm AI Hub to get a QNN context binary for models and use Qualcomm AI Engine to run those context binaries. The context binary is a SoC-specific deployment mechanism. When compiled for a device, it is expected that the model will be deployed to the same device. The format is operating system agnostic so the same model can be deployed on Android, Linux, or Windows. The context binary is designed only for the NPU. For more details on how to compile models in other formats, please visit the documentation here Overview of Qualcomm AI Hub — qai-hub documentation. The following case study details the efficient execution of the Phi-3.5 model using optimized, hardware-specific binaries on a Surface Laptop 13-inch powered by the Qualcomm Snapdragon X Plus processor, Hexagon™ NPU, and Qualcomm Al Hub. Microsoft Surface Engineering Case Study: Running Phi-3.5 Model Locally on Snapdragon X Plus on Surface Laptop 13-inch This case study details how the Phi-3.5 model was deployed on a Surface Laptop 13-inch powered by the Snapdragon X Plus processor. The study was developed and documented by the Surface DASH team, which specializes in delivering AI/ML solutions to Surface devices and generating data-driven insights through advanced telemetry. Using Qualcomm AI Hub, we obtained precompiled QNN context binaries tailored to the target SoC, enabling efficient local inference. This method maximizes hardware utilization and minimizes setup complexity. We used a Surface Laptop 13-inch with the Snapdragon X Plus processor as our test device. The steps below apply to the Snapdragon X Plus processor; however, the process remains similar for other Snapdragon X Series processors and devices as well. For the other processors, you may need to download different model variants of the desired models from Qualcomm AI Hub. Before you begin to follow along, please check the make and models of your NPU by navigating to Device Manager --> Neural Processors. We also used Visual Studio Code and Python (3.10.3.11, 3.12). We used the 3.11 version to run these steps below and recommend using the same, although there should be no difference in using a higher Python version. Before starting, let's create a new virtual environment in Python as a best practice. Follow the steps to create a new virtual environment here: https://code.visualstudio.com/docs/python/environments?from=20423#_creating-environments Create a folder named ‘genie_bundle’ store config and bin files. Download the QNN context binaries specific to your NPU and place the config files into the genie_bundle folder. Copy the .dll files from QNN SDK into the genie_bundle folder. Finally, execute the test prompt through genie-sdk in the required format for Phi-3.5. Setup steps in details Step 1: Setup local development environment Download QNN SDK: Go to the Qualcomm Software Center Qualcomm Neural Processing SDK | Qualcomm Developer and download the QNN SDK by clicking on Get Software (by default latest version of SDK gets downloaded). For the purpose of this demo, we used latest version available (2.34) . You may need to make an account on the Qualcomm website to access it. Step 2: Download QNN Context Binaries from Qualcomm AI Hub Models Download Binaries: Download the context binaries (.bin files) for the Phi-3.5-mini-instruct model from (Link to Download Phi-3.5 context binaries). Clone AI Hub Apps repo: Use the Genie SDK (Generative Runtime built on top of Qualcomm AI Direct Engine), and leverage the sample provided in https://github.com/quic/ai-hub-apps Setup folder structure to follow along the code: Create a folder named "genie_bundle" outside of the folder where AI Hub Apps repo was cloned. Selectively copy configuration files from AI Hub sample repo to 'genie_bundle' Step 3: Copy config files and edit files Copy config files to genie_bundle folder from ai-hub-apps. You will need two config files. You can use the PowerShell script below to copy the config files from repo to local genie folder created in previous steps. You also need to copy HTP backend config file as well as the genie config file from the repo # Define the source paths $sourceFile1 = "ai-hub-apps/tutorials/llm_on_genie/configs/htp/htp_backend_ext_config.json.template" $sourceFile2 = "ai-hub-apps/tutorials/llm_on_genie/configs/genie/phi_3_5_mini_instruct.json" # Define the local folder path $localFolder = "genie_bundle" # Define the destination file paths using the local folder $destinationFile1 = Join-Path -Path $localFolder -ChildPath "htp_backend_ext_config.json" $destinationFile2 = Join-Path -Path $localFolder -ChildPath "genie_config.json" # Create the local folder if it doesn't exist if (-not (Test-Path -Path $localFolder)) { New-Item -ItemType Directory -Path $localFolder } # Copy the files to the local folder Copy-Item -Path $sourceFile1 -Destination $destinationFile1 -Force Copy-Item -Path $sourceFile2 -Destination $destinationFile2 -Force Write-Host "Files have been successfully copied to the genie_bundle folder with updated names." After copying the files, you will need to make sure to change the default values of the parameters provided with template files copied. Edit HTP backend file in the newly pasted location - Change dsp_arch and soc_model to match with your configuration pdate soc model and dsp arch in HTP backend config files Edit genie_config file to include the downloaded binaries for Phi 3 models in previous steps Step 4: Download the tokenizer file from Hugging Face Visit the Hugging Face Website: Open your web browser and go to https://huggingface.co/microsoft/Phi-3.5-mini-instruct/tree/main Locate the Tokenizer File: On the Hugging Face page, find the tokenizer file for the Phi-3.5-mini-instruct model Download the File: Click on the download button to save the tokenizer file to your computer Save the File: Navigate to your genie_bundle folder and save the downloaded tokenizer file there. Note: There is an issue with the tokenizer.json file for the Phi 3.5 mini instruct model, where the output does not break words using spaces. To resolve this, you need to delete lines #192-197 in the tokenizer.json file. Download tokenizer files from the hugging face repo (Image Source - Hugging Face) Step 5: Copy files from QNN SDK Locate the QNN SDK Folder: Open the folder where you have installed the QNN SDK in step 1 and identify the required files. You need to copy the files from the below mentioned folder. Exact folder naming may change based on SDK version <QNN-SDK ROOT FOLDER>/qairt/2.34.0.250424/lib/hexagon-v75/unsigned <QNN-SDK ROOT FOLDER> /qairt/2.34.0.250424/lib/aarch64-windows-msvc <QNN-SDK ROOT FOLDER> /qairt/2.34.0.250424/bin/aarch64-windows-msvc Navigate to your genie_bundle folder and paste the copied files there. Step 6: Execute the Test Prompt Open Your Terminal: Navigate to your genie_bundle folder using your terminal or command prompt. Run the Command: Copy and paste the following command into your terminal: ./genie-t2t-run.exe -c genie_config.json -p "<|system|>\nYou are an assistant. Provide helpful and brief responses.\n<|user|>What is an NPU? \n<|end|>\n<|assistant|>\n" Check the Output: After running the command, you should see the response from the assistant in your terminal. This case study demonstrates the process of deploying a small language model (SLM) like Phi-3.5 on a Copilot+ PC using the Hexagon NPU and Qualcomm AI Hub. It outlines the setup steps, tooling, and configuration required for local inference using hardware-specific binaries. As deployment methods mature, this approach highlights a viable path toward efficient, scalable Gen AI execution directly on edge devices. Snapdragon® and Qualcomm® branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm, Snapdragon and Hexagon™ are trademarks or registered trademarks of Qualcomm Incorporated.2.6KViews6likes2CommentsJS AI Build‑a‑thon: Wrapping Up an Epic June 2025!
After weeks of building, testing, and learning — we’re officially wrapping up the first-ever JS AI Build-a-thon 🎉. This wasn't your average coding challenge. This was a hands-on journey where JavaScript and TypeScript developers dove deep into real-world AI concepts — from local GenAI prototyping to building intelligent agents and deploying production-ready apps. Whether you joined from the start or hopped on midway, you built something that matters — and that’s worth celebrating. Replay the Journey No worries if you joined late or want to revisit any part of the journey. The JS AI Build-a-thon was designed to let you learn at your own pace, so whether you're starting now or polishing up your final project, here’s your complete quest map: Build-a-thon set up guide: https://aka.ms/JSAIBuildathonSetup Quest 1: 🔧 Build your first GenAI app locally with GitHub Models 👉🏽 https://aka.ms/JSAIBuildathonQuest1 Quest 2: ☁️ Move your AI prototype to Azure AI Foundry 👉🏽 https://aka.ms/JSAIBuildathonQuest Quest 3: 🎨 Add a chat UI using Vite + Lit 👉🏽 https://aka.ms/JSAIBuildathonQuest3 Quest 4: 📄 Enhance your app with RAG (Chat with Your Data) 👉🏽 https://aka.ms/JSAIBuildathonQuest4 Quest 5: 🧠 Add memory and context to your AI app 👉🏽 https://aka.ms/JSAIBuildathonQuest5 Quest 6: ⚙️ Build your first AI Agent using AI Foundry 👉🏽 https://aka.ms/JSAIBuildathonQuest6 Quest 7: 🧩 Equip your agent with tools from an MCP server 👉🏽 https://aka.ms/JSAIBuildathonQuest7 Quest 8: 💬 Ground your agent with real-time search using Bing 👉🏽 https://aka.ms/JSAIBuildathonQuest8 Quest 9: 🚀 Build a real-world AI project with full-stack templates 👉🏽 https://aka.ms/JSAIBuildathonQuest9 Link to our space in the AI Discord Community: https://aka.ms/JSAIonDiscord Project Submission Guidelines 📌 Quest 9 is where it all comes together. Participants chose a problem, picked a template, customized it, submitted it, and rallied their community for support! 🏅 Claim Your Badge! Whether you completed select quests or went all the way, we celebrate your learning. If you participated in the June 2025 JS AI Build-a-thon, make sure to Submit the Participation Form to receive your participation badge recognizing your commitment to upskilling in AI with JavaScript/ TypeScript. What’s Next? We’re not done. In fact, we’re just getting started. We’re already cooking up JS AI Build-a-thon v2, which will introduce: Running everything locally with Foundry Local Real-world RAG with vector databases Advanced agent patterns with remote MCPs And much more based on your feedback Want to shape what comes next? Drop your ideas in the participation form and in our Discord. In the meantime, add these resources to your JavaScript + AI Dev Pack: 🔗 Microsoft for JavaScript developers 📚 Generative AI for Beginners with JavaScript Wrap-Up This build-a-thon showed what’s possible when developers are empowered to learn by doing. You didn’t just follow tutorials — you shipped features, connected services, and created working AI experiences. We can’t wait to see what you build next. 👉 Bookmark the repo 👉 Join the community on Join the Azure AI Foundry Discord Server! 👉 Stay building Until next time — keep coding, keep shipping!Quest 5 - I want to add conversation memory to my app
In this quest, you’ll explore how to build GenAI apps using a modern JavaScript AI framework, LangChain.js. LangChain.js helps you orchestrate prompts, manage memory, and build multi-step AI workflows all while staying in your favorite language. Using LangChain.js you will make your GenAI chat app feel truly personal by teaching it to remember. In this quest, you’ll upgrade your AI prototype with conversation memory, allowing it to recall previous interactions making the conversation flow more naturally and human-like. 👉 Want to catch up on the full program or grab more quests? https://aka.ms/JSAIBuildathon 💬 Got questions or want to hang with other builders? Join us on Discord — head to the #js-ai-build-a-thon channel. 🔧 What You’ll Build A smarter, context-aware chat backend that: Remembers user conversations across multiple exchanges (e.g., knowing "Terry" after you introduced yourself as Terry) Maintains session-specific memory so each chat thread feels consistent and coherent Uses LangChain.js abstractions to streamline state management. 🚀 What You’ll Need ✅ A GitHub account ✅ Visual Studio Code ✅ Node.js ✅ A working chat app from previous quests (UI + Azure-based chat endpoint) 🛠️ Concepts You’ll Explore Integrating LangChain.js Learn how LangChain.js simplifies building AI-powered web applications by providing a standard interface to connect your backend with Azure’s language models. You’ll see how using this framework decouples your code and unlocks advanced features. Adding Conversation Memory Understand why memory matters in chatbots. Explore how conversation memory lets your app remember previous user messages within each session enabling more context-aware and coherent conversations. Session-based Message History Implement session-specific chat histories using LangChain’s memory modules (ChatMessageHistory and BufferMemory). Each user or session gets its own history, so previous questions and answers inform future responses without manual log management. Seamless State Management Experience how LangChain handles chat logs and memory behind the scenes, freeing you from manually stitching together chat history or juggling context with every prompt. 📖 Bonus Resources to Go Deeper Exploring Generative AI in App Development: LangChain.js and Azure: a video introduction to LangChain.js and how you can build a project with LangChain.js and Azure 🦜️🔗 Langchain: the official LangChain.js documentation. GitHub - Azure-Samples/serverless-chat-langchainjs: Build your own serverless AI Chat with Retrieval-Augmented-Generation using LangChain.js, TypeScript and Azure: A GitHub sample that helps you build your own serverless AI Chat with Retrieval-Augmented-Generation using LangChain.js, TypeScript and Azure GitHub - Azure-Samples/langchainjs-quickstart-demo: Build a generative AI application using LangChain.js, from local to Azure: A GitHub sample that helps you build a generative AI application using LangChain.js, from local to Azure. Microsoft | 🦜️🔗 Langchain Official LangChain documentation on all functionalities related to Microsoft and Microsoft Azure. Quest 4 - I want to connect my AI prototype to external data using RAG | Microsoft Community Hub a link to the previous quest instructions.Quest 4 - I want to connect my AI prototype to external data using RAG
In Quest 4 of the JS AI Build-a-thon, you’ll integrate Retrieval-Augmented Generation (RAG) to give your AI apps access to external data like PDFs. You’ll explore embeddings, vector stores, and how to use the pdf-parse library in JavaScript to build more context-aware apps — with challenges to push you even further.Quest 6 - I want to build an AI Agent
Quest 6 of the JS AI Build-a-thon marks a major milestone — building your first intelligent AI agent using the Azure AI Foundry VS Code extension. In this quest, you’ll design, test, and integrate an agent that can use tools like Bing Search, respond to goals, and adapt in real-time. With updated instructions, real-world workflows, and powerful tooling, this is where your AI app gets truly smart.Quest 7: Create an AI Agent with Tools from an MCP Server
In Quest 7 of the JS AI Build-a-thon, developers explore how to create AI agents that use real tools through the Model Context Protocol (MCP). With the MCP TypeScript SDK and AI Toolkit in VS Code, you’ll connect your agent to a custom MCP server and give it real capabilities, like accessing your system's OS info. This builds on agentic development and introduces tooling practices that reflect how modern AI apps are built.Quest 9: I want to use a ready-made template
Building robust, scalable AI apps is tough, especially when you want to move fast, follow best practices, and avoid being bogged down by endless setup and configuration. In this quest, you’ll discover how to accelerate your journey from prototype to production by leveraging ready-made templates and modern cloud tools. Say goodbye to decision fatigue and hello to streamlined, industry-approved workflows you can make your own. 👉 Want to catch up on the full program or grab more quests? https://aka.ms/JSAIBuildathon 💬 Got questions or want to hang with other builders? Join us on Discord — head to the #js-ai-build-a-thon channel. 🚀 What You’ll Build A fully functional AI application deployed on Azure, customized to solve a real problem that matters to you. A codebase powered by a production-grade template, complete with all the necessary infrastructure-as-code, deployment scripts, and best practices already baked in. Your own proof-of-concept or MVP, ready to scale or show off to the world. 🛠️ What You Need ✅ GitHub account ✅ Visual Studio Code ✅ Node.js ✅ Azure subscription (free trials and student credits available) ✅ Azure Developer CLI (azd) ✅ The curiosity to solve a meaningful problem! 🧩 Concepts You’ll Explore Azure Developer CLI (azd) Learn how azd, the developer-first command-line tool, simplifies authentication, setup, deployment, and teardown for Azure apps. With intuitive commands like azd up and azd deploy, you can go from zero to running in the cloud no deep cloud expertise required. Production-Ready Templates Explore a gallery of customizable templates designed to get your app up and running fast. These templates aren’t just “hello world” they feature scalable architectures, sample code, and reusable infrastructure assets to launch everything from chatbots to RAG apps to full-stack solutions. Infrastructure as Code (IaC) See how every template bundle configuration files and scripts to automatically provision the cloud resources you need. You’ll get a taste of how top teams ship secure, repeatable, and maintainable systems without manually clicking through Azure dashboards. Best Practices by Default Templates incorporate industry best practices for code structure, deployment, and scalability. You’ll spend less time researching how to “do it right” and more time customizing your application to fit your unique use case. Customization for Real-World Problems Pick a template and make it yours! Whether you’re building a copilot, a chat-enabled app, or a serverless API, you’ll learn how to tweak the frontend, swap out backend logic, connect your own data sources, and shape the solution to solve a real-world problem you care about. 🌟 Bonus Resources Here are some additional resources to help you learn more about the Azure Developer CLI (azd) and the templates available: Kickstart JS/TS projects with azd Templates Kickstart your JavaScript projects with azd on YouTube ⏭️ What next? With production-ready templates and the Azure Developer CLI at your side, you’re ready to move from “just an idea” to a deployable, scalable solution without reinventing the wheel. Start with the right foundation, customize with confidence, and ship your next AI app like a pro! Once you have your project done, ensure you submit to GitHub - Azure-Samples/JS-AI-Build-a-thonAI Repo of the Week: Generative AI for Beginners with JavaScript
Introduction Ready to explore the fascinating world of Generative AI using your JavaScript skills? This week’s featured repository, Generative AI for Beginners with JavaScript, is your launchpad into the future of application development. Whether you're just starting out or looking to expand your AI toolbox, this open-source GitHub resource offers a rich, hands-on journey. It includes interactive lessons, quizzes, and even time-travel storytelling featuring historical legends like Leonardo da Vinci and Ada Lovelace. Each chapter combines narrative-driven learning with practical exercises, helping you understand foundational AI concepts and apply them directly in code. It’s immersive, educational, and genuinely fun. What You'll Learn 1. 🧠 Foundations of Generative AI and LLMs Start with the basics: What is generative AI? How do large language models (LLMs) work? This chapter lays the groundwork for how these technologies are transforming JavaScript development. 2. 🚀 Build Your First AI-Powered App Walk through setting up your environment and creating your first AI app. Learn how to configure prompts and unlock the potential of AI in your own projects. 3. 🎯 Prompt Engineering Essentials Get hands-on with prompt engineering techniques that shape how AI models respond. Explore strategies for crafting prompts that are clear, targeted, and effective. 4. 📦 Structured Output with JSON Learn how to guide the model to return structured data formats like JSON—critical for integrating AI into real-world applications. 5. 🔍 Retrieval-Augmented Generation (RAG) Go beyond static prompts by combining LLMs with external data sources. Discover how RAG lets your app pull in live, contextual information for more intelligent results. 6. 🛠️ Function Calling and Tool Use Give your LLM new powers! Learn how to connect your own functions and tools to your app, enabling more dynamic and actionable AI interactions. 7. 📚 Model Context Protocol (MCP) Dive into MCP, a new standard for organizing prompts, tools, and resources. Learn how it simplifies AI app development and fosters consistency across projects. 8. ⚙️ Enhancing MCP Clients with LLMs Build on what you’ve learned by integrating LLMs directly into your MCP clients. See how to make them smarter, faster, and more helpful. ✨ More chapters coming soon—watch the repo for updates! Companion App: Interact with History Experience the power of generative AI in action through the companion web app—where you can chat with historical figures and witness how JavaScript brings AI to life in real time. Conclusion Generative AI for Beginners with JavaScript is more than a course—it’s an adventure into how storytelling, coding, and AI can come together to create something fun and educational. Whether you're here to upskill, experiment, or build the next big thing, this repository is your all-in-one resource to get started with confidence. 🔗 Jump into the future of development—check out the repo and start building with AI today!