python
247 Topics- Introducing langchain-azure-storage: Azure Storage integrations for LangChainWe're excited to introduce langchain-azure-storage , the first official Azure Storage integration package built by Microsoft for LangChain 1.0. As part of its launch, we've built a new Azure Blob Storage document loader (currently in public preview) that improves upon prior LangChain community implementations. This new loader unifies both blob and container level access, simplifying loader integration. More importantly, it offers enhanced security through default OAuth 2.0 authentication, supports reliably loading millions to billions of documents through efficient memory utilization, and allows pluggable parsing, so you can leverage other document loaders to parse specific file formats. What are LangChain document loaders? A typical Retrieval‑Augmented Generation (RAG) pipeline follows these main steps: Collect source content (PDFs, DOCX, Markdown, CSVs) — often stored in Azure Blob Storage. Parse into text and associated metadata (i.e., represented as LangChain Document objects). Chunk + embed those documents and store in a vector store (e.g., Azure AI Search, Postgres pgvector, etc.). At query time, retrieve the most relevant chunks and feed them to an LLM as grounded context. LangChain document loaders make steps 1–2 turnkey and consistent so the rest of the stack (splitters, vector stores, retrievers) “just works”. See this LangChain RAG tutorial for a full example of these steps when building a RAG application in LangChain. How can the Azure Blob Storage document loader help? The langchain-azure-storage package offers the AzureBlobStorageLoader , a document loader that simplifies retrieving documents stored in Azure Blob Storage for use in a LangChain RAG application. Key benefits of the AzureBlobStorageLoader include: Flexible loading of Azure Storage blobs to LangChain Document objects. You can load blobs as documents from an entire container, a specific prefix within a container, or by blob names. Each document loaded corresponds 1:1 to a blob in the container. Lazy loading support for improved memory efficiency when dealing with large document sets. Documents can now be loaded one-at-a-time as you iterate over them instead of all at once. Automatically uses DefaultAzureCredential to enable seamless OAuth 2.0 authentication across various environments, from local development to Azure-hosted services. You can also explicitly pass your own credential (e.g., ManagedIdentityCredential , SAS token). Pluggable parsing. Easily customize how documents are parsed by providing your own LangChain document loader to parse downloaded blob content. Using the Azure Blob Storage document loader Installation To install the langchain-azure-storage package, run: pip install langchain-azure-storage Loading documents from a container To load all blobs from an Azure Blob Storage container as LangChain Document objects, instantiate the AzureBlobStorageLoader with the Azure Storage account URL and container name: from langchain_azure_storage.document_loaders import AzureBlobStorageLoader loader = AzureBlobStorageLoader( "https://<your-storage-account>.blob.core.windows.net/", "<your-container-name>" ) # lazy_load() yields one Document per blob for all blobs in the container for doc in loader.lazy_load(): print(doc.metadata["source"]) # The "source" metadata contains the full URL of the blob print(doc.page_content) # The page_content contains the blob's content decoded as UTF-8 text Loading documents by blob names To only load specific blobs as LangChain Document objects, you can additionally provide a list of blob names: from langchain_azure_storage.document_loaders import AzureBlobStorageLoader loader = AzureBlobStorageLoader( "https://<your-storage-account>.blob.core.windows.net/", "<your-container-name>", ["<blob-name-1>", "<blob-name-2>"] ) # lazy_load() yields one Document per blob for only the specified blobs for doc in loader.lazy_load(): print(doc.metadata["source"]) # The "source" metadata contains the full URL of the blob print(doc.page_content) # The page_content contains the blob's content decoded as UTF-8 text Pluggable parsing By default, loaded Document objects contain the blob's UTF-8 decoded content. To parse non-UTF-8 content (e.g., PDFs, DOCX, etc.) or chunk blob content into smaller documents, provide a LangChain document loader via the loader_factory parameter. When loader_factory is provided, the AzureBlobStorageLoader processes each blob with the following steps: Downloads the blob to a new temporary file Passes the temporary file path to the loader_factory callable to instantiate a document loader Uses that loader to parse the file and yield Document objects Cleans up the temporary file For example, below shows parsing PDF documents with the PyPDFLoader from the langchain-community package: from langchain_azure_storage.document_loaders import AzureBlobStorageLoader from langchain_community.document_loaders import PyPDFLoader # Requires langchain-community and pypdf packages loader = AzureBlobStorageLoader( "https://<your-storage-account>.blob.core.windows.net/", "<your-container-name>", prefix="pdfs/", # Only load blobs that start with "pdfs/" loader_factory=PyPDFLoader # PyPDFLoader will parse each blob as a PDF ) # Each blob is downloaded to a temporary file and parsed by PyPDFLoader instance for doc in loader.lazy_load(): print(doc.page_content) # Content parsed by PyPDFLoader (yields one Document per page in the PDF) This file path-based interface allows you to use any LangChain document loader that accepts a local file path as input, giving you access to a wide range of parsers for different file formats. Migrating from community document loaders to langchain-azure-storage If you're currently using AzureBlobStorageContainerLoader or AzureBlobStorageFileLoader from the langchain-community package, the new AzureBlobStorageLoader provides an improved alternative. This section provides step-by-step guidance for migrating to the new loader. Steps to migrate To migrate to the new Azure Storage document loader, make the following changes: Depend on the langchain-azure-storage package Update import statements from langchain_community.document_loaders to langchain_azure_storage.document_loaders . Change class names from AzureBlobStorageFileLoader and AzureBlobStorageContainerLoader to AzureBlobStorageLoader . Update document loader constructor calls to: Use an account URL instead of a connection string. Specify UnstructuredLoader as the loader_factory to continue to use Unstructured for parsing documents. Enable Microsoft Entra ID authentication in environment (e.g., run az login or configure managed identity) instead of using connection string authentication. Migration samples Below shows code snippets of what usage patterns look like before and after migrating from langchain-community to langchain-azure-storage : Before migration from langchain_community.document_loaders import AzureBlobStorageContainerLoader, AzureBlobStorageFileLoader container_loader = AzureBlobStorageContainerLoader( "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<account-key>;EndpointSuffix=core.windows.net", "<container>", ) file_loader = AzureBlobStorageFileLoader( "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<account-key>;EndpointSuffix=core.windows.net", "<container>", "<blob>" ) After migration from langchain_azure_storage.document_loaders import AzureBlobStorageLoader from langchain_unstructured import UnstructuredLoader # Requires langchain-unstructured and unstructured packages container_loader = AzureBlobStorageLoader( "https://<account>.blob.core.windows.net", "<container>", loader_factory=UnstructuredLoader # Only needed if continuing to use Unstructured for parsing ) file_loader = AzureBlobStorageLoader( "https://<account>.blob.core.windows.net", "<container>", "<blob>", loader_factory=UnstructuredLoader # Only needed if continuing to use Unstructured for parsing ) What's next? We're excited for you to try the new Azure Blob Storage document loader and would love to hear your feedback! Here are some ways you can help shape the future of langchain-azure-storage : Show support for interface stabilization - The document loader is currently in public preview and the interface may change in future versions based on feedback. If you'd like to see the current interface marked as stable, upvote the proposal PR to show your support. Report issues or suggest improvements - Found a bug or have an idea to make the document loaders better? File an issue on our GitHub repository. Propose new LangChain integrations - Interested in other ways to use Azure Storage with LangChain (e.g., checkpointing for agents, persistent memory stores, retriever implementations)? Create a feature request or write to us to let us know. Your input is invaluable in making langchain-azure-storage better for the entire community! Resources langchain-azure GitHub repository langchain-azure-storage PyPI package AzureBlobStorageLoader usage guide AzureBlobStorageLoader documentation reference
- Level up your Python + AI skills with our complete seriesWe've just wrapped up our live series on Python + AI, a comprehensive nine-part journey diving deep into how to use generative AI models from Python. The series introduced multiple types of models, including LLMs, embedding models, and vision models. We dug into popular techniques like RAG, tool calling, and structured outputs. We assessed AI quality and safety using automated evaluations and red-teaming. Finally, we developed AI agents using popular Python agents frameworks and explored the new Model Context Protocol (MCP). To help you apply what you've learned, all of our code examples work with GitHub Models, a service that provides free models to every GitHub account holder for experimentation and education. Even if you missed the live series, you can still access all the material using the links below! If you're an instructor, feel free to use the slides and code examples in your own classes. If you're a Spanish speaker, check out the Spanish version of the series. Python + AI: Large Language Models 📺 Watch recording In this session, we explore Large Language Models (LLMs), the models that power ChatGPT and GitHub Copilot. We use Python to interact with LLMs using popular packages like the OpenAI SDK and LangChain. We experiment with prompt engineering and few-shot examples to improve outputs. We also demonstrate how to build a full-stack app powered by LLMs and explain the importance of concurrency and streaming for user-facing AI apps. Slides for this session Code repository with examples: python-openai-demos Python + AI: Vector embeddings 📺 Watch recording In our second session, we dive into a different type of model: the vector embedding model. A vector embedding is a way to encode text or images as an array of floating-point numbers. Vector embeddings enable similarity search across many types of content. In this session, we explore different vector embedding models, such as the OpenAI text-embedding-3 series, through both visualizations and Python code. We compare distance metrics, use quantization to reduce vector size, and experiment with multimodal embedding models. Slides for this session Code repository with examples: vector-embedding-demos Python + AI: Retrieval Augmented Generation 📺 Watch recording In our third session, we explore one of the most popular techniques used with LLMs: Retrieval Augmented Generation. RAG is an approach that provides context to the LLM, enabling it to deliver well-grounded answers for a particular domain. The RAG approach works with many types of data sources, including CSVs, webpages, documents, and databases. In this session, we walk through RAG flows in Python, starting with a simple flow and culminating in a full-stack RAG application based on Azure AI Search. Slides for this session Code repository with examples: python-openai-demos Python + AI: Vision models 📺 Watch recording Our fourth session is all about vision models! Vision models are LLMs that can accept both text and images, such as GPT-4o and GPT-4o mini. You can use these models for image captioning, data extraction, question answering, classification, and more! We use Python to send images to vision models, build a basic chat-with-images app, and create a multimodal search engine. Slides for this session Code repository with examples: openai-chat-vision-quickstart Python + AI: Structured outputs 📺 Watch recording In our fifth session, we discover how to get LLMs to output structured responses that adhere to a schema. In Python, all you need to do is define a Pydantic BaseModel to get validated output that perfectly meets your needs. We focus on the structured outputs mode available in OpenAI models, but you can use similar techniques with other model providers. Our examples demonstrate the many ways you can use structured responses, such as entity extraction, classification, and agentic workflows. Slides for this session Code repository with examples: python-openai-demos Python + AI: Quality and safety 📺 Watch recording This session covers a crucial topic: how to use AI safely and how to evaluate the quality of AI outputs. There are multiple mitigation layers when working with LLMs: the model itself, a safety system on top, the prompting and context, and the application user experience. We focus on Azure tools that make it easier to deploy safe AI systems into production. We demonstrate how to configure the Azure AI Content Safety system when working with Azure AI models and how to handle errors in Python code. Then we use the Azure AI Evaluation SDK to evaluate the safety and quality of output from your LLM. Slides for this session Code repository with examples: ai-quality-safety-demos Python + AI: Tool calling 📺 Watch recording In the final part of the series, we focus on the technologies needed to build AI agents, starting with the foundation: tool calling (also known as function calling). We define tool call specifications using both JSON schema and Python function definitions, then send these definitions to the LLM. We demonstrate how to properly handle tool call responses from LLMs, enable parallel tool calling, and iterate over multiple tool calls. Understanding tool calling is absolutely essential before diving into agents, so don't skip over this foundational session. Slides for this session Code repository with examples: python-openai-demos Python + AI: Agents 📺 Watch recording In the penultimate session, we build AI agents! We use Python AI agent frameworks such as the new agent-framework from Microsoft and the popular LangGraph framework. Our agents start simple and then increase in complexity, demonstrating different architectures such as multiple tools, supervisor patterns, graphs, and human-in-the-loop workflows. Slides for this session Code repository with examples: python-ai-agent-frameworks-demos Python + AI: Model Context Protocol 📺 Watch recording In the final session, we dive into the hottest technology of 2025: MCP (Model Context Protocol). This open protocol makes it easy to extend AI agents and chatbots with custom functionality, making them more powerful and flexible. We demonstrate how to use the Python FastMCP SDK to build an MCP server running locally and consume that server from chatbots like GitHub Copilot. Then we build our own MCP client to consume the server. Finally, we discover how easy it is to connect AI agent frameworks like LangGraph and Microsoft agent-framework to MCP servers. With great power comes great responsibility, so we briefly discuss the security risks that come with MCP, both as a user and as a developer. Slides for this session Code repository with examples: python-mcp-demo131Views0likes0Comments
- Using Scikit-learn on Azure Web AppTOC Introduction to Scikit-learn System Architecture Architecture Focus of This Tutorial Setup Azure Resources Web App Storage Running Locally File and Directory Structure Training Models and Training Data Predicting with the Model Publishing the Project to Azure Deployment Configuration Running on Azure Web App Training the Model Using the Model for Prediction Troubleshooting Missing Environment Variables After Deployment Virtual Environment Resource Lock Issues Package Version Dependency Issues Default Binding Missing System Commands in Restricted Environments Conclusion References 1. Introduction to Scikit-learn Scikit-learn is a popular open-source Python library for machine learning, built on NumPy, SciPy, and matplotlib. It offers an efficient and easy-to-use toolkit for data analysis, data mining, and predictive modeling. Scikit-learn supports a variety of machine learning algorithms, including classification, regression, clustering, and dimensionality reduction (e.g., SVM, Random Forest, K-means). Its preprocessing utilities handle tasks like scaling, encoding, and missing data imputation. It also provides tools for model evaluation (e.g., accuracy, precision, recall) and pipeline creation, enabling users to chain preprocessing and model training into seamless workflows. 2. System Architecture Architecture Development Environment OS: Windows 11 Version: 24H2 Python Version: 3.7.3 Azure Resources App Service Plan: SKU - Premium Plan 0 V3 App Service: Platform - Linux (Python 3.9, Version 3.9.19) Storage Account: SKU - General Purpose V2 File Share: No backup plan Focus of This Tutorial This tutorial walks you through the following stages: Setting up Azure resources Running the project locally Publishing the project to Azure Running the application on Azure Troubleshooting common issues Each of the mentioned aspects has numerous corresponding tools and solutions. The relevant information for this session is listed in the table below. Local OS Windows Linux Mac V How to setup Azure resources Portal (i.e., REST api) ARM Bicep Terraform V How to deploy project to Azure VSCode CLI Azure DevOps GitHub Action V 3. Setup Azure Resources Web App We need to create the following resources or services: Manual Creation Required Resource/Service App Service Plan No Resource App Service Yes Resource Storage Account Yes Resource File Share Yes Service Go to the Azure Portal and create an App Service. Important configuration: OS: Select Linux (default if Python stack is chosen). Stack: Select Python 3.9 to avoid dependency issues. SKU: Choose at least Premium Plan to ensure enough memory for your AI workloads. Storage Create a Storage Account in the Azure Portal. Create a file share named data-and-model in the Storage Account. Mount the File Share to the App Service: Use the name data-and-model for consistency with tutorial paths. At this point, all Azure resources and services have been successfully created. Let’s take a slight detour and mount the recently created File Share to your Windows development environment. Navigate to the File Share you just created, and refer to the diagram below to copy the required command. Before copying, please ensure that the drive letter remains set to the default "Z" as the sample code in this tutorial will rely on it. Return to your development environment. Open a PowerShell terminal (do not run it as Administrator) and input the command copied in the previous step, as shown in the diagram. After executing the command, the network drive will be successfully mounted. You can open File Explorer to verify, as illustrated in the diagram. 4. Running Locally File and Directory Structure Please use VSCode to open a PowerShell terminal and enter the following commands: git clone https://github.com/theringe/azure-appservice-ai.git cd azure-appservice-ai .\scikit-learn\tools\add-venv.cmd If you are using a Linux or Mac platform, use the following alternative commands instead: git clone https://github.com/theringe/azure-appservice-ai.git cd azure-appservice-ai bash ./scikit-learn/tools/add-venv.sh After completing the execution, you should see the following directory structure: File and Path Purpose scikit-learn/tools/add-venv.* The script executed in the previous step (cmd for Windows, sh for Linux/Mac) to create all Python virtual environments required for this tutorial. .venv/scikit-learn-webjob/ A virtual environment specifically used for training models. scikit-learn/webjob/requirements.txt The list of packages (with exact versions) required for the scikit-learn-webjob virtual environment. .venv/scikit-learn/ A virtual environment specifically used for the Flask application, enabling API endpoint access for querying predictions. scikit-learn/requirements.txt The list of packages (with exact versions) required for the scikit-learn virtual environment. scikit-learn/ The main folder for this tutorial. scikit-learn/tools/create-folder.* A script to create all directories required for this tutorial in the File Share, including train, model, and test. scikit-learn/tools/download-sample-training-set.* A script to download a sample training set from the UCI Machine Learning Repository, containing heart disease data, into the train directory of the File Share. scikit-learn/webjob/train_heart_disease_model.py A script for training the model. It loads the training set, applies a machine learning algorithm (Logistic Regression), and saves the trained model in the model directory of the File Share. scikit-learn/webjob/train_heart_disease_model.sh A shell script for Azure App Service web jobs. It activates the scikit-learn-webjob virtual environment and starts the train_heart_disease_model.py script. scikit-learn/webjob/train_heart_disease_model.zip A ZIP file containing the shell script for Azure web jobs. It must be recreated manually whenever train_heart_disease_model.sh is modified. Ensure it does not include any directory structure. scikit-learn/api/app.py Code for the Flask application, including routes, port configuration, input parsing, model loading, predictions, and output generation. scikit-learn/.deployment A configuration file for deploying the project to Azure using VSCode. It disables the default Oryx build process in favor of custom scripts. scikit-learn/start.sh A script executed after deployment (as specified in the Portal's startup command). It sets up the virtual environment and starts the Flask application to handle web requests. Training Models and Training Data Return to VSCode and execute the following commands (their purpose has been described earlier). .\.venv\scikit-learn-webjob\Scripts\Activate.ps1 .\scikit-learn\tools\create-folder.cmd .\scikit-learn\tools\download-sample-training-set.cmd python .\scikit-learn\webjob\train_heart_disease_model.py If you are using a Linux or Mac platform, use the following alternative commands instead: source .venv/scikit-learn-webjob/bin/activate bash ./scikit-learn/tools/create-folder.sh bash ./scikit-learn/tools/download-sample-training-set.sh python ./scikit-learn/webjob/train_heart_disease_model.py After execution, the File Share will now include the following directories and files. Let’s take a brief detour to examine the structure of the training data downloaded from the public dataset website. The right side of the figure describes the meaning of each column in the dataset, while the left side shows the actual training data (after preprocessing). This is a predictive model that uses an individual’s physiological characteristics to determine the likelihood of having heart disease. Columns 1-13 represent various physiological features and background information of the patients, while Column 14 (originally Column 58) is the label indicating whether the individual has heart disease. The supervised learning process involves using a large dataset containing both features and labels. Machine learning algorithms (such as neural networks, SVMs, or in this case, logistic regression) identify the key features and their ranges that differentiate between labels. The trained model is then saved and can be used in services to predict outcomes in real time by simply providing the necessary features. Predicting with the Model Return to VSCode and execute the following commands. First, deactivate the virtual environment used for training the model, then activate the virtual environment for the Flask application, and finally, start the Flask app. Commands for Windows: deactivate .\.venv\scikit-learn\Scripts\Activate.ps1 python .\scikit-learn\api\app.py Commands for Linux or Mac: deactivate source .venv/scikit-learn/bin/activate python ./scikit-learn/api/app.py When you see a screen similar to the following, it means the server has started successfully. Press Ctrl+C to stop the server if needed. Before conducting the actual test, let’s construct some sample human feature data: [63, 1, 3, 145, 233, 1, 0, 150, 0, 2.3, 0, 0, 1] [63, 1, 3, 305, 233, 1, 0, 150, 0, 2.3, 0, 0, 1] Referring to the feature description table from earlier, we can see that the only modified field is Column 4 ("Resting Blood Pressure"), with the second sample having an abnormally high value. (Note: Normal resting blood pressure ranges are typically 90–139 mmHg.) Next, open a PowerShell terminal and use the following curl commands to send requests to the app: curl -X GET http://127.0.0.1:8000/api/detect -H "Content-Type: application/json" -d '{"info": [63, 1, 3, 145, 233, 1, 0, 150, 0, 2.3, 0, 0, 1]}' curl -X GET http://127.0.0.1:8000/api/detect -H "Content-Type: application/json" -d '{"info": [63, 1, 3, 305, 233, 1, 0, 150, 0, 2.3, 0, 0, 1]}' You should see the prediction results, confirming that the trained model is working as expected. 5. Publishing the Project to Azure Deployment In the VSCode interface, right-click on the target App Service where you plan to deploy your project. Manually select the local project folder named scikit-learn as the deployment source, as shown in the image below. Configuration After deployment, the App Service will not be functional yet and will still display the default welcome page. This is because the App Service has not been configured to build the virtual environment and start the Flask application. To complete the setup, go to the Azure Portal and navigate to the App Service. The following steps are critical, and their execution order must be correct. To avoid delays, it’s recommended to open two browser tabs beforehand, complete the settings in each, and apply them in sequence. Refer to the following two images for guidance. You need to do the following: Set the Startup Command: Specify the path to the script you deployed bash /home/site/wwwroot/start.sh Set Two App Settings: WEBSITES_CONTAINER_START_TIME_LIMIT=600 The value is in seconds, ensuring the Startup Command can continue execution beyond the default timeout of 230 seconds. This tutorial’s Startup Command typically takes around 300 seconds, so setting it to 600 seconds provides a safety margin and accommodates future project expansion (e.g., adding more packages). WEBSITES_ENABLE_APP_SERVICE_STORAGE=1 This setting is required to enable the App Service storage feature, which is necessary for using web jobs (e.g., for model training). Step-by-Step Process: Before clicking Continue, switch to the next browser tab and set up all the app settings. In the second tab, apply all app settings, then switch back to the first tab. Click Continue in the first tab and wait for several seconds for the operation to complete. Once completed, switch to the second tab and click Continue within 5 seconds. Ensure to click Continue promptly within 5 seconds after the previous step to finish all settings. After completing the configuration, wait for about 10 minutes for the settings to take effect. Then, navigate to the WebJobs section in the Azure Portal and upload the ZIP file mentioned in the earlier sections. Set its trigger type to Manual. At this point, the entire deployment process is complete. For future code updates, you only need to redeploy from VSCode; there is no need to reconfigure settings in the Azure Portal. 6. Running on Azure Web App Training the Model Go to the Azure Portal, locate your App Service, and navigate to the WebJobs section. Click on Start to initiate the job and wait for the results. During this process, you may need to manually refresh the page to check the status of the job execution. Refer to the image below for guidance. Once you see the model report in the Logs, it indicates that the model training is complete, and the Flask app is ready for predictions. You can also find the newly trained model in the File Share mounted in your local environment. Using the Model for Prediction Just like in local testing, open a PowerShell terminal and use the following curl commands to send requests to the app: # Note: Replace both instances of scikit-learn-portal-app with the name of your web app. curl -X GET https://scikit-learn-portal-app.azurewebsites.net/api/detect -H "Content-Type: application/json" -d '{"info": [63, 1, 3, 145, 233, 1, 0, 150, 0, 2.3, 0, 0, 1]}' curl -X GET https://scikit-learn-portal-app.azurewebsites.net/api/detect -H "Content-Type: application/json" -d '{"info": [63, 1, 3, 305, 233, 1, 0, 150, 0, 2.3, 0, 0, 1]}' As with the local environment, you should see the expected results. 7. Troubleshooting Missing Environment Variables After Deployment Symptom: Even after setting values in App Settings (e.g., WEBSITES_CONTAINER_START_TIME_LIMIT), they do not take effect. Cause: App Settings (e.g., WEBSITES_CONTAINER_START_TIME_LIMIT, WEBSITES_ENABLE_APP_SERVICE_STORAGE) are reset after updating the startup command. Resolution: Use Azure CLI or the Azure Portal to reapply the App Settings after deployment. Alternatively, set the startup command first, and then apply app settings. Virtual Environment Resource Lock Issues Symptom: The app fails to redeploy, even though no configuration or code changes were made. Cause: The virtual environment folder cannot be deleted due to active resource locks from the previous process. Files or processes from the previous virtual environment session remain locked. Resolution: Deactivate processes before deletion and use unique epoch-based folder names to avoid conflicts. Refer to scikit-learn/start.sh in this tutorial for implementation. Package Version Dependency Issues Symptom: Conflicts occur between package versions specified in requirements.txt and the versions required by the Python environment. This results in errors during installation or runtime. Cause: Azure deployment environments enforce specific versions of Python and pre-installed packages, leading to mismatches when older or newer versions are explicitly defined. Additionally, the read-only file system in Azure App Service prevents modifying global packages like typing-extensions. Resolution: Pin compatible dependency versions. For example, follow the instructions for installing scikit-learn from the scikit-learn 1.5.2 documentation. Refer to scikit-learn/requirements.txt in this tutorial. Default Binding Symptom: Despite setting the WEBSITES_PORT parameter in App Settings to match the port Flask listens on (e.g., Flask's default 5000), the deployment still fails. Cause: The Flask framework's default settings are not overridden to bind to 0.0.0.0 or the required port. Resolution: Explicitly bind Flask to 0.0.0.0:8000 in app.py . To avoid additional issues, it’s recommended to use the Azure Python Linux Web App's default port (8000), as this minimizes the need for extra configuration. Missing System Commands in Restricted Environments Symptom: In the WebJobs log, an error is logged stating that the ls command is missing. Cause: This typically occurs in minimal environments, such as Azure App Services, containers, or highly restricted shells. Resolution: Use predefined paths or variables in the script instead of relying on system commands. Refer to scikit-learn/webjob/train_heart_disease_model.sh in this tutorial for an example of handling such cases. 8. Conclusion Azure App Service, while being a PaaS product with less flexibility compared to a VM, still offers several powerful features that allow us to fully leverage the benefits of AI frameworks. For example, the resource-intensive model training phase can be offloaded to a high-performance local machine. This approach enables the App Service to focus solely on loading models and serving predictions. Additionally, if the training dataset is frequently updated, we can configure WebJobs with scheduled triggers to retrain the model periodically, ensuring the prediction service always uses the latest version. These capabilities make Azure App Service well-suited for most business scenarios. 9. References Scikit-learn Documentation UCI Machine Learning Repository Azure App Service Documentation735Views1like1Comment
- Scaling Azure Functions Python with orjsonAzure Functions now supports ORJSON in the Python worker, giving developers an easy way to boost performance by simply adding the library to their environment. Benchmarks show that ORJSON delivers measurable gains in throughput and latency, with the biggest improvements on small–medium payloads common in real-world workloads. In tests, ORJSON improved throughput by up to 6% on 35 KB payloads and significantly reduced response times under load, while also eliminating dropped requests in high-throughput scenarios. With its Rust-based speed, standards compliance, and drop-in adoption, ORJSON offers a straightforward path to faster, more scalable Python Functions without any code changes.282Views0likes0Comments
- LangChain v1 is now generally available!Today LangChain v1 officially launches and marks a new era for the popular AI agent library. The new version ushers in a more streamlined, and extensible foundation for building agentic LLM applications. In this post we'll breakdown what’s new, what changed, and what “general availability” means in practice. Join Microsoft Developer Advocates, Marlene Mhangami and Yohan Lasorsa, to see live demos of the new API and find out more about what JavaScript and Python developers need to know about v1. Register for this event here. Why v1? The Motivation Behind the Redesign The number of abstractions in LangChain had grown over the years to include chains, agents, tools, wrappers, prompt helpers and more, which, while powerful, introduced complexity and fragmentation. As model APIs evolve (multimodal inputs, richer structured output, tool-calling semantics), LangChain needed a cleaner, more consistent core to ensure production ready stability. In v1: All existing chains and agent abstractions in the old LangChain are deprecated; they are replaced by a single high-level agent abstraction built on LangGraph internals. LangGraph becomes the foundational runtime for durable, stateful, orchestrated execution. LangChain now emphasizes being the “fast path to agents” that doesn’t hide but builds upon LangGraph. The internal message format has been upgraded to support standard content blocks (e.g. text, reasoning, citations, tool calls) across model providers, decoupling “content” from raw strings. Namespace cleanup: the langchain package now focuses tightly on core abstractions (agents, models, messages, tools), while legacy patterns are moved into langchain-classic (or equivalents). What’s New & Noteworthy for Developers Here are key changes developers should pay attention to: 1. create_agent becomes the default API The create_agent function is now the idiomatic way to spin up agents in v1. It replaces older constructs (e.g. create_react_agent) with a clearer, more modular API. You can also now compose middleware around model calls, tool calls, before/after hooks, error handling, etc. 2. Standard content blocks & normalized message model One of LangChain's greatest stregnth's is it's model agnosticism. Content blocks move to standardize all outputs, so developers know exactly what to expect regardless of the model they are using. Responses from models are no longer opaque strings. Instead, they carry structured `content_blocks` which classify parts of the output (e.g. “text”, “reasoning”, “citation”, “tool_call”). 3. Multimodal and richer model inputs / outputs LangChain continues to support more than just text-based interactions, but in a more comprehensive way in v1. Models can accept and return files, images, video, etc., and the message format reflects this flexibility. This upgrade prepares us well for the next generation of models with mixed modalities (vision, audio, etc.). 4. Middleware hooks Because create_agent is designed as a pluggable pipeline, developers can now inject logic before/after model calls, before tool calls and more. New middleware such as 'human in the loop' and 'summarization' middleware have been added. This is a feature of the new package that I am most excited about it! Even with the simplified agents API, this option provides more room to customize workflows! Developers can try pre-built middleware or make their own. 5. Simplified, leaner namespace Many formerly top-level modules or helper classes have been removed or relocated to langchain-classic (or similarly stamped “legacy”) to declutter the main API surface. A migration guide is available to help projects transition from v0 to v1. While v1 is now the main line, older v0 is still documented and maintained for compatibility. What “General Availability” Means (and Doesn’t) v1 is production-ready, after testing the alpha version. The stable v0 release line remains supported for those unwilling or unable to migrate immediately. Breaking changes in public APIs will be accompanied by version bumps (i.e. minor version increments) and deprecation notices. The roadmap anticipates minor versions every 2–3 months (with patch releases more frequently). Because the field of LLM applications is evolving rapidly, the team expects continued iterations in v1—even in GA mode—with users encouraged to surface feedback, file issues, and adopt the migration path. (This is in line with the philosophy stated in docs.) Developer Callouts & Suggested Steps Some things we recommend for developers to do to get started with v1: Try the new API Now! LangChain Azure AI and Azure OpenAI have migrated to LangChain v1 and are ready to test! Learn more about using LangChain and Azure AI: Python: https://docs.langchain.com/oss/python/integrations/providers/azure_ai JavaScript: https://docs.langchain.com/oss/javascript/integrations/providers/microsoft Join us for a Live Stream on Wednesday 22 October 2025 Join Microsoft Developer Advocates Marlene Mhangami and Yohan Lasorsa for a livestream this Wednesday to see live demos and find out more about what JavaScript and Python developers need to know about v1. Register for this event here.
- Transform Your AI Applications with Local LLM DeploymentIntroduction Are you tired of watching your AI application costs spiral out of control every time your user base grows? As AI Engineers and Developers, we've all felt the pain of cloud-dependent LLM deployments. Every API call adds up, latency becomes a bottleneck in real-time applications, and sensitive data must leave your infrastructure to get processed. Meanwhile, your users demand faster responses, better privacy, and more reliable service. What if there was a way to run powerful language models directly on your users' devices or your local infrastructure? Enter the world of Edge AI deployment with Microsoft's Foundry Local a game-changing approach that brings enterprise-grade LLM capabilities to local hardware while maintaining full OpenAI API compatibility. The Edge AI for Beginners https://aka.ms/edgeai-for-beginners curriculum provides AI Engineers and Developers with comprehensive, hands-on training to master local LLM deployment. This isn't just another theoretical course, it's a practical guide that will transform how you think about AI infrastructure, combining cutting-edge local deployment techniques with production-ready implementation patterns. In this post, we'll explore why Edge AI deployment represents the future of AI applications, dive deep into Foundry Local's capabilities across multiple frameworks, and show you exactly how to implement local LLM solutions that deliver both technical excellence and significant business value. Why Edge AI Deployment Changes Everything for Developers The shift from cloud-dependent to edge-deployed AI represents more than just a technical evolution, it's a fundamental reimagining of how we build intelligent applications. As AI Engineers, we're witnessing a transformation that addresses the most pressing challenges in modern AI deployment while opening up entirely new possibilities for innovation. Consider the current state of cloud-based LLM deployment. Every user interaction requires a round-trip to external servers, introducing latency that can kill user experience in real-time applications. Costs scale linearly (or worse) with usage, making successful applications expensive to operate. Sensitive data must traverse networks and live temporarily in external systems, creating compliance nightmares for enterprise applications. Edge AI deployment fundamentally changes this equation. By running models locally, we achieve several critical advantages: Data Sovereignty and Privacy Protection: Your sensitive data never leaves your infrastructure. For healthcare applications processing patient records, financial services handling transactions, or enterprise tools managing proprietary information, this represents a quantum leap in security posture. You maintain complete control over data flow, meeting even the strictest compliance requirements without architectural compromises. Real-Time Performance at Scale: Local inference eliminates network latency entirely. Instead of 200-500ms round-trips to cloud APIs, you get sub-10ms response times. This enables entirely new categories of applications—real-time code completion, interactive AI tutoring systems, voice assistants that respond instantly, and IoT devices that make intelligent decisions without connectivity. Predictable Cost Structure: Transform variable API costs into fixed infrastructure investments. Instead of paying per-token for potentially unlimited usage, you invest in local hardware that serves unlimited requests. This makes ROI calculations straightforward and removes the fear of viral success destroying your margins. Offline Capabilities and Resilience: Local deployment means your AI features work even when connectivity fails. Mobile applications can provide intelligent features in areas with poor network coverage. Critical systems maintain AI capabilities during network outages. Edge devices in remote locations operate autonomously. The technical implications extend beyond these obvious benefits. Local deployment enables new architectural patterns: AI-powered applications that work entirely client-side, edge computing nodes that make intelligent routing decisions, and distributed systems where intelligence lives close to data sources. Foundry Local: Multi-Framework Edge AI Deployment Made Simple Microsoft's Foundry Local https://www.foundrylocal.ai represents a breakthrough in local AI deployment, designed specifically for developers who need production-ready edge AI solutions. Unlike single-framework tools, Foundry Local provides a unified platform that works seamlessly across multiple programming languages and deployment scenarios while maintaining full compatibility with existing OpenAI-based workflows. The platform's approach to multi-framework support means you're not locked into a single technology stack. Whether you're building TypeScript applications, Python ML pipelines, Rust systems programming projects, or .NET enterprise applications, Foundry Local provides native SDKs and consistent APIs that integrate naturally with your existing codebase. Enterprise-Grade Model Catalog: Foundry Local comes with a curated selection of production-ready models optimized for edge deployment. The `phi-3.5-mini` model delivers impressive performance in a compact footprint, perfect for resource-constrained environments. For applications requiring more sophisticated reasoning, `qwen2.5-0.5b` provides enhanced capabilities while maintaining efficiency. When you need maximum capability and have sufficient hardware resources, `gpt-oss-20b` offers state-of-the-art performance with full local control. Intelligent Hardware Optimization: One of Foundry Local's most powerful features is its automatic hardware detection and optimization. The platform automatically identifies your available compute resources, NVIDIA CUDA GPUs, AMD GPUs, Intel NPUs, Qualcomm Snapdragon NPUs, or CPU-only environments and downloads the most appropriate model variant. This means the same application code delivers optimal performance across diverse hardware configurations without manual intervention. ONNX Runtime Acceleration: Under the hood, Foundry Local leverages Microsoft's ONNX Runtime for maximum performance. This provides significant advantages over generic inference engines, delivering optimized execution paths for different hardware architectures while maintaining model accuracy and compatibility. OpenAI SDK Compatibility: Perhaps most importantly for developers, Foundry Local maintains complete API compatibility with the OpenAI SDK. This means existing applications can migrate to local inference by changing only the endpoint configuration—no rewriting of application logic, no learning new APIs, no disruption to existing workflows. The platform handles the complex aspects of local AI deployment automatically: model downloading, hardware-specific optimization, memory management, and inference scheduling. This allows developers to focus on building intelligent applications rather than managing AI infrastructure. Framework-Agnostic Benefits: Foundry Local's multi-framework approach delivers consistent benefits regardless of your technology choices. Whether you're working in a Node.js microservices architecture, a Python data science environment, a Rust embedded system, or a C# enterprise application, you get the same advantages: reduced latency, eliminated API costs, enhanced privacy, and offline capabilities. This universal compatibility means teams can adopt edge AI deployment incrementally, starting with pilot projects in their preferred language and expanding across their technology stack as they see results. The learning curve is minimal because the API patterns remain familiar while the underlying infrastructure transforms to local deployment. Implementing Edge AI: From Code to Production Moving from cloud APIs to local AI deployment requires understanding the implementation patterns that make edge AI both powerful and practical. Let's explore how Foundry Local's SDKs enable seamless integration across different development environments, with real-world code examples that you can adapt for your production systems. Python Implementation for Data Science and ML Pipelines Python developers will find Foundry Local's integration particularly natural, especially in data science and machine learning contexts where local processing is often preferred for security and performance reasons. import openai from foundry_local import FoundryLocalManager # Initialize with automatic hardware optimization alias = "phi-3.5-mini" manager = FoundryLocalManager(alias) This simple initialization handles a remarkable amount of complexity automatically. The `FoundryLocalManager` detects your hardware configuration, downloads the most appropriate model variant for your system, and starts the local inference service. Behind the scenes, it's making intelligent decisions about memory allocation, selecting optimal execution providers, and preparing the model for efficient inference. # Configure OpenAI client for local deployment client = openai.OpenAI( base_url=manager.endpoint, api_key=manager.api_key # Not required for local, but maintains API compatibility ) # Production-ready inference with streaming def analyze_document(content: str): stream = client.chat.completions.create( model=manager.get_model_info(alias).id, messages=[{ "role": "system", "content": "You are an expert document analyzer. Provide structured analysis." }, { "role": "user", "content": f"Analyze this document: {content}" }], stream=True, temperature=0.7 ) result = "" for chunk in stream: if chunk.choices[0].delta.content: content_piece = chunk.choices[0].delta.content result += content_piece yield content_piece # Enable real-time UI updates return result Key implementation benefits here: • Automatic model management: The `FoundryLocalManager` handles model lifecycle, memory optimization, and hardware-specific acceleration without manual configuration. • Streaming interface compatibility: Maintains the familiar OpenAI streaming API while processing locally, enabling real-time user interfaces with zero latency overhead. • Production error handling: The manager includes built-in retry logic, graceful degradation, and resource management for reliable production deployment. JavaScript/TypeScript Implementation for Web Applications JavaScript and TypeScript developers can integrate local AI capabilities directly into web applications, enabling entirely new categories of client-side intelligent features. import { OpenAI } from "openai"; import { FoundryLocalManager } from "foundry-local-sdk"; class LocalAIService { constructor() { this.foundryManager = null; this.openaiClient = null; this.isInitialized = false; } async initialize(modelAlias = "phi-3.5-mini") { this.foundryManager = new FoundryLocalManager(); const modelInfo = await this.foundryManager.init(modelAlias); this.openaiClient = new OpenAI({ baseURL: this.foundryManager.endpoint, apiKey: this.foundryManager.apiKey, }); this.isInitialized = true; return modelInfo; } The initialization pattern establishes local AI capabilities with full error handling and resource management. This enables web applications to provide AI features without external API dependencies. async generateCodeCompletion(codeContext, userPrompt) { if (!this.isInitialized) { throw new Error("LocalAI service not initialized"); } try { const completion = await this.openaiClient.chat.completions.create({ model: this.foundryManager.getModelInfo().id, messages: [ { role: "system", content: "You are a code completion assistant. Provide accurate, efficient code suggestions." }, { role: "user", content: `Context: ${codeContext}\n\nComplete: ${userPrompt}` } ], max_tokens: 150, temperature: 0.2 }); return completion.choices[0].message.content; } catch (error) { console.error("Local AI completion failed:", error); throw new Error("Code completion unavailable"); } } } Implementation advantages for web applications • Zero-dependency AI features: Applications work entirely offline once models are downloaded, enabling AI capabilities in disconnected environments. • Instant response times: Eliminate network latency for real-time features like code completion, content generation, or intelligent search. • Client-side privacy: Sensitive code or content never leaves the user's device, meeting strict security requirements for enterprise development tools. Cross-Platform Production Deployment Patterns Both Python and JavaScript implementations share common production deployment patterns that make Foundry Local particularly suitable for enterprise applications: Automatic Hardware Optimization: The platform automatically detects and utilizes available acceleration hardware. On systems with NVIDIA GPUs, it leverages CUDA acceleration. On newer Intel systems, it uses NPU acceleration. On ARM-based systems like Apple Silicon or Qualcomm Snapdragon, it optimizes for those architectures. This means the same application code delivers optimal performance across diverse deployment environments. Graceful Resource Management: Foundry Local includes sophisticated memory management and resource allocation. Models are loaded efficiently, memory is recycled properly, and concurrent requests are handled intelligently to maintain system stability under load. Production Monitoring Integration: The platform provides comprehensive metrics and logging that integrate naturally with existing monitoring systems, enabling production observability for AI workloads running at the edge. These implementation patterns demonstrate how Foundry Local transforms edge AI from an experimental concept into a practical, production-ready deployment strategy that works consistently across different technology stacks and hardware environments. Measuring Success: Technical Performance and Business Impact The transition to edge AI deployment delivers measurable improvements across both technical and business metrics. Understanding these impacts helps justify the architectural shift and demonstrates the concrete value of local LLM deployment in production environments. Technical Performance Gains Latency Elimination: The most immediately visible benefit is the dramatic reduction in response times. Cloud API calls typically require 200-800ms round-trips, depending on geographic location and network conditions. Local inference with Foundry Local reduces this to sub-10ms response times—a 95-99% improvement that fundamentally changes user experience possibilities. Consider a code completion feature: cloud-based completion feels sluggish and interrupts developer flow, while local completion provides instant suggestions that enhance productivity. The same applies to real-time chat applications, interactive AI tutoring systems, and any application where response latency directly impacts usability. Automatic Hardware Utilization: Foundry Local's intelligent hardware detection and optimization delivers significant performance improvements without manual configuration. On systems with NVIDIA RTX 4000 series GPUs, inference speeds can be 10-50x faster than CPU-only processing. On newer Intel systems with NPUs, the platform automatically leverages neural processing units for efficient AI workloads. Apple Silicon systems benefit from Metal Performance Shaders optimization, delivering excellent performance per watt. ONNX Runtime Optimization: Microsoft's ONNX Runtime provides substantial performance advantages over generic inference engines. In benchmark testing, ONNX Runtime consistently delivers 2-5x performance improvements compared to standard PyTorch or TensorFlow inference, while maintaining full model accuracy and compatibility. Scalability Characteristics: Local deployment transforms scaling economics entirely. Instead of linear cost scaling with usage, you get horizontal scaling through hardware deployment. A single modern GPU can handle hundreds of concurrent inference requests, making per-request costs approach zero for high-volume applications. Business Impact Analysis Cost Structure Transformation: The financial implications of local deployment are profound. Consider an application processing 1 million tokens daily through OpenAI's API—this represents $20-60 in daily costs depending on the model. Over a year, this becomes $7,300-21,900 in recurring expenses. A comparable local deployment might require a $2,000-5,000 hardware investment with no ongoing API costs. For high-volume applications, the savings become dramatic. Applications processing 100 million tokens monthly face $60,000-180,000 annual API costs. Local deployment with appropriate hardware infrastructure could reduce this to electricity and maintenance costs—typically under $10,000 annually for equivalent processing capacity. Enhanced Privacy and Compliance: Local deployment eliminates data sovereignty concerns entirely. Healthcare applications processing patient records, financial services handling transaction data, and enterprise tools managing proprietary information can deploy AI capabilities without data leaving their infrastructure. This simplifies compliance with GDPR, HIPAA, SOX, and other regulatory frameworks while reducing legal and security risks. Operational Resilience: Local deployment provides significant business continuity advantages. Applications continue functioning during network outages, API service disruptions, or third-party provider issues. For mission-critical systems, this resilience can prevent costly downtime and maintain user productivity during external service failures. Development Velocity: Local deployment accelerates development cycles by eliminating API rate limits, usage quotas, and external dependencies during development and testing. Developers can iterate freely, run comprehensive test suites, and experiment with AI features without cost concerns or rate limiting delays. Enterprise Adoption Metrics Real-world enterprise deployments demonstrate measurable business value: Local Usage: Foundry Local for internal AI-powered tools, reporting 60-80% reduction in AI-related operational costs while improving developer productivity through instant AI responses in development environments. Manufacturing Applications: Industrial IoT deployments using edge AI for predictive maintenance show 40-60% reduction in unplanned downtime while eliminating cloud connectivity requirements in remote facilities. Financial Services: Trading firms deploying local LLMs for market analysis report sub-millisecond decision latencies while maintaining complete data isolation for competitive advantage and regulatory compliance. ROI Calculation Framework For AI Engineers evaluating edge deployment, consider these quantifiable factors: Direct Cost Savings: Compare monthly API costs against hardware amortization over 24-36 months. Most applications with >$1,000 monthly API costs achieve positive ROI within 12-18 months. Performance Value: Quantify the business impact of reduced latency. For customer-facing applications, each 100ms of latency reduction typically correlates with 1-3% conversion improvement. Risk Mitigation: Calculate the cost of downtime or compliance violations prevented by local deployment. For many enterprise applications, avoiding a single significant outage justifies the infrastructure investment. Development Efficiency: Measure developer productivity improvements from unlimited local AI access during development. Teams report 20-40% faster iteration cycles when AI features can be tested without external dependencies. These metrics demonstrate that edge AI deployment with Foundry Local delivers both immediate technical improvements and substantial long-term business value, making it a strategic investment in AI infrastructure that pays dividends across multiple dimensions. Your Edge AI Journey Starts Here The shift to edge AI represents more than just a technical evolution, it's an opportunity to fundamentally improve your applications while building valuable expertise in an emerging field. Whether you're looking to reduce costs, improve performance, or enhance privacy, the path forward involves both learning new concepts and connecting with a community of practitioners solving similar challenges. Master Edge AI with Comprehensive Training The Edge AI for Beginners https://aka.ms/edgeai-for-beginners curriculum provides the complete foundation you need to become proficient in local AI deployment. This isn't a superficial overview, it's a comprehensive, hands-on program designed specifically for developers who want to build production-ready edge AI applications. The curriculum takes you through hours of structured learning, progressing from fundamental concepts to advanced deployment scenarios. You'll start by understanding the principles of edge AI and local inference, then dive deep into practical implementation with Foundry Local across multiple programming languages. The program includes working examples and comprehensive sample applications that demonstrate real-world use cases. What sets this curriculum apart is its practical focus. Instead of theoretical discussions, you'll build actual applications: document analysis systems that work offline, real-time code completion tools, intelligent chatbots that protect user privacy, and IoT applications that make decisions locally. Each project teaches both the technical implementation and the architectural thinking needed for successful edge AI deployment. The curriculum covers multi-framework deployment patterns extensively, ensuring you can apply edge AI principles regardless of your preferred development stack. Whether you're working in Python data science environments, JavaScript web applications, C# enterprise systems, or Rust embedded projects, you'll learn the patterns and practices that make edge AI successful. Join a Community of AI Engineers Learning edge AI doesn't happen in isolation, it requires connection with other developers who are solving similar challenges and discovering new possibilities. The Foundry Local Discord community https://aka.ms/foundry-local-discord provides exactly this environment, connecting AI Engineers and Developers from around the world who are implementing local AI solutions. This community serves multiple crucial functions for your development as an edge AI practitioner. You'll find experienced developers sharing implementation patterns they've discovered, debugging complex deployment issues collaboratively, and discussing the architectural decisions that make edge AI successful in production environments. The Discord community includes dedicated channels for different programming languages, specific deployment scenarios, and technical discussions about optimization and performance. Whether you're implementing your first local AI feature or optimizing a complex multi-model deployment, you'll find peers and experts ready to help problem-solve and share insights. Beyond technical support, the community provides valuable career and business insights. Members share their experiences with edge AI adoption in different industries, discuss the business cases that have proven most successful, and collaborate on open-source projects that advance the entire ecosystem. Share Your Experience and Build Expertise One of the most effective ways to solidify your edge AI expertise is by sharing your implementation experiences with the community. As you build applications with Foundry Local and deploy edge AI solutions, documenting your process and sharing your learnings provides value both to others and to your own professional development. Consider sharing your deployment stories, whether they're successes or challenges you've overcome. The community benefits from real-world case studies that show how edge AI performs in different environments and use cases. Your experience implementing local AI in a healthcare application, financial services system, or manufacturing environment provides valuable insights that others can build upon. Technical contributions are equally valuable, whether it's sharing configuration patterns you've discovered, performance optimizations you've implemented, or integration approaches you've developed for specific frameworks or libraries. The edge AI field is evolving rapidly, and practical contributions from working developers drive much of the innovation. Sharing your work also builds your professional reputation as an edge AI expert. As organizations increasingly adopt local AI deployment strategies, developers with proven experience in this area become valuable resources for their teams and the broader industry. The combination of structured learning through the Edge AI curriculum, active participation in the community, and sharing your practical experiences creates a comprehensive path to edge AI expertise that serves both your immediate project needs and your long-term career development as AI deployment patterns continue evolving. Key Takeaways Local LLM deployment transforms application economics: Replace variable API costs with fixed infrastructure investments that scale to unlimited usage, typically achieving ROI within 12-18 months for applications with significant AI workloads. Foundry Local enables multi-framework edge AI: Consistent deployment patterns across Python, JavaScript, C#, and Rust environments with automatic hardware optimization and OpenAI API compatibility. Performance improvements are dramatic and measurable: Sub-10ms response times replace 200-800ms cloud API latency, while automatic hardware acceleration delivers 2-50x performance improvements depending on available compute resources. Privacy and compliance become architectural advantages: Local deployment eliminates data sovereignty concerns, simplifies regulatory compliance, and provides complete control over sensitive information processing. Edge AI expertise is a strategic career investment: As organizations increasingly adopt local AI deployment, developers with hands-on edge AI experience become valuable technical resources with unique skills in an emerging field. Conclusion Edge AI deployment represents the next evolution in intelligent application development, transforming both the technical possibilities and economic models of AI-powered systems. With Foundry Local and the comprehensive Edge AI for Beginners curriculum, you have access to production-ready tools and expert guidance to make this transition successfully. The path forward is clear: start with the Edge AI for Beginners curriculum to build solid foundations, connect with the Foundry Local Discord community to learn from practicing developers, and begin implementing local AI solutions in your projects. Each step builds valuable expertise while delivering immediate improvements to your applications. As cloud costs continue rising and privacy requirements become more stringent, organizations will increasingly rely on developers who can implement local AI solutions effectively. Your early adoption of edge AI deployment patterns positions you at the forefront of this technological shift, with skills that will become increasingly valuable as the industry evolves. The future of AI deployment is local, private, and performance-optimized. Start building that future today. Resources Edge AI for Beginners Curriculum: Comprehensive training with 36-45 hours of hands-on content examples, and production-ready deployment patterns https://aka.ms/edgeai-for-beginners Foundry Local GitHub Repository: Official documentation, samples, and community contributions for local AI deployment https://github.com/microsoft/foundry_local Foundry Local Discord Community: Connect with AI Engineers and Developers implementing edge AI solutions worldwide https://aka.ms/foundry/discord Foundry Local Documentation: Complete technical documentation and API references Foundry Local documentation | Microsoft Learn Foundry Local Model Catalog: Browse available models and deployment options for different hardware configurations Foundry Local Models - Browse AI Models
- Deployment and Build from Azure Linux based Web AppTOC Introduction Deployment Sources From Laptop From CI/CD tools Build Source From Oryx Build From Runtime From Deployment Sources Walkthrough Laptop + Oryx Laptop + Runtime Laptop CI/CD concept Conclusion 1. Introduction Deployment on Azure Linux Web Apps can be done through several different methods. When a deployment issue occurs, the first step is usually to identify which method was used. The core of these methods revolves around the concept of Build, the process of preparing and loading the third-party dependencies required to run an application. For example, a Python app defines its build process as pip install packages, a Node.js app uses npm install modules, and PHP or Java apps rely on libraries. In this tutorial, I’ll use a simple Python app to demonstrate four different Deployment/Build approaches. Each method has its own use cases and limitations. You can even combine them, for example, using your laptop as the deployment tool while still using Oryx as the build engine. The same concepts apply to other runtimes such as Node.js, PHP, and beyond. 2. Deployment Sources From Laptop Scenarios: Setting up a proof of concept Developing in a local environment Advantages: Fast development cycle Minimal configuration required Limitations: Difficult for the local test environment to interact with cloud resources OS differences between local and cloud environments may cause integration issues From CI/CD tools Scenarios: Projects with established development and deployment workflows Codebases requiring version control and automation Advantages: Developers can focus purely on coding Automatic deployment upon branch commits Limitations: Build and runtime environments may still differ slightly at the OS level 3. Build Source From Oryx Build Scenarios: Offloading resource-intensive build tasks from your local or CI/CD environment directly to the Azure Web App platform, reducing local computing overhead. Advantages: Minimal extra configuration Multi-language support Limitations: Build performance is limited by the App Service SKU and may face performance bottlenecks The build environment may differ from the runtime environment, so apps sensitive to minor package versions should take caution From Runtime Scenarios: When you want the benefits and pricing of a PaaS solution but need control similar to an IaaS setup Advantages: Build occurs in the runtime environment itself Allows greater flexibility for low-level system operations Limitations: Certain system-level settings (e.g., NTP time sync) remain inaccessible From Deployment Sources Scenarios: Pre-package all dependencies and deploy them together, eliminating the need for a separate build step. Advantages: Supports proprietary or closed-source company packages Limitations: Incompatibility may arise if the development and runtime environments differ significantly in OS or package support Type Method Scenario Advantage Limitation Deployment From Laptop POC / Dev Fast setup Poor cloud link Deployment From CI/CD Auto pipeline Focus on code OS mismatch Build From Oryx Platform build Simple, multi-lang Performance cap Build From Runtime High control Flexible ops Limited access Build From Deployment Pre-built deploy Use private pkg Env mismatch 4. Walkthrough Laptop + Oryx Add Environment Variables SCM_DO_BUILD_DURING_DEPLOYMENT=false (Purpose: prevents the deployment environment from packaging during publish; this must also be set in the deployment environment itself.) WEBSITE_RUN_FROM_PACKAGE=false (Purpose: tells Azure Web App not to run the app from a prepackaged file.) ENABLE_ORYX_BUILD=true (Purpose: allows the Azure Web App platform to handle the build process automatically after a deployment event.) Add startup command bash /home/site/wwwroot/run.sh (The run.sh file corresponds to the script in your project code.) Check sample code requirements.txt — defines Python packages (similar to package.json in Node.js). Flask==3.0.3 gunicorn==23.0.0 app.py — main Python application code. from flask import Flask app = Flask(__name__) @app.route("/") def home(): return "Deploy from Laptop + Oryx" if __name__ == "__main__": import os app.run(host="0.0.0.0", port=8000) run.sh — script used to start the application. #!/bin/bash gunicorn --bind=0.0.0.0:8000 app:app .deployment — VS Code deployment configuration file. [config] SCM_DO_BUILD_DURING_DEPLOYMENT=false Deployment Once both the deployment and build processes complete successfully, you should see the expected result. Laptop + Runtime Add Environment Variables (Screenshots omitted since the process is similar to previous steps) SCM_DO_BUILD_DURING_DEPLOYMENT=false Purpose: Prevents the deployment environment from packaging during the publishing process. This setting must also be added in the deployment environment itself. WEBSITE_RUN_FROM_PACKAGE=false Purpose: Instructs Azure Web App not to run the application from a prepackaged file. ENABLE_ORYX_BUILD=false Purpose: Ensures that Azure Web App does not perform any build after deployment; all build tasks will instead be handled during the startup script execution. Add Startup Command (Screenshots omitted since the process is similar to previous steps) bash /home/site/wwwroot/run.sh (The run.sh file corresponds to the script of the same name in your project code.) Check Sample Code (Screenshots omitted since the process is similar to previous steps) requirements.txt: Defines Python packages (similar to package.json in Node.js). Flask==3.0.3 gunicorn==23.0.0 app.py: The main Python application code. from flask import Flask app = Flask(__name__) @app.route("/") def home(): return "Deploy from Laptop + Runtime" if __name__ == "__main__": import os app.run(host="0.0.0.0", port=8000) run.sh: Startup script. In addition to launching the app, it also creates a virtual environment and installs dependencies, all build-related tasks happen here. #!/bin/bash python -m venv venv source venv/bin/activate pip install -r requirements.txt gunicorn --bind=0.0.0.0:8000 app:app .deployment: VS Code deployment configuration file. [config] SCM_DO_BUILD_DURING_DEPLOYMENT=false Deployment (Screenshots omitted since the process is similar to previous steps) Once both deployment and build are completed, you should see the expected output. Laptop Add Environment Variables (Screenshots omitted as the process is similar to previous steps) SCM_DO_BUILD_DURING_DEPLOYMENT=false Purpose: Prevents the deployment environment from packaging during publish. This must also be set in the deployment environment itself. WEBSITE_RUN_FROM_PACKAGE=false Purpose: Instructs Azure Web App not to run the app from a prepackaged file. ENABLE_ORYX_BUILD=false Purpose: Prevents Azure Web App from building after deployment. All build tasks will instead execute during the startup script. Add Startup Command (Screenshots omitted as the process is similar to previous steps) bash /home/site/wwwroot/run.sh (The run.sh corresponds to the same-named file in your project code.) Check Sample Code (Screenshots omitted as the process is similar to previous steps) requirements.txt: Defines Python packages (like package.json in Node.js). Flask==3.0.3 gunicorn==23.0.0 app.py: The main Python application. from flask import Flask app = Flask(__name__) @app.route("/") def home(): return "Deploy from Laptop" if __name__ == "__main__": import os app.run(host="0.0.0.0", port=8000) run.sh: The startup script. In addition to launching the app, it activates an existing virtual environment. The creation of that environment and installation of dependencies will occur in the next section. #!/bin/bash source venv/bin/activate gunicorn --bind=0.0.0.0:8000 app:app .deployment: VS Code deployment configuration file. [config] SCM_DO_BUILD_DURING_DEPLOYMENT=false Deployment Before deployment, you must perform a local build process. Run commands locally (depending on the language, usually for installing dependencies). python -m venv venv source venv/bin/activate pip install -r requirements.txt After completing the local build, deploy your app. Once deployment finishes, you should see the expected result. CI/CD concept For example, when using Azure DevOps (ADO) as your CI/CD tool, its behavior conceptually mirrors deploying directly from a laptop, but with enhanced automation, governance, and reproducibility. Essentially, ADO pipelines translate your manual local deployment steps into codified, repeatable workflows defined in a YAML pipeline file, executed by Microsoft-hosted or self-hosted agents. A typical azure-pipelines.yml defines the stages (e.g., build, deploy) and their corresponding jobs and steps. Each stage runs on a specified VM image (e.g., ubuntu-latest) and executes commands, the same npm install, pip install which you would normally run on your laptop. The ADO pipeline acts as your automated laptop, every build command, environment variable, and deployment step you’d normally execute locally is just formalized in YAML. Whether you build inline, use Oryx, or deploy pre-built artifacts, the underlying concept remains identical: compile, package, and deliver code to Azure. The distinction lies in who performs it. 5. Conclusion Different deployment and build methods lead to different debugging and troubleshooting approaches. Therefore, understanding the selected deployment method and its corresponding troubleshooting process is an essential skill for every developer and DevOps engineer.356Views0likes0Comments
- The importance of streaming for LLM-powered chat applicationsThanks to the popularity of chat-based interfaces like ChatGPT and GitHub Copilot, users have grown accustomed to getting answers conversationally. As a result, thousands of developers are now deploying chat applications on Azure for their own specialized domains. To help developers understand how to build LLM-powered chat apps, we have open-sourced many chat app templates, like a super simple chat app and the very popular and sophisticated RAG chat app. All our templates support an important feature: streaming. At first glance, streaming might not seem essential. But users have come to expect it from modern chat experiences. Beyond meeting expectations, streaming can dramatically improve the time to first token — letting your frontend display words as soon as they’re generated, instead of making users wait seconds for a complete answer. How to stream from the APIs Most modern LLM APIs and wrapper libraries now support streaming responses — usually through a simple boolean flag or a dedicated streaming method. Let’s look at an example using the official OpenAI Python SDK. The openai package makes it easy to stream responses by passing a stream=True argument: completion_stream = openai_client.chat.completions.create( model="gpt-5-mini", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What does a product manager do?"}, ], stream=True, ) When stream is true, the return type is an iterable, so we can use a for loop to process each of the ChatCompletion chunk objects: for chunk in await completion_stream: content = event.choices[0].delta.content Sending stream from backend to frontend When building a web app, we need a way to stream data from the backend to the browser. A normal HTTP response won’t work here — it sends all the data at once, then closes the connection. Instead, we need a protocol that allows data to arrive progressively. The most common options are: WebSockets: A bidirectional channel where both client and server can send data at any time. Server-sent events: A one-way channel where the server continuously pushes events to the client over HTTP. Readable streams: An HTTP response with a Transfer-encoding header of "chunked", allowing the client to process chunks as they arrive. All of these could potentially be used for a chat app, and I myself have experimented with both server-sent events and readable streams. Behind the scenes, the ChatGPT API actually uses server-sent events, so you'll find code in the openai package for parsing that protocol. However, I now prefer using readable streams for my frontend to backend communication. It's the simplest code setup on both the frontend and backend, and it supports the POST requests that our apps are already sending. The key is to send chunks from the backend in NDJSON (newline-delimited JSON) format and parse them incrementally on the frontend. See my blog post on fetching JSON over streaming HTTP for Python and JavaScript example code. Achieving a word-by-word effect With all of that in place, we now have a frontend that reveals the model’s answer gradually — almost like watching it type in real time. But something still feels off! Despite our frontend receiving chunks of just a few tokens at a time, that UI tends to reveal entire sentences at once. Why does that happen? It turns out the browser is batching repaints. Instead of immediately re-rendering after each DOM update, it waits until it’s more efficient to repaint — a smart optimization in most cases, but not ideal for a streaming text effect. My colleague Steve Steiner explored several techniques to make the browser repaint more frequently. The most effective approach uses window.setTimeout() with a delay of 33 milliseconds for each chunk. While this adds a small overall delay, it stays well within a natural reading pace and produces a smooth, word-by-word reveal. See his PR for implementation details for a React codebase. With that change, our frontend now displays responses at the same granularity as the chat completions API itself — chunk by chunk: Streaming more of the process Many of our sample apps use RAG (Retrieval-Augmented Generation) pipelines that chain together multiple operations — querying data stores (like Azure AI Search), generating embeddings, and finally calling the chat completions API. Naturally, that chain takes longer than a single LLM call, so users may wait several seconds before seeing a response. One way to improve the experience is to stream more of the process itself. Instead of holding back everything until the final answer, the backend can emit progress updates as each step completes — keeping users informed and engaged. For example, your app might display messages like this sequence: Processing your question: "Can you suggest a pizza recipe that incorporates both mushroom and pineapples?" Generated search query "pineapple mushroom pizza recipes" Found three related results from our cookbooks: 1) Mushroom calzone 2) Pineapple ham pizza 3) Mushroom loaf Generating answer to your question... Sure! Here's a recipe for a mushroom pineapple pizza... Adding streamed progress like this makes your app feel responsive and alive, even while the backend is doing complex work. Consider experimenting with progress events in your own chat apps — a few simple updates can greatly improve user trust and engagement. Making it optional After all this talk about streaming, here’s one final recommendation: make streaming optional. Provide a setting in your frontend to disable streaming, and a corresponding non-streaming endpoint in your backend. This flexibility helps both your users and your developers: For users: Some may prefer (or require) a non-streamed experience for accessibility reasons, or simply to receive the full response at once. For developers: There are times when you’ll want to interact with the app programmatically — for example, using curl, requests, or automated tests — and a standard, non-streaming HTTP endpoint makes that much easier. Designing your app to gracefully support both modes ensures it’s inclusive, debuggable, and production-ready. Sample applications We’ve already linked to several of our sample apps that support streaming, but here’s a complete list so you can explore the one that best fits your tech stack: Repository App purpose Backend Frontend azure-search-openai-demo RAG with AI Search Python + Quart React rag-postgres-openai-python RAG with PostgreSQL Python + FastAPI React openai-chat-app-quickstart Simple chat with Azure OpenAI models Python + Quart plain JS openai-chat-backend-fastapi Simple chat with Azure OpenAI models Python + FastAPI plain JS deepseek-python Simple chat with Azure AI Foundry models Python + Quart plain JS Each of these repositories includes streaming support out of the box, so you can inspect real implementation details in both the frontend and backend. They’re a great starting point for learning how to structure your own LLM chat application — or for extending one of the samples to match your specific use case.
- Study Buddy: Learning Data Science and Machine Learning with an AI SidekickIf you've ever wished for a friendly companion to guide you through the world of data science and machine learning, you're not alone. As part of the "For Beginners" curriculum, I recently built a Study Buddy Agent, an AI-powered assistant designed to help learners explore data science interactively, intuitively, and joyfully. Why a Study Buddy? Learning something new can be overwhelming, especially when you're navigating complex topics like machine learning, statistics, or Python programming. The Study Buddy Agent is here to change that. It brings the curriculum to life by answering questions, offering explanations, and nudging learners toward deeper understanding, all in a conversational format. Think of it as your AI-powered lab partner: always available, never judgmental, and endlessly curious. Built with chatmodes, Powered by Purpose The agent lives inside a .chatmodes file in the https://github.com/microsoft/Data-Science-For-Beginners/blob/main/.github/chatmodes/study-mode.chatmode.md. This file defines how the agent behaves, what tone it uses, and how it interacts with learners. I designed it to be friendly, encouraging, and beginner-first—just like the curriculum itself. It’s not just about answering questions. The Study Buddy is trained to: Reinforce key concepts from the curriculum Offer hints and nudges when learners get stuck Encourage exploration and experimentation Celebrate progress and milestones What’s Under the Hood? The agent uses GitHub Copilot's chatmode, which allows developers to define custom behaviors for AI agents. By aligning the agent’s responses with the curriculum’s learning objectives, we ensure that learners stay on track while enjoying the flexibility of conversational learning. How You Can Use It YouTube Video here: Study Buddy - Data Science AI Sidekick Clone the repo: Head to the https://github.com/microsoft/Data-Science-For-Beginners and clone it locally or use Codespaces. Open the GitHub Copilot Chat, and select Study Buddy: This will activate the Study Buddy. Start chatting: Ask questions, explore topics, and let the agent guide you. What’s Next? This is just the beginning. I’m exploring ways to: Expand the agent to other beginner curriculums (Web Dev, AI, IoT) Integrate feedback loops so learners can shape the agent’s evolution Final Thoughts In my role, I believe learning should be inclusive, empowering, and fun. The Study Buddy Agent is a small step toward that vision, a way to make data science feel less like a mountain and more like a hike with a good friend. Try it out, share your feedback, and let’s keep building tools that make learning magical. Join us on Discord to share your feedback.
- AI Career Navigator — Empowering Job Seekers with Azure OpenAIAI Career Navigator is more than just a project — it’s a mission to make career growth accessible, intelligent, and human. Powered by Azure OpenAI, it transforms uncertainty into direction and effort into achievement. Author: Aryan Jaiswal — Gold Microsoft Learn Student Ambassador Reviewer: Julia Muiruri (Microsoft)204Views1like0Comments