ollama
19 TopicsFrom Cloud to Chip: Building Smarter AI at the Edge with Windows AI PCs
As AI engineers, we’ve spent years optimizing models for the cloud, scaling inference, wrangling latency, and chasing compute across clusters. But the frontier is shifting. With the rise of Windows AI PCs and powerful local accelerators, the edge is no longer a constraint it’s now a canvas. Whether you're deploying vision models to industrial cameras, optimizing speech interfaces for offline assistants, or building privacy-preserving apps for healthcare, Edge AI is where real-world intelligence meets real-time performance. Why Edge AI, Why Now? Edge AI isn’t just about running models locally, it’s about rethinking the entire lifecycle: - Latency: Decisions in milliseconds, not round-trips to the cloud. - Privacy: Sensitive data stays on-device, enabling HIPAA/GDPR compliance. - Resilience: Offline-first apps that don’t break when the network does. - Cost: Reduced cloud compute and bandwidth overhead. With Windows AI PCs powered by Intel and Qualcomm NPUs and tools like ONNX Runtime, DirectML, and Olive, developers can now optimize and deploy models with unprecedented efficiency. What You’ll Learn in Edge AI for Beginners The Edge AI for Beginners curriculum is a hands-on, open-source guide designed for engineers ready to move from theory to deployment. Multi-Language Support This content is available in over 48 languages, so you can read and study in your native language. What You'll Master This course takes you from fundamental concepts to production-ready implementations, covering: Small Language Models (SLMs) optimized for edge deployment Hardware-aware optimization across diverse platforms Real-time inference with privacy-preserving capabilities Production deployment strategies for enterprise applications Why EdgeAI Matters Edge AI represents a paradigm shift that addresses critical modern challenges: Privacy & Security: Process sensitive data locally without cloud exposure Real-time Performance: Eliminate network latency for time-critical applications Cost Efficiency: Reduce bandwidth and cloud computing expenses Resilient Operations: Maintain functionality during network outages Regulatory Compliance: Meet data sovereignty requirements Edge AI Edge AI refers to running AI algorithms and language models locally on hardware, close to where data is generated without relying on cloud resources for inference. It reduces latency, enhances privacy, and enables real-time decision-making. Core Principles: On-device inference: AI models run on edge devices (phones, routers, microcontrollers, industrial PCs) Offline capability: Functions without persistent internet connectivity Low latency: Immediate responses suited for real-time systems Data sovereignty: Keeps sensitive data local, improving security and compliance Small Language Models (SLMs) SLMs like Phi-4, Mistral-7B, Qwen and Gemma are optimized versions of larger LLMs, trained or distilled for: Reduced memory footprint: Efficient use of limited edge device memory Lower compute demand: Optimized for CPU and edge GPU performance Faster startup times: Quick initialization for responsive applications They unlock powerful NLP capabilities while meeting the constraints of: Embedded systems: IoT devices and industrial controllers Mobile devices: Smartphones and tablets with offline capabilities IoT Devices: Sensors and smart devices with limited resources Edge servers: Local processing units with limited GPU resources Personal Computers: Desktop and laptop deployment scenarios Course Modules & Navigation Course duration. 10 hours of content Module Topic Focus Area Key Content Level Duration 📖 00 Introduction to EdgeAI Foundation & Context EdgeAI Overview • Industry Applications • SLM Introduction • Learning Objectives Beginner 1-2 hrs 📚 01 EdgeAI Fundamentals Cloud vs Edge AI comparison EdgeAI Fundamentals • Real World Case Studies • Implementation Guide • Edge Deployment Beginner 3-4 hrs 🧠 02 SLM Model Foundations Model families & architecture Phi Family • Qwen Family • Gemma Family • BitNET • μModel • Phi-Silica Beginner 4-5 hrs 🚀 03 SLM Deployment Practice Local & cloud deployment Advanced Learning • Local Environment • Cloud Deployment Intermediate 4-5 hrs ⚙️ 04 Model Optimization Toolkit Cross-platform optimization Introduction • Llama.cpp • Microsoft Olive • OpenVINO • Apple MLX • Workflow Synthesis Intermediate 5-6 hrs 🔧 05 SLMOps Production Production operations SLMOps Introduction • Model Distillation • Fine-tuning • Production Deployment Advanced 5-6 hrs 🤖 06 AI Agents & Function Calling Agent frameworks & MCP Agent Introduction • Function Calling • Model Context Protocol Advanced 4-5 hrs 💻 07 Platform Implementation Cross-platform samples AI Toolkit • Foundry Local • Windows Development Advanced 3-4 hrs 🏭 08 Foundry Local Toolkit Production-ready samples Sample applications (see details below) Expert 8-10 hrs Each module includes Jupyter notebooks, code samples, and deployment walkthroughs, perfect for engineers who learn by doing. Developer Highlights - 🔧 Olive: Microsoft's optimization toolchain for quantization, pruning, and acceleration. - 🧩 ONNX Runtime: Cross-platform inference engine with support for CPU, GPU, and NPU. - 🎮 DirectML: GPU-accelerated ML API for Windows, ideal for gaming and real-time apps. - 🖥️ Windows AI PCs: Devices with built-in NPUs for low-power, high-performance inference. Local AI: Beyond the Edge Local AI isn’t just about inference, it’s about autonomy. Imagine agents that: - Learn from local context - Adapt to user behavior - Respect privacy by design With tools like Agent Framework, Azure AI Foundry and Windows Copilot Studio, and Foundry Local developers can orchestrate local agents that blend LLMs, sensors, and user preferences, all without cloud dependency. Try It Yourself Ready to get started? Clone the Edge AI for Beginners GitHub repo, run the notebooks, and deploy your first model to a Windows AI PC or IoT devices Whether you're building smart kiosks, offline assistants, or industrial monitors, this curriculum gives you the scaffolding to go from prototype to production.¡Curso oficial y gratuito de GenAI y Python! 🚀
¿Quieres aprender a usar modelos de IA generativa en tus aplicaciones de Python?Estamos organizando una serie de nueve transmisiones en vivo, en inglés y español, totalmente dedicadas a la IA generativa. Vamos a cubrir modelos de lenguaje (LLMs), modelos de embeddings, modelos de visión, y también técnicas como RAG, function calling y structured outputs. Además, te mostraremos cómo construir Agentes y servidores MCP, y hablaremos sobre seguridad en IA y evaluaciones, para asegurarnos de que tus modelos y aplicaciones generen resultados seguros. 🔗 Regístrate para toda la serie. Además de las transmisiones en vivo, puedes unirte a nuestras office hours semanales en el AI Foundry Discord de para hacer preguntas que no se respondan durante el chat. ¡Nos vemos en los streams! 👋🏻 Here’s your HTML converted into clean, readable text format (perfect for a newsletter, blog post, or social media caption): Modelos de Lenguaje 📅 7 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor ¡Únete a la primera sesión de nuestra serie de Python + IA! En esta sesión, hablaremos sobre los Modelos de Lenguaje (LLMs), los modelos que impulsan ChatGPT y GitHub Copilot. Usaremos Python para interactuar con LLMs utilizando paquetes como el SDK de OpenAI y Langchain. Experimentaremos con prompt engineering y ejemplos few-shot para mejorar los resultados. También construiremos una aplicación full stack impulsada por LLMs y explicaremos la importancia de la concurrencia y el streaming en apps de IA orientadas al usuario. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Embeddings Vectoriales 📅 8 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la segunda sesión de Python + IA, exploraremos los embeddings vectoriales, una forma de codificar texto o imágenes como arrays de números decimales. Estos modelos permiten realizar búsquedas por similitud en distintos tipos de contenido. Usaremos modelos como la serie text-embedding-3 de OpenAI, visualizaremos resultados en Python y compararemos métricas de distancia. También veremos cómo aplicar cuantización y cómo usar modelos multimodales de embedding. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Recuperación-Aumentada Generación (RAG) 📅 9 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la tercera sesión, exploraremos RAG, una técnica que envía contexto al LLM para obtener respuestas más precisas dentro de un dominio específico. Usaremos distintas fuentes de datos —CSVs, páginas web, documentos, bases de datos— y construiremos una app RAG full-stack con Azure AI Search. Modelos de Visión 📅 14 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor ¡La cuarta sesión trata sobre modelos de visión como GPT-4o y 4o-mini! Estos modelos pueden procesar texto e imágenes, generando descripciones, extrayendo datos, respondiendo preguntas o clasificando contenido. Usaremos Python para enviar imágenes a los modelos, crear una app de chat con imágenes e integrarlos en flujos RAG. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Salidas Estructuradas 📅 15 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la quinta sesión aprenderemos a hacer que los LLMs generen respuestas estructuradas según un esquema. Exploraremos el modo structured outputs de OpenAI y cómo aplicarlo para extracción de entidades, clasificación y flujos con agentes. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Calidad y Seguridad 📅 16 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la sexta sesión hablaremos sobre cómo usar IA de manera segura y evaluar la calidad de las salidas. Mostraremos cómo configurar Azure AI Content Safety, manejar errores en código Python y evaluar resultados con el SDK de Evaluación de Azure AI. Tool Calling 📅 21 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la última semana de la serie, nos enfocamos en tool calling (function calling), la base para construir agentes de IA. Aprenderemos a definir herramientas en Python o JSON, manejar respuestas de los modelos y habilitar llamadas paralelas y múltiples iteraciones. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Agentes de IA 📅 22 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor ¡En la penúltima sesión construiremos agentes de IA! Usaremos frameworks como Langgraph, Semantic Kernel, Autogen, y Pydantic AI. Empezaremos con ejemplos simples y avanzaremos a arquitecturas más complejas como round-robin, supervisor, graphs y ReAct. Model Context Protocol (MCP) 📅 23 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor Cerramos la serie con Model Context Protocol (MCP), la tecnología abierta más candente de 2025. Aprenderás a usar FastMCP para crear un servidor MCP local y conectarlo a chatbots como GitHub Copilot. También veremos cómo integrar MCP con frameworks de agentes como Langgraph, Semantic Kernel y Pydantic AI. Y, por supuesto, hablaremos sobre los riesgos de seguridad y las mejores prácticas para desarrolladores. ¿Querés que lo reformatee para publicación en Markdown (para blogs o repos) o en texto plano con emojis y separadores estilo redes sociales?Using DeepSeek-R1 on Azure with JavaScript
The pace at which innovative AI models are being developed is outstanding! DeepSeek-R1 is one such model that focuses on complex reasoning tasks, providing a powerful tool for developers to build intelligent applications. The week, we announced its availability on GitHub Models as well as on Azure AI Foundry. In this article, we’ll take a look at how you can deploy and use the DeepSeek-R1 models in your JavaScript applications. TL;DR key takeaways DeepSeek-R1 models focus on complex reasoning tasks, and is not designed for general conversation You can quickly switch your configuration to use Azure AI, GitHub Models, or even local models with Ollama. You can use OpenAI Node SDK or LangChain.js to interact with DeepSeek models. What you'll learn here Deploying DeepSeek-R1 model on Azure. Switching between Azure, GitHub Models, or local (Ollama) usage. Code patterns to start using DeepSeek-R1 with various libraries in TypeScript. Reference links DeepSeek on Azure - JavaScript demos repository Azure AI Foundry OpenAI Node SDK LangChain.js Ollama Requirements GitHub account. If you don't have one, you can create a free GitHub account. You can optionally use GitHub Copilot Free to help you write code and ship your application even faster. Azure account. If you're new to Azure, get an Azure account for free to get free Azure credits to get started. If you're a student, you can also get free credits with Azure for Students. Getting started We'll use GitHub Codespaces to get started quickly, as it provides a preconfigured Node.js environment for you. Alternatively, you can set up a local environment using the instructions found in the GitHub repository. Click on the button below to open our sample repository in a web-based VS Code, directly in your browser: Once the project is open, wait a bit to ensure everything has loaded correctly. Open a terminal and run the following command to install the dependencies: npm install Running the samples The repository contains several TypeScript files under the samples directory that demonstrate how to interact with DeepSeek-R1 models. You can run a sample using the following command: npx tsx samples/<sample>.ts For example, let's start with the first one: npx tsx samples/01-chat.ts Wait a bit, and you should see the response from the model in your terminal. You'll notice that it may take longer than usual to respond, and see a weird response that starts with a <think> tag. This is because DeepSeek-R1 is designed to be used for task that need complex reasoning, like solving problems or answering math questions, and not for you usual chat interactions. Model configuration By default, the repository is configured to use GitHub Models, so you can run any example using Codespaces without any additional setup. While it's great for quick experimentation, GitHub models limit the number of requests you can make in a day and the amount of data you can send in a single request. If you want to use the model more extensively, you can switch to Azure AI or even use a local model with Ollama. You can take a look at the samples/config.ts to see how the different configurations are set up. We'll not cover using Ollama models in this article, but you can find more information in the repository documentation. Deploying DeepSeek-R1 on Azure To experiment with the full capabilities of DeepSeek-R1, you can deploy it on Azure AI Foundry. Azure AI Foundry is a platform that allows you to deploy, manage and develop with AI models quickly. To use Azure AI Foundry, you need to have an Azure account. Let's start by deploying the model on Azure AI Foundry. First, follow this tutorial to deploy a serverless endpoint with the model. When it's time to choose the model, make sure to select the DeepSeek-R1 model in the catalog. Once your endpoint is deployed, you should be able to see your endpoint details and retrieve the URL and API key: Screenshot showing the endpoint details in Azure AI Foundry Then create a .env file in the root of the project and add the following content: AZURE_AI_BASE_URL="https://<your-deployment-name>.<region>.models.ai.azure.com/v1" AZURE_AI_API_KEY="<your-api-key>" Tip: if you're copying the endpoint from the Azure AI Foundry portal, make sure to add the /v1 at the end of the URL. Open the samples/config.ts file and update the default export to use Azure: export default AZURE_AI_CONFIG; Now all samples will use the Azure configuration. Explore reasoning with DeepSeek-R1 Now that you have the model deployed, you can start experimenting with it. Open the samples/08-reasoning.ts file to see how the model handles more complex tasks, like helping us understand a well-known weird piece of code. const prompt = ` float fast_inv_sqrt(float number) { long i; float x2, y; const float threehalfs = 1.5F; x2 = number * 0.5F; y = number; i = *(long*)&y; i = 0x5f3759df - ( i >> 1 ); y = *(float*)&i; y = y * ( threehalfs - ( x2 * y * y ) ); return y; } What is this code doing? Explain me the magic behind it. `; Now run this sample with the command: npx tsx samples/08-reasoning.ts You should see the model's response streaming piece by piece in the terminal, while describing its thought process before providing the actual answer to our question. Screenshot showing the model's response streaming in the terminal Brace yourself, as it might take a while to get the full response! At the end of the process, you should see the model's detailed explanation of the code, along with some context around it. Leveraging frameworks Most examples in this repository are built with the OpenAI Node SDK, but you can also use LangChain.js to interact with the model. This might be especially interested if you need to integrate other sources of data or want to build a more complex application. Open the file samples/07-langchain.ts to have a look at the setup, and see how you can reuse the same configuration we used with the OpenAI SDK. Going further Now it's your turn to experiment and discover the full potential of DeepSeek-R1! You can try more advanced prompts, integrate it into your larger application, or even build agents to make the most out of the model. To continue your learning journey, you can check out the following resources: Generative AI with JavaScript (GitHub): code samples and resources to learn Generative AI with JavaScript. Build a serverless AI chat with RAG using LangChain.js (GitHub): a next step code example to build an AI chatbot using Retrieval-Augmented Generation and LangChain.js.Step-by-step: Integrate Ollama Web UI to use Azure Open AI API with LiteLLM Proxy
Introductions Ollama WebUI is a streamlined interface for deploying and interacting with open-source large language models (LLMs) like Llama 3 and Mistral, enabling users to manage models, test them via a ChatGPT-like chat environment, and integrate them into applications through Ollama’s local API. While it excels for self-hosted models on platforms like Azure VMs, it does not natively support Azure OpenAI API endpoints—OpenAI’s proprietary models (e.g., GPT-4) remain accessible only through OpenAI’s managed API. However, tools like LiteLLM bridge this gap, allowing developers to combine Ollama-hosted models with OpenAI’s API in hybrid workflows, while maintaining compliance and cost-efficiency. This setup empowers users to leverage both self-managed open-source models and cloud-based AI services. Problem Statement As of February 2025, Ollama WebUI, still do not support Azure Open AI API. The Ollama Web UI only support self-hosted Ollama API and managed OpenAI API service (PaaS). This will be an issue if users want to use Open AI models they already deployed on Azure AI Foundry. Objective To integrate Azure OpenAI API via LiteLLM proxy into with Ollama Web UI. LiteLLM translates Azure AI API requests into OpenAI-style requests on Ollama Web UI allowing users to use OpenAI models deployed on Azure AI Foundry. If you haven’t hosted Ollama WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Ollama WebUI deployed already. Step 1: Deploy OpenAI models on Azure Foundry. If you haven’t created an Azure AI Hub already, search for Azure AI Foundry on Azure, and click on the “+ Create” button > Hub. Fill out all the empty fields with the appropriate configuration and click on “Create”. After the Azure AI Hub is successfully deployed, click on the deployed resources and launch the Azure AI Foundry service. To deploy new models on Azure AI Foundry, find the “Models + Endpoints” section on the left hand side and click on “+ Deploy Model” button > “Deploy base model” A popup will appear, and you can choose which models to deploy on Azure AI Foundry. Please note that the o-series models are only available to select customers at the moment. You can request access to the o-series models by completing this request access form, and wait until Microsoft approves the access request. Click on “Confirm” and another popup will emerge. Now name the deployment and click on “Deploy” to deploy the model. Wait a few moments for the model to deploy. Once it successfully deployed, please save the “Target URI” and the API Key. Step 2: Deploy LiteLLM Proxy via Docker Container Before pulling the LiteLLM Image into the host environment, create a file named “litellm_config.yaml” and list down the models you deployed on Azure AI Foundry, along with the API endpoints and keys. Replace "API_Endpoint" and "API_Key" with “Target URI” and “Key” found from Azure AI Foundry respectively. Template for the “litellm_config.yaml” file. model_list: - model_name: [model_name] litellm_params: model: azure/[model_name_on_azure] api_base: "[API_ENDPOINT/Target_URI]" api_key: "[API_Key]" api_version: "[API_Version]" Tips: You can find the API version info at the end of the Target URI of the model's endpoint: Sample Endpoint - https://example.openai.azure.com/openai/deployments/o1-mini/chat/completions?api-version=2024-08-01-preview Run the docker command below to start LiteLLM Proxy with the correct settings: docker run -d \ -v $(pwd)/litellm_config.yaml:/app/config.yaml \ -p 4000:4000 \ --name litellm-proxy-v1 \ --restart always \ ghcr.io/berriai/litellm:main-latest \ --config /app/config.yaml --detailed_debug Make sure to run the docker command inside the directory where you created the “litellm_config.yaml” file just now. The port used to listen for LiteLLM Proxy traffic is port 4000. Now that LiteLLM proxy had been deployed on port 4000, lets change the OpenAI API settings on Ollama WebUI. Navigate to Ollama WebUI’s Admin Panel settings > Settings > Connections > Under the OpenAI API section, write http://127.0.0.1:4000 as the API endpoint and set any key (You must write anything to make it work!). Click on “Save” button to reflect the changes. Refresh the browser and you should be able to see the AI models deployed on the Azure AI Foundry listed in the Ollama WebUI. Now let’s test the chat completion + Web Search capability using the "o1-mini" model on Ollama WebUI. Conclusion Hosting Ollama WebUI on an Azure VM and integrating it with OpenAI’s API via LiteLLM offers a powerful, flexible approach to AI deployment, combining the cost-efficiency of open-source models with the advanced capabilities of managed cloud services. While Ollama itself doesn’t support Azure OpenAI endpoints, the hybrid architecture empowers IT teams to balance data privacy (via self-hosted models on Azure AI Foundry) and cutting-edge performance (using Azure OpenAI API), all within Azure’s scalable ecosystem. This guide covers every step required to deploy your OpenAI models on Azure AI Foundry, set up the required resources, deploy LiteLLM Proxy on your host machine and configure Ollama WebUI to support Azure AI endpoints. You can test and improve your AI model even more with the Ollama WebUI interface with Web Search, Text-to-Image Generation, etc. all in one place.9.5KViews1like4CommentsBuild AI Agents with MCP Tool Use in Minutes with AI Toolkit for VSCode
We’re excited to announce Agent Builder, the newest evolution of what was formerly known as Prompt Builder, now reimagined and supercharged for intelligent app development. This powerful tool in AI Toolkit enables you to create, iterate, and optimize agents—from prompt engineering to tool integration—all in one seamless workflow. Whether you're designing simple chat interactions or complex task-performing agents with tool access, Agent Builder simplifies the journey from idea to integration. Why Agent Builder? Agent Builder is designed to empower developers and prompt engineers to: 🚀 Generate starter prompts with natural language 🔁 Iterate and refine prompts based on model responses 🧩 Break down tasks with prompt chaining and structured outputs 🧪 Test integrations with real-time runs and tool use such as MCP servers 💻 Generate production-ready code for rapid app development And a lot of features are coming soon, stay tuned for: 📝 Use variables in prompts �� Run agent with test cases to test your agent easily 📊 Evaluate the accuracy and performance of your agent with built-in or your custom metrics ☁️ Deploy your agent to cloud Build Smart Agents with Tool Use (MCP Servers) Agents can now connect to external tools through MCP (Model Control Protocol) servers, enabling them to perform real-world actions like querying a database, accessing APIs, or executing custom logic. Connect to an Existing MCP Server To use an existing MCP server in Agent Builder: In the Tools section, select + MCP Server. Choose a connection type: Command (stdio) – run a local command that implements the MCP protocol HTTP (server-sent events) – connect to a remote server implementing the MCP protocol If the MCP server supports multiple tools, select the specific tool you want to use. Enter your prompts and click Run to test the agent's interaction with the tool. This integration allows your agents to fetch live data or trigger custom backend services as part of the conversation flow. Build and Scaffold a New MCP Server Want to create your own tool? Agent Builder helps you scaffold a new MCP server project: In the Tools section, select + MCP Server. Choose MCP server project. Select your preferred programming language: Python or TypeScript. Pick a folder to create your server project. Name your project and click Create. Agent Builder generates a scaffolded implementation of the MCP protocol that you can extend. Use the built-in VS Code debugger: Press F5 or click Debug in Agent Builder Test with prompts like: System: You are a weather forecast professional that can tell weather information based on given location. User: What is the weather in Shanghai? Agent Builder will automatically connect to your running server and show the response, making it easy to test and refine the tool-agent interaction. AI Sparks from Prototype to Production with AI Toolkit Building AI-powered applications from scratch or infusing intelligence into existing systems? AI Sparks is your go-to webinar series for mastering the AI Toolkit (AITK) from foundational concepts to cutting-edge techniques. In this bi-weekly, hands-on series, we’ll cover: 🚀SLMs & Local Models – Test and deploy AI models and applications efficiently on your own terms locally, to edge devices or to the cloud 🔍 Embedding Models & RAG – Supercharge retrieval for smarter applications using existing data. 🎨 Multi-Modal AI – Work with images, text, and beyond. 🤖 Agentic Frameworks – Build autonomous, decision-making AI systems. Watch on Demand Share your feedback Get started with the latest version, share your feedback, and let us know how these new features help you in your AI development journey. As always, we’re here to listen, collaborate, and grow alongside our amazing user community. Thank you for being a part of this journey—let’s build the future of AI together! Join our Microsoft Azure AI Foundry Discord channel to continue the discussion 🚀The Startup Stage: Powered by Microsoft for Startups at European AI & Cloud Summit
🚀 The Startup Stage: Powered by Microsoft for Startups Take center stage in the AI and Cloud Startup Program, designed to showcase groundbreaking solutions and foster collaboration between ambitious startups and influential industry leaders. Whether you're looking to engage with potential investors, connect with clients, or share your boldest ideas, this is the platform to shine. Why Join the Startup Stage? Pitch to Top Investors: Present your ideas and products to key decision-makers in the tech world. Gain Visibility: Showcase your startup in a vibrant space dedicated to innovation, and prove that you are the next game-changer. Learn from the Best: Hear from visionary thought leaders and Microsoft AI experts about the latest trends and opportunities in AI and cloud. AI Competition: Propel Your Startup Stand out from the crowd by participating in the European AI & Cloud Startup Stage competition, exclusively designed for startups leveraging Microsoft AI and Azure Cloud services. Compete for prestigious awards, including: $25,000 in Microsoft Azure Credits. A mentoring session with Marco Casalaina, VP of Products at Azure AI. Fast-track access to exclusive resources through the Microsoft for Startups Program. Get ready to deliver a pitch in front of a live audience and an expert panel on 28 May 2025! How to Apply: Ensure your startup solution runs on Microsoft AI and Azure Cloud. Register as a conference and submit your Competiton application form before the deadline: 14 April 2025 at European Cloud and AI Summit. Be Part of Something Bigger This isn’t just an exhibition—it’s a thriving community where innovation meets opportunity. Don’t miss out! With tickets already 70% sold out, now’s the time to secure your spot. Join the European AI and Cloud Startup Area with a booth or launchpad, and accelerate your growth in the tech ecosystem. Visit the [European AI and Cloud Summit](https://ecs.events) website to learn more, purchase tickets, or apply for the AI competition. Download the sponsorship brochure for detailed insights into this once-in-a-lifetime event. Together, let’s shape the future of cloud technology. See you in Düsseldorf! 🎉Building a DeepSeek Extension for GitHub Copilot in VS Code
DeepSeek has been getting a lot of buzz lately, and with a little setup, you can start using it today in GitHub Copilot within VS Code. In this post, I’ll walk you through how to install and run a VS Code extension I built, so you can take advantage of DeepSeek right on your machine. With this extension, you can use “@deepseek” to explore the deepseek-coder model. It’s powered by Ollama, enabling seamless, fully offline interactions with DeepSeek models—giving you a local coding assistant that prioritizes privacy and performance. In a future post I'll walk you through the extension code and explain how to call models hosted locally using Ollama. Feel free to subscribe to get notified. Features and Benefits Open-Source and Extendable As an open-source project, the DeepSeek for GitHub Copilot extension is fully customizable. Advanced users can modify and extend its functionality, build from source, tweak configurations, and even integrate additional AI capabilities. Local AI Processing With the DeepSeek for GitHub Copilot extension, all interactions are processed locally on your machine, ensuring complete data privacy and eliminating latency issues. This makes it an ideal solution for developers working on sensitive projects or in restricted environments. Seamless Integration with GitHub Copilot Chat The extension integrates natively with GitHub Copilot Chat, allowing you to invoke DeepSeek models effortlessly. If you're already familiar with GitHub Copilot, you'll find the workflow intuitive and easy to use. You can simply type "@deepseek" followed by your question to get started. Powered by Ollama Ollama, a lightweight AI model runtime, powers the execution of DeepSeek models. It simplifies model management by handling downloads and execution, so you can focus on coding. Customizable Model Selection You can configure the extension to use different DeepSeek models through a simple setting adjustment. This flexibility allows you to choose the right model size and capability for your hardware. Please note that running bigger models might not work on your local system. You can take advantage of Azure's infrastructure to run bigger models. Installation Guide Installing and Running Ollama DeepSeek for GitHub Copilot requires Ollama to function properly. Ollama is an AI model runtime that allows you to run and manage large language models efficiently on your local machine. Install Ollama: Download the installer, install, and start Ollama from the Ollama website. Install from Visual Studio Code Marketplace: The simplest way to get started is by installing the extension directly from the Visual Studio Code Marketplace. Open Visual Studio Code. Navigate to the Extensions panel (Ctrl + Shift + X). Search for DeepSeek for GitHub Copilot and click Install. Using the Extension: Once installed, using the extension is straightforward: Open the GitHub Copilot Chat panel. Type @deepseek followed by your prompt to interact with the model. Note: On the first run, the extension will automatically download the DeepSeek model. This may take a few minutes, depending on your internet connection. Configuration and Customization DeepSeek for GitHub Copilot allows users to configure the AI model through Visual Studio Code settings. To change the DeepSeek model, update the settings.json file: { "deepseek.model.name": "deepseek-coder:1.3b" } A list of available models can be found on the Ollama website. Limitations and Workarounds Current Limitations The extension does not have access to your files in this version, meaning it cannot provide context-aware completions. This is due to the fact that DeepSeek models don't support Function Calling. Limited to local machine performance—larger models may require more RAM and CPU power. Workarounds To provide context for completions, manually copy-paste the relevant code into the chat. Optimize performance by selecting smaller DeepSeek models (such as deepseek-coder:1.3b) if you experience lag. System Requirements To run DeepSeek for GitHub Copilot Chat, ensure you have the following: Visual Studio Code (latest version recommended) Ollama app installed and running (Download from ollama.com) Sufficient system resources Minimum: 8GB RAM, multi-core CPU Recommended: 16GB RAM, GPU acceleration (if available) Conclusion The DeepSeek for GitHub Copilot Chat extension provides an excellent way for delivering privacy, low-latency responses, and offline capabilities. 🔗 Get Started Today: Install the DeepSeek for GitHub Copilot Chat extension and supercharge your GitHub Copilot Chat experience with AI—entirely offline! 🚀 ■ Co-authored with CopilotBuilding AI Agents on edge devices using Ollama + Phi-4-mini Function Calling
The new Phi-4-mini and Phi-4-multimodal now support Function Calling. This feature enables the models to connect with external tools and APIs. By deploying Phi-4-mini and Phi-4-multimodal with Function Calling capabilities on edge devices, we can achieve local expansion of knowledge capabilities and enhance their task execution efficiency. This blog will focus on how to use Phi-4-mini's Function Calling capabilities to build efficient AI Agents on edge devices. What‘s Function Calling How it works First we need to learn how Function Calling works Tool Integration: Function Calling allows LLM/SLM to interact with external tools and APIs, such as weather APIs, databases, or other services. Function Definition: Defines a function (tool) that LLM/SLM can call, specifying its name, parameters, and expected output. LLM Detection: LLM/SLM analyzes the user's input and determines if a function call is required and which function to use. JSON Output: LLM/SLM outputs a JSON object containing the name of the function to call and the parameters required by the function. External Execution: The application executes the function call using the parameters provided by LLM/SLM. Response to LLM: Returns the output of Function Calling to LLM/SLM, and LLM/SLM can use this information to generate a response to the user. Application scenarios Data retrieval: convert natural language queries into API calls to fetch data (e.g., "show my recent orders" triggers a database query) Operation execution: convert user requests into specific function calls (e.g., "schedule a meeting" becomes a calendar API call) Computational tasks: handle mathematical or logical operations through dedicated functions (e.g., calculate compound interest or statistical analysis) Data processing: chain multiple function calls together (e.g., get data → parse → transform → store) UI/UX integration: trigger interface updates based on user interactions (e.g., update map markers or display charts) Phi-4-mini / Phi-4-multimodal's Function Calling Phi-4-mini / Phi-4-multimodal supports single and parallel Function Calling. Things to note when calling You need to define Tools in System to start single or parallel Function Calling If you want to start parallel Function Calling, you also need to add 'some tools' to the System prompt The following is an example Single Function Calling tools = [ { "name": "get_match_result", "description": "get match result", "parameters": { "match": { "description": "The name of the match", "type": "str", "default": "Arsenal vs ManCity" } } }, ] messages = [ { "role": "system", "content": "You are a helpful assistant", "tools": json.dumps(tools), # pass the tools into system message using tools argument }, { "role": "user", "content": "What is the result of Arsenal vs ManCity today?" } ] Full Sample : Click Parallel Function Calling AGENT_TOOLS = { "booking_fight": { "name": "booking_fight", "description": "booking fight", "parameters": { "departure": { "description": "The name of Departure airport code", "type": "str", }, "destination": { "description": "The name of Destination airport code", "type": "str", }, "outbound_date": { "description": "The date of outbound flight", "type": "str", }, "return_date": { "description": "The date of return flight", "type": "str", } } }, "booking_hotel": { "name": "booking_hotel", "description": "booking hotel", "parameters": { "query": { "description": "The name of the city", "type": "str", }, "check_in_date": { "description": "The date of check in", "type": "str", }, "check_out_date": { "description": "The date of check out", "type": "str", } } }, } SYSTEM_PROMPT = """ You are my travel agent with some tools available. """ messages = [ { "role": "system", "content": SYSTEM_PROMPT, "tools": json.dumps(AGENT_TOOLS), # pass the tools into system message using tools argument }, { "role": "user", "content": """I have a business trip from London to New York in March 21 2025 to March 27 2025, can you help me to book a hotel and flight tickets""" } ] Full sample : click Using Ollama and Phi-4-mini Function Calling to Create AI Agents on Edge Devices Ollama is a popular free tool for deploying LLM/SLM locally and can be used in combination with AI Toolkit for VS Code. In addition to being deployed on your PC/Laptop, it can also be deployed on IoT, mobile phones, containers, etc. To use Phi-4-mini on Ollama, you need to use Ollama 0.5.13+. Different quantitative versions are supported on Ollama, as shown in the figure below: Using Ollama, we can deploy Phi-4-mini on the edge, and implement AI Agent with Function Calling under limited computing power, so that Generative AI can be applied more effectively on the edge. Current Issues A sad experience - If you directly use the interface to try to call Ollama in the above way, you will find that Function Calling will not be triggered. There are discussions on Ollama's GitHub Issue. You can enter the Issue https://github.com/ollama/ollama/issues/9437. By modifying the Phi-4-mini Template on the ModelFile to implement a single Function Calling, but the call to Parallel Function Calling still failed. Resolution We have implemented a fix by making a adjustments to the template. We have improved it according to Phi-4-mini's Chat Template and re-modified the Modelfile. Of course, the quantitative model has a huge impact on the results. The adjustments are as follows: TEMPLATE """ {{- if .Messages }} {{- if or .System .Tools }}<|system|> {{ if .System }}{{ .System }} {{- end }} In addition to plain text responses, you can chose to call one or more of the provided functions. Use the following rule to decide when to call a function: * if the response can be generated from your internal knowledge (e.g., as in the case of queries like "What is the capital of Poland?"), do so * if you need external information that can be obtained by calling one or more of the provided functions, generate a function calls If you decide to call functions: * prefix function calls with functools marker (no closing marker required) * all function calls should be generated in a single JSON list formatted as functools[{"name": [function name], "arguments": [function arguments as JSON]}, ...] * follow the provided JSON schema. Do not hallucinate arguments or values. Do to blindly copy values from the provided samples * respect the argument type formatting. E.g., if the type if number and format is float, write value 7 as 7.0 * make sure you pick the right functions that match the user intent Available functions as JSON spec: {{- if .Tools }} {{ .Tools }} {{- end }}<|end|> {{- end }} {{- range .Messages }} {{- if ne .Role "system" }}<|{{ .Role }}|> {{- if and .Content (eq .Role "tools") }} {"result": {{ .Content }}} {{- else if .Content }} {{ .Content }} {{- else if .ToolCalls }} functools[ {{- range .ToolCalls }}{{ "{" }}"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}{{ "}" }} {{- end }}] {{- end }}<|end|> {{- end }} {{- end }}<|assistant|> {{ else }} {{- if .System }}<|system|> {{ .System }}<|end|>{{ end }}{{ if .Prompt }}<|user|> {{ .Prompt }}<|end|>{{ end }}<|assistant|> {{ end }}{{ .Response }}{{ if .Response }}<|user|>{{ end }} """ We have tested the solution using different quantitative models. In the laptop environment, we recommend that you use the following model to enable single/parallel Function Calling: phi4-mini:3.8b-fp16. Note: you need to bind the defined Modelfile and phi4-mini:3.8b-fp16 together to enable this to work. Please execute the following command in the command line: #If you haven't downloaded it yet, please execute this command firstr ollama run phi4-mini:3.8b-fp16 #Binding with the adjusted Modelfile ollama create phi4-mini:3.8b-fp16 -f {Your Modelfile Path} To test the single Function Calling and Parallel Function Calling of Phi-4-mini. Single Function Calling Parallel Function Calling Full Sample in notebook The above example is just a simple introduction. As we move forward with the development we hope to find simpler ways to apply it on the edge, use Function Calling to expand the scenarios of Phi-4-mini / Phi-4-multimodal, and also develop more usecases in vertical industries. Resources Phi-4 model on Hugging face https://huggingface.co/collections/microsoft/phi-4-677e9380e514feb5577a40e4 Phi-4-mini on Ollama https://ollama.com/library/phi4-mini Learn Function Calling https://huggingface.co/docs/hugs/en/guides/function-calling Phi Cookbook - Samples and Resources for Phi Models https://aka.ms/phicookbook6KViews4likes1CommentAI Toolkit for VS Code January Update
AI Toolkit is a VS Code extension aiming to empower AI engineers in transforming their curiosity into advanced generative AI applications. This toolkit, featuring both local-enabled and cloud-accelerated inner loop capabilities, is set to ease model exploration, prompt engineering, and the creation and evaluation of generative applications. We are pleased to announce the January Update to the toolkit with support for OpenAI's o1 model and enhancements in the Model Playground and Bulk Run features. What's New? January’s update brings several exciting new features to boost your productivity in AI development. Here's a closer look at what's included: Support for OpenAI’s new o1 Model: We've added access to GitHub hosted OpenAI’s latest o1 model. This new model replaces the o1-preview and offers even better performance in handling complex tasks. You can start interacting with the o1 model within VS Code for free by using the latest AI Toolkit update. Chat History Support in Model Playground: We have heard your feedback that tracking past model interactions is crucial. The Model Playground has been updated to include support for chat history. This feature saves chat history as individual files stored entirely on your local machine, ensuring privacy and security. Bulk Run with Prompt Templating: The Bulk Run feature, introduced in the AI Toolkit December release, now supports prompt templating with variables. This allows users to create templates for prompts, insert variables, and run them in bulk. This enhancement simplifies the process of testing multiple scenarios and models. Stay tuned for more updates and enhancements as we continue to innovate and support your journey in AI development. Try out the AI Toolkit for Visual Studio Code, share your thoughts, and file issues and suggest features in our GitHub repo. Thank you for being a part of this journey with us!