ollama

11 Topics

Bring your own models on AI Toolkit - using Ollama and API keys
As we have seen in past blog posts, AI toolkit supports a range of models using Github Marketplace of models. However, you might require support for external models hosted by Google, Anthropic and Open AI which are either not available in the Github catalog of models or might want to use the models served by Ollama. We will cover both of these scenarios in this blog post. OpenAI, Anthropic and Google Hosted models Once you click on the Model catalog window and selected the models hosted by Google, Anthropic and OpenAI, you should see the following models selected. Also, you can add your APIs keys to the model in the following way On the above models, you should click on the "Try in playground", just below the model name model card and you should be able to see the following dialog box on the top search bar of the VS Code window. Here I have clicked on the "Try it in playground" link for Anthropic Claude 3.5 Sonnet model Enter your API Key and you are good to go. Also, as you can see in the dropdown text, you can also edit or change the value later. Similarly, you can perform the same action for the Google and OpenAI hosted models. Once you have done this you are free to use these models in the playground and for using the other features of AI toolkit extension. Using models served by Ollama Several developers are also using Ollama to experiment and play with models using the command line. Ollama is an open-source AI tool that allows users to run large language models (LLMs) on their local systems. It's a valuable tool for industries that require data privacy, such as healthcare, finance, and government which might need locally hosted models. So, AI toolkit already supports some locally downloadable models such as those in the Phi-series by Microsoft or those by Mistral. Ollama supports a wider variety of models especially those from Meta's Llama series of LLMs and SLMs. The complete list of models currently supported by Ollama can be found at Ollama library. We will run ollama on windows and when you run ollama and see help command you get the following output. Once you have selected the model from the library, you can use the ollama pull or ollama run to download the model. The run command will download the model and then run it if it's not already downloaded. The pull command will just download the model from the repository. Since I want you show you a multimodal model that can be run locally, I will be go to the command line and download and run llama3.2-vision model. See the commands for the same below We can also list the models already downloaded. As you can see, I have downloaded and tried a bunch of models. It might take a bit of time to download the models based on the speed of your internet connection. As you can see some of the models are quite large downloads. Also, you might need to make sure you have enough RAM on your laptop or desktop to run the models. So now we have downloaded the models, and we have run the models. Now, let's see how we can access them in AI Toolkit for VSCode. Go to the My models window as seen below. Click on the '+' symbol as seen below in the screenshot. Once you click on this you will see a dropdown in search. Click on add an Ollama model You will now see the choice to either select a model from ollama library or a custom ollama endpoint. For the purposes of this tutorial, we will select models. Let's select Multimodal modal we downloaded earlier. We should see the models we had seen in ollama list command earlier. See below. Now we can select the checkbox alongside llama3.2-vision:latest and select okay. You should now see the model appear in the My Models window like below. You can now right click on it and start using the model by loading in the playground. Since this is a multimodal model, you can use it to generate text as below. The screenshot below shows the model loaded via ollama and at the same time you can see that it is has the clip symbol activated in the window. Since this is a multi-modal model, we can give it an image and ask it questions. Which is what we will do next. Next, now let's attach an image and ask it questions about this image. This might take a bit more time as it will need to analyze the image and answer any questions. Since it is a generative AI model it can give slightly different inputs when given the same image. I can also ask the models questions about the image, and it will use the information from the image - the objects shown, the relationship between the objects and its world knowledge from its training to answer the questions. For example, see the session below. So, as you can the AI toolkit can be a fantastic place to try out different models from various sources. Ollama is also a great tool to try pre-built models locally securely without sending your data to the cloud, which can make it suitable for air-gapped environments and data privacy sensitive industries like fintech, healthcare and government. You have greater control over the models and the environment and the data they run on. It is also possible to customize the models and then serve them via ollama. This helps you choose the best model for your AI application. Resources AI toolkit for VSCode - https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio AI toolkit for VSCode on Github - https://github.com/microsoft/vscode-ai-toolkit Ollama - https://github.com/ollama/ollama Ollama library - https://ollama.com/library Azure AI Discord - https://aka.ms/AzureAI/Discord
vinayakh
Jan 27, 2025 Place Microsoft Developer Community Blog
6.9KViews
4likes
0Comments
AI Toolkit for Visual Studio Code: October 2024 Update Highlights
The AI Toolkit’s October 2024 update revolutionizes Visual Studio Code with game-changing features for developers, researchers, and enthusiasts. Explore multi-model integration, including GitHub Models, ONNX, and Google Gemini, alongside custom model support. Dive into multi-modal capabilities for richer AI testing and seamless multi-platform compatibility across Windows, macOS, and Linux. Tailored for productivity, the enhanced Model Catalog simplifies choosing the best tools for your projects. Try it now and share feedback to shape the future of AI in VS Code!
ronglums
Nov 19, 2024 Place Microsoft Developer Community Blog
3KViews
4likes
0Comments
Getting Started - Generative AI with Phi-3-mini: A Guide to Inference and Deployment
Getting started with Microsoft Phi-3-mini - Inference Phi-3-mini models, Discover how Phi-3-mini, a new series of models from Microsoft, enables deployment of Large Language Models (LLMs) on edge devices and IoT devices. Learn how to use Semantic Kernel, Ollama/LlamaEdge, and ONNX Runtime to access and infer phi3-mini models, and explore the possibilities of generative AI in various application scenarios
kinfey
Apr 23, 2024 Place Microsoft Developer Community Blog
51KViews
4likes
13Comments
From Cloud to Chip: Building Smarter AI at the Edge with Windows AI PCs
As AI engineers, we’ve spent years optimizing models for the cloud, scaling inference, wrangling latency, and chasing compute across clusters. But the frontier is shifting. With the rise of Windows AI PCs and powerful local accelerators, the edge is no longer a constraint it’s now a canvas. Whether you're deploying vision models to industrial cameras, optimizing speech interfaces for offline assistants, or building privacy-preserving apps for healthcare, Edge AI is where real-world intelligence meets real-time performance. Why Edge AI, Why Now? Edge AI isn’t just about running models locally, it’s about rethinking the entire lifecycle: - Latency: Decisions in milliseconds, not round-trips to the cloud. - Privacy: Sensitive data stays on-device, enabling HIPAA/GDPR compliance. - Resilience: Offline-first apps that don’t break when the network does. - Cost: Reduced cloud compute and bandwidth overhead. With Windows AI PCs powered by Intel and Qualcomm NPUs and tools like ONNX Runtime, DirectML, and Olive, developers can now optimize and deploy models with unprecedented efficiency. What You’ll Learn in Edge AI for Beginners The Edge AI for Beginners curriculum is a hands-on, open-source guide designed for engineers ready to move from theory to deployment. Multi-Language Support This content is available in over 48 languages, so you can read and study in your native language. What You'll Master This course takes you from fundamental concepts to production-ready implementations, covering: Small Language Models (SLMs) optimized for edge deployment Hardware-aware optimization across diverse platforms Real-time inference with privacy-preserving capabilities Production deployment strategies for enterprise applications Why EdgeAI Matters Edge AI represents a paradigm shift that addresses critical modern challenges: Privacy & Security: Process sensitive data locally without cloud exposure Real-time Performance: Eliminate network latency for time-critical applications Cost Efficiency: Reduce bandwidth and cloud computing expenses Resilient Operations: Maintain functionality during network outages Regulatory Compliance: Meet data sovereignty requirements Edge AI Edge AI refers to running AI algorithms and language models locally on hardware, close to where data is generated without relying on cloud resources for inference. It reduces latency, enhances privacy, and enables real-time decision-making. Core Principles: On-device inference: AI models run on edge devices (phones, routers, microcontrollers, industrial PCs) Offline capability: Functions without persistent internet connectivity Low latency: Immediate responses suited for real-time systems Data sovereignty: Keeps sensitive data local, improving security and compliance Small Language Models (SLMs) SLMs like Phi-4, Mistral-7B, Qwen and Gemma are optimized versions of larger LLMs, trained or distilled for: Reduced memory footprint: Efficient use of limited edge device memory Lower compute demand: Optimized for CPU and edge GPU performance Faster startup times: Quick initialization for responsive applications They unlock powerful NLP capabilities while meeting the constraints of: Embedded systems: IoT devices and industrial controllers Mobile devices: Smartphones and tablets with offline capabilities IoT Devices: Sensors and smart devices with limited resources Edge servers: Local processing units with limited GPU resources Personal Computers: Desktop and laptop deployment scenarios Course Modules & Navigation Course duration. 10 hours of content Module Topic Focus Area Key Content Level Duration 📖 00 Introduction to EdgeAI Foundation & Context EdgeAI Overview • Industry Applications • SLM Introduction • Learning Objectives Beginner 1-2 hrs 📚 01 EdgeAI Fundamentals Cloud vs Edge AI comparison EdgeAI Fundamentals • Real World Case Studies • Implementation Guide • Edge Deployment Beginner 3-4 hrs 🧠 02 SLM Model Foundations Model families & architecture Phi Family • Qwen Family • Gemma Family • BitNET • μModel • Phi-Silica Beginner 4-5 hrs 🚀 03 SLM Deployment Practice Local & cloud deployment Advanced Learning • Local Environment • Cloud Deployment Intermediate 4-5 hrs ⚙️ 04 Model Optimization Toolkit Cross-platform optimization Introduction • Llama.cpp • Microsoft Olive • OpenVINO • Apple MLX • Workflow Synthesis Intermediate 5-6 hrs 🔧 05 SLMOps Production Production operations SLMOps Introduction • Model Distillation • Fine-tuning • Production Deployment Advanced 5-6 hrs 🤖 06 AI Agents & Function Calling Agent frameworks & MCP Agent Introduction • Function Calling • Model Context Protocol Advanced 4-5 hrs 💻 07 Platform Implementation Cross-platform samples AI Toolkit • Foundry Local • Windows Development Advanced 3-4 hrs 🏭 08 Foundry Local Toolkit Production-ready samples Sample applications (see details below) Expert 8-10 hrs Each module includes Jupyter notebooks, code samples, and deployment walkthroughs, perfect for engineers who learn by doing. Developer Highlights - 🔧 Olive: Microsoft's optimization toolchain for quantization, pruning, and acceleration. - 🧩 ONNX Runtime: Cross-platform inference engine with support for CPU, GPU, and NPU. - 🎮 DirectML: GPU-accelerated ML API for Windows, ideal for gaming and real-time apps. - 🖥️ Windows AI PCs: Devices with built-in NPUs for low-power, high-performance inference. Local AI: Beyond the Edge Local AI isn’t just about inference, it’s about autonomy. Imagine agents that: - Learn from local context - Adapt to user behavior - Respect privacy by design With tools like Agent Framework, Azure AI Foundry and Windows Copilot Studio, and Foundry Local developers can orchestrate local agents that blend LLMs, sensors, and user preferences, all without cloud dependency. Try It Yourself Ready to get started? Clone the Edge AI for Beginners GitHub repo, run the notebooks, and deploy your first model to a Windows AI PC or IoT devices Whether you're building smart kiosks, offline assistants, or industrial monitors, this curriculum gives you the scaffolding to go from prototype to production.
Lee_Stott
Oct 07, 2025 Place Microsoft Developer Community Blog
365Views
1like
1Comment
¡Curso oficial y gratuito de GenAI y Python! 🚀
¿Quieres aprender a usar modelos de IA generativa en tus aplicaciones de Python?Estamos organizando una serie de nueve transmisiones en vivo, en inglés y español, totalmente dedicadas a la IA generativa. Vamos a cubrir modelos de lenguaje (LLMs), modelos de embeddings, modelos de visión, y también técnicas como RAG, function calling y structured outputs. Además, te mostraremos cómo construir Agentes y servidores MCP, y hablaremos sobre seguridad en IA y evaluaciones, para asegurarnos de que tus modelos y aplicaciones generen resultados seguros. 🔗 Regístrate para toda la serie. Además de las transmisiones en vivo, puedes unirte a nuestras office hours semanales en el AI Foundry Discord de para hacer preguntas que no se respondan durante el chat. ¡Nos vemos en los streams! 👋🏻 Here’s your HTML converted into clean, readable text format (perfect for a newsletter, blog post, or social media caption): Modelos de Lenguaje 📅 7 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor ¡Únete a la primera sesión de nuestra serie de Python + IA! En esta sesión, hablaremos sobre los Modelos de Lenguaje (LLMs), los modelos que impulsan ChatGPT y GitHub Copilot. Usaremos Python para interactuar con LLMs utilizando paquetes como el SDK de OpenAI y Langchain. Experimentaremos con prompt engineering y ejemplos few-shot para mejorar los resultados. También construiremos una aplicación full stack impulsada por LLMs y explicaremos la importancia de la concurrencia y el streaming en apps de IA orientadas al usuario. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Embeddings Vectoriales 📅 8 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la segunda sesión de Python + IA, exploraremos los embeddings vectoriales, una forma de codificar texto o imágenes como arrays de números decimales. Estos modelos permiten realizar búsquedas por similitud en distintos tipos de contenido. Usaremos modelos como la serie text-embedding-3 de OpenAI, visualizaremos resultados en Python y compararemos métricas de distancia. También veremos cómo aplicar cuantización y cómo usar modelos multimodales de embedding. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Recuperación-Aumentada Generación (RAG) 📅 9 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la tercera sesión, exploraremos RAG, una técnica que envía contexto al LLM para obtener respuestas más precisas dentro de un dominio específico. Usaremos distintas fuentes de datos —CSVs, páginas web, documentos, bases de datos— y construiremos una app RAG full-stack con Azure AI Search. Modelos de Visión 📅 14 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor ¡La cuarta sesión trata sobre modelos de visión como GPT-4o y 4o-mini! Estos modelos pueden procesar texto e imágenes, generando descripciones, extrayendo datos, respondiendo preguntas o clasificando contenido. Usaremos Python para enviar imágenes a los modelos, crear una app de chat con imágenes e integrarlos en flujos RAG. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Salidas Estructuradas 📅 15 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la quinta sesión aprenderemos a hacer que los LLMs generen respuestas estructuradas según un esquema. Exploraremos el modo structured outputs de OpenAI y cómo aplicarlo para extracción de entidades, clasificación y flujos con agentes. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Calidad y Seguridad 📅 16 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la sexta sesión hablaremos sobre cómo usar IA de manera segura y evaluar la calidad de las salidas. Mostraremos cómo configurar Azure AI Content Safety, manejar errores en código Python y evaluar resultados con el SDK de Evaluación de Azure AI. Tool Calling 📅 21 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor En la última semana de la serie, nos enfocamos en tool calling (function calling), la base para construir agentes de IA. Aprenderemos a definir herramientas en Python o JSON, manejar respuestas de los modelos y habilitar llamadas paralelas y múltiples iteraciones. 👉 Si querés seguir los ejemplos en vivo, asegurate de tener una cuenta de GitHub. Agentes de IA 📅 22 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor ¡En la penúltima sesión construiremos agentes de IA! Usaremos frameworks como Langgraph, Semantic Kernel, Autogen, y Pydantic AI. Empezaremos con ejemplos simples y avanzaremos a arquitecturas más complejas como round-robin, supervisor, graphs y ReAct. Model Context Protocol (MCP) 📅 23 de octubre, 2025 | 10:00 PM - 11:00 PM (UTC) 🔗 Regístrate para la transmisión en Reactor Cerramos la serie con Model Context Protocol (MCP), la tecnología abierta más candente de 2025. Aprenderás a usar FastMCP para crear un servidor MCP local y conectarlo a chatbots como GitHub Copilot. También veremos cómo integrar MCP con frameworks de agentes como Langgraph, Semantic Kernel y Pydantic AI. Y, por supuesto, hablaremos sobre los riesgos de seguridad y las mejores prácticas para desarrolladores. ¿Querés que lo reformatee para publicación en Markdown (para blogs o repos) o en texto plano con emojis y separadores estilo redes sociales?
gwyneth-penas
Oct 06, 2025 Place Microsoft Developer Community Blog
163Views
1like
1Comment
Build AI Agents with MCP Tool Use in Minutes with AI Toolkit for VSCode
We’re excited to announce Agent Builder, the newest evolution of what was formerly known as Prompt Builder, now reimagined and supercharged for intelligent app development. This powerful tool in AI Toolkit enables you to create, iterate, and optimize agents—from prompt engineering to tool integration—all in one seamless workflow. Whether you're designing simple chat interactions or complex task-performing agents with tool access, Agent Builder simplifies the journey from idea to integration. Why Agent Builder? Agent Builder is designed to empower developers and prompt engineers to: 🚀 Generate starter prompts with natural language 🔁 Iterate and refine prompts based on model responses 🧩 Break down tasks with prompt chaining and structured outputs 🧪 Test integrations with real-time runs and tool use such as MCP servers 💻 Generate production-ready code for rapid app development And a lot of features are coming soon, stay tuned for: 📝 Use variables in prompts �� Run agent with test cases to test your agent easily 📊 Evaluate the accuracy and performance of your agent with built-in or your custom metrics ☁️ Deploy your agent to cloud Build Smart Agents with Tool Use (MCP Servers) Agents can now connect to external tools through MCP (Model Control Protocol) servers, enabling them to perform real-world actions like querying a database, accessing APIs, or executing custom logic. Connect to an Existing MCP Server To use an existing MCP server in Agent Builder: In the Tools section, select + MCP Server. Choose a connection type: Command (stdio) – run a local command that implements the MCP protocol HTTP (server-sent events) – connect to a remote server implementing the MCP protocol If the MCP server supports multiple tools, select the specific tool you want to use. Enter your prompts and click Run to test the agent's interaction with the tool. This integration allows your agents to fetch live data or trigger custom backend services as part of the conversation flow. Build and Scaffold a New MCP Server Want to create your own tool? Agent Builder helps you scaffold a new MCP server project: In the Tools section, select + MCP Server. Choose MCP server project. Select your preferred programming language: Python or TypeScript. Pick a folder to create your server project. Name your project and click Create. Agent Builder generates a scaffolded implementation of the MCP protocol that you can extend. Use the built-in VS Code debugger: Press F5 or click Debug in Agent Builder Test with prompts like: System: You are a weather forecast professional that can tell weather information based on given location. User: What is the weather in Shanghai? Agent Builder will automatically connect to your running server and show the response, making it easy to test and refine the tool-agent interaction. AI Sparks from Prototype to Production with AI Toolkit Building AI-powered applications from scratch or infusing intelligence into existing systems? AI Sparks is your go-to webinar series for mastering the AI Toolkit (AITK) from foundational concepts to cutting-edge techniques. In this bi-weekly, hands-on series, we’ll cover: 🚀SLMs & Local Models – Test and deploy AI models and applications efficiently on your own terms locally, to edge devices or to the cloud 🔍 Embedding Models & RAG – Supercharge retrieval for smarter applications using existing data. 🎨 Multi-Modal AI – Work with images, text, and beyond. 🤖 Agentic Frameworks – Build autonomous, decision-making AI systems. Watch on Demand Share your feedback  Get started with the latest version, share your feedback, and let us know how these new features help you in your AI development journey. As always, we’re here to listen, collaborate, and grow alongside our amazing user community.  Thank you for being a part of this journey—let’s build the future of AI together! Join our Microsoft Azure AI Foundry Discord channel to continue the discussion 🚀
junjieli
Apr 29, 2025 Place Microsoft Developer Community Blog
4.1KViews
1like
0Comments
Building a DeepSeek Extension for GitHub Copilot in VS Code
DeepSeek has been getting a lot of buzz lately, and with a little setup, you can start using it today in GitHub Copilot within VS Code. In this post, I’ll walk you through how to install and run a VS Code extension I built, so you can take advantage of DeepSeek right on your machine. With this extension, you can use “@deepseek” to explore the deepseek-coder model. It’s powered by Ollama, enabling seamless, fully offline interactions with DeepSeek models—giving you a local coding assistant that prioritizes privacy and performance. In a future post I'll walk you through the extension code and explain how to call models hosted locally using Ollama. Feel free to subscribe to get notified. Features and Benefits Open-Source and Extendable As an open-source project, the DeepSeek for GitHub Copilot extension is fully customizable. Advanced users can modify and extend its functionality, build from source, tweak configurations, and even integrate additional AI capabilities. Local AI Processing With the DeepSeek for GitHub Copilot extension, all interactions are processed locally on your machine, ensuring complete data privacy and eliminating latency issues. This makes it an ideal solution for developers working on sensitive projects or in restricted environments. Seamless Integration with GitHub Copilot Chat The extension integrates natively with GitHub Copilot Chat, allowing you to invoke DeepSeek models effortlessly. If you're already familiar with GitHub Copilot, you'll find the workflow intuitive and easy to use. You can simply type "@deepseek" followed by your question to get started. Powered by Ollama Ollama, a lightweight AI model runtime, powers the execution of DeepSeek models. It simplifies model management by handling downloads and execution, so you can focus on coding. Customizable Model Selection You can configure the extension to use different DeepSeek models through a simple setting adjustment. This flexibility allows you to choose the right model size and capability for your hardware. Please note that running bigger models might not work on your local system. You can take advantage of Azure's infrastructure to run bigger models. Installation Guide Installing and Running Ollama DeepSeek for GitHub Copilot requires Ollama to function properly. Ollama is an AI model runtime that allows you to run and manage large language models efficiently on your local machine. Install Ollama: Download the installer, install, and start Ollama from the Ollama website. Install from Visual Studio Code Marketplace: The simplest way to get started is by installing the extension directly from the Visual Studio Code Marketplace. Open Visual Studio Code. Navigate to the Extensions panel (Ctrl + Shift + X). Search for DeepSeek for GitHub Copilot and click Install. Using the Extension: Once installed, using the extension is straightforward: Open the GitHub Copilot Chat panel. Type @deepseek followed by your prompt to interact with the model. Note: On the first run, the extension will automatically download the DeepSeek model. This may take a few minutes, depending on your internet connection. Configuration and Customization DeepSeek for GitHub Copilot allows users to configure the AI model through Visual Studio Code settings. To change the DeepSeek model, update the settings.json file: { "deepseek.model.name": "deepseek-coder:1.3b" } A list of available models can be found on the Ollama website. Limitations and Workarounds Current Limitations The extension does not have access to your files in this version, meaning it cannot provide context-aware completions. This is due to the fact that DeepSeek models don't support Function Calling. Limited to local machine performance—larger models may require more RAM and CPU power. Workarounds To provide context for completions, manually copy-paste the relevant code into the chat. Optimize performance by selecting smaller DeepSeek models (such as deepseek-coder:1.3b) if you experience lag. System Requirements To run DeepSeek for GitHub Copilot Chat, ensure you have the following: Visual Studio Code (latest version recommended) Ollama app installed and running (Download from ollama.com) Sufficient system resources Minimum: 8GB RAM, multi-core CPU Recommended: 16GB RAM, GPU acceleration (if available) Conclusion The DeepSeek for GitHub Copilot Chat extension provides an excellent way for delivering privacy, low-latency responses, and offline capabilities. 🔗 Get Started Today: Install the DeepSeek for GitHub Copilot Chat extension and supercharge your GitHub Copilot Chat experience with AI—entirely offline! 🚀 ■ Co-authored with Copilot
wassimchegham
Mar 17, 2025 Place Microsoft Developer Community Blog
4.9KViews
1like
0Comments