ollama
19 TopicsGetting Started - Generative AI with Phi-3-mini: A Guide to Inference and Deployment
Getting started with Microsoft Phi-3-mini - Inference Phi-3-mini models, Discover how Phi-3-mini, a new series of models from Microsoft, enables deployment of Large Language Models (LLMs) on edge devices and IoT devices. Learn how to use Semantic Kernel, Ollama/LlamaEdge, and ONNX Runtime to access and infer phi3-mini models, and explore the possibilities of generative AI in various application scenarios50KViews4likes13CommentsStep-by-step: Integrate Ollama Web UI to use Azure Open AI API with LiteLLM Proxy
Introductions Ollama WebUI is a streamlined interface for deploying and interacting with open-source large language models (LLMs) like Llama 3 and Mistral, enabling users to manage models, test them via a ChatGPT-like chat environment, and integrate them into applications through Ollama’s local API. While it excels for self-hosted models on platforms like Azure VMs, it does not natively support Azure OpenAI API endpoints—OpenAI’s proprietary models (e.g., GPT-4) remain accessible only through OpenAI’s managed API. However, tools like LiteLLM bridge this gap, allowing developers to combine Ollama-hosted models with OpenAI’s API in hybrid workflows, while maintaining compliance and cost-efficiency. This setup empowers users to leverage both self-managed open-source models and cloud-based AI services. Problem Statement As of February 2025, Ollama WebUI, still do not support Azure Open AI API. The Ollama Web UI only support self-hosted Ollama API and managed OpenAI API service (PaaS). This will be an issue if users want to use Open AI models they already deployed on Azure AI Foundry. Objective To integrate Azure OpenAI API via LiteLLM proxy into with Ollama Web UI. LiteLLM translates Azure AI API requests into OpenAI-style requests on Ollama Web UI allowing users to use OpenAI models deployed on Azure AI Foundry. If you haven’t hosted Ollama WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Ollama WebUI deployed already. Step 1: Deploy OpenAI models on Azure Foundry. If you haven’t created an Azure AI Hub already, search for Azure AI Foundry on Azure, and click on the “+ Create” button > Hub. Fill out all the empty fields with the appropriate configuration and click on “Create”. After the Azure AI Hub is successfully deployed, click on the deployed resources and launch the Azure AI Foundry service. To deploy new models on Azure AI Foundry, find the “Models + Endpoints” section on the left hand side and click on “+ Deploy Model” button > “Deploy base model” A popup will appear, and you can choose which models to deploy on Azure AI Foundry. Please note that the o-series models are only available to select customers at the moment. You can request access to the o-series models by completing this request access form, and wait until Microsoft approves the access request. Click on “Confirm” and another popup will emerge. Now name the deployment and click on “Deploy” to deploy the model. Wait a few moments for the model to deploy. Once it successfully deployed, please save the “Target URI” and the API Key. Step 2: Deploy LiteLLM Proxy via Docker Container Before pulling the LiteLLM Image into the host environment, create a file named “litellm_config.yaml” and list down the models you deployed on Azure AI Foundry, along with the API endpoints and keys. Replace "API_Endpoint" and "API_Key" with “Target URI” and “Key” found from Azure AI Foundry respectively. Template for the “litellm_config.yaml” file. model_list: - model_name: [model_name] litellm_params: model: azure/[model_name_on_azure] api_base: "[API_ENDPOINT/Target_URI]" api_key: "[API_Key]" api_version: "[API_Version]" Tips: You can find the API version info at the end of the Target URI of the model's endpoint: Sample Endpoint - https://example.openai.azure.com/openai/deployments/o1-mini/chat/completions?api-version=2024-08-01-preview Run the docker command below to start LiteLLM Proxy with the correct settings: docker run -d \ -v $(pwd)/litellm_config.yaml:/app/config.yaml \ -p 4000:4000 \ --name litellm-proxy-v1 \ --restart always \ ghcr.io/berriai/litellm:main-latest \ --config /app/config.yaml --detailed_debug Make sure to run the docker command inside the directory where you created the “litellm_config.yaml” file just now. The port used to listen for LiteLLM Proxy traffic is port 4000. Now that LiteLLM proxy had been deployed on port 4000, lets change the OpenAI API settings on Ollama WebUI. Navigate to Ollama WebUI’s Admin Panel settings > Settings > Connections > Under the OpenAI API section, write http://127.0.0.1:4000 as the API endpoint and set any key (You must write anything to make it work!). Click on “Save” button to reflect the changes. Refresh the browser and you should be able to see the AI models deployed on the Azure AI Foundry listed in the Ollama WebUI. Now let’s test the chat completion + Web Search capability using the "o1-mini" model on Ollama WebUI. Conclusion Hosting Ollama WebUI on an Azure VM and integrating it with OpenAI’s API via LiteLLM offers a powerful, flexible approach to AI deployment, combining the cost-efficiency of open-source models with the advanced capabilities of managed cloud services. While Ollama itself doesn’t support Azure OpenAI endpoints, the hybrid architecture empowers IT teams to balance data privacy (via self-hosted models on Azure AI Foundry) and cutting-edge performance (using Azure OpenAI API), all within Azure’s scalable ecosystem. This guide covers every step required to deploy your OpenAI models on Azure AI Foundry, set up the required resources, deploy LiteLLM Proxy on your host machine and configure Ollama WebUI to support Azure AI endpoints. You can test and improve your AI model even more with the Ollama WebUI interface with Web Search, Text-to-Image Generation, etc. all in one place.9.4KViews1like4CommentsExtending Semantic Kernel using OllamaSharp for Chat and Text Completion
This cutting-edge .NET binding for the Ollama API revolutionizes how we interact with AI, making it a breeze for developers to integrate chat and text completion features into their applications with the power of Semantic Kernel and OllamaSharp.8.8KViews1like0CommentsSemanticKernel – 📎Chat Service demo running Llama2 LLM locally in Ubuntu
Learn how to run a Llama 2 model locally with Ollama, an open-source language model platform. Interact with the model using .NET and Semantic Kernel, a chat service and a console app. Experiment with large language models without external tools or services.7.5KViews0likes0CommentsBring your own models on AI Toolkit - using Ollama and API keys
As we have seen in past blog posts, AI toolkit supports a range of models using Github Marketplace of models. However, you might require support for external models hosted by Google, Anthropic and Open AI which are either not available in the Github catalog of models or might want to use the models served by Ollama. We will cover both of these scenarios in this blog post. OpenAI, Anthropic and Google Hosted models Once you click on the Model catalog window and selected the models hosted by Google, Anthropic and OpenAI, you should see the following models selected. Also, you can add your APIs keys to the model in the following way On the above models, you should click on the "Try in playground", just below the model name model card and you should be able to see the following dialog box on the top search bar of the VS Code window. Here I have clicked on the "Try it in playground" link for Anthropic Claude 3.5 Sonnet model Enter your API Key and you are good to go. Also, as you can see in the dropdown text, you can also edit or change the value later. Similarly, you can perform the same action for the Google and OpenAI hosted models. Once you have done this you are free to use these models in the playground and for using the other features of AI toolkit extension. Using models served by Ollama Several developers are also using Ollama to experiment and play with models using the command line. Ollama is an open-source AI tool that allows users to run large language models (LLMs) on their local systems. It's a valuable tool for industries that require data privacy, such as healthcare, finance, and government which might need locally hosted models. So, AI toolkit already supports some locally downloadable models such as those in the Phi-series by Microsoft or those by Mistral. Ollama supports a wider variety of models especially those from Meta's Llama series of LLMs and SLMs. The complete list of models currently supported by Ollama can be found at Ollama library. We will run ollama on windows and when you run ollama and see help command you get the following output. Once you have selected the model from the library, you can use the ollama pull or ollama run to download the model. The run command will download the model and then run it if it's not already downloaded. The pull command will just download the model from the repository. Since I want you show you a multimodal model that can be run locally, I will be go to the command line and download and run llama3.2-vision model. See the commands for the same below We can also list the models already downloaded. As you can see, I have downloaded and tried a bunch of models. It might take a bit of time to download the models based on the speed of your internet connection. As you can see some of the models are quite large downloads. Also, you might need to make sure you have enough RAM on your laptop or desktop to run the models. So now we have downloaded the models, and we have run the models. Now, let's see how we can access them in AI Toolkit for VSCode. Go to the My models window as seen below. Click on the '+' symbol as seen below in the screenshot. Once you click on this you will see a dropdown in search. Click on add an Ollama model You will now see the choice to either select a model from ollama library or a custom ollama endpoint. For the purposes of this tutorial, we will select models. Let's select Multimodal modal we downloaded earlier. We should see the models we had seen in ollama list command earlier. See below. Now we can select the checkbox alongside llama3.2-vision:latest and select okay. You should now see the model appear in the My Models window like below. You can now right click on it and start using the model by loading in the playground. Since this is a multimodal model, you can use it to generate text as below. The screenshot below shows the model loaded via ollama and at the same time you can see that it is has the clip symbol activated in the window. Since this is a multi-modal model, we can give it an image and ask it questions. Which is what we will do next. Next, now let's attach an image and ask it questions about this image. This might take a bit more time as it will need to analyze the image and answer any questions. Since it is a generative AI model it can give slightly different inputs when given the same image. I can also ask the models questions about the image, and it will use the information from the image - the objects shown, the relationship between the objects and its world knowledge from its training to answer the questions. For example, see the session below. So, as you can the AI toolkit can be a fantastic place to try out different models from various sources. Ollama is also a great tool to try pre-built models locally securely without sending your data to the cloud, which can make it suitable for air-gapped environments and data privacy sensitive industries like fintech, healthcare and government. You have greater control over the models and the environment and the data they run on. It is also possible to customize the models and then serve them via ollama. This helps you choose the best model for your AI application. Resources AI toolkit for VSCode - https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio AI toolkit for VSCode on Github - https://github.com/microsoft/vscode-ai-toolkit Ollama - https://github.com/ollama/ollama Ollama library - https://ollama.com/library Azure AI Discord - https://aka.ms/AzureAI/Discord6.4KViews4likes0CommentsBuilding AI Agents on edge devices using Ollama + Phi-4-mini Function Calling
The new Phi-4-mini and Phi-4-multimodal now support Function Calling. This feature enables the models to connect with external tools and APIs. By deploying Phi-4-mini and Phi-4-multimodal with Function Calling capabilities on edge devices, we can achieve local expansion of knowledge capabilities and enhance their task execution efficiency. This blog will focus on how to use Phi-4-mini's Function Calling capabilities to build efficient AI Agents on edge devices. What‘s Function Calling How it works First we need to learn how Function Calling works Tool Integration: Function Calling allows LLM/SLM to interact with external tools and APIs, such as weather APIs, databases, or other services. Function Definition: Defines a function (tool) that LLM/SLM can call, specifying its name, parameters, and expected output. LLM Detection: LLM/SLM analyzes the user's input and determines if a function call is required and which function to use. JSON Output: LLM/SLM outputs a JSON object containing the name of the function to call and the parameters required by the function. External Execution: The application executes the function call using the parameters provided by LLM/SLM. Response to LLM: Returns the output of Function Calling to LLM/SLM, and LLM/SLM can use this information to generate a response to the user. Application scenarios Data retrieval: convert natural language queries into API calls to fetch data (e.g., "show my recent orders" triggers a database query) Operation execution: convert user requests into specific function calls (e.g., "schedule a meeting" becomes a calendar API call) Computational tasks: handle mathematical or logical operations through dedicated functions (e.g., calculate compound interest or statistical analysis) Data processing: chain multiple function calls together (e.g., get data → parse → transform → store) UI/UX integration: trigger interface updates based on user interactions (e.g., update map markers or display charts) Phi-4-mini / Phi-4-multimodal's Function Calling Phi-4-mini / Phi-4-multimodal supports single and parallel Function Calling. Things to note when calling You need to define Tools in System to start single or parallel Function Calling If you want to start parallel Function Calling, you also need to add 'some tools' to the System prompt The following is an example Single Function Calling tools = [ { "name": "get_match_result", "description": "get match result", "parameters": { "match": { "description": "The name of the match", "type": "str", "default": "Arsenal vs ManCity" } } }, ] messages = [ { "role": "system", "content": "You are a helpful assistant", "tools": json.dumps(tools), # pass the tools into system message using tools argument }, { "role": "user", "content": "What is the result of Arsenal vs ManCity today?" } ] Full Sample : Click Parallel Function Calling AGENT_TOOLS = { "booking_fight": { "name": "booking_fight", "description": "booking fight", "parameters": { "departure": { "description": "The name of Departure airport code", "type": "str", }, "destination": { "description": "The name of Destination airport code", "type": "str", }, "outbound_date": { "description": "The date of outbound flight", "type": "str", }, "return_date": { "description": "The date of return flight", "type": "str", } } }, "booking_hotel": { "name": "booking_hotel", "description": "booking hotel", "parameters": { "query": { "description": "The name of the city", "type": "str", }, "check_in_date": { "description": "The date of check in", "type": "str", }, "check_out_date": { "description": "The date of check out", "type": "str", } } }, } SYSTEM_PROMPT = """ You are my travel agent with some tools available. """ messages = [ { "role": "system", "content": SYSTEM_PROMPT, "tools": json.dumps(AGENT_TOOLS), # pass the tools into system message using tools argument }, { "role": "user", "content": """I have a business trip from London to New York in March 21 2025 to March 27 2025, can you help me to book a hotel and flight tickets""" } ] Full sample : click Using Ollama and Phi-4-mini Function Calling to Create AI Agents on Edge Devices Ollama is a popular free tool for deploying LLM/SLM locally and can be used in combination with AI Toolkit for VS Code. In addition to being deployed on your PC/Laptop, it can also be deployed on IoT, mobile phones, containers, etc. To use Phi-4-mini on Ollama, you need to use Ollama 0.5.13+. Different quantitative versions are supported on Ollama, as shown in the figure below: Using Ollama, we can deploy Phi-4-mini on the edge, and implement AI Agent with Function Calling under limited computing power, so that Generative AI can be applied more effectively on the edge. Current Issues A sad experience - If you directly use the interface to try to call Ollama in the above way, you will find that Function Calling will not be triggered. There are discussions on Ollama's GitHub Issue. You can enter the Issue https://github.com/ollama/ollama/issues/9437. By modifying the Phi-4-mini Template on the ModelFile to implement a single Function Calling, but the call to Parallel Function Calling still failed. Resolution We have implemented a fix by making a adjustments to the template. We have improved it according to Phi-4-mini's Chat Template and re-modified the Modelfile. Of course, the quantitative model has a huge impact on the results. The adjustments are as follows: TEMPLATE """ {{- if .Messages }} {{- if or .System .Tools }}<|system|> {{ if .System }}{{ .System }} {{- end }} In addition to plain text responses, you can chose to call one or more of the provided functions. Use the following rule to decide when to call a function: * if the response can be generated from your internal knowledge (e.g., as in the case of queries like "What is the capital of Poland?"), do so * if you need external information that can be obtained by calling one or more of the provided functions, generate a function calls If you decide to call functions: * prefix function calls with functools marker (no closing marker required) * all function calls should be generated in a single JSON list formatted as functools[{"name": [function name], "arguments": [function arguments as JSON]}, ...] * follow the provided JSON schema. Do not hallucinate arguments or values. Do to blindly copy values from the provided samples * respect the argument type formatting. E.g., if the type if number and format is float, write value 7 as 7.0 * make sure you pick the right functions that match the user intent Available functions as JSON spec: {{- if .Tools }} {{ .Tools }} {{- end }}<|end|> {{- end }} {{- range .Messages }} {{- if ne .Role "system" }}<|{{ .Role }}|> {{- if and .Content (eq .Role "tools") }} {"result": {{ .Content }}} {{- else if .Content }} {{ .Content }} {{- else if .ToolCalls }} functools[ {{- range .ToolCalls }}{{ "{" }}"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}{{ "}" }} {{- end }}] {{- end }}<|end|> {{- end }} {{- end }}<|assistant|> {{ else }} {{- if .System }}<|system|> {{ .System }}<|end|>{{ end }}{{ if .Prompt }}<|user|> {{ .Prompt }}<|end|>{{ end }}<|assistant|> {{ end }}{{ .Response }}{{ if .Response }}<|user|>{{ end }} """ We have tested the solution using different quantitative models. In the laptop environment, we recommend that you use the following model to enable single/parallel Function Calling: phi4-mini:3.8b-fp16. Note: you need to bind the defined Modelfile and phi4-mini:3.8b-fp16 together to enable this to work. Please execute the following command in the command line: #If you haven't downloaded it yet, please execute this command firstr ollama run phi4-mini:3.8b-fp16 #Binding with the adjusted Modelfile ollama create phi4-mini:3.8b-fp16 -f {Your Modelfile Path} To test the single Function Calling and Parallel Function Calling of Phi-4-mini. Single Function Calling Parallel Function Calling Full Sample in notebook The above example is just a simple introduction. As we move forward with the development we hope to find simpler ways to apply it on the edge, use Function Calling to expand the scenarios of Phi-4-mini / Phi-4-multimodal, and also develop more usecases in vertical industries. Resources Phi-4 model on Hugging face https://huggingface.co/collections/microsoft/phi-4-677e9380e514feb5577a40e4 Phi-4-mini on Ollama https://ollama.com/library/phi4-mini Learn Function Calling https://huggingface.co/docs/hugs/en/guides/function-calling Phi Cookbook - Samples and Resources for Phi Models https://aka.ms/phicookbook5.9KViews4likes1CommentBuilding a DeepSeek Extension for GitHub Copilot in VS Code
DeepSeek has been getting a lot of buzz lately, and with a little setup, you can start using it today in GitHub Copilot within VS Code. In this post, I’ll walk you through how to install and run a VS Code extension I built, so you can take advantage of DeepSeek right on your machine. With this extension, you can use “@deepseek” to explore the deepseek-coder model. It’s powered by Ollama, enabling seamless, fully offline interactions with DeepSeek models—giving you a local coding assistant that prioritizes privacy and performance. In a future post I'll walk you through the extension code and explain how to call models hosted locally using Ollama. Feel free to subscribe to get notified. Features and Benefits Open-Source and Extendable As an open-source project, the DeepSeek for GitHub Copilot extension is fully customizable. Advanced users can modify and extend its functionality, build from source, tweak configurations, and even integrate additional AI capabilities. Local AI Processing With the DeepSeek for GitHub Copilot extension, all interactions are processed locally on your machine, ensuring complete data privacy and eliminating latency issues. This makes it an ideal solution for developers working on sensitive projects or in restricted environments. Seamless Integration with GitHub Copilot Chat The extension integrates natively with GitHub Copilot Chat, allowing you to invoke DeepSeek models effortlessly. If you're already familiar with GitHub Copilot, you'll find the workflow intuitive and easy to use. You can simply type "@deepseek" followed by your question to get started. Powered by Ollama Ollama, a lightweight AI model runtime, powers the execution of DeepSeek models. It simplifies model management by handling downloads and execution, so you can focus on coding. Customizable Model Selection You can configure the extension to use different DeepSeek models through a simple setting adjustment. This flexibility allows you to choose the right model size and capability for your hardware. Please note that running bigger models might not work on your local system. You can take advantage of Azure's infrastructure to run bigger models. Installation Guide Installing and Running Ollama DeepSeek for GitHub Copilot requires Ollama to function properly. Ollama is an AI model runtime that allows you to run and manage large language models efficiently on your local machine. Install Ollama: Download the installer, install, and start Ollama from the Ollama website. Install from Visual Studio Code Marketplace: The simplest way to get started is by installing the extension directly from the Visual Studio Code Marketplace. Open Visual Studio Code. Navigate to the Extensions panel (Ctrl + Shift + X). Search for DeepSeek for GitHub Copilot and click Install. Using the Extension: Once installed, using the extension is straightforward: Open the GitHub Copilot Chat panel. Type @deepseek followed by your prompt to interact with the model. Note: On the first run, the extension will automatically download the DeepSeek model. This may take a few minutes, depending on your internet connection. Configuration and Customization DeepSeek for GitHub Copilot allows users to configure the AI model through Visual Studio Code settings. To change the DeepSeek model, update the settings.json file: { "deepseek.model.name": "deepseek-coder:1.3b" } A list of available models can be found on the Ollama website. Limitations and Workarounds Current Limitations The extension does not have access to your files in this version, meaning it cannot provide context-aware completions. This is due to the fact that DeepSeek models don't support Function Calling. Limited to local machine performance—larger models may require more RAM and CPU power. Workarounds To provide context for completions, manually copy-paste the relevant code into the chat. Optimize performance by selecting smaller DeepSeek models (such as deepseek-coder:1.3b) if you experience lag. System Requirements To run DeepSeek for GitHub Copilot Chat, ensure you have the following: Visual Studio Code (latest version recommended) Ollama app installed and running (Download from ollama.com) Sufficient system resources Minimum: 8GB RAM, multi-core CPU Recommended: 16GB RAM, GPU acceleration (if available) Conclusion The DeepSeek for GitHub Copilot Chat extension provides an excellent way for delivering privacy, low-latency responses, and offline capabilities. 🔗 Get Started Today: Install the DeepSeek for GitHub Copilot Chat extension and supercharge your GitHub Copilot Chat experience with AI—entirely offline! 🚀 ■ Co-authored with CopilotBuild AI Agents with MCP Tool Use in Minutes with AI Toolkit for VSCode
We’re excited to announce Agent Builder, the newest evolution of what was formerly known as Prompt Builder, now reimagined and supercharged for intelligent app development. This powerful tool in AI Toolkit enables you to create, iterate, and optimize agents—from prompt engineering to tool integration—all in one seamless workflow. Whether you're designing simple chat interactions or complex task-performing agents with tool access, Agent Builder simplifies the journey from idea to integration. Why Agent Builder? Agent Builder is designed to empower developers and prompt engineers to: 🚀 Generate starter prompts with natural language 🔁 Iterate and refine prompts based on model responses 🧩 Break down tasks with prompt chaining and structured outputs 🧪 Test integrations with real-time runs and tool use such as MCP servers 💻 Generate production-ready code for rapid app development And a lot of features are coming soon, stay tuned for: 📝 Use variables in prompts �� Run agent with test cases to test your agent easily 📊 Evaluate the accuracy and performance of your agent with built-in or your custom metrics ☁️ Deploy your agent to cloud Build Smart Agents with Tool Use (MCP Servers) Agents can now connect to external tools through MCP (Model Control Protocol) servers, enabling them to perform real-world actions like querying a database, accessing APIs, or executing custom logic. Connect to an Existing MCP Server To use an existing MCP server in Agent Builder: In the Tools section, select + MCP Server. Choose a connection type: Command (stdio) – run a local command that implements the MCP protocol HTTP (server-sent events) – connect to a remote server implementing the MCP protocol If the MCP server supports multiple tools, select the specific tool you want to use. Enter your prompts and click Run to test the agent's interaction with the tool. This integration allows your agents to fetch live data or trigger custom backend services as part of the conversation flow. Build and Scaffold a New MCP Server Want to create your own tool? Agent Builder helps you scaffold a new MCP server project: In the Tools section, select + MCP Server. Choose MCP server project. Select your preferred programming language: Python or TypeScript. Pick a folder to create your server project. Name your project and click Create. Agent Builder generates a scaffolded implementation of the MCP protocol that you can extend. Use the built-in VS Code debugger: Press F5 or click Debug in Agent Builder Test with prompts like: System: You are a weather forecast professional that can tell weather information based on given location. User: What is the weather in Shanghai? Agent Builder will automatically connect to your running server and show the response, making it easy to test and refine the tool-agent interaction. AI Sparks from Prototype to Production with AI Toolkit Building AI-powered applications from scratch or infusing intelligence into existing systems? AI Sparks is your go-to webinar series for mastering the AI Toolkit (AITK) from foundational concepts to cutting-edge techniques. In this bi-weekly, hands-on series, we’ll cover: 🚀SLMs & Local Models – Test and deploy AI models and applications efficiently on your own terms locally, to edge devices or to the cloud 🔍 Embedding Models & RAG – Supercharge retrieval for smarter applications using existing data. 🎨 Multi-Modal AI – Work with images, text, and beyond. 🤖 Agentic Frameworks – Build autonomous, decision-making AI systems. Watch on Demand Share your feedback Get started with the latest version, share your feedback, and let us know how these new features help you in your AI development journey. As always, we’re here to listen, collaborate, and grow alongside our amazing user community. Thank you for being a part of this journey—let’s build the future of AI together! Join our Microsoft Azure AI Foundry Discord channel to continue the discussion 🚀AI Toolkit for Visual Studio Code: October 2024 Update Highlights
The AI Toolkit’s October 2024 update revolutionizes Visual Studio Code with game-changing features for developers, researchers, and enthusiasts. Explore multi-model integration, including GitHub Models, ONNX, and Google Gemini, alongside custom model support. Dive into multi-modal capabilities for richer AI testing and seamless multi-platform compatibility across Windows, macOS, and Linux. Tailored for productivity, the enhanced Model Catalog simplifies choosing the best tools for your projects. Try it now and share feedback to shape the future of AI in VS Code!2.9KViews4likes0Comments