slm

20 Topics

Function Calling with Small Language Models
In our previous article on running Phi-4 locally, we built a web-enhanced assistant that could search the internet and provide informed answers. Here's what that implementation looked like: def web_enhanced_query(question): # 1. ALWAYS search (hardcoded decision) search_results = search_web(question) # 2. Inject results into prompt prompt = f"""Here are recent search results: {search_results} Question: {question} Using only the information above, give a clear answer.""" # 3. Model just summarizes what it reads return ask_phi4(endpoint, model_id, prompt) Today, we're upgrading to true function calling. With this, we have ability to transform small language models from passive text generators into intelligent agents that can: Decide when to use external tools Reason which tool bests fit each task Execute real-world actions thrugh apis Function calling represents a significant evolution in AI capabilities. Let's understand where this positions our small language models: Agent Classification Framework Simple Reflex Agents (Basic) React to immediate input with predefined rules Example: Thermostat, basic chatbot Without function calling, models operate here Model-Based Agents (Intermediate) Maintain internal state and context Example: Robot vacuum with room mapping Function calling enables this level Goal-Based Agents (Advanced) Plan multi-step sequences to achieve objectives Example: Route planner, task scheduler Function calling + reasoning enables this Learning Agents (Expert) Adapt and improve over time Example: Recommendation systems Future: Function calling + fine-tuning As usual with these articles, let's get ready to get our hands dirty! Project Setup Let's set up our environment for building function-calling assistants. Prerequisites First, ensure you have Foundry Local installed and a model running. We'll use Qwen 2.5-7B for this tutorial as it has excellent function calling support. Important: Not all small language models support function calling equally. Qwen 2.5 was specifically trained for this capability and provides a reliable experience through Foundry Local. # 1. Check Foundry Local is installed foundry --version # 2. Start the Foundry Local service foundry service start # 3. Download and run Qwen 2.5-7B foundry model run qwen2.5-7b Python Environment Setup # 1. Create Python virtual environment python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # 2. Install dependencies pip install openai requests python-dotenv # 3. Get a free OpenWeatherMap API key # Sign up at: https://openweathermap.org/api ``` Create `.env` file: ``` OPENWEATHER_API_KEY=your_api_key_here ``` Building a Weather-Aware Assistant So in this scenario, a user wants to plan outdoor activities but needs weather context. Without function calling, You will get something like this: User: "Should I schedule my team lunch outside at 2pm in Birmingham?" Model: "That depends on weather conditions. Please check the forecast for rain and temperature." However, with fucntion-calling you get an answer that is able to look up the weather and reply with the needed context. We will do that now. Understanding Foundry Local's Function Calling Implementation Before we start coding, there's an important implementation detail to understand. Foundry Local uses a non-standard function calling format. Instead of returning function calls in the standard OpenAI tool_calls field, Qwen models return the function call as JSON text in the response content. For example, when you ask about weather, instead of: # Standard OpenAI format message.tool_calls = [ {"name": "get_weather", "arguments": {"location": "Birmingham"}} ] You get: # Foundry Local format message.content = '{"name": "get_weather", "arguments": {"location": "Birmingham"}}' This means we need to parse the JSON from the content ourselves. Don't worry—this is straightforward, and I'll show you exactly how to handle it! Step 1: Define the Weather Tool Create weather_assistant.py: import os from openai import OpenAI import requests import json import re from dotenv import load_dotenv load_dotenv() # Initialize Foundry Local client client = OpenAI( base_url="http://127.0.0.1:59752/v1/", api_key="not-needed" ) # Define weather tool tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get current weather information for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city or location name" }, "units": { "type": "string", "description": "Temperature units", "enum": ["celsius", "fahrenheit"], "default": "celsius" } }, "required": ["location"] } } } ] A tool is necessary because it provides the model with a structured specification of what external functions are available and how to use them. The tool definition contains the function name, description, parameters schema, and information returned. Step 2: Implement the Weather Function def get_weather(location: str, units: str = "celsius") -> dict: """Fetch weather data from OpenWeatherMap API""" api_key = os.getenv("OPENWEATHER_API_KEY") url = "http://api.openweathermap.org/data/2.5/weather" params = { "q": location, "appid": api_key, "units": "metric" if units == "celsius" else "imperial" } response = requests.get(url, params=params, timeout=5) response.raise_for_status() data = response.json() temp_unit = "°C" if units == "celsius" else "°F" return { "location": data["name"], "temperature": f"{round(data['main']['temp'])}{temp_unit}", "feels_like": f"{round(data['main']['feels_like'])}{temp_unit}", "conditions": data["weather"][0]["description"], "humidity": f"{data['main']['humidity']}%", "wind_speed": f"{round(data['wind']['speed'] * 3.6)} km/h" } The model calls this function to get the weather data. it contacts OpenWeatherMap API, gets real weather data and returns it as a python dictionary Step 3: Parse Function Calls from Content This is the crucial step where we handle Foundry Local's non-standard format: def parse_function_call(content: str): """Extract function call JSON from model response""" if not content: return None json_pattern = r'\{"name":\s*"get_weather",\s*"arguments":\s*\{[^}]+\}\}' match = re.search(json_pattern, content) if match: try: return json.loads(match.group()) except json.JSONDecodeError: pass try: parsed = json.loads(content.strip()) if isinstance(parsed, dict) and "name" in parsed: return parsed except json.JSONDecodeError: pass return None Step 4: Main Chat Function with Function Calling and lastly, calling the model. Notice the tools and tool_choice parameter. Tools tells the model it is allowed to output a tool_call requesting that the function be executed. While tool_choice instructs the model how to decide whether to call a tool. def chat(user_message: str) -> str: """Process user message with function calling support""" messages = [ {"role": "user", "content": user_message} ] response = client.chat.completions.create( model="qwen2.5-7b-instruct-generic-cpu:4", messages=messages, tools=tools, tool_choice="auto", temperature=0.3, max_tokens=500 ) message = response.choices[0].message if message.content: function_call = parse_function_call(message.content) if function_call and function_call.get("name") == "get_weather": print(f"\n[Function Call] {function_call.get('name')}({function_call.get('arguments')})") args = function_call.get("arguments", {}) weather_data = get_weather(**args) print(f"[Result] {weather_data}\n") final_prompt = f"""User asked: "{user_message}" Weather data: {json.dumps(weather_data, indent=2)} Provide a natural response based on this weather information.""" final_response = client.chat.completions.create( model="qwen2.5-7b-instruct-generic-cpu:4", messages=[{"role": "user", "content": final_prompt}], max_tokens=200, temperature=0.7 ) return final_response.choices[0].message.content return message.content Step 5: Run the script Now put all the above together and run the script def main(): """Interactive weather assistant""" print("\nWeather Assistant") print("=" * 50) print("Ask about weather or general questions.") print("Type 'exit' to quit\n") while True: user_input = input("You: ").strip() if user_input.lower() in ['exit', 'quit']: print("\nGoodbye!") break if user_input: response = chat(user_input) print(f"Assistant: {response}\n") if __name__ == "__main__": if not os.getenv("OPENWEATHER_API_KEY"): print("Error: OPENWEATHER_API_KEY not set") print("Set it with: export OPENWEATHER_API_KEY='your_key_here'") exit(1) main() Note: Make sure Qwen 2.5 is running in Foundry Local in a new terminal Now let's talk about Model Context Protocol! Our weather assistant works beautifully with a single function, but what happens when you need dozens of tools? Database queries, file operations, calendar integration, email—each would require similar setup code. This is where Model Context Protocol (MCP) comes in. MCP is an open standard that provides pre-built, standardized servers for common tools. Instead of writing custom integration code for every capability, you can connect to MCP servers that handle the complexity for you. With MCP, You only need one command to enable weather, database, and file access npx @modelcontextprotocol/server-weather npx @modelcontextprotocol/server-sqlite npx @modelcontextprotocol/server-filesystem Your model automatically discovers and uses these tools without custom integration code. Learn more: Model Context Protocol Documentation EdgeAI Course - Module 03: MCP Integration Key Takeaways Function calling transforms models into agents - From passive text generators to active problem-solvers Qwen 2.5 has excellent function calling support - Specifically trained for reliable tool use Foundry Local uses non-standard format - Parse JSON from content instead of tool_calls field Start simple, then scale with MCP - Build one tool to understand the pattern, then leverage standards Documentation Running Phi-4 Locally with Foundry Local Phi-4: Small Language Models That Pack a Punch Microsoft Foundry Local GitHub EdgeAI for Beginners Course OpenWeatherMap API Documentation Model Context Protocol Qwen 2.5 Documentation Thank you for reading! I hope this article helps you build more capable AI agents with small language models. Function calling opens up incredible possibilities—from simple weather assistants to complex multi-tool workflows. Start with one tool, understand the pattern, and scale from there.
Abdulhamid_Onawole
Nov 28, 2025 Place Educator Developer Blog
264Views
0likes
0Comments
Running Phi-4 Locally with Microsoft Foundry Local: A Step-by-Step Guide
In our previous post, we explored how Phi-4 represents a new frontier in AI efficiency that delivers performance comparable to models 5x its size while being small enough to run on your laptop. Today, we're taking the next step: getting Phi-4 up and running locally on your machine using Microsoft Foundry Local. Whether you're a developer building AI-powered applications, an educator exploring AI capabilities, or simply curious about running state-of-the-art models without relying on cloud APIs, this guide will walk you through the entire process. Microsoft Foundry Local brings the power of Azure AI Foundry to your local device without requiring an Azure subscription, making local AI development more accessible than ever. So why do you want to run Phi-4 Locally? Before we dive into the setup, let's quickly recap why running models locally matters: Privacy and Control: Your data never leaves your machine. This is crucial for sensitive applications in healthcare, finance, or education where data privacy is paramount. Cost Efficiency: No API costs, no rate limits. Once you have the model downloaded, inference is completely free. Speed and Reliability: No network latency or dependency on external services. Your AI applications work even when you're offline. Learning and Experimentation: Full control over model parameters, prompts, and fine-tuning opportunities without restrictions. With Phi-4's compact size, these benefits are now accessible to anyone with a modern laptop—no expensive GPU required. What You'll Need Before we begin, make sure you have: Operating System: Windows 10/11, macOS (Intel or Apple Silicon), or Linux RAM: Minimum 16GB (32GB recommended for optimal performance) Storage: At least 5 - 10GB of free disk space Processor: Any modern CPU (GPU optional but provides faster inference) Note: Phi-4 works remarkably well even on consumer hardware 😀. Step 1: Installing Microsoft Foundry Local Microsoft Foundry Local is designed to make running AI models locally as simple as possible. It handles model downloads, manages memory efficiently, provides OpenAI-compatible APIs, and automatically optimizes for your hardware. For Windows Users: Open PowerShell or Command Prompt and run: winget install Microsoft.FoundryLocal For macOS Users (Apple Silicon): Open Terminal and run: brew install microsoft/foundrylocal/foundrylocal Verify Installation: Open your terminal and type. This should return the Microsoft Foundry Local version, confirming installation: foundry --version Step 2: Downloading Phi-4-Mini For this tutorial, we'll use Phi-4-mini, the lightweight 3.8 billion parameter version that's perfect for learning and experimentation. Open your terminal and run: foundry model run phi-4-mini You should see your download begin and something similar to the image below Available Phi Models on Foundry Local While we're using phi-4-mini for this guide, Foundry Local offers several Phi model variants and other open-source models optimized for different hardware and use cases: Model Hardware Type Size Best For phi-4-mini GPU chat-completion 3.72 GB Learning, fast responses, resource-constrained environments with GPU phi-4-mini CPU chat-completion 4.80 GB Learning, fast responses, CPU-only systems phi-4-mini-reasoning GPU chat-completion 3.15 GB Reasoning tasks with GPU acceleration phi-4-mini-reasoning CPU chat-completion 4.52 GB Mathematical proofs, logic puzzles with lower resource requirements phi-4 GPU chat-completion 8.37 GB Maximum reasoning performance, complex tasks with GPU phi-4 CPU chat-completion 10.16 GB Maximum reasoning performance, CPU-only systems phi-3.5-mini GPU chat-completion 2.16 GB Most lightweight option with GPU support phi-3.5-mini CPU chat-completion 2.53 GB Most lightweight option, CPU-optimized phi-3-mini-128k GPU chat-completion 2.13 GB Extended context (128k tokens), GPU-optimized phi-3-mini-128k CPU chat-completion 2.54 GB Extended context (128k tokens), CPU-optimized phi-3-mini-4k GPU chat-completion 2.13 GB Standard context (4k tokens), GPU-optimized phi-3-mini-4k CPU chat-completion 2.53 GB Standard context (4k tokens), CPU-optimized Note: Foundry Local automatically selects the best variant for your hardware. If you have an NVIDIA GPU, it will use the GPU-optimized version. Otherwise, it will use the CPU-optimized version. run the command below to see full list of models foundry model list Step 3: Test It Out Once the download completes, an interactive session will begin. Let's test Phi-4-mini's capabilities with a few different prompts: Example 1: Explanation Phi-4-mini provides a thorough, well-structured explanation! It starts with the basic definition, explains the process in biological systems, gives real-world examples (plant cells, human blood cells). The response is detailed yet accessible. Example 2: Mathematical Problem Solving Excellent step-by-step solution! Phi-4-mini breaks down the problem methodically: 1. Distributes on the left side 2. Isolates the variable terms 3. Simplifies progressively 4. Arrives at the final answer: x = 11 The model shows its work clearly, making it easy to follow the logic and ideal for educational purposes Example 3: Code Generation The model provides a concise Python function using string slicing ([::-1]) - the most Pythonic approach to reversing a string. It includes clear documentation with a docstring explaining the function's purpose, provides example usage demonstrating the output, and even explains how the slicing notation works under the hood. The response shows that the model understands not just how to write the code, but why this approach is preferred - noting that the [::-1] slice notation means "start at the end of the string and end at position 0, move with the step -1, negative one, which means one step backwards." This showcases the model's ability to generate production-ready code with proper documentation while being educational about Python idioms. To exit the interactive session, type `/bye` Step 4: Extending Phi-4 with Real-Time Tools Understanding Phi-4's Knowledge Cutoff Like all language models, Phi-4 has a knowledge cutoff date from its training data (typically several months old). This means it won't know about very recent events, current prices, or breaking news. For example, if you ask "Who won the 2024 NBA championship?" it might not have the answer. The good thing is, there's a powerful work-around. While Phi-4 is incredibly capable, connecting it to external tools like web search, databases, or APIs transforms it from a static knowledge base into a dynamic reasoning engine. This is where Microsoft Foundry's REST API comes in. Microsoft Foundry provides a simple API that lets you integrate Phi-4 into Python applications and connect it to real-time data sources. Here's a practical example: building a web-enhanced AI assistant. Web-Enhanced AI Assistant This simple application combines Phi-4's reasoning with real-time web search, allowing it to answer current questions accurately. Prerequisites: pip install foundry-local-sdk requests ddgs Create phi4_web_assistant.py: import requests from foundry_local import FoundryLocalManager from ddgs import DDGS import json def search_web(query): """Search the web and return top results""" try: results = list(DDGS().text(query, max_results=3)) if not results: return "No search results found." search_summary = "\n\n".join([ f"[Source {i+1}] {r['title']}\n{r['body'][:500]}" for i, r in enumerate(results) ]) return search_summary except Exception as e: return f"Search failed: {e}" def ask_phi4(endpoint, model_id, prompt): """Send a prompt to Phi-4 and stream response""" response = requests.post( f"{endpoint}/chat/completions", json={ "model": model_id, "messages": [{"role": "user", "content": prompt}], "stream": True }, stream=True, timeout=180 ) full_response = "" for line in response.iter_lines(): if line: line_text = line.decode('utf-8') if line_text.startswith('data: '): line_text = line_text[6:] # Remove 'data: ' prefix if line_text.strip() == '[DONE]': break try: data = json.loads(line_text) if 'choices' in data and len(data['choices']) > 0: delta = data['choices'][0].get('delta', {}) if 'content' in delta: chunk = delta['content'] print(chunk, end="", flush=True) full_response += chunk except json.JSONDecodeError: continue print() return full_response def web_enhanced_query(question): """Combine web search with Phi-4 reasoning""" # By using an alias, the most suitable model will be downloaded # to your device automatically alias = "phi-4-mini" # Create a FoundryLocalManager instance. This will start the Foundry # Local service if it is not already running and load the specified model. manager = FoundryLocalManager(alias) model_info = manager.get_model_info(alias) print("🔍 Searching the web...\n") search_results = search_web(question) prompt = f"""Here are recent search results: {search_results} Question: {question} Using only the information above, give a clear answer with specific details.""" print("🤖 Phi-4 Answer:\n") return ask_phi4(manager.endpoint, model_info.id, prompt) if __name__ == "__main__": # Try different questions question = "Who won the 2024 NBA championship?" # question = "What is the latest iPhone model released in 2024?" # question = "What is the current price of Bitcoin?" print(f"Question: {question}\n") print("=" * 60 + "\n") web_enhanced_query(question) print("\n" + "=" * 60) Run It: python phi4_web_assistant.py What Makes This Powerful By connecting Phi-4 to external tools, you create an intelligent system that: Accesses Real-Time Information: Get news, weather, sports scores, and breaking developments Verifies Facts: Cross-reference information with multiple sources Extends Capabilities: Connect to databases, APIs, file systems, or any other tool Enables Complex Applications: Build research assistants, customer support bots, educational tutors, and personal assistants This same pattern can be applied to connect Phi-4 to: Databases: Query your company's internal data APIs: Weather services, stock prices, translation services File Systems: Analyze documents and spreadsheets IoT Devices: Control smart home systems The possibilities are endless when you combine local AI reasoning with real-world data access. Troubleshooting Common Issues Service not running: Make sure Foundry Local is properly installed and the service is running. Try restarting with foundry --version to verify installation. Model downloads slowly: Check your internet connection and ensure you have enough disk space (5-10GB per model). Out of memory: Close other applications or try using a smaller model variant like phi-3.5-mini instead of the full phi-4. Connection issues: Verify that no other services are using the same ports. Foundry Local typically runs on http://localhost:5272. Model not found: Run foundry model list to see available models, then use foundry model run <model-name> to download and run a specific model. Your Next Steps with Foundry Local Congratulations! You now have Phi-4 running locally through Microsoft Foundry Local and understand how to extend it with external tools like web search. This combination of local AI reasoning with real-time data access opens up countless possibilities for building intelligent applications. Coming in Future Posts In the coming weeks, we'll explore advanced topics using Hugging Face: Fine-tuning Phi models on your own data for domain-specific applications Phi-4-multimodal: Analyze images, process audio, and combine multiple data types Advanced deployment patterns: RAG systems and multi-agent orchestration Resources to Explore EdgeAI for Beginners Course: Comprehensive 36-45 hour course covering Edge AI fundamentals, optimization, and production deployment Phi-4 Technical Report: Deep dive into architecture and benchmarks Phi Cookbook on GitHub: Practical examples and recipes Foundry Local Documentation: Complete technical documentation and API reference Module 08: Foundry Local Toolkit: 10 comprehensive samples including RAG applications and multi-agent systems Keep experimenting with Foundry Local, and stay tuned as we unlock the full potential of Edge AI! What will you build with Phi-4? Share your ideas and projects in the comments below!
Abdulhamid_Onawole
Nov 06, 2025 Place Educator Developer Blog
933Views
1like
1Comment
Phi-4: Small Language Models That Pack a Punch
What Are Small Language Models, and Why Should You Care? If you've been following AI development, you can probably recall "bigger is better" being the mantra for years. GPT-3.5 was 175 billion parameters, GPT-4 is even larger, and everyone seemed to be in an arms race to build the biggest model possible. But here's the thing: bigger models are expensive to run, slow to respond, and often overkill for what you actually need. Small Language Models (SLMs) flip this script. These are models with fewer parameters (typically 1-15 billion) that are trained really thoughtfully on high-quality data. The outcome of this is models that can run on your laptop, respond instantly, and still handle complex reasoning tasks. You can extrapolate from this, increased speed, privacy, and cost-effectiveness. Microsoft's been exploring this space for a while. It started with Phi-1, which showed that small models trained on carefully curated "textbook-like" data could punch way above their weight class. Then came Phi-2 and Phi-3, each iteration getting better at reasoning and problem-solving. Now we have Phi-4, and it's honestly impressive. At 14 billion parameters, it outperforms models that are 5 times its size on math and reasoning tasks. Microsoft trained it on 9.8 trillion tokens over three weeks, using a mix of synthetic data (generated by larger models like GPT-4o) and high-quality web content. The key innovation isn't just throwing more data at it but they were incredibly selective about what to include, focusing on teaching reasoning patterns rather than memorizing facts. The Phi family has also expanded recently. There's Phi-4-mini at 3.8 billion parameters for even lighter deployments, and Phi-4-multimodal at 5.6 billion parameters that can handle text, images, and audio all at once. Pretty cool if you're building something that needs to understand screenshots or transcribe audio. How Well Does It Actually Perform? Let's talk numbers, because that's where Phi-4 really shines. On MMLU (a broad test of knowledge across 57 subjects), Phi-4 scores 84.8%. That's better than Phi-3's 77.9% and competitive with models like GPT-4o-mini. On MATH (competition-level math problems), it hits 56.1%, which is significantly higher than Phi-3's 42.5%. For code generation on HumanEval, it achieves 82.6%. Model Parameters MMLU MATH HumanEval Phi-3-medium 14B 77.9% 42.5% 62.5% Phi-4 14B 84.8% 56.1% 82.6% Llama 3.3 70B 86.0% ~51% ~73% GPT-4o-mini Unknown ~82% 52.2% 87.2% Microsoft tested Phi-4 on the November 2024 AMC-10 and AMC-12 math competitions. These are tests that over 150,000 high school students take each year, and the questions appeared after all of Phi-4's training data was collected. Phi-4 beat not just similar-sized models, but also much larger ones. That suggests it's actually learned to reason, not just memorize benchmark answers. The model also does well on GPQA (graduate-level science questions) and even outperforms its teacher model GPT-4o on certain reasoning tasks. That's pretty remarkable for a 14 billion parameter model. If you're wondering about practical performance, Phi-4 runs about 2-4x faster than comparable larger models and uses significantly less memory. You can run it on a single GPU or even on newer AI-capable laptops with NPUs. That makes it practical for real-time applications where latency matters. Try Phi-4 Yourself You can start experimenting with Phi-4 right now without any complicated setup. Azure AI Foundry Microsoft's Azure AI Foundry is probably the quickest way to get started. Once you're logged in: Go to the Model Catalog and search for "Phi-4" Click "Use this Model" Select an active subscription in the subsequent pop-up and confirm Deploy and start chatting or testing prompts The playground lets you adjust parameters like temperature and see how the model responds. You can test it on math problems, coding questions, or reasoning tasks without writing any code. There's also a code view that shows you how to integrate it into your own applications. Hugging Face (for open-source enthusiasts) If you prefer to work with open-source tools, the model weights are available on Hugging Face. You can run it locally or use their hosted inference API: # Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/phi-4") messages = [ {"role": "user", "content": "What's the derivative of x²?"}, ] pipe(messages) Other Options The Phi Cookbook on GitHub has tons of examples for different use cases like RAG (retrieval-augmented generation), function calling, and multimodal inputs. If you want to run it locally with minimal setup, you can use Ollama (ollama pull phi-4) or LM Studio, which provides a nice GUI. The Azure AI Foundry Labs also has experimental features where you can test Phi-4-multimodal with audio and image inputs. What's Next? Phi-4 is surprisingly capable for its size, and it's practical enough to run almost anywhere. Whether you're building a chatbot, working on educational software, or just experimenting with AI, it's worth checking out. We might explore local deployment in more detail later, including how to build multi-agent systems where several SLMs work together, and maybe even look at fine-tuning Phi-4 for specific tasks. But for now, give it a try and see what you can build with it. The model weights are MIT licensed, so you're free to use them commercially. Microsoft's made it pretty easy to get started, so there's really no reason not to experiment. Resources: Azure AI Foundry Phi-4 on Hugging Face Phi Cookbook Phi-4 Technical Report
Abdulhamid_Onawole
Oct 27, 2025 Place Educator Developer Blog
509Views
1like
0Comments
Bringing AI to the edge: Hackathon Windows ML
AI Developer Hackathon Windows ML Hosted by Qualcomm on SnapDragonX We’re excited to announce our support and participation for the upcoming global series of Edge AI hackathons, hosted by Qualcomm Technologies. The first is on June 14-15 in Bangalore. We see a world of hybrid AI, developing rapidly as new generation of intelligent applications get built for diverse scenarios. These range from mobile, desktop, spatial computing and extending all the way to industrial and automotive. Mission critical workloads oscillate between decision-making in the moment, on device, to fine tuning models on the cloud. We believe we are in the early stages of development of agentic applications that efficiently run on the edge for scenarios needing local deployment and on-device inferencing. Microsoft Windows ML Windows ML – a cutting-edge runtime optimized for performant on-device model inference and simplified deployment, and the foundation of Windows AI Foundry. Windows ML is designed to support developers creating AI-infused applications with ease, harnessing the incredible strength of Windows’ diverse hardware ecosystem whether it’s for entry-level laptops, Copilot+ PCs or top-of-the-line AI workstations. It’s built to help developers leverage the client silicon best suited for their specific workload on any given device whether it’s an NPU for low-power and sustained inference, a GPU for raw horsepower or CPU for the broadest footprint and flexibility. Introducing Windows ML: The future of machine learning development on Windows - Windows Developer Blog Getting Started To get started, install AI Toolkit, leverage one of our conversion and optimization templates, or start building your own. Explore documentation and code samples available on Microsoft Learn, check out AI Dev Gallery (install, documentation) for demos and more samples to help you get started with Windows ML. Microsoft and Qualcomm Technologies: A strong collaboration Microsoft and Qualcomm Technologies’ collaboration bring new advanced AI features into Copilot+ PCs, leveraging the Snapdragon X Elite. Microsoft Research has played a pivotal role by optimizing new lightweight LLMs, such as Phi Silica, specifically for on-device execution with the Hexagon NPU. These models are designed to run efficiently on Hexagon NPUs, enabling multimodal AI experiences like vision-language tasks directly on Copilot+ PCs without relying on the cloud. Additionally, Microsoft has made DeepSeek R1 7B and 14B distilled models available via Azure AI Foundry, further expanding the AI ecosystem on the edge. This collaboration marks a significant step in democratizing AI by making powerful, efficient models accessible on everyday devices Windows AI Foundry expands AI capabilities by providing high-performance built-in models and supports developers' custom models with silicon performance. This developer platform plays a key role in this collaboration. Windows ML enables Windows 11 and Copilot+ PCs to use the Hexagon NPU for power efficient inference. Scaling optimization through Olive toolchain The Windows ML foundation of the Windows AI Foundry provides a unified platform for AI development across various hardware architectures and brings silicon performance using QNN Execution provider. This stack includes Windows ML and toolchains like Olive, easily accessible in AI Toolkit for VS Code, which streamlines model optimization and deployment. Qualcomm Technologies has contributed to Microsoft’s Olive, an open-source model optimization tool that enhances AI performance by optimizing models for efficient inference on client systems. This tool is particularly beneficial for running LLMs and GenAI workloads on Qualcomm Technologies’ platforms. Real-World Applications Through Qualcomm Technologies and Microsoft’s collaboration we have partnered with top developers to adopt Windows ML and have demonstrated impressive performance for their AI features. Independent Solution Vendors (ISVs) such as Powder, Topaz Labs, Camo and McAfee, Join us at the Hackathon With the recent launch of Qualcomm Snapdragon® X Elite-powered Windows laptops, developers can now take advantage of powerful NPUs (Neural Processing Units) to deploy AI applications that are both responsive and energy-efficient. These new devices open up a world of opportunities for developers to rethink how applications are built from productivity tools to creative assistants and intelligent agents all running directly on the device. Our mission has always been to enable high-quality AI experiences using compact, optimized models. These models are tailor-made for edge computing, offering faster inference, lower memory usage, and enhanced privacy without compromising performance. We encourage all application developers whether you’re building with open-source SLMs (small language models), working on smart assistants, or exploring new on-device AI use cases to join us at the event. You can register here: https://www.qualcomm.com/support/contact/forms/edge-ai-developer-hackathon-bengaluru-proposal-submission Dive deeper into these innovative developer solutions: Windows AI Foundry & Windows ML on Qualcomm NPU Microsoft and Qualcomm Technologies collaborate on Windows 11, Copilot+ PCs and Windows AI Foundry | Qualcomm Unlocking the power of Qualcomm QNN Execution Provider GPU backen Introducing Windows ML: The future of machine learning development on Windows - Windows Developer Blog
Lee_Stott
Jun 05, 2025 Place Educator Developer Blog
684Views
4likes
0Comments
Showcasing Phi-4-Reasoning: A Game-Changer for AI Developers
Showcasing Phi-4-Reasoning: A Game-Changer for AI Developers Introduction Phi-4-Reasoning is a state-of-the-art AI model developed by Microsoft Research, designed to excel in complex reasoning tasks. With its advanced capabilities, Phi-4-Reasoning is a powerful tool for AI developers, enabling them to tackle intricate problems with ease and precision. What is Phi-4-Reasoning? Phi-4-Reasoning is a 14-billion parameter open-weight reasoning model that has been fine-tuned from the Phi-4 model using supervised fine-tuning on a dataset of chain-of-thought traces.. We are also releasing Phi-4-reasoning-plus, a variant enhanced through a short phase of outcome-based reinforcement learning that offers higher performance by generating longer reasoning traces. This model is designed to handle complex reasoning tasks, including mathematical problems, scientific queries, and coding challenges Why Phi-4-reasoning and Phi-4-reasoning-plus Excels Advanced Reasoning Capabilities: Phi-4-Reasoning is trained to generate detailed reasoning chains, making it highly effective in solving complex problems. It leverages additional inference-time compute to enhance its reasoning abilities High-Quality Data Curation: The model's training involved meticulous data curation and the use of high-quality synthetic datasets, allowing it to compete with much larger models Performance: Despite its smaller size, Phi-4-Reasoning outperforms larger models like OpenAI's o1-mini and DeepSeek-R1-Distill-Llama-70B on various benchmarks, including mathematical reasoning and PhD-level science questions Industry Applications: Phi-4-Reasoning is versatile and can be deployed in various industry scenarios, including medicine, where it can be used for reasoning tasks on edge devices Real-World Examples Wordplay Riddle: Phi-4-Reasoning can solve wordplay riddles by recognizing patterns and applying logical reasoning. For example, it deduced that collecting three strawberries would yield 9 r's because the word "strawberry" contains three r's Example 1: How many strawberries for 9 r’s? User: How many strawberries do I need to collect 9 r’s? Phi-4-Reasoning: <think> TRUNCATED </think> It’s a wordplay riddle! Notice that the word “strawberry” has three r’s in it. So if you “collect” three strawberries, you get 3 Å~ 3 = 9 r’s. Thus, you need 3 strawberries. Solving a Riddle: The model can decrypt complex riddles by identifying patterns and performing arithmetic operations. It demonstrated this capability by solving a riddle involving foreign language number-words and reversed number-words. What This Problem Proves: Pattern Recognition: Phi-4-Reasoning demonstrates its ability to identify patterns in words and apply logical reasoning to solve the riddle. Wordplay Understanding: The model showcases its proficiency in understanding and solving wordplay riddles, which require a combination of linguistic knowledge and logical deduction. Efficiency: Phi-4-Reasoning quickly arrives at the correct solution by applying a straightforward multiplication operation, highlighting its efficiency in solving such problems. Example 2: Solving a Riddle User: I will give you a riddle to solve with a few examples, and something to complete at the end What is a likely completion for X that is consistent the with examples above? Phi-4-Reasoning: <think> TRUNCATED </think> We can “decrypt” the riddle by noticing that each line has five parts. In every example the parts are: A “number-word” in a foreign language (or with an extra minus sign), A “first name” (of any nationality), A “reversed number-word” (or two, joined by a hyphen), A “second name”, and A final number. When you “reverse” the third word you get a familiar numberin English: In line 1 “evif” is “five” (5) and the first number is “uno” = 1, and 5 – 1 = 4. In line 2 “neves” is “seven” (7) and “tres” = 3, and 7 – 3 = 4. In line 3 “evlewt” is “twelve” (12) and “drei” = 3, and 12 – 3 = 9. In lines 4 and 5 with two reversed “number-words”: – Line 4: “xis” →“six” (6) and “ytnewt” →“twelve” (12); 6 + 12 = 18, plus 3 = 21. – Line 5: “eerht” → “three” (3) and “ytnewt” → “twelve” (12); 3 + 12 = 15, plus 3 = 18. Now look at the final line: “minus one-point-two Satya eno Bill X” It follows the same five–part format. Notice that “eno” reversed is “one” (1). And the “first number” is “minus onepoint-two” (–1.2). Since for a one–number–word line the rule is: final number = (number from reversed word)−(first number) we compute: final = 1 − (−1.2) = 1 + 1.2 = 2.2 Thus, a likely completion for X is 2.2. Phi-4-Reasoning: Phi-4-Reasoning decrypts the riddle by identifying a pattern in the examples provided. Each line consists of five parts: a number-word in a foreign language, a first name, a reversed number-word, a second name, and a final number. By reversing the third word and performing arithmetic operations, Phi-4-Reasoning determines that the likely completion for X is 2.2 What This Problem Proves: Complex Pattern Recognition: Phi-4-Reasoning excels in recognizing complex patterns involving multiple elements, such as foreign language number-words and reversed number-words. Logical Deduction: The model demonstrates its ability to apply logical deduction to solve the riddle by performing arithmetic operations based on the identified patterns. Versatility: Phi-4-Reasoning showcases its versatility in handling diverse types of riddles and problems, making it a valuable tool for AI developers. Example 3. solve complex graph traversal tasks, such as finding a path through a maze. The maze is represented as an adjacency list, which is a common way to describe graphs in computer science. Each node in the graph (or maze) is connected to other nodes, and the task is to find a valid path from the starting node (node 0) to the destination node Example 3: Solving a maze User: Please solve the maze provided in adjacency list form below, by outputting a valid path from node 0 to node {destination}, in a space-separated list of integers. Adjacency list: {depicted as an image below} Phi-4-Reasoning: <think> TRUNCATED </think> {path depicted as an image below} What This Problem Proves: Graph Traversal Capability: Phi-4-Reasoning can effectively navigate through graphs using algorithms like Depth-First Search (DFS) or Breadth-First Search (BFS). This capability is crucial for solving problems that involve finding paths, cycles, or connectivity in graphs. Logical Reasoning: The model demonstrates its ability to apply logical reasoning to determine the correct sequence of nodes to traverse from the start to the destination. This involves understanding the structure of the graph and making decisions based on the connections between nodes. Pattern Recognition: Phi-4-Reasoning can recognize patterns in the adjacency list and use them to find a solution. This is important for tasks that require identifying and following specific paths or routes. Versatility: The ability to solve a maze using an adjacency list showcases the model's versatility in handling different types of data structures and problem-solving scenarios. This is beneficial for AI developers who need to work with various data representations and algorithms. Efficiency: The model's ability to quickly and accurately find a valid path through the maze highlights its efficiency in solving complex problems. This is valuable for applications that require fast and reliable solutions. Conclusion: Phi-4-Reasoning's ability to solve a maze using an adjacency list demonstrates its advanced reasoning capabilities, making it a powerful tool for AI developers. Its proficiency in graph traversal, logical reasoning, pattern recognition, versatility, and efficiency makes it well-suited for tackling a wide range of complex problems. Deployment and Integration Phi-4-Reasoning can be deployed on various platforms, including Azure AI Foundry and Hugging Face. It supports quantization using tools like Microsoft Olive, making it suitable for deployment on edge devices such as IoT, laptops, and mobile devices. Phi-4-Reasoning is a groundbreaking AI model that offers advanced reasoning capabilities, high performance, and versatility. Its ability to handle complex reasoning tasks makes it an invaluable tool for AI developers, enabling them to create innovative solutions across various industries. References Make Phi-4-mini-reasoning more powerful with industry reasoning on edge devices | Microsoft Community Hub Phi-4 Reasoning Technical Paper Phi-4-Mini-Reasoning Technical Paper One year of Phi: Small language models making big leaps in AI | Microsoft Azure Blog PhiCookBook Access Phi-4-reasoning models Phi Models at Azure AI Foundry Models Phi Models on Hugging Face Phi Models on GitHub Marketplace Models
Lee_Stott
May 01, 2025 Place Educator Developer Blog
3.7KViews
0likes
0Comments
AI Sparks: Unleashing Agents with the AI Toolkit
The final episode of our "AI Sparks" series delved deep into the exciting world of AI Agents and their practical implementation. We also covered a fair part of MCP with Microsoft AI Toolkit extension for VS Code. We kicked off by charting the evolutionary path of intelligent conversational systems. Starting with the rudimentary rule-based Basic Chatbots, we then explored the advancements brought by Basic Generative AI Chatbots, which offered contextually aware interactions. Then we explored the Retrieval-Augmented Generation (RAG), highlighting its ability to ground generative models in specific knowledge bases, significantly enhancing accuracy and relevance. The limitations were also discussed for the above mentioned techniques. The session was then centralized to the theme – Agents and Agentic Frameworks. We uncovered the fundamental shift from basic chatbots to autonomous agents capable of planning, decision-making, and executing tasks. We moved forward with detailed discussion on the distinction between Single Agentic systems, where one core agent orchestrates the process, and Multi-Agent Architectures, where multiple specialized agents collaborate to achieve complex goals. A key part of building robust and reliable AI Agents, as we discussed, revolves around carefully considering four critical factors. Firstly, Knowledge-Providing agents with the right context is paramount for them to operate effectively and make informed decisions. Secondly, equipping agents with the necessary Actions by granting them access to the appropriate tools allows them to execute tasks and achieve desired outcomes. Thirdly, Security is non-negotiable; ensuring agents have access only to the data and services they genuinely need is crucial for maintaining privacy and preventing unintended actions. Finally, establishing robust Evaluations mechanisms is essential to verify that agents are completing tasks correctly and meeting the required standards. These four pillars – Knowledge, Actions, Security, and Evaluation – form the bedrock of any successful agentic implementation. To illustrate the transformative power of AI Agents, we explored several interesting use cases and applications. These ranged from intelligent personal assistants capable of managing schedules and automating workflows to sophisticated problem-solving systems in domains like customer service. A significant portion of the session was dedicated to practical implementation through demonstrations. We highlighted key frameworks that are empowering developers to build agentic systems.: Semantic Kernel: We highlighted its modularity and rich set of features for integrating various AI services and tools. Autogen Studio: The focus here was on its capabilities for facilitating the creation and management of multi-agent conversations and workflows. Agent Service: We discussed its role in providing a more streamlined and managed environment for deploying and scaling AI agents. The major point of attraction was that these were demonstrated using the local LLMs which were hosted using AI Toolkit. This showcased the ease with which developers can utilize VS Code AI toolkit to build and experiment with agentic workflows directly within their familiar development environment. Finally, we demystified the concept of Model Context Protocol (MCP) and demonstrated how seamlessly it can be implemented using the Agent Builder within the VS Code AI Toolkit. We demonstrated this with a basic Website development using MCP. This practical demonstration underscored the toolkit's power in simplifying the development of complex solutions that can maintain context and engage in more natural, multi-step interactions. The "AI Sparks" series concluded with a discussion, where attendees had a clearer understanding of the evolution, potential and practicalities of AI Agents. The session underscored that we are on the cusp of a new era of intelligent systems that are not just reactive but actively work alongside us to achieve goals. The tools and frameworks are maturing, and the possibilities for agentic applications are sparking innovation across various industries. It was an exciting journey, and engagement during the final session on AI Sparks around Agents truly highlighted the transformative potential of this field. "AI Sparks" Series Roadmap: The "AI Sparks" series delved deeper into specific topics using AI Toolkit for Visual Studio Code, including: Introduction to AI toolkit and feature walkthrough: Introduction to the AI Toolkit extension for VS Code a powerful way to explore and integrate the latest AI models from OpenAI, Meta, Deepseek, Mistral, and more. Introduction to SLMs and local model with use cases: Explore Small Language Models (SLMs) and how they compare to larger models. Building RAG Applications: Create powerful applications that combine the strengths of LLMs with external knowledge sources. Multimodal Support and Image Analysis: Working with vision models and building multimodal applications. Evaluation and Model Selection: Evaluate model performance and choose the best model for your needs. Agents and Agentic Frameworks: Exploring the cutting edge of AI agents and how they can be used to build more complex and autonomous systems. The full playlist of the series with all the episodes of "AI Sparks" is available at AI Sparks Playlist. Continue the discussion and questions in Microsoft AI Discord Community where we have a dedicated AI-sparks channel. All the code samples can be found on AI_Toolkit_Samples .We look forward to continuing these insightful discussions in future series!
shreyanfern
Apr 30, 2025 Place Educator Developer Blog
350Views
2likes
0Comments
Building AI Agents on edge devices using Ollama + Phi-4-mini Function Calling
The new Phi-4-mini and Phi-4-multimodal now support Function Calling. This feature enables the models to connect with external tools and APIs. By deploying Phi-4-mini and Phi-4-multimodal with Function Calling capabilities on edge devices, we can achieve local expansion of knowledge capabilities and enhance their task execution efficiency. This blog will focus on how to use Phi-4-mini's Function Calling capabilities to build efficient AI Agents on edge devices. What‘s Function Calling How it works First we need to learn how Function Calling works Tool Integration: Function Calling allows LLM/SLM to interact with external tools and APIs, such as weather APIs, databases, or other services. Function Definition: Defines a function (tool) that LLM/SLM can call, specifying its name, parameters, and expected output. LLM Detection: LLM/SLM analyzes the user's input and determines if a function call is required and which function to use. JSON Output: LLM/SLM outputs a JSON object containing the name of the function to call and the parameters required by the function. External Execution: The application executes the function call using the parameters provided by LLM/SLM. Response to LLM: Returns the output of Function Calling to LLM/SLM, and LLM/SLM can use this information to generate a response to the user. Application scenarios Data retrieval: convert natural language queries into API calls to fetch data (e.g., "show my recent orders" triggers a database query) Operation execution: convert user requests into specific function calls (e.g., "schedule a meeting" becomes a calendar API call) Computational tasks: handle mathematical or logical operations through dedicated functions (e.g., calculate compound interest or statistical analysis) Data processing: chain multiple function calls together (e.g., get data → parse → transform → store) UI/UX integration: trigger interface updates based on user interactions (e.g., update map markers or display charts) Phi-4-mini / Phi-4-multimodal's Function Calling Phi-4-mini / Phi-4-multimodal supports single and parallel Function Calling. Things to note when calling You need to define Tools in System to start single or parallel Function Calling If you want to start parallel Function Calling, you also need to add 'some tools' to the System prompt The following is an example Single Function Calling tools = [ { "name": "get_match_result", "description": "get match result", "parameters": { "match": { "description": "The name of the match", "type": "str", "default": "Arsenal vs ManCity" } } }, ] messages = [ { "role": "system", "content": "You are a helpful assistant", "tools": json.dumps(tools), # pass the tools into system message using tools argument }, { "role": "user", "content": "What is the result of Arsenal vs ManCity today?" } ] Full Sample : Click Parallel Function Calling AGENT_TOOLS = { "booking_fight": { "name": "booking_fight", "description": "booking fight", "parameters": { "departure": { "description": "The name of Departure airport code", "type": "str", }, "destination": { "description": "The name of Destination airport code", "type": "str", }, "outbound_date": { "description": "The date of outbound flight", "type": "str", }, "return_date": { "description": "The date of return flight", "type": "str", } } }, "booking_hotel": { "name": "booking_hotel", "description": "booking hotel", "parameters": { "query": { "description": "The name of the city", "type": "str", }, "check_in_date": { "description": "The date of check in", "type": "str", }, "check_out_date": { "description": "The date of check out", "type": "str", } } }, } SYSTEM_PROMPT = """ You are my travel agent with some tools available. """ messages = [ { "role": "system", "content": SYSTEM_PROMPT, "tools": json.dumps(AGENT_TOOLS), # pass the tools into system message using tools argument }, { "role": "user", "content": """I have a business trip from London to New York in March 21 2025 to March 27 2025, can you help me to book a hotel and flight tickets""" } ] Full sample : click Using Ollama and Phi-4-mini Function Calling to Create AI Agents on Edge Devices Ollama is a popular free tool for deploying LLM/SLM locally and can be used in combination with AI Toolkit for VS Code. In addition to being deployed on your PC/Laptop, it can also be deployed on IoT, mobile phones, containers, etc. To use Phi-4-mini on Ollama, you need to use Ollama 0.5.13+. Different quantitative versions are supported on Ollama, as shown in the figure below: Using Ollama, we can deploy Phi-4-mini on the edge, and implement AI Agent with Function Calling under limited computing power, so that Generative AI can be applied more effectively on the edge. Current Issues A sad experience - If you directly use the interface to try to call Ollama in the above way, you will find that Function Calling will not be triggered. There are discussions on Ollama's GitHub Issue. You can enter the Issue https://github.com/ollama/ollama/issues/9437. By modifying the Phi-4-mini Template on the ModelFile to implement a single Function Calling, but the call to Parallel Function Calling still failed. Resolution We have implemented a fix by making a adjustments to the template. We have improved it according to Phi-4-mini's Chat Template and re-modified the Modelfile. Of course, the quantitative model has a huge impact on the results. The adjustments are as follows: TEMPLATE """ {{- if .Messages }} {{- if or .System .Tools }}<|system|> {{ if .System }}{{ .System }} {{- end }} In addition to plain text responses, you can chose to call one or more of the provided functions. Use the following rule to decide when to call a function: * if the response can be generated from your internal knowledge (e.g., as in the case of queries like "What is the capital of Poland?"), do so * if you need external information that can be obtained by calling one or more of the provided functions, generate a function calls If you decide to call functions: * prefix function calls with functools marker (no closing marker required) * all function calls should be generated in a single JSON list formatted as functools[{"name": [function name], "arguments": [function arguments as JSON]}, ...] * follow the provided JSON schema. Do not hallucinate arguments or values. Do to blindly copy values from the provided samples * respect the argument type formatting. E.g., if the type if number and format is float, write value 7 as 7.0 * make sure you pick the right functions that match the user intent Available functions as JSON spec: {{- if .Tools }} {{ .Tools }} {{- end }}<|end|> {{- end }} {{- range .Messages }} {{- if ne .Role "system" }}<|{{ .Role }}|> {{- if and .Content (eq .Role "tools") }} {"result": {{ .Content }}} {{- else if .Content }} {{ .Content }} {{- else if .ToolCalls }} functools[ {{- range .ToolCalls }}{{ "{" }}"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}{{ "}" }} {{- end }}] {{- end }}<|end|> {{- end }} {{- end }}<|assistant|> {{ else }} {{- if .System }}<|system|> {{ .System }}<|end|>{{ end }}{{ if .Prompt }}<|user|> {{ .Prompt }}<|end|>{{ end }}<|assistant|> {{ end }}{{ .Response }}{{ if .Response }}<|user|>{{ end }} """ We have tested the solution using different quantitative models. In the laptop environment, we recommend that you use the following model to enable single/parallel Function Calling: phi4-mini:3.8b-fp16. Note: you need to bind the defined Modelfile and phi4-mini:3.8b-fp16 together to enable this to work. Please execute the following command in the command line: #If you haven't downloaded it yet, please execute this command firstr ollama run phi4-mini:3.8b-fp16 #Binding with the adjusted Modelfile ollama create phi4-mini:3.8b-fp16 -f {Your Modelfile Path} To test the single Function Calling and Parallel Function Calling of Phi-4-mini. Single Function Calling Parallel Function Calling Full Sample in notebook The above example is just a simple introduction. As we move forward with the development we hope to find simpler ways to apply it on the edge, use Function Calling to expand the scenarios of Phi-4-mini / Phi-4-multimodal, and also develop more usecases in vertical industries. Resources Phi-4 model on Hugging face https://huggingface.co/collections/microsoft/phi-4-677e9380e514feb5577a40e4 Phi-4-mini on Ollama https://ollama.com/library/phi4-mini Learn Function Calling https://huggingface.co/docs/hugs/en/guides/function-calling Phi Cookbook - Samples and Resources for Phi Models https://aka.ms/phicookbook
kinfey
Mar 11, 2025 Place Educator Developer Blog
6.4KViews
4likes
1Comment
Using Advanced Reasoning Model on EdgeAI Part 2 - Evaluate local models using AITK for VSCode
Quantizing a generative AI model assists in ease of deployment of edge devices. For development purposes we would like to have 1) quantization 2) optimization and 3) evaluation of models all in one tool. We recommend AI Toolkit for Visual Studio Code (AITK) for as an all-in-one tool for developers experimenting with GenAI models. This is a lightweight open-source tool that covers model selection, fine-tuning, deployment, application development, batch data for testing, and evaluation. In Using Advanced Reasoning Model on EdgeAI Part 1 - Quantization, Conversion, Performance, we performed quantization and format conversion for Phi-4 and DeepSeek-R1-Distill-Qwen-1.5B. In This blog will evaluate Phi-4-14B-ONNX-INT4-GPU and DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU using AITK. About model quantization Quantization refers to the process of approximating the continuous value of a signal to a finite number of discrete values. It can be understood as a method of information compression. When considering this concept on a computer system, it is generally represented by "low bits". Some people also call quantization "fixed-point", but strictly speaking, the range represented is reduced. Fixed-point refers specifically to linear quantization with a scale of 2, which is a more practical quantization method. Our trained models generally have the characteristics of large number of parameters, large amount of calculation, high memory usage, and high precision. However, after model quantization, we can compress parameters, increase the calculation speed of the model, and reduce memory usage, but there is a problem that the quantization approximation algorithm reduces the precision. In the use of generative AI, we will face some errors and omissions in the results. We therefore need to evaluate the quantization model. Use AITK to evaluate quantitative models Model deployment AITK is an open-source tool with GenAIOps, based on Visual Studio Code. If you want to evaluate the model, you first need to deploy the model, which can be deployed locally or remotely. Currently AITK only supports the deployment of ONNX models in the Model Catalog (Phi-3/Phi-3.5 Family/Mistral 7B) To test models that are not currently supported within model catalog we propose API that is compatible with AITK. This uses the remote deployment method of the Open AI Chat completion. We use Python + Flask to create the Open AI Chat completion service, the code is as follows: from flask import Flask, request, jsonify, stream_with_context, Response import json import onnxruntime_genai as og import numpy as np import requests import uuid, time app = Flask(__name__) model_path = "Your Phi-4-14B-ONNX-INT4-GPU or DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU path" model = og.Model(model_path) tokenizer = og.Tokenizer(model) tokenizer_stream = tokenizer.create_stream() # Your Phi-4 chat template chat_template = "<|user|>{input}<|assistant|>" # Your DeepSeek-R1-Distill-Qwen chat template # chat_template = "<|im_start|> user {input}<|im_end|><|im_start|> assistant" @app.route('/v1/chat/completions', methods=['POST']) def ort_chat(): data = request.get_json() messages = data.get("messages") if not messages or not isinstance(messages, list): return jsonify({ "error": { "message": "Invalid request. 'messages' is required and must be a list of strings.", "type": "invalid_request_error", "param": "messages", "code": None } }), 400 prompt = '' if messages: last_message = messages[-1] if last_message.get("role") == "user" and last_message.get("content"): prompt = last_message['content'] search_options = {} search_options['max_length'] = data.get("max_tokens") search_options['temperature'] = data.get("temperature") search_options['top_p'] = data.get("top_p") search_options['past_present_share_buffer'] = False prompt = f'{chat_template.format(input=messages)}' input_tokens = tokenizer.encode(prompt) params = og.GeneratorParams(model) params.set_search_options(**search_options) params.input_ids = input_tokens generator = og.Generator(model, params) reply = "" while not generator.is_done(): generator.compute_logits() generator.generate_next_token() new_token = generator.get_next_tokens()[0] reply += tokenizer.decode(new_token) if data.get("stream"): def generate(): tokens = reply.split() for token in tokens: chunk = { "id": "chatcmpl-"+ str(uuid.uuid4()), "object": "chat.completion.chunk", "created": int(time.time()), "model": data.get("model"), "choices": [ { "delta": {"content": token + " "}, "index": 0, "finish_reason": None } ] } yield "data: " + json.dumps(chunk) + "\n\n" time.sleep(0.5) yield "data: [DONE]\n\n" return Response(stream_with_context(generate()), mimetype="text/event-stream") response_data = { "id": "chatcmpl-"+ str(uuid.uuid4()), "object": "chat.completion", "created": int(time.time()), "model": data.get("model"), "choices": [ { "index": 0, "message": { "role": "assistant", "content": reply }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": len(messages), "completion_tokens": len(reply.split()), "total_tokens": len(reply.split()) } } return jsonify(response_data) if __name__ == '__main__': app.run(debug=True, port=5000) After create service, through AITK's My MODELS Remote models select "Add a custom model" select the API (http://127.0.0.1:5000/v1/chat/completions) set the name such as DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU or Phi-4-14B-ONNX-INT4-GPU. We can test it in playground DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU Playground Phi-4-14B-ONNX-INT4-GPU Playground Batch Data We can use AITK's Bulk Data to set up the execution environment for batch data. I simulate the execution of 10 data here. We can execute in batches to see the relevant results. This is an important step in the evaluation because we need to know the results based on a specific problem before the evaluation. After the run, we can export the run results Evaluation You can evaluate the exported batch data results. AITK supports different evaluation methods. Create an evaluation by selecting Evaluation from Tools and importing the data exported from Bulk Data. You can choose different evaluation methods. We select F1 to complete the evaluation selection, Let's take a look at the evaluation results of DeepSeek-R1-Distill-Qwen-14B-ONNX-INT4-GPU Summary The AI Toolkit (AITK) enables us to assess various models, which is instrumental for our edge deployments. As Responsible AI gains importance, continuous testing and evaluation post-deployment become essential. In this context, AITK serves as a crucial tool for our daily operations. Resource Learn about AI Toolkit for Visual Studio Code https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio DeepSeek-R1 in GitHub Models https://github.com/marketplace/models/azureml-deepseek/DeepSeek-R1 DeepSeek-R1 in Azure AI Foundry https://ai.azure.com/explore/models/DeepSeek-R1/version/1/registry/azureml-deepseek Phi-4-14B in Hugging face https://huggingface.co/microsoft/phi-4 Learn about Microsoft Olive https://github.com/microsoft/olive Learn about ONNX Runtime GenAI https://github.com/microsoft/onnxruntime-genai
kinfey
Feb 13, 2025 Place Educator Developer Blog
464Views
0likes
0Comments
Fine-Tuning Language Models with Azure AI Foundry: A Detailed Guide
What is Azure AI Foundry? Azure AI Foundry is a comprehensive platform designed to simplify the development, deployment, and management of AI models. It provides a user-friendly interface and powerful tools that enable developers to create custom AI solutions without needing extensive machine learning expertise. Key Features of Azure AI Foundry One-Button Fine-Tuning: A streamlined process that allows users to fine-tune models with minimal configuration. Integration with Development Tools: Seamless integration with popular development environments, particularly Visual Studio Code. Support for Multiple Models: Access to a variety of pre-trained models, including the Phi family of models. Understanding Fine-Tuning Fine-tuning is the process of taking a pre-trained model and adapting it to a specific dataset or task. This is particularly useful when the base model has been trained on a large corpus of general data but needs to perform well on a narrower domain. Why Fine-Tune? Improved Performance: Fine-tuning can significantly enhance the model's accuracy and relevance for specific tasks. Reduced Training Time: Starting with a pre-trained model reduces the amount of data and time required for training. Customization: Tailor the model to meet the unique needs of your application or business. One-Button Fine-Tuning in Azure AI Foundry Step-by-Step Process Select the Model: Log in to Azure AI Foundry and navigate to the model selection interface. Choose Phi-3 or another small language model from the available options. Prepare Your Data: Ensure your dataset is formatted correctly. Typically, this involves having a set of input-output pairs that the model can learn from. Upload your dataset to Azure AI Foundry. The platform supports various data formats, making it easy to integrate your existing data. Initiate Fine-Tuning: Locate the one-button fine-tuning feature within the Azure AI Foundry interface. Click the button to start the fine-tuning process. The platform will handle the configuration and setup automatically. Monitor Progress: After initiating fine-tuning, you can monitor the process through the Azure portal. The portal provides real-time updates on training metrics, allowing you to track the model's performance as it learns. Evaluate the Model: Once fine-tuning is complete, evaluate the model's performance using a validation dataset. Azure AI Foundry provides tools for assessing accuracy, precision, recall, and other relevant metrics. Deploy the Model: After successful evaluation, you can deploy the fine-tuned model directly from Azure AI Foundry. The platform supports various deployment options, including REST APIs and integration with other Azure services. Using the AI Toolkit in Visual Studio Code Overview of the AI Toolkit The AI Toolkit for Visual Studio Code enhances the development experience by providing tools specifically designed for AI model management and fine-tuning. This integration allows developers to work within a familiar environment while leveraging powerful AI capabilities. Key Features of the AI Toolkit 1) Model Management: Easily manage and switch between different models, including Phi-3 and Ollama models. 2) Data Handling: Simplified data upload and preprocessing tools to prepare datasets for training. 3) Real-Time Collaboration: Collaborate with team members in real-time, sharing insights and progress on AI projects. How to Use the AI Toolkit 1) Install the AI Toolkit: Open Visual Studio Code and navigate to the Extensions Marketplace. Search for "AI Toolkit" and install the extension. 2) Connect to Azure AI Foundry: Once installed, configure the toolkit to connect to your Azure AI Foundry account. This will allow you to access your models and datasets directly from Visual Studio Code. 3) Fine-Tune Models: Use the toolkit to initiate fine-tuning processes directly from your development environment. Monitor training progress and view logs without leaving Visual Studio Code. 4) Consume Ollama Models: The AI Toolkit supports the consumption of Ollama models, providing additional flexibility in your AI projects. This feature allows you to integrate various models seamlessly, enhancing your application's capabilities. Microsoft ONNX Live for Fine-Tuning What is Microsoft ONNX Live? Microsoft ONNX Live is a platform that allows developers to deploy and optimize AI models using the Open Neural Network Exchange (ONNX) format. ONNX is an open-source format that enables interoperability between different AI frameworks, making it easier to deploy models across various environments. Key Features of Microsoft ONNX Live Model Optimization: ONNX Live provides tools to optimize models for performance, ensuring they run efficiently in production environments. Cross-Framework Compatibility: Models trained in different frameworks (like PyTorch or TensorFlow) can be converted to ONNX format, allowing for greater flexibility in deployment. Real-Time Inference: ONNX Live supports real-time inference, enabling applications to utilize AI models for immediate predictions. Fine-Tuning with ONNX Live Model Conversion: If you have a model trained in a different framework, you can convert it to ONNX format using tools provided by Microsoft. This conversion allows you to leverage the benefits of ONNX Live for deployment and optimization. Integration with Azure AI Foundry: Once your model is in ONNX format, you can integrate it with Azure AI Foundry for fine-tuning. The one-button fine-tuning feature can be used to adapt the ONNX model to your specific dataset. Optimization Techniques: After fine-tuning, you can apply various optimization techniques available in ONNX Live to enhance the model's performance. Techniques such as quantization and pruning can significantly reduce the model size and improve inference speed. Deployment: Once optimized, the model can be deployed directly from Azure AI Foundry or ONNX Live. This deployment can be done as a REST API, allowing easy integration with web applications and services. Additional Resources To further enhance your understanding and capabilities in fine-tuning language models, consider exploring the following resources: Phi-3 Cookbook: This comprehensive guide provides insights into getting started with Phi models, including best practices for fine-tuning and deployment. Explore the Phi-3 Cookbook. Ignite Fine-Tuning Workshop: This workshop offers a hands-on approach to learning about fine-tuning techniques and tools. It includes real-world scenarios to help you understand the practical applications of fine-tuning. Visit the GitHub Repository. Conclusion Fine-tuning language models like Phi-3 using Azure AI Foundry, combined with the AI Toolkit in Visual Studio Code and Microsoft ONNX Live, provides a powerful and efficient workflow for developers. The one-button fine-tuning feature simplifies the process, while the integration with ONNX Live allows for optimization and deployment flexibility. By leveraging these tools, you can enhance your AI applications, ensuring they are tailored to meet specific needs and perform optimally in production environments. Whether you are a seasoned AI developer or just starting, Azure AI Foundry and its associated tools offer a robust ecosystem for building and deploying advanced AI solutions. References Microsoft Docs Links Fine-Tuning Models in Azure OpenAI Azure AI Services Documentation Azure Machine Learning Documentation Microsoft Learn Links Develop Generative AI Apps in Azure Fine-Tune a Language Model Azure AI Foundry Overview Get started with AI Toolkit for Visual Studio Code
Sharda_Kaur
Jan 29, 2025 Place Educator Developer Blog
1.8KViews
0likes
0Comments
Recipe Generator Application with Phi-3 Vision on AI Toolkit Locally
In today's data-driven world, images have become a ubiquitous source of information. From social media feeds to medical imaging, we encounter and generate images constantly. Extracting meaningful insights from these visual data requires sophisticated analysis techniques. In this blog post let’s build an Image Analysis Application using the cutting-edge Phi-3 Vision model completely free of cost and on-premise environment using the VS Code AI Toolkit. We'll explore the exciting possibilities that this powerful combination offers. The AI Toolkit for Visual Studio Code (VS Code) is a VS Code extension that simplifies generative AI app development by bringing together cutting-edge AI development tools and models. I would recommend going through the following blogs for getting started with VS Code AI Toolkit. 1. Visual Studio Code AI Toolkit: How to Run LLMs locally 2. Visual Studio AI Toolkit : Building Phi-3 GenAI Applications 3. Building Retrieval Augmented Generation on VSCode & AI Toolkit 4. Bring your own models on AI Toolkit - using Ollama and API keys Setup VS Code AI Toolkit: Launch the VS Code application and Click on the VS Code AI Toolkit extension. Login to the GitHub account if not already done. Once ready, click on model catalog. In the model catalog there are a lot of models, broadly classified into two categories, Local Run (with CPU and with GPU) Remote Access (Hosted by GitHub and other providers) For this blog, we will be using a Local Run model. This will utilize the local machine’s hardware to run the Language model. Since it involves analyzing images, we will be using the language model which supports vision operations and hence Phi-3-Vision will be a good fit as its light and supports local run. Download the model and then further it will be loaded it in the playground to test. Once downloaded, Launch the “Playground” tab and load the Phi-3 Vision model from the dropdown. The Playground also shows that Phi-3 vision allows image attachments. We can try it out before we start developing the application. Let’s upload the image using the “Paperclip icon” on the UI. I have uploaded image of Microsoft logo and prompted the language model to Analyze and explain the image. Phi-3 vision running on local premise boasts an uncanny ability to not just detect but unerringly pinpoint the exact Company logo and decipher the name with astonishing precision. This is a simple use case, but it can be built upon with various applications to unlock a world of new possibilities. Port Forwarding: Port Forwarding, a valuable feature within the AI Toolkit, serves as a crucial gateway for seamless communication with the GenAI model. To do this, launch the terminal and navigate to the “Ports” section. There will be button “Forward a Port”, click on that and select any desired port, in this blog we will use 5272 as the port. The Model-as-a-server is now ready, where the model will be available on the port 5272 to respond to the API calls. It can be tested with any API testing application. To know more click here. Creating Application with Python using OpenAI SDK: To follow this section, Python must be installed on the local machine. Launch the new VS Code window and set the working directory. Create a new Python Virtual environment. Once the setup is ready, open the terminal on VS Code, and install the libraries using “pip”. pip install openai pip install streamlit Before we build the streamlit application, lets develop the basic program and check the responses in the VSCode terminal and then further develop a basic webapp using the streamlit framework. Basic Program Import libraries: import base64 from openai import OpenAI base64: The base64 module provides functions for encoding binary data to base64-encoded strings and decoding base64-encoded strings back to binary data. Base64 encoding is commonly used for encoding binary data in text-based formats such as JSON or XML. OpenAI: The OpenAI package is a Python client library for interacting with OpenAI's API. The OpenAI class provides methods for accessing various OpenAI services, such as generating text, performing natural language processing tasks, and more. Initialize Client: Initialize an instance of the OpenAI class from the openai package, client = OpenAI( base_url="http://127.0.0.1:5272/v1/", api_key="xyz" # required by API but not used ) OpenAI (): Initializes a OpenAI model with specific parameters, including a base URL for the API, an API key, a custom model name, and a temperature setting. This model is used to generate responses based on user queries. This instance will be used to interact with the OpenAI API. base_url = "http://127.0.0.1:5272/v1/": Specifies the base URL for the OpenAI API. In this case, it points to a local server running on 127.0.0.1 (localhost) at port 5272. api_key = "ai-toolkit": The API key used to authenticate requests to the OpenAI API. In case of AI Toolkit usage, we don’t have to specify any API key. The image analysis application will frequently deal with images uploaded by users. But to send these images to GenAI model, we need them in a format it understands. This is where the encode_image function comes in. # Function to encode the image def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8") Function Definition: def encode_image(image_path): defines a function named encode_image that takes a single argument, image_path. This argument represents the file path of the image we want to encode. Opening the Image: with open(image_path, "rb") as image_file: opens the image file specified by image_path in binary reading mode ("rb"). This is crucial because we're dealing with raw image data, not text. Reading Image Content: image_file.read() reads the entire content of the image file into a byte stream. Remember, images are stored as collections of bytes representing color values for each pixel. Base64 Encoding: base64.b64encode(image_file.read()) encodes the byte stream containing the image data into base64 format. Base64 encoding is a way to represent binary data using a combination of printable characters, which makes it easier to transmit or store the data. Decoding to UTF-8: .decode("utf-8") decodes the base64-encoded data into a UTF-8 string. This step is necessary because the OpenAI API typically expects text input, and the base64-encoded string can be treated as text containing special characters. Returning the Encoded Image: return returns the base64-encoded string representation of the image. This encoded string is what we'll send to the AI model for analysis. In essence, the encode_image function acts as a bridge, transforming an image file on your computer into a format that the AI model can understand and process. Path for the Image: We will use an image stored on our local machine for this section, while we develop the webapp, we will change this to accept it to what the user uploads. image_path = "C:/img.jpg" #path of the image here This line of code is crucial for any program that needs to interact with an image file. It provides the necessary information for the program to locate and access the image data. Base64 String: # Getting the base64 string base64_image = encode_image(image_path) This line of code is responsible for obtaining the base64-encoded representation of the image specified by the image_path. Let's break it down: encode_image(image_path): This part calls the encode_image function, which we've discussed earlier. This function takes the image_path as input and performs the following: Reads the image file from the specified path. Converts the image data into a base64-encoded string. Returns the resulting base64-encoded string. base64_image = ...: This part assigns the return value of the encode_image function to the variable base64_image. This section effectively fetches the image from the given location and transforms it into a special format (base64) that can be easily handled and transmitted by the computer system. This base64-encoded string will be used subsequently to send the image data to the AI model for analysis. Invoking the Language Model: This code tells the AI model what to do with the image. response = client.chat.completions.create( model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx", messages=[ { "role": "user", "content": [ { "type": "text", "text": "What's in the Image?", }, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, }, ], } ], ) response = client.chat.completions.create(...): This line sends instructions to the AI model we're using (represented by client). Here's a breakdown of what it's telling the model: chat.completions.create: We're using a specific part of the OpenAI API designed for having a conversation-like interaction with the model. The ... part: This represents additional details that define what we want the model to do, which we'll explore next. Let's break down the details (...) sent to the model: 1) model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx": This tells the model exactly which AI model to use for analysis. In our case, it's the "Phi-3-vision" model. 2) messages: This defines what information we're providing to the model. Here, we're sending two pieces of information: role": "user": This specifies that the first message comes from a user (us). The content: This includes two parts: "What's in the Image?": This is the prompt we're sending to the model about the image. "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}: This sends the actual image data encoded in base64 format (stored in base64_image). In a nutshell, this code snippet acts like giving instructions to the AI model. We specify the model to use, tell it we have a question about an image, and then provide the image data itself. Printing the response on the console: print(response.choices[0].message.content) We asked the AI model "What's in this image?" This line of code would then display the AI's answer. Console response: Finally, we can see the response on the terminal. Now to make things more interesting, let’s convert this into a webapp using the streamlit framework. Recipe Generator Application with Streamlit: Now we know how to interact with the Vision model offline using a basic console. Let’s make things even more exciting by applying all this to a use-case which probably will be most loved by all those who are cooking enthusiasts!! Yes, let’s create an application which will assist in cooking by looking what’s in the image of ingredients! Create a new file and name is as “app.py” select the same. venv that was used earlier. Make sure the Visual studio toolkit is running and serving the Phi-3 Vision model through the port 5272. First step is importing the libraries, import streamlit as st import base64 from openai import OpenAI base64 and OpenAI is the same as we had used in the earlier section. Streamlit: This part imports the entire Streamlit library, which provides a powerful set of tools for creating user interfaces (UIs) with Python. Streamlit simplifies the process of building web apps by allowing you to write Python scripts that directly translate into interactive web pages. client = OpenAI( base_url="http://127.0.0.1:5272/v1/", api_key="xyz" # required by API but not used ) As discussed in the earlier section, initializing the client and configuring the base_url and api_key. st.title('Recipe Generator 🍔') st.write('This is a simple recipe generator application.Upload images of the Ingridients and get the recipe by Chef GenAI! 🧑‍🍳') uploaded_file = st.file_uploader("Choose a file") if uploaded_file is not None: st.image(uploaded_file, width=300) st.title('Recipe Generator 🍔'): This line sets the title of the Streamlit application as "Recipe Generator" with a visually appealing burger emoji. st.write(...): This line displays a brief description of the application's functionality to the user. uploaded_file = st.file_uploader("Choose a file"): This creates a file uploader component within the Streamlit app. Users can select and upload an image file (likely an image of ingredients). if uploaded_file is not None: : This conditional block executes only when the user has actually selected and uploaded a file. st.image(uploaded_file, width=300): If an image is uploaded, this line displays the uploaded image within the Streamlit app with a width of 300 pixels. In essence, this code establishes the basic user interface for the Recipe Generator app. It allows users to upload an image, and if an image is uploaded, it displays the image within the app. preference = st.sidebar.selectbox( "Choose your preference", ("Vegetarian", "Non-Vegetarian") ) cuisine = st.sidebar.selectbox( "Select for Cuisine", ("Indian","Chinese","French","Thai","Italian","Mexican","Japanese","American","Greek","Spanish") ) We use Streamlit's sidebar and selectbox features to create interactive user input options within a web application: st.sidebar.selectbox(...): This line creates a dropdown menu (selectbox) within the sidebar of the Streamlit application.The first argument, "Choose your preference", sets the label or title for the dropdown.The second argument, ("Vegetarian", "Non-Vegetarian"), defines the list of options available for the user to select (in this case, dietary preferences). cuisine = st.sidebar.selectbox(...): This line creates another dropdown menu in the sidebar, this time for selecting the desired cuisine.The label is "Select for Cuisine".The options provided include "Indian", "Chinese", "French", and several other popular cuisines. In essence, this code allows users to interact with the application by selecting their preferred dietary restrictions (Vegetarian or Non-Vegetarian) and desired cuisine from the dropdown menus in the sidebar. def encode_image(uploaded_file): """Encodes a Streamlit uploaded file into base64 format""" if uploaded_file is not None: content = uploaded_file.read() return base64.b64encode(content).decode("utf-8") else: return None base64_image = encode_image(uploaded_file) The same function of encode_image as discussed in the earlier section is being used here. if st.button("Ask Chef GenAI!"): if base64_image: response = client.chat.completions.create( model="Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx", messages=[ { "role": "user", "content": [ { "type": "text", "text": f"STRICTLY use the ingredients in the image to generate a {preference} recipe and {cuisine} cuisine.", }, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, }, ], } ], ) print(response.choices[0].message.content) st.write(response.choices[0].message.content) else: st.write("Please upload an image with any number of ingridients and instantly get a recipe.") Above code block implements the core functionality of the Recipe Generator app, triggered when the user clicks a button labeled "Ask Chef GenAI!": if st.button("Ask Chef GenAI!"): This line checks if the user has clicked the button. If they have, the code within the if block executes. if base64_image: This inner if condition checks if a variable named base64_image has a value. This variable likely stores the base64 encoded representation of the uploaded image (containing ingredients). If base64_image has a value (meaning an image is uploaded), the code proceeds. client.chat.completions.create(...): Client that had been defined earlier interacts with the API . Here, it calls a to generate text completions, thereby invoking a small language model. The arguments provided specify the model to be used ("Phi-3-vision-128k-cpu-int4-rtn-block-32-acc-level-4-onnx") and the message to be completed. The message consists of two parts within a list: User Input: The first part defines the user's role ("user") and the content they provide. This content is an instruction with two key points: Dietary Preference: It specifies to "STRICTLY use the ingredients in the image" to generate a recipe that adheres to the user's preference (vegetarian or non-vegetarian, set using the preference dropdown). Cuisine Preference: It mentions the desired cuisine type (Indian, Chinese, etc., selected using the cuisine dropdown). Image Data: The second part provides the image data itself. It includes the type ("image_url") and the URL, which is constructed using the base64_image variable containing the base64 encoded image data. print(response.choices[0].message.content) & st.write(...): The response will contain a list of possible completions. Here, the code retrieves the first completion (response.choices[0]) and extracts its message content. This content is then printed to the console like before and displayed on the Streamlit app using st.write. else block: If no image is uploaded (i.e., base64_image is empty), the else block executes. It displays a message reminding the user to upload an image to get recipe recommendations. The above code block is the same as before except the we have now modified it to accept few inputs and also have made it compatible with streamlit. The coding is now completed for our streamlit application! It's time to test the application. Navigate to the terminal on Visual Studio Code and enter the following command, (if the file is named as app.py) streamlit run app.py Upon successful run, it will redirect to default browser and a screen with the Recipe generator will be launched, Upload an image with ingredients, select the recipe, cuisine and click on “Ask Chef GenAI”. It will take a few moments for delightful recipe generation. While generating we can see the logs on the terminal and finally the recipe will be shown on the screen! Enjoy your first recipe curated by Chef GenAI powered by Phi-3 vision model on local prem using Visual Studio AI Toolkit! The code is available on the following GitHub Repository. In the upcoming series we will explore more types of Gen AI implementations with AI toolkit. Resources: 1. Visual Studio Code AI Toolkit: Run LLMs locally 2. Visual Studio AI Toolkit : Building Phi-3 GenAI Applications 3. Building Retrieval Augmented Generation on VSCode & AI Toolkit 4. Bring your own models on AI Toolkit - using Ollama and API keys 5. Expanded model catalog for AI Toolkit 6. Azure Toolkit Samples GitHub Repository
shreyanfern
Jan 28, 2025 Place Educator Developer Blog
617Views
1like
1Comment