Educator Developer Blog

7 MIN READ

Function Calling with Small Language Models

Brass Contributor

Nov 28, 2025

In our previous article on running Phi-4 locally, we built a web-enhanced assistant that could search the internet and provide informed answers. Here's what that implementation looked like:

def web_enhanced_query(question):
    # 1. ALWAYS search (hardcoded decision)
    search_results = search_web(question)
    
    # 2. Inject results into prompt
    prompt = f"""Here are recent search results:
    {search_results}
    
    Question: {question}
    
    Using only the information above, give a clear answer."""
    
    # 3. Model just summarizes what it reads
    return ask_phi4(endpoint, model_id, prompt)

Today, we're upgrading to true function calling. With this, we have ability to transform small language models from passive text generators into intelligent agents that can:

Decide when to use external tools
Reason which tool bests fit each task
Execute real-world actions thrugh apis

Function calling represents a significant evolution in AI capabilities. Let's understand where this positions our small language models:

Agent Classification Framework

Simple Reflex Agents (Basic)

React to immediate input with predefined rules
Example: Thermostat, basic chatbot
Without function calling, models operate here

Model-Based Agents (Intermediate)

Maintain internal state and context
Example: Robot vacuum with room mapping
Function calling enables this level

Goal-Based Agents (Advanced)

Plan multi-step sequences to achieve objectives
Example: Route planner, task scheduler
Function calling + reasoning enables this

Learning Agents (Expert)

Adapt and improve over time
Example: Recommendation systems
Future: Function calling + fine-tuning

A staged hierarchy illustrating the progression from simple reflex agents to advanced learning agents, emphasizing increasing capability through function calling, reasoning, and fine-tuning.

As usual with these articles, let's get ready to get our hands dirty!

Project Setup

Let's set up our environment for building function-calling assistants.

Prerequisites

First, ensure you have Foundry Local installed and a model running. We'll use Qwen 2.5-7B for this tutorial as it has excellent function calling support.

Important: Not all small language models support function calling equally. Qwen 2.5 was specifically trained for this capability and provides a reliable experience through Foundry Local.

# 1. Check Foundry Local is installed
foundry --version

# 2. Start the Foundry Local service
foundry service start

# 3. Download and run Qwen 2.5-7B
foundry model run qwen2.5-7b

Python Environment Setup

# 1. Create Python virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 2. Install dependencies
pip install openai requests python-dotenv

# 3. Get a free OpenWeatherMap API key
# Sign up at: https://openweathermap.org/api

``` Create `.env` file: ``` OPENWEATHER_API_KEY=your_api_key_here ```

Building a Weather-Aware Assistant

So in this scenario, a user wants to plan outdoor activities but needs weather context. Without function calling, You will get something like this:

User: "Should I schedule my team lunch outside at 2pm in Birmingham?" 
Model: "That depends on weather conditions. Please check the forecast for rain and temperature."

However, with fucntion-calling you get an answer that is able to look up the weather and reply with the needed context. We will do that now.

Understanding Foundry Local's Function Calling Implementation

Before we start coding, there's an important implementation detail to understand. Foundry Local uses a non-standard function calling format. Instead of returning function calls in the standard OpenAI tool_calls field, Qwen models return the function call as JSON text in the response content.

For example, when you ask about weather, instead of:

# Standard OpenAI format message.tool_calls = [ {"name": "get_weather", "arguments": {"location": "Birmingham"}} ]

You get:

# Foundry Local format message.content = '{"name": "get_weather", "arguments": {"location": "Birmingham"}}'

This means we need to parse the JSON from the content ourselves. Don't worry—this is straightforward, and I'll show you exactly how to handle it!

Step 1: Define the Weather Tool

Create weather_assistant.py:

import os
from openai import OpenAI
import requests
import json
import re
from dotenv import load_dotenv

load_dotenv()

# Initialize Foundry Local client
client = OpenAI(
    base_url="http://127.0.0.1:59752/v1/",
    api_key="not-needed"
)

# Define weather tool
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather information for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city or location name"
                    },
                    "units": {
                        "type": "string",
                        "description": "Temperature units",
                        "enum": ["celsius", "fahrenheit"],
                        "default": "celsius"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

A tool is necessary because it provides the model with a structured specification of what external functions are available and how to use them.

The tool definition contains the function name, description, parameters schema, and information returned.

Step 2: Implement the Weather Function

def get_weather(location: str, units: str = "celsius") -> dict:
    """Fetch weather data from OpenWeatherMap API"""
    api_key = os.getenv("OPENWEATHER_API_KEY")
    
    url = "http://api.openweathermap.org/data/2.5/weather"
    params = {
        "q": location,
        "appid": api_key,
        "units": "metric" if units == "celsius" else "imperial"
    }
    
    response = requests.get(url, params=params, timeout=5)
    response.raise_for_status()
    data = response.json()
    
    temp_unit = "°C" if units == "celsius" else "°F"
    
    return {
        "location": data["name"],
        "temperature": f"{round(data['main']['temp'])}{temp_unit}",
        "feels_like": f"{round(data['main']['feels_like'])}{temp_unit}",
        "conditions": data["weather"][0]["description"],
        "humidity": f"{data['main']['humidity']}%",
        "wind_speed": f"{round(data['wind']['speed'] * 3.6)} km/h"
    }

The model calls this function to get the weather data. it contacts OpenWeatherMap API, gets real weather data and returns it as a python dictionary

Step 3: Parse Function Calls from Content

This is the crucial step where we handle Foundry Local's non-standard format:

def parse_function_call(content: str):
    """Extract function call JSON from model response"""
    if not content:
        return None
    
    json_pattern = r'\{"name":\s*"get_weather",\s*"arguments":\s*\{[^}]+\}\}'
    match = re.search(json_pattern, content)
    
    if match:
        try:
            return json.loads(match.group())
        except json.JSONDecodeError:
            pass
    
    try:
        parsed = json.loads(content.strip())
        if isinstance(parsed, dict) and "name" in parsed:
            return parsed
    except json.JSONDecodeError:
        pass
    
    return None

Step 4: Main Chat Function with Function Calling

and lastly, calling the model. Notice the tools and tool_choice parameter. Tools tells the model it is allowed to output a tool_call requesting that the function be executed. While tool_choice instructs the model how to decide whether to call a tool.

def chat(user_message: str) -> str:
    """Process user message with function calling support"""
    
    messages = [
        {"role": "user", "content": user_message}
    ]
    
    response = client.chat.completions.create(
        model="qwen2.5-7b-instruct-generic-cpu:4",
        messages=messages,
        tools=tools,
        tool_choice="auto",
        temperature=0.3,
        max_tokens=500
    )
    
    message = response.choices[0].message
    
    if message.content:
        function_call = parse_function_call(message.content)
        
        if function_call and function_call.get("name") == "get_weather":
            print(f"\n[Function Call] {function_call.get('name')}({function_call.get('arguments')})")
            
            args = function_call.get("arguments", {})
            weather_data = get_weather(**args)
            
            print(f"[Result] {weather_data}\n")
            
            final_prompt = f"""User asked: "{user_message}"

Weather data: {json.dumps(weather_data, indent=2)}

Provide a natural response based on this weather information."""
            
            final_response = client.chat.completions.create(
                model="qwen2.5-7b-instruct-generic-cpu:4",
                messages=[{"role": "user", "content": final_prompt}],
                max_tokens=200,
                temperature=0.7
            )
            
            return final_response.choices[0].message.content
    
    return message.content

Step 5: Run the script

Now put all the above together and run the script

def main():
    """Interactive weather assistant"""
    
    print("\nWeather Assistant")
    print("=" * 50)
    print("Ask about weather or general questions.")
    print("Type 'exit' to quit\n")
    
    while True:
        user_input = input("You: ").strip()
        
        if user_input.lower() in ['exit', 'quit']:
            print("\nGoodbye!")
            break
        
        if user_input:
            response = chat(user_input)
            print(f"Assistant: {response}\n")


if __name__ == "__main__":
    if not os.getenv("OPENWEATHER_API_KEY"):
        print("Error: OPENWEATHER_API_KEY not set")
        print("Set it with: export OPENWEATHER_API_KEY='your_key_here'")
        exit(1)
    
    main()

Note: Make sure Qwen 2.5 is running in Foundry Local in a new terminal

Now let's talk about Model Context Protocol!

Our weather assistant works beautifully with a single function, but what happens when you need dozens of tools? Database queries, file operations, calendar integration, email—each would require similar setup code.

This is where Model Context Protocol (MCP) comes in. MCP is an open standard that provides pre-built, standardized servers for common tools. Instead of writing custom integration code for every capability, you can connect to MCP servers that handle the complexity for you.

With MCP, You only need one command to enable weather, database, and file access

npx @modelcontextprotocol/server-weather
npx @modelcontextprotocol/server-sqlite
npx @modelcontextprotocol/server-filesystem

Your model automatically discovers and uses these tools without custom integration code.

Learn more:

Key Takeaways

Function calling transforms models into agents - From passive text generators to active problem-solvers
Qwen 2.5 has excellent function calling support - Specifically trained for reliable tool use
Foundry Local uses non-standard format - Parse JSON from content instead of tool_calls field
Start simple, then scale with MCP - Build one tool to understand the pattern, then leverage standards

Documentation

Thank you for reading! I hope this article helps you build more capable AI agents with small language models. Function calling opens up incredible possibilities—from simple weather assistants to complex multi-tool workflows. Start with one tool, understand the pattern, and scale from there.

Updated Nov 25, 2025

Version 1.0

Brass Contributor

Joined June 12, 2024

View Profile