Explore the use of the AI Toolkit in conjunction with the Prompt Orchestration Markup Language (POML) for on-premises model usage. It will be demonstrated how this structured approach addresses the challenges associated with complex, multi-step tasks, moving beyond unstructured prompts to a more programmatic and reliable methodology for language model interaction. The focus will be on achieving predictable and accurate results in a controlled environment.
Most of the times, it is observed that prompts/system message for large language models (LLMs) can become unstructured and difficult to manage, especially when it is lengthy containing few shot examples. This often leads to incorrect responses, makes way for hallucinations and more such problems. For those who have experienced this challenge, Prompt Orchestration Markup Language (POML) offers a structured solution.
POML can be considered as the HTML of AI prompts, utilizing familiar tag-based components to introduce organization and reusability into a set of instructions. Its design addresses common frustrations in prompt engineering, allowing for a more efficient and scalable interaction with LLMs.
Why use POML?
The adoption of POML provides several key benefits:
- Structure and Reusability: The need for unstructured, hard-to-edit text prompts is eliminated. With POML, prompts can be constructed in a way that makes them easier to maintain and reuse, integrating them as a central part of a technical workflow.
- Diverse Data Handling: POML facilitates the seamless integration of various data types directly into a prompt. It is capable of pulling in text from Word documents, tables from CSVs, images, audio files, and even entire file folders, providing the LLM with comprehensive context.
- Enhanced Workflow: A dedicated VS Code extension offers a powerful environment for writing, testing, and managing prompts in a manner similar to code, complete with features such as live preview.
The Core Components of POML
The architecture of POML is built upon the following main types of components:
- Basic Components: These tags provide logical structure and formatting. Examples include <p> for paragraphs and <ul> for lists.
- Intention Components: These are used to clearly define the objectives for the LLM. Tags like <task>, <role>, and <example> help in specifying the AI's intended actions and behavior.
- Data Components: These are crucial for incorporating external information. Tags such as <document>, <table>, and <img> allow for the inclusion of various data sources.
POML in Action:
Let’s now see POML in action. To begin with we need the POML extension installed on the Visual studio code, navigate to extensions and search for “POML”, Click on the install option.
POML : VS Code extension installOnce the POML extension is installed, its now time to configure the LLMs, Following are the steps to configure the LLM for testing the POML files,
- Press Ctrl + , (comma) or go to File > Preferences > Settings.
- Navigate to POML Extension Settings
- In the search bar within the Settings tab, type "POML" to filter the settings.
- Locate the settings related to the POML extension.
- Configure Language Model Settings like API Key, API URL, Max Tokens, Provider.
- In this section, we are using the Github models for configuring the LLMs on the POML toolkit. Obtain the Personal access token from : Github PAT . This PAT will be used as the API Key.
Choose the provider following as parameters,
- Provider: OpenAI (Select from dropdown)
- API_URL: https://models.github.ai/inference
- Max Tokens: 1500
- Model: gpt-4o
To start, we'll create a new .poml file. This file type is specifically designed for defining prompts in POML. In our example, we'll use the POML toolkit to pass an image of the photosynthesis process to an LLM and ask it to explain the concept in a way a 10-year-old can understand.
Open a new directory and make sure the .poml and the image file are in the same directory. Create a new file and name it “example.poml” and paste the following content.
<poml>
<role>You are a patient teacher explaining concepts to a 10-year-old.</role>
<task>Explain the concept of photosynthesis using the provided image as a reference.</task>
<img src="https://github.com/microsoft/poml/raw/HEAD/photosynthesis_diagram.png" alt="Diagram of photosynthesis" />
<output-format>
Keep the explanation simple, engaging, and under 100 words.
Start with "Hey there, future scientist!".
</output-format>
</poml>
Save the file and then click on – “Open POML Preview to the Side [ALT] Open POML Preview” located on the top right corner.
Open POML Preview to the Side [ALT] Open POML PreviewA dedicated window to shows preview of the POML. Display settings are available to view it in a rendered format, which illustrates how the prompt will be visualized when passed to the LLM.
POML previewIts finally time to execute this prompt and perhaps the most interesting part! Click on the “Run” option in the file example.poml window.
Runoption in the file example.pomlNote:
Sometimes if RUN is not visible, then close the preview tab and it will be seen.
In the output window, we can now see the output from the LLM.
POML outputGreat! POML has been successfully utilized to pass prompts and generate completions from the LLM.
It is also worth mentioning that POML can be integrated directly into Python code. A demonstration of this will be provided using an on-premise model hosted via the AI Toolkit. If you are unfamiliar with the AI Toolkit are encouraged to go through the provided link.
Launch the AI Toolkit extension and choose an offline model, now copy the name of the model by simply doing a right click on the model in the list and choose “Copy model name”. This step is very crucial, as a wrong model name in the request can lead to invalid request to the Language model.
AI Toolkit: Copy model nameNote: There are a variety of models that we can choose from AI Toolkit, we can also use models from Ollama, Azure AI foundry, Hugging face and use it through toolkit.
Local models hosted via AI Toolkit are by default available at http://127.0.0.1:5272/v1/chat/completions . This is the address that we will be using while sending a request using the requests library in python.
Note: If you are using it directly from Python in an application, then use the URL http://127.0.0.1:5272/v1/
We will create a POML file to build a Data chatbot that queries a CSV. As this presents a lengthier prompting scenario, using POML is an effective way to provide the language model with clear, step-by-step instructions for successful task completion.
Create a file orders.poml and paste the following content,
<poml>
<role> You are a helpful chatbot agent answering customer's question in a chat. </role>
<task> Your task is to answer the customer's question using the data provided in the data section.
<!-- Use listStyle property to change the style of a list. -->
<list listStyle="decimal">
<item> You can access order histoory in the orders section including email id and order total with payment summary.</item>
<item> Refer to orderlines for item level details within each order in orders. </item>
</list>
</task>
<!--cp means CaptionedParagraph, which is a paragraph with a customised heading.-->
<cp caption="Data">
<cp caption="Orders">
<!-- Use table to read a csv file. By default, it follows its parents' style (markdown in this case). -->
<table src="orders.csv" key="order_id" />
</cp>
<cp caption="OrderLines">
<!-- Use syntax to specify its output format. -->
<table src="orderlines.csv" syntax="tsv" />
</cp>
</cp>
<!-- This can also be stepwise-instructions, and it's case-insensitive. -->
<StepwiseInstructions>
<!-- Read a file and save it as instructions -->
<let src="order_instructions.json" name="instructions"/>
<!-- Use a for loop to iterate over the instructions, use {{ }} to evaluate an expression -->
<p for="inst in instructions">
Instruction {{loop.index+1}}: {{ inst }}
</p>
</StepwiseInstructions>
<!--Specify the speaker of a block-->
<HumanMessage>
<qa> How much did I pay for my last order? </qa>
</HumanMessage>
<!-- Use stylesheet (a CSS-like JSON) to modify the style in a batch. -->
<stylesheet>
{
"cp": {
"captionTextTransform": "upper"
}
}
</stylesheet>
</poml>
We can also preview it while we are creating the file using the POML preview feature of the POML Extension.
POML preview: orders.pomlWe also have created a csv file that carries all the instructions and json file that is used to specify instructions to Language model. Both of these files can be found here.
Next step is to create the python file that will use POML file to interact with the Language model. Create a new directory and setup the python virtual environment by using “Ctrl+Shift+P”, in the search bar type “Python: Create Environment”. Make sure that Python is installed on your computer before proceeding with this step. Choose “.venv” from the dropdown and choose the relevant interpreter path.
Let’s now install the library using pip. In the terminal, type in the following command,
pip install poml requests
Create a new file app.py and paste the following code in the file,
from poml import poml
import requests
import json
#Load and render POML file
messages=poml("orders.poml",chat=True)
#Combine messages into a single prompt
full_prompt = "\n".join(
["\n".join(str(c).strip() for c in m["content"]) if isinstance(m.get("content"), list) else str(m["content"]).strip()
for m in messages if m.get("content")]
)
print("\n---Full Prompt---\n")
print(full_prompt)
#Send response to AI Toolkit Model
url = "http://127.0.0.1:5272/v1/chat/completions"
payload = json.dumps({
"model": "Phi-4-cpu-int4-rtn-block-32-acc-level-4",
"messages": [
{
"role": "user",
"content": full_prompt
}
],
"temperature": 0.7,
"top_p": 1,
"top_k": 10,
"max_tokens": 100,
"stream": False
})
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
# To view the completions
print("\n---AI Toolkit Response---\n")
print(response.text)
print(response.json()['choices'][0]['message']['content'])
With the configurations complete, the setup can now be tested. Clicking "RUN" will display the POML-based prompt in the terminal before it is sent to the Phi-4 model.
TerminalThe clarity of the prompt being sent to the language model is notable. This structured approach is understood to be the reason for POML's effectiveness in ensuring a better response, particularly when there is a need to impose order on a complex or lengthy prompting scenario like the one used for this Data chatbot.
POML: Completions
This is the model's response after being prompted with the POML file.
The output demonstrates how the structured prompt successfully instructed the model to extract and organize data from a CSV file. This demonstrates POML's effectiveness in guiding the language model to perform specific, data-driven tasks accurately.Unlike XML or JSON, which require dedicated code for parsing, POML handles the parsing automatically. This allows one to focus on the prompt's content rather than the underlying data structure.
This has demonstrated the shift from simple conversational prompts to a more programmatic approach to interacting with language models. POML provides the essential structure required to manage complex scenarios, ensuring models like Phi-4 hosted on an on-premise scenario like we did here with AI toolkit can handle tasks like data extraction with remarkable clarity and precision. This method offers a robust path forward for developing more reliable and sophisticated LLM applications.
The full code for this project is available on AI_Toolkit_Samples GitHub repository.
In our upcoming series of posts, these concepts will be expanded upon as we delve into creating agentic workflows. By leveraging POML, these agents can be given clear, hierarchical instructions, enabling them to make decisions, execute tasks, and interact with various tools in a reliable manner.