Forum Discussion
Title: Synthetic Dataset Format from AI Foundry Not Compatible with Evaluation Schema
Ans of your questions:
- Can the user role be mapped to a question?
- All Microsoft evaluation examples use: (query/Question --> user prompt
- Response --> model output
- Can the assistant role be mapped to ground_truth?
- Yes, if a synthetic assistant response represents the ideal / expected answer, then it is valid ground truth
- . Please use this link for reference :
https://azure.github.io/slm-innovator-lab/1_synthetic_data/
- Is there any built in transformation option within AI Foundry?
- No, there is no built-in transformation
Actually, your proposed mapping seems correct, supported, and aligned with Microsoft's own example, but the bridge between them is intentionally left to developers. It's not just automated
Here are suggestions:
You should convert each synthetic conversation into a single evaluation record using the following mapping
Syntetic Dataset(messages) Evaluation datasets
messages[].role == "user" questions
messages[].role =="assistant" ground_truth
system prompt / source info metadata
Get Chunks (if any) reference_contex
Examples could be
{
"messages": [
{ "role": "system", "content": "You are a helpful assistant" },
{ "role": "user", "content": "What is the primary purpose?" },
{ "role": "assistant", "content": "The primary purpose is to explain the concept clearly." }
]
}
output JSONL
{
"question": "What is the primary purpose?",
"ground_truth": "The primary purpose is to explain the concept clearly.",
"reference_context": "",
"metadata": {
"source": "ai-foundry-synthetic",
"system_prompt": "You are a helpful assistant"
}
}
please try to upload these JSONL or CSV and check your results accordingly
Please Note:
- Foundry Does not auto converts the messages ---> question/ground_truth
Hopefully it will resolve your issue if i correctly understood it - Thanks