Title: Synthetic Dataset Format from AI Foundry Not Compatible with Evaluation Schema

Question

Current SituationThe synthetic dataset created from AI Foundry Data Synthetic Data is generated in the following messages format{"messages": [{ "role": "system", "content": "You are a helpful assistant" },{ "role": "user", "content": "What is the primary purpose?" },{ "role": "assistant", "content": "The primary purpose is..." }]}&nbsp;ChallengeWhen attempting evaluation, especially RAG evaluation, the documentation indicates that the dataset must contain structured fields such asquestion - The query being askedground_truth - The expected answer&nbsp;Recommended additional fieldsreference_contextmetadataExample required format{"question": "","ground_truth": "","reference_context": "","metadata": { "document": "" }}&nbsp;Because the synthetic dataset is in messages format, I am unable to directly map it to the required evaluation schema.QuestionIs there a recommended or supported way to convert the synthetic dataset generated in AI Foundry messages format into the structured format required for evaluation?Can the user role be mapped to question?Can the assistant role be mapped to ground_truth?Is there any built in transformation option within AI Foundry?&nbsp;

anjalisadhukhan · Answer

Based on my understanding, AI Foundry currently doesn’t provide a built-in way to convert messages format into the structured evaluation schema.
A practical approach would be to preprocess the dataset by mapping:

user → question
assistant → ground_truth
system → optional (metadata or ignore)

For multi-turn conversations, this would need to be split into multiple Q&amp;A pairs. For RAG evaluations, reference_context may need to be added separately.
Alternatively, generating synthetic data directly in the required structured format can help avoid this extra step.

imran shakeel · Answer

Ans of your questions:Can the user role be mapped to a question?All Microsoft evaluation examples use: (query/Question --&gt; user promptResponse --&gt; model outputCan the assistant role be mapped to ground_truth?Yes, if a synthetic assistant response represents the ideal / expected answer, then it is valid ground truth. Please use this link for reference :&nbsp;https://azure.github.io/slm-innovator-lab/1_synthetic_data/Is there any built in transformation option within AI Foundry?No, there is no built-in transformation&nbsp;Actually, your proposed mapping seems correct, supported, and aligned with Microsoft's own example, but the bridge between them is intentionally left to developers. It's not just automated&nbsp;Here are suggestions:You should convert each synthetic conversation into a single evaluation record using the following mappingSyntetic Dataset(messages)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Evaluation datasetsmessages[].role == "user"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; questionsmessages[].role =="assistant"&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ground_truthsystem prompt / source info&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;metadataGet Chunks (if any)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; reference_contexExamples could be&nbsp;&nbsp;{&nbsp; "messages": [&nbsp;&nbsp;&nbsp; { "role": "system", "content": "You are a helpful assistant" },&nbsp;&nbsp;&nbsp; { "role": "user", "content": "What is the primary purpose?" },&nbsp;&nbsp;&nbsp; { "role": "assistant", "content": "The primary purpose is to explain the concept clearly." }&nbsp; ]}output JSONL&nbsp;&nbsp;{&nbsp; "question": "What is the primary purpose?",&nbsp; "ground_truth": "The primary purpose is to explain the concept clearly.",&nbsp; "reference_context": "",&nbsp; "metadata": {&nbsp;&nbsp;&nbsp; "source": "ai-foundry-synthetic",&nbsp;&nbsp;&nbsp; "system_prompt": "You are a helpful assistant"&nbsp; }}please try to upload these JSONL or CSV and check your results accordingly&nbsp;&nbsp;Please Note:Foundry Does not auto converts the messages&nbsp; ---&gt; question/ground_truthHopefully it will resolve your issue if i correctly understood it - Thanks&nbsp;

Forum Discussion

Title: Synthetic Dataset Format from AI Foundry Not Compatible with Evaluation Schema

2 Replies