Forum Discussion

parulpaul01's avatar
parulpaul01
Copper Contributor
Feb 13, 2026

Title: Synthetic Dataset Format from AI Foundry Not Compatible with Evaluation Schema

Current Situation

The synthetic dataset created from AI Foundry Data Synthetic Data is generated in the following messages format

{

"messages": [

{ "role": "system", "content": "You are a helpful assistant" },

{ "role": "user", "content": "What is the primary purpose?" },

{ "role": "assistant", "content": "The primary purpose is..." }

]

}

 

Challenge

When attempting evaluation, especially RAG evaluation, the documentation indicates that the dataset must contain structured fields such as

question - The query being asked

ground_truth - The expected answer

 

Recommended additional fields

reference_context

metadata

Example required format

{

"question": "",

"ground_truth": "",

"reference_context": "",

"metadata": { "document": "" }

}

 

Because the synthetic dataset is in messages format, I am unable to directly map it to the required evaluation schema.

Question

Is there a recommended or supported way to convert the synthetic dataset generated in AI Foundry messages format into the structured format required for evaluation?

Can the user role be mapped to question?

Can the assistant role be mapped to ground_truth?

Is there any built in transformation option within AI Foundry?

 

2 Replies

  • Imran Shakeel's avatar
    Imran Shakeel
    Copper Contributor

    Ans of your questions:

    • Can the user role be mapped to a question?
      • All Microsoft evaluation examples use: (query/Question --> user prompt
      • Response --> model output
    • Can the assistant role be mapped to ground_truth?
      • Yes, if a synthetic assistant response represents the ideal / expected answer, then it is valid ground truth
      • . Please use this link for reference : 

        https://azure.github.io/slm-innovator-lab/1_synthetic_data/

    • Is there any built in transformation option within AI Foundry?
      • No, there is no built-in transformation
    •  

    Actually, your proposed mapping seems correct, supported, and aligned with Microsoft's own example, but the bridge between them is intentionally left to developers. It's not just automated 

    Here are suggestions:

    You should convert each synthetic conversation into a single evaluation record using the following mapping

    Syntetic Dataset(messages)                                         Evaluation datasets

    messages[].role == "user"                                            questions

    messages[].role =="assistant"                                       ground_truth

    system prompt / source info                                         metadata

    Get Chunks (if any)                                                        reference_contex

    Examples could be 

     

    {

      "messages": [

        { "role": "system", "content": "You are a helpful assistant" },

        { "role": "user", "content": "What is the primary purpose?" },

        { "role": "assistant", "content": "The primary purpose is to explain the concept clearly." }

      ]

    }

    output JSONL 

     

    {

      "question": "What is the primary purpose?",

      "ground_truth": "The primary purpose is to explain the concept clearly.",

      "reference_context": "",

      "metadata": {

        "source": "ai-foundry-synthetic",

        "system_prompt": "You are a helpful assistant"

      }

    }

    please try to upload these JSONL or CSV and check your results accordingly 

     

    Please Note:

    • Foundry Does not auto converts the messages  ---> question/ground_truth

    Hopefully it will resolve your issue if i correctly understood it - Thanks

     

  • Based on my understanding, AI Foundry currently doesn’t provide a built-in way to convert messages format into the structured evaluation schema.

    A practical approach would be to preprocess the dataset by mapping:

    • user → question
    • assistant → ground_truth
    • system → optional (metadata or ignore)

    For multi-turn conversations, this would need to be split into multiple Q&A pairs. For RAG evaluations, reference_context may need to be added separately.

    Alternatively, generating synthetic data directly in the required structured format can help avoid this extra step.