Bringing automated quality scoring and governance to your AI skills catalog
As AI skills become central to enterprise automation and intelligent workflows, ensuring their quality, safety, and discoverability at scale is a growing challenge. Today, we're excited to announce skills assessment in Azure API Center — a built-in, automated quality scoring system powered by the LLM-as-a-judge technique.
This capability enables organizations to continuously evaluate AI skills against defined quality standards, giving platform administrators governance controls and giving developers the confidence to adopt skills that are ready for production.
What is LLM-as-a-Judge?
AI skill assessments using the LLM-as-a-judge technique leverage a large language model to evaluate AI outputs against defined quality criteria — scoring responses across dimensions like accuracy, coherence, and helpfulness. The judge model can be prompted with rubrics, reference answers, or pairwise comparisons, enabling scalable feedback at a fraction of the cost of human annotation.
By embedding this technique directly into Azure API Center, teams can now benefit from continuous, automated quality evaluation — without manual review overhead.
Default Assessment Criteria: Four Dimensions of Quality
API Center comes with default skill assessment criteria out of the box, evaluating skills across four key dimensions — each scored on a 1–5 scale with a default threshold of 3:
- Documentation Clarity — Evaluates how clearly a skill's purpose and behavior are communicated.
- Help Completeness — Assesses whether the output serves as a comprehensive standalone reference.
- Discoverability — Measures how easily functionality can be navigated and found.
- Safe Usage — Evaluates whether sufficient guidance is provided for safe operation.
Enterprise platform administrators can further extend these defaults by defining custom assessment criteria tailored to their organization's specific standards, compliance requirements, and governance policies.
Figure 1: Platform administrators configure skill assessment criteria and thresholds in the Azure API Center portal
Detailed Quality Reports for Developers
Once skills assessment is enabled, developers can view a detailed AI Quality Score report for each skill directly within the API Center portal. This report provides an at-a-glance Pass or Fail status along with per-dimension scores and actionable feedback.
Alongside the LLM-based scores, the report includes:
- Structural Checks — Verifying foundational elements like valid frontmatter, skill name, and body content.
- Schema Validation — Flags missing sections such as examples or error handling
Figure 2: Developers see per-dimension quality scores, structural checks, and schema validation results for each skill in the API Center portal.
What This Means for Developers
As a developer, this means you can quickly understand the quality and reliability of a skill before adopting it — making informed decisions about which skills are ready to use and which may need further refinement.
No more guessing whether a skill is production ready. With skills assessment, every skill in your catalog comes with a transparent quality score, clear actionable feedback, and the confidence that comes from automated, consistent evaluation.
Get Started
Skills assessment is available now in Azure API Center. Platform administrators can enable assessment, configure criteria, and start evaluating skills from the API Center portal today.
To learn more, visit the skills assessment in Azure API Center documentation or try it out in the Azure Portal.