azure ai content safety
42 TopicsAzure AI announces Prompt Shields for Jailbreak and Indirect prompt injection attacks
Our Azure OpenAI Service and Azure AI Content Safety teams are excited to launch a new Responsible AI capability called Prompt Shields. Prompt Shields protects applications powered by Foundation Models from two types of attacks: direct (jailbreak) and indirect attacks, both of which are now available in Public Preview.52KViews11likes3CommentsAzure OpenAI Best Practices Insights from Customer Journeys
When integrating Azure OpenAI’s powerful models into your production environment, it’s essential to follow best practices to ensure security, reliability, and scalability. Azure provides a robust platform with enterprise capabilities that, when leveraged with OpenAI models like GPT-4, DALL-E 3, and various embedding models, can revolutionize how businesses interact with AI. This guidance document contains best practices for scaling OpenAI applications within Azure, detailing resource organization, quota management, rate limiting, and the strategic use of Provisioned Throughput Units (PTUs) and Azure API Management (APIM) for efficient load balancing.14KViews7likes1CommentExplore Azure AI Services: Curated list of prebuilt models and demos
Unlock the potential of AI with Azure's comprehensive suite of prebuilt models and demos. Whether you're looking to enhance speech recognition, analyze text, or process images and documents, Azure AI services offer ready-to-use solutions that make implementation effortless. Explore the diverse range of use cases and discover how these powerful tools can seamlessly integrate into your projects. Dive into the full catalogue of demos and start building smarter, AI-driven applications today.9.8KViews5likes1CommentIntelligent Load Balancing with APIM for OpenAI: Weight-Based Routing
Weightage: There is no direct feature capablities in APIM for weightage based routing.I have tried achieve same results using custom logic with APIM policies Selection Process: Backend logic used in this policy is based on weighted selection method to choose an endpoint route for retry.endpoint with higher weights are more likely to be chosen, but each endpoints route has at least some chance of being selected. This is because the selection is based on a random number that is compared against cumulative weights, which means the selection process inherently favors routes with higher weights due to the way cumulative weights are calculated and utilized13KViews5likes0CommentsCorrection capability helps revise ungrounded content and hallucinations
Today, we are excited to announce a preview of "correction," a new capability within Azure AI Content Safety's groundedness detection feature. With this enhancement, groundedness detection not only identifies inaccuracies in AI outputs but also corrects them, fostering greater trust in generative AI technologies.15KViews4likes2CommentsBest Practices for Mitigating Hallucinations in Large Language Models (LLMs)
Real-world AI Solutions: Lessons from the Field Overview This document provides practical guidance for minimizing hallucinations—instances where models produce inaccurate or fabricated content—when building applications with Azure AI services. It targets developers, architects, and MLOps teams working with LLMs in enterprise settings. Key Outcomes ✅ Reduce hallucinations through retrieval-augmented strategies and prompt engineering ✅ Improve model output reliability, grounding, and explainability ✅ Enable robust enterprise deployment through layered safety, monitoring, and security Understanding Hallucinations Hallucinations come in different forms. Here are some realistic examples for each category to help clarify them: Type Description Example Factual Outputs are incorrect or made up "Albert Einstein won the Nobel Prize in Physics in 1950." (It was 1921) Temporal Stale or outdated knowledge shown as current "The latest iPhone model is the iPhone 12." (When iPhone 15 is current) Contextual Adds concepts that weren’t mentioned or implied Summarizing a doc and adding "AI is dangerous" when the doc never said it Linguistic Grammatically correct but incoherent sentences "The quantum sandwich negates bicycle logic through elegant syntax." Extrinsic Unsupported by source documents Citing nonexistent facts in a RAG-backed chatbot Intrinsic Contradictory or self-conflicting answers Saying both "Azure OpenAI supports fine-tuning" and "Azure OpenAI does not." Mitigation Strategies 1- Retrieval-Augmented Generation (RAG) Grounding model outputs with enterprise knowledge sources like PDFs, SharePoint docs, or images. Key Practices: Data Preparation and Organization Clean and curate your data. Organize data into topics to improve search accuracy and prevent noise. Regularly audit and update grounding data to avoid outdated or biased content. Search and Retrieval Techniques Explore different methods (keyword, vector, hybrid, semantic search) to find the best fit for your use case. Use metadata filtering (e.g., tagging by recency or source reliability) to prioritize high-quality information. Apply data chunking to improve retrieval efficiency and clarity. Query Engineering and Post-Processing Use prompt engineering to specify which data source or section to pull from. Apply query transformation methods (e.g., sub-queries) for complex queries. Employ reranking methods to boost output quality. 2- Prompt Engineering High-quality prompts guide LLMs to produce factual and relevant responses. Use the ICE method: Instructions: Start with direct, specific asks. Constraints: Add boundaries like "only from retrieved docs". Escalation: Include fallback behaviors (e.g., “Say ‘I don’t know’ if unsure”). Example Prompt Improvement: ❌: Summarize this document. ✅: Using only the retrieved documentation, summarize this paper in 3–5 bullet points. If any information is missing, reply with 'Insufficient data.' Prompt Patterns That Work: Clarity and Specificity Write clear, unambiguous instructions to minimize misinterpretation. Use detailed prompts, e.g., "Provide only factual, verified information. If unsure, respond with 'I don't know.'" Structure Break down complex tasks into smaller logical subtasks for accuracy. Example: Research Paper Analysis ❌ Bad Prompt (Too Broad, Prone to Hallucination): "Summarize this research paper and explain its implications." ✅ Better Prompt (Broken into Subtasks): Extract Core Information: "Summarize the key findings of the research paper in 3-5 bullet points." Assess Reliability: "Identify the sources of data used and assess their credibility." Determine Implications: "Based on the findings, explain potential real-world applications." Limit Speculation: "If any conclusions are uncertain, indicate that explicitly rather than making assumptions." Repetition Repeating key instructions in a prompt can help reduce hallucinations. The way you structure the repetition matters. Here are some best practices: Beginning (Highly Recommended) The start of the prompt has the most impact on how the LLM interprets the task. Place essential guidelines here, such as: "Provide only factual, verified information." End (For Final Confirmation or Safety Checks) Use the end to reinforce key rules. Instead of repeating the initial instruction verbatim, word it differently to reinforce it, and keep it concise. For example: "If unsure, clearly state 'I don't know.'" Temperature Control Adjust temperature settings (0.1–0.4) for deterministic, focused responses. Chain-of-Thought Incorporate "Chain-of-Thought" instructions to encourage logical, stepwise responses. For example, to solve a math problem: "Solve this problem step-by-step. First, break it into smaller parts. Explain each step before moving to the next." Tip: Use Azure AI Prompt Flow’s playground to test prompt variations with parameter sweeps. 3- System-Level Defenses Mitigation isn't just prompt-side—it requires end-to-end design. Key Recommendations: Content Filtering: Use Azure AI Content Safety to detect sexual, hate, violence, or self-harm content. Metaprompts: Define system boundaries ("You can only answer from documents retrieved"). RBAC & Networking: Use Azure Private Link, VNETs, and Microsoft Entra ID for secure access. 4- Evaluation & Feedback Loops Continuously evaluate outputs using both automated and human-in-the-loop feedback. Real-World Setup: Labeling Teams: Review hallucination-prone cases with Human in Loop integrations. Automated Test Generation Use LLMs to generate diverse test cases covering multiple inputs and difficulty levels. Simulate real-world queries to evaluate model accuracy. Evaluations Using Multiple LLMs Cross-evaluate outputs from multiple LLMs. Use ranking and comparison to refine model performance. Be cautious—automated evaluations may miss subtle errors requiring human oversight. Tip: Common Evaluation Metrics Metric What It Measures How to Use It Relevance Score How closely the model's response aligns with the user query and intent (0–1 scale). Use automated LLM-based grading or semantic similarity to flag off-topic or loosely related answers. Groundedness Score Whether the output is supported by retrieved documents or source context. Use manual review or Azure AI Evaluation tools (like RAG evaluation) to identify unsupported claims. User Trust Score Real-time feedback from users, typically collected via thumbs up/down or star ratings. Track trends to identify low-confidence flows and prioritize them for prompt tuning or data curation. Tip: Use evaluation scores in combination. For example, high relevance but low groundedness often signals hallucination risks—especially in chat apps with fallback answers. Tip: Flag any outputs where "source_confidence" < threshold and route them to a human review queue. Tip: Include “accuracy audits” as part of your CI/CD pipeline using Prompt Flow or other evaluations tools to test components. Summary & Deployment Checklist Task Tools/Methods Curate and chunk enterprise data Azure AI Search, data chunkers Use clear, scoped, role-based prompts Prompt engineering, prompt templates Ground all outputs using RAG Azure AI Search + Azure OpenAI Automate evaluation flows Prompt Flow + custom evaluators Add safety filters and monitoring Azure Content Safety, Monitor, Insights Secure deployments with RBAC/VNET Azure Key Vault, Entra ID, Private Link Additional AI Best Practices blog posts: Best Practices for Requesting Quota Increase for Azure OpenAI Models Best Practices for Leveraging Azure OpenAI in Constrained Optimization Scenarios Best Practices for Structured Extraction from Documents Using Azure OpenAI Best Practices for Using Generative AI in Automated Response Generation for Complex Decision Making Best Practices for Leveraging Azure OpenAI in Code Conversion Scenarios Kickstarting AI Agent Development with Synthetic Data: A GenAI Approach on Azure | Microsoft Community Hub3.8KViews3likes0Comments