azure resource management
140 TopicsAzure Key Vault RBAC (Role Based Access Control) versus Access Policies!
Dear Microsoft Azure Friends, With an Azure Key Vault, RBAC (Role Based Access Control) and Access Policies always leads to confusion. Let me take this opportunity to explain this with a small example. First of all, let me show you with which account I logged into the Azure Portal. You can see this in the graphic on the top right. Now let's examine the subscription named "MSDN Platforms" by navigating to (Access Control IAM). In "Check Access" we are looking for a specific person. It is the Jane Ford, we see that Jane has the Contributor right on this subscription. So she can do (almost) everything except change or assign permissions. This is in short the Contributor right. Now we search for the Azure Kay Vault in "All resources", for this it is good to work with a filter. As you can see, Azure Key Vault (twkv77) is part of the "MSDN Platforms" subscription. We check again that Jane Ford has the Contributor Role (Inherited) by navigating to "Access Control IAM) in the Azure Kay Vault and clicking on "Role assignment". Now we navigate to "Access Policies" in the Azure Key Vault. As you can see there is a policy for the user "Tom" but none for Jane Ford. With an Access Policy you determine who has access to the key, passwords and certificates. This means that if there is no access policy for Jane, she will not have access to keys, passwords, etc. That's exactly what we're about to check. As you can see in the upper right corner I registered as "Jane Ford" (she gave me the authorization ;-)). If I now navigate to the keys we see immediately that the Jane has no right to look at the keys. There is no access policy for Jane where for example the right "List" is included, so she can't access the keys. With RBAC you control the so-called Management Plane and with the Access Policies the Data Plane. Now you know the difference between RBAC and an Access Policy in an Azure Key Vault! Sure this wasn't super exciting, but I still wanted to share this information with you. I hope this article was helpful for you? Thank you for taking the time to read this article. Best regards, Tom Wechsler16KViews6likes0CommentsBuilding Multi-Agent Orchestration Using Microsoft Semantic Kernel: A Complete Step-by-Step Guide
What You Will Build By the end of this guide, you will have a working multi-agent system where 4 specialist AI agents collaborate to diagnose production issues: ClientAnalyst — Analyzes browser, JavaScript, CORS, uploads, and UI symptoms NetworkAnalyst — Analyzes DNS, TCP/IP, TLS, load balancers, and firewalls ServerAnalyst — Analyzes backend logs, database, deployments, and resource limits Coordinator — Synthesizes all findings into a root cause report with a prioritized action plan These agents don't just run in sequence — they debate, cross-examine, and challenge each other's findings through a shared conversation, producing a diagnosis that's better than any single agent could achieve alone. Table of Contents Why Multi-Agent? The Problem with Single Agents Architecture Overview Understanding the Key SK Components The Actor Model — How InProcessRuntime Works Setting Up Your Development Environment Step-by-Step: Building the Multi-Agent Analyzer The Agent Interaction Flow — Round by Round Bugs I Found & Fixed — Lessons Learned Running with Different AI Providers What to Build Next 1. Why Multi-Agent? The Problem with Single Agents A single AI agent analyzing a production issue is like having one doctor diagnose everything — they'll catch issues in their specialty but miss cross-domain connections. Consider this problem: "Users report 504 Gateway Timeout errors when uploading files larger than 10MB. Started after Friday's deployment. Worse during peak hours." A single agent might say "it's a server timeout" and stop. But the real root cause often spans multiple layers: The client is sending chunked uploads with an incorrect Content-Length header (client-side bug) The load balancer has a 30-second timeout that's too short for large uploads (network config) The server recently deployed a new request body parser that's 3x slower (server-side regression) The combination only fails during peak hours because connection pool saturation amplifies the latency No single perspective catches this. You need specialists who analyze independently, then debate to find the cross-layer causal chain. That's what multi-agent orchestration gives you. The 5 Orchestration Patterns in SK Semantic Kernel provides 5 built-in patterns for agent collaboration: SEQUENTIAL: A → B → C → Done (pipeline — each builds on previous) CONCURRENT: ↗ A ↘ Task → B → Aggregate ↘ C ↗ (parallel — results merged) GROUP CHAT: A ↔ B ↔ C ↔ D ← We use this one (rounds, shared history, debate) HANDOFF: A → (stuck?) → B → (complex?) → Human (escalation with human-in-the-loop) MAGENTIC: LLM picks who speaks next dynamically (AI-driven speaker selection) We use GroupChatOrchestration with RoundRobinGroupChatManager because our problem requires agents to see each other's work, challenge assumptions, and build on each other's analysis across two rounds. 2. Architecture Overview Here's the complete architecture of what we're building: 3. Understanding the Key SK Components Before we write code, let's understand the 5 components we'll use and the design pattern each implements: ChatCompletionAgent — Strategy Pattern The agent definition. Each agent is a combination of: name — unique identifier (used in round-robin ordering) instructions — the persona and rules (this is the prompt engineering) service — which AI provider to call (Strategy Pattern — swap providers without changing agent logic) description — what other agents/tools understand about this agent agent = ChatCompletionAgent( name="ClientAnalyst", instructions="You are ONLY ClientAnalyst...", service=gemini_service, # ← Strategy: swap to OpenAI with zero changes description="Analyzes client-side issues", ) GroupChatOrchestration — Mediator Pattern The orchestration defines HOW agents interact. It's the Mediator — agents don't talk to each other directly. Instead, the orchestration manages a shared ChatHistory and routes messages through the Manager. RoundRobinGroupChatManager — Strategy Pattern The Manager decides WHO speaks next. RoundRobinGroupChatManager cycles through agents in a fixed order. SK also provides AutomaticGroupChatManager where the LLM decides who speaks next. max_rounds is the total number of messages per agent or cycle. With 4 agents and max_rounds=8, each agent speaks exactly twice. InProcessRuntime — Actor Model Abstraction The execution engine. Every agent becomes an "actor" with its own kind of mailbox (message queue). The runtime delivers messages between actors. Key properties: No shared state — agents communicate only through messages Sequential processing — each agent processes one message at a time Location transparency — same code works in-process today, distributed tomorrow agent_response_callback — Observer Pattern A function that fires after EVERY agent response. We use it to display each agent's output in real-time with emoji labels and round numbers. 4. The Actor Model — How InProcessRuntime Works The Actor Model is a concurrency pattern where each entity is an isolated "actor" with a private mailbox. Here's what happens inside InProcessRuntime when we run our demo: runtime.start() │ ├── Creates internal message loop (asyncio event loop) │ orchestration.invoke(task="504 timeout...", runtime=runtime) │ ├── Creates Actor[Orchestrator] → manages overall flow ├── Creates Actor[Manager] → RoundRobinGroupChatManager ├── Creates Actor[ClientAnalyst] → mailbox created, waiting ├── Creates Actor[NetworkAnalyst] → mailbox created, waiting ├── Creates Actor[ServerAnalyst] → mailbox created, waiting └── Creates Actor[Coordinator] → mailbox created, waiting Manager receives "start" message │ ├── Checks turn order: [Client, Network, Server, Coordinator] ├── Sends task to ClientAnalyst mailbox │ → ClientAnalyst processes: calls LLM → response │ → Response added to shared ChatHistory │ → callback fires (displayed in Notebook UI) │ → Sends "done" back to Manager │ ├── Manager updates: turn_index=1 ├── Sends to NetworkAnalyst mailbox │ → Same flow... │ ├── ... (ServerAnalyst, Coordinator for Round 1) │ ├── Manager checks: messages=4, max_rounds=8 → continue │ ├── Round 2: same cycle with cross-examination │ └── After message 8: Manager sends "complete" → OrchestrationResult resolves → result.get() returns final answer runtime.stop_when_idle() → All mailboxes empty → clean shutdown The Actor Model guarantees: No race conditions (each actor processes one message at a time) No deadlocks (no shared locks to contend for) No shared mutable state (agents communicate only via messages) 5. Setting Up Your Development Environment Prerequisites Python 3.11 or 3.12 (3.13+ may have compatibility issues with some SK connectors) Visual Studio Code with the Python and Jupyter extensions An API key from one of: Google AI Studio (free), OpenAI Step 1: Install Python Download from python.org. During installation, check "Add Python to PATH". Verify: python --version # Python 3.12.x Step 2: Install VS Code Extensions Open VS Code, go to Extensions (Ctrl+Shift+X), and install: Python (by Microsoft) — Python language support Jupyter (by Microsoft) — Notebook support Pylance (by Microsoft) — IntelliSense and type checking Step 3: Create Project Folder mkdir sk-multiagent-demo cd sk-multiagent-demo Open in VS Code: code . Step 4: Create Virtual Environment Open the VS Code terminal (Ctrl+`) and run: # Create virtual environment python -m venv sk-env # Activate it # Windows: sk-env\Scripts\activate # macOS/Linux: source sk-env/bin/activate You should see (sk-env) in your terminal prompt. Step 5: Install Semantic Kernel For Google Gemini (free tier — recommended for getting started): pip install semantic-kernel[google] python-dotenv ipykernel For OpenAI (paid API key): pip install semantic-kernel openai python-dotenv ipykernel For Azure AI Foundry (enterprise, Entra ID auth): pip install semantic-kernel azure-identity python-dotenv ipykernel Step 6: Register the Jupyter Kernel python -m ipykernel install --user --name=sk-env --display-name="Semantic Kernel (Python 3.12)" You can also select if this is already available from your environment from VSCode as below: Step 7: Get Your API Key Option A — Google Gemini (FREE, recommended for demo): Go to https://aistudio.google.com/apikey Click "Create API Key" Copy the key Free tier limits: 15 requests/minute, 1 million tokens/minute — more than enough for this demo. Option B — OpenAI: Go to https://platform.openai.com/api-keys Create a new key Copy the key Option C — Azure AI Foundry: Deploy a model in Azure AI Foundry portal Note the endpoint URL and deployment name If key-based auth is disabled, you'll need Entra ID with permissions Step 8: Create the .env File In your project root, create a file named .env: For Gemini: GOOGLE_AI_API_KEY=AIzaSy...your-key-here GOOGLE_AI_GEMINI_MODEL_ID=gemini-2.5-flash For OpenAI: OPENAI_API_KEY=sk-...your-key-here OPENAI_CHAT_MODEL_ID=gpt-4o For Azure AI Foundry: AZURE_OPENAI_ENDPOINT=https://your-resource.cognitiveservices.azure.com AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=gpt-4o AZURE_OPENAI_API_KEY=your-key Step 9: Create the Notebook In VS Code: Click File > New File Save as multi_agent_analyzer.ipynb In the top-right of the notebook, click Select Kernel Choose Semantic Kernel (Python 3.12) (or your sk-env) Your environment is ready. Let's build. 6. Step-by-Step: Building the Multi-Agent Analyzer Cell 1: Verify Setup import semantic_kernel print(f"Semantic Kernel version: {semantic_kernel.__version__}") from semantic_kernel.agents import ( ChatCompletionAgent, GroupChatOrchestration, RoundRobinGroupChatManager, ) from semantic_kernel.agents.runtime import InProcessRuntime from semantic_kernel.contents import ChatMessageContent print("All imports successful") Cell 2: Load API Key and Create Service For Gemini: import os from dotenv import load_dotenv load_dotenv() from semantic_kernel.connectors.ai.google.google_ai import ( GoogleAIChatCompletion, GoogleAIChatPromptExecutionSettings, ) from semantic_kernel.contents import ChatHistory GEMINI_API_KEY = os.getenv("GOOGLE_AI_API_KEY") GEMINI_MODEL = os.getenv("GOOGLE_AI_GEMINI_MODEL_ID", "gemini-2.5-flash") service = GoogleAIChatCompletion( gemini_model_id=GEMINI_MODEL, api_key=GEMINI_API_KEY, ) print(f"Service created: Gemini {GEMINI_MODEL}") # Smoke test settings = GoogleAIChatPromptExecutionSettings() test_history = ChatHistory(system_message="You are a helpful assistant.") test_history.add_user_message("Say 'Connected!' and nothing else.") response = await service.get_chat_message_content( chat_history=test_history, settings=settings ) print(f"Model says: {response.content}") For OpenAI: import os from dotenv import load_dotenv load_dotenv() from semantic_kernel.connectors.ai.open_ai import ( OpenAIChatCompletion, OpenAIChatPromptExecutionSettings, ) from semantic_kernel.contents import ChatHistory service = OpenAIChatCompletion( ai_model_id=os.getenv("OPENAI_CHAT_MODEL_ID", "gpt-4o"), ) print(f"Service created: OpenAI {os.getenv('OPENAI_CHAT_MODEL_ID', 'gpt-4o')}") # Smoke test settings = OpenAIChatPromptExecutionSettings() test_history = ChatHistory(system_message="You are a helpful assistant.") test_history.add_user_message("Say 'Connected!' and nothing else.") response = await service.get_chat_message_content( chat_history=test_history, settings=settings ) print(f"Model says: {response.content}") Cell 3: Define All 4 Agents This is the most important cell — the prompt engineering that makes the demo work: from semantic_kernel.agents import ChatCompletionAgent # ═══════════════════════════════════════════════════ # AGENT 1: Client-Side Analyst # ═══════════════════════════════════════════════════ client_agent = ChatCompletionAgent( name="ClientAnalyst", description="Analyzes problems from the client-side: browser, JS, CORS, caching, UI symptoms", instructions="""You are ONLY **ClientAnalyst**. You must NEVER speak as NetworkAnalyst, ServerAnalyst, or Coordinator. Every word you write is from ClientAnalyst's perspective only. You are a senior front-end and client-side diagnostics expert. When given a problem statement, analyze it EXCLUSIVELY from the client side: 1. **Browser & Rendering**: DOM issues, JavaScript errors, CSS rendering, browser compatibility, memory leaks, console errors. 2. **Client-Side Caching**: Stale cache, service worker issues, local storage corruption. 3. **Network from Client View**: CORS errors, preflight failures, request timeouts, client-side retry storms, fetch/XHR configuration. 4. **Upload Handling**: File API usage, chunk upload implementation, progress tracking, FormData construction, content-type headers. 5. **UI/UX Symptoms**: What the user sees, error messages displayed, loading states. ROUND 1: Provide your independent analysis. Do NOT reference other agents. List your top 3 most likely causes with evidence. Every response MUST be at least 200 words. ROUND 2: You MUST: - Reference NetworkAnalyst and ServerAnalyst BY NAME - State specifically where you AGREE or DISAGREE with their findings - Answer the Coordinator's questions from your perspective - Add NEW cross-layer insights you see from the client perspective - Do NOT just say 'I agree' — provide substantive technical reasoning Be specific, evidence-based, and prioritize findings by likelihood.""", service=service, ) # ═══════════════════════════════════════════════════ # AGENT 2: Network Analyst # ═══════════════════════════════════════════════════ network_agent = ChatCompletionAgent( name="NetworkAnalyst", description="Analyzes problems from the network side: DNS, TCP, TLS, firewalls, load balancers, latency", instructions="""You are ONLY **NetworkAnalyst**. You must NEVER speak as ClientAnalyst, ServerAnalyst, or Coordinator. Every word you write is from NetworkAnalyst's perspective only. You are a senior network infrastructure diagnostics expert. When given a problem statement, analyze it EXCLUSIVELY from the network layer: 1. **DNS & Resolution**: DNS TTL, propagation delays, record misconfigurations. 2. **TCP/IP & Connections**: Connection pooling, keep-alive, TCP window scaling, connection resets, SYN floods. 3. **TLS/SSL**: Certificate issues, handshake failures, protocol version mismatches. 4. **Load Balancers & Proxies**: Sticky sessions, health checks, timeout configs, request body size limits, proxy buffering. 5. **Firewall & WAF**: Rule blocks, rate limiting, request inspection delays, geo-blocking, DDoS protection interference. ROUND 1: Provide your independent analysis. Do NOT reference other agents. List your top 3 most likely causes with evidence. Every response MUST be at least 200 words. ROUND 2: You MUST: - Reference ClientAnalyst and ServerAnalyst BY NAME - State specifically where you AGREE or DISAGREE with their findings - Answer the Coordinator's questions from your perspective - Add NEW cross-layer insights you see from the network perspective - Do NOT just say 'I am ready to proceed' — provide substantive technical analysis Be specific, evidence-based, and prioritize findings by likelihood.""", service=service, ) # ═══════════════════════════════════════════════════ # AGENT 3: Server-Side Analyst # ═══════════════════════════════════════════════════ server_agent = ChatCompletionAgent( name="ServerAnalyst", description="Analyzes problems from the server side: backend app, database, logs, resources, deployments", instructions="""You are ONLY **ServerAnalyst**. You must NEVER speak as ClientAnalyst, NetworkAnalyst, or Coordinator. Every word you write is from ServerAnalyst's perspective only. You are a senior backend and infrastructure diagnostics expert. When given a problem statement, analyze it EXCLUSIVELY from the server side: 1. **Application Server**: Error logs, exception traces, thread pool exhaustion, memory leaks, CPU spikes, garbage collection pauses. 2. **Database**: Slow queries, connection pool saturation, lock contention, deadlocks, replication lag, query plan changes. 3. **Deployment & Config**: Recent deployments, configuration changes, feature flags, environment variable mismatches, rollback candidates. 4. **Resource Limits**: File upload size limits, request body limits, disk space, temporary file cleanup, storage quotas. 5. **External Dependencies**: Upstream API timeouts, third-party service degradation, queue backlogs, cache (Redis/Memcached) issues. ROUND 1: Provide your independent analysis. Do NOT reference other agents. List your top 3 most likely causes with evidence. Every response MUST be at least 200 words. ROUND 2: You MUST: - Reference ClientAnalyst and NetworkAnalyst BY NAME - State specifically where you AGREE or DISAGREE with their findings - Answer the Coordinator's questions from your perspective - Add NEW cross-layer insights you see from the server perspective - Do NOT just say 'I agree' — provide substantive technical reasoning Be specific, evidence-based, and prioritize findings by likelihood.""", service=service, ) # ═══════════════════════════════════════════════════ # AGENT 4: Coordinator # ═══════════════════════════════════════════════════ coordinator_agent = ChatCompletionAgent( name="Coordinator", description="Synthesizes all specialist analyses into a final root cause report with prioritized action plan", instructions="""You are ONLY **Coordinator**. You must NEVER speak as ClientAnalyst, NetworkAnalyst, or ServerAnalyst. You synthesize — you do NOT do domain-specific analysis. You are the lead engineer who synthesizes the team's findings. ═══ ROUND 1 BEHAVIOR (your first turn, message 4) ═══ Keep this SHORT — maximum 300 words. - Note 2-3 KEY PATTERNS across the three analyses - Identify where specialists AGREE (high-confidence) - Identify where they CONTRADICT (needs resolution) - Ask 2-3 SPECIFIC QUESTIONS for Round 2 Round 1 MUST NOT: assign tasks, create action plans, write reports, or tell agents what to take lead on. Observation + questions ONLY. ═══ ROUND 2 BEHAVIOR (your final turn, message 8) ═══ Keep this FOCUSED — maximum 800 words. Produce a structured report: 1. **Root Cause** (1 paragraph): The #1 most likely cause with causal chain across layers. Reference specific findings from each specialist. 2. **Confidence** (short list): - HIGH: Areas where all 3 agreed - MEDIUM: Areas where 2 of 3 agreed - LOW: Disagreements needing investigation 3. **Action Plan** (numbered, max 6 items): For each: - What to do (specific) - Owner (Client/Network/Server team) - Time estimate 4. **Quick Wins vs Long-term** (2 short lists) Do NOT repeat what specialists already said verbatim. Synthesize, don't echo.""", service=service, ) # ═══════════════════════════════════════════════════ # All 4 agents — order = RoundRobin order # ═══════════════════════════════════════════════════ agents = [client_agent, network_agent, server_agent, coordinator_agent] print(f"{len(agents)} agents created:") for i, a in enumerate(agents, 1): print(f" {i}. {a.name}: {a.description[:60]}...") print(f"\nRoundRobin order: {' → '.join(a.name for a in agents)}") Cell 4: Run the Analysis from semantic_kernel.agents import GroupChatOrchestration, RoundRobinGroupChatManager from semantic_kernel.agents.runtime import InProcessRuntime from semantic_kernel.contents import ChatMessageContent from IPython.display import display, Markdown # ╔══════════════════════════════════════════════════════════╗ # ║ EDIT YOUR PROBLEM STATEMENT HERE ║ # ╚══════════════════════════════════════════════════════════╝ PROBLEM = """ Users are reporting intermittent 504 Gateway Timeout errors when trying to upload files larger than 10MB through our web application. The issue started after last Friday's deployment and seems worse during peak hours (2-5 PM EST). Some users also report that smaller file uploads work fine but the progress bar freezes at 85% for large files before timing out. """ # ════════════════════════════════════════════════════════════ agent_responses = [] def agent_response_callback(message: ChatMessageContent) -> None: name = message.name or "Unknown" content = message.content or "" agent_responses.append({"agent": name, "content": content}) emoji = { "ClientAnalyst": "🖥️", "NetworkAnalyst": "🌐", "ServerAnalyst": "⚙️", "Coordinator": "🎯" }.get(name, "🔹") round_num = (len(agent_responses) - 1) // len(agents) + 1 display(Markdown( f"---\n### {emoji} {name} (Message {len(agent_responses)}, Round {round_num})\n\n{content}" )) MAX_ROUNDS = 8 # 4 agents × 2 rounds = 8 messages exactly task = f"""## Problem Statement {PROBLEM.strip()} ## Discussion Rules You are in a GROUP DISCUSSION with 4 members. You can see ALL previous messages. There are exactly 2 rounds. ### ROUND 1 (Messages 1-4): Independent Analysis - ClientAnalyst, NetworkAnalyst, ServerAnalyst: Analyze from YOUR domain only. Give your top 3 most likely causes with evidence and reasoning. - Coordinator: Note patterns across the 3 analyses. Ask 2-3 specific questions. Do NOT assign tasks yet. ### ROUND 2 (Messages 5-8): Cross-Examination & Final Report - ClientAnalyst, NetworkAnalyst, ServerAnalyst: You MUST reference the OTHER specialists BY NAME. State where you agree, disagree, or have new insights. Answer the Coordinator's questions. Provide SUBSTANTIVE analysis. - Coordinator: Produce the FINAL structured report: root cause, confidence levels, prioritized action plan with owners and time estimates. IMPORTANT: Each agent speaks as THEMSELVES only. Never impersonate another agent.""" display(Markdown(f"## Problem Statement\n\n{PROBLEM.strip()}")) display(Markdown(f"---\n## Discussion Starting — {len(agents)} agents, {MAX_ROUNDS} rounds\n")) # Build and run orchestration = GroupChatOrchestration( members=agents, manager=RoundRobinGroupChatManager(max_rounds=MAX_ROUNDS), agent_response_callback=agent_response_callback, ) runtime = InProcessRuntime() runtime.start() result = await orchestration.invoke(task=task, runtime=runtime) final_result = await result.get(timeout=300) await runtime.stop_when_idle() display(Markdown(f"---\n## FINAL CONCLUSION\n\n{final_result}")) Cell 5: Statistics and Validation print("═" * 55) print(" ANALYSIS STATISTICS") print("═" * 55) emojis = {"ClientAnalyst": "🖥️", "NetworkAnalyst": "🌐", "ServerAnalyst": "⚙️", "Coordinator": "🎯"} agent_counts = {} agent_chars = {} for r in agent_responses: agent_counts[r["agent"]] = agent_counts.get(r["agent"], 0) + 1 agent_chars[r["agent"]] = agent_chars.get(r["agent"], 0) + len(r["content"]) for agent, count in agent_counts.items(): em = emojis.get(agent, "🔹") chars = agent_chars.get(agent, 0) avg = chars // count if count else 0 print(f" {em} {agent}: {count} msg(s), ~{chars:,} chars (avg {avg:,}/msg)") print(f"\n Total messages: {len(agent_responses)}") total_chars = sum(len(r['content']) for r in agent_responses) print(f" Total analysis: ~{total_chars:,} characters") # Validation print(f"\n Validation:") import re identity_issues = [] for r in agent_responses: other_agents = [a.name for a in agents if a.name != r["agent"]] for other in other_agents: pattern = rf'(?i)as {re.escape(other)}[,:]?\s+I\b' if re.search(pattern, r["content"][:300]): identity_issues.append(f"{r['agent']} impersonated {other}") if identity_issues: print(f" Identity confusion: {identity_issues}") else: print(f" No identity confusion detected") thin = [r for r in agent_responses if len(r["content"].strip()) < 100] if thin: for t in thin: print(f" Thin response from {t['agent']}") else: print(f" All responses are substantive") Cell 6: Save Report from datetime import datetime timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") filename = f"analysis_report_{timestamp}.md" with open(filename, "w", encoding="utf-8") as f: f.write(f"# Problem Analysis Report\n\n") f.write(f"**Generated:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n") f.write(f"**Agents:** {', '.join(a.name for a in agents)}\n") f.write(f"**Rounds:** {MAX_ROUNDS}\n\n---\n\n") f.write(f"## Problem Statement\n\n{PROBLEM.strip()}\n\n---\n\n") for i, r in enumerate(agent_responses, 1): em = emojis.get(r['agent'], '🔹') round_num = (i - 1) // len(agents) + 1 f.write(f"### {em} {r['agent']} (Message {i}, Round {round_num})\n\n") f.write(f"{r['content']}\n\n---\n\n") f.write(f"## Final Conclusion\n\n{final_result}\n") print(f"Report saved to: {filename}") 7. The Agent Interaction Flow — Round by Round Here's what actually happens during the 8-message orchestration: Round 1: Independent Analysis (Messages 1-4) Msg Agent What They See What They Do 1 ClientAnalyst Problem statement only Analyzes from client perspective: upload chunking, progress bar freezing at 85%, CORS, content-type headers 2 NetworkAnalyst Problem + ClientAnalyst's analysis Gives INDEPENDENT analysis despite seeing msg 1: load balancer timeouts, proxy body size limits, TCP window scaling 3 ServerAnalyst Problem + msgs 1-2 Gives INDEPENDENT analysis: recent deployment regression, request body parser, thread pool exhaustion, disk space 4 Coordinator Problem + msgs 1-3 Observes patterns: "All three mention timeout configuration. ClientAnalyst and NetworkAnalyst both point to body size. Question: Was the deployment a backend-only change or did it include infra?" Round 2: Cross-Examination (Messages 5-8) Msg Agent What They Do 5 ClientAnalyst "I agree with NetworkAnalyst that the load balancer timeout is likely a factor — the 85% freeze point matches the 30-second LB timeout for a 10MB upload on our average upload speed. However, I disagree with ServerAnalyst about thread pool exhaustion because the UI shows a clean 504, not a connection reset." 6 NetworkAnalyst "ServerAnalyst's point about the recent deployment is critical — if the new request parser is 3x slower, that would push uploads past the LB timeout. I can confirm the LB has a 30s idle timeout. The fix is both: increase LB timeout AND optimize the parser." 7 ServerAnalyst "Responding to Coordinator's question: The deployment was backend-only — a new multipart parser using streaming instead of buffered reads. ClientAnalyst is correct that the 504 is from the LB, not the app. The app itself returns 200 after 45 seconds, but the LB kills the connection at 30." 8 Coordinator Produces final structured report with root cause: "The backend deployment introduced a slower multipart parser (45s vs 15s for 10MB). The load balancer's 30s timeout kills the connection at ~85% progress. Fix: immediate — increase LB timeout to 120s. Short-term — optimize parser. Long-term — implement chunked uploads with progress resumption." Notice: The Round 2 analysis is dramatically better than Round 1. Agents reference each other by name, build on each other's findings, and the Coordinator can synthesize a cross-layer causal chain that no single agent could have produced. I made a small adjustment to the issue with Azure Web Apps. Please find the details below from testing carried out using Google Gemini: 8. Bugs I Found & Fixed — Lessons Learned Building this demo taught me several important lessons about multi-agent systems: Bug 1: Agents Speaking Only Once Symptom: Only 4 messages instead of 8. Root cause: The agents list was missing the Coordinator. It was defined in a separate cell and wasn't included in the members list. Fix: All 4 agents must be in the same list passed to GroupChatOrchestration. Bug 2: NetworkAnalyst Says "I'm Ready to Proceed" Symptom: NetworkAnalyst's Round 2 response was just "I'm ready to proceed with the analysis" — no actual content. Root cause: The Coordinator's Round 1 message was assigning tasks ("NetworkAnalyst, please check the load balancer config"), and the agent was acknowledging the assignment instead of analyzing. Fix: Added explicit constraint to Coordinator: "Round 1 MUST NOT assign tasks — observation + questions ONLY." Bug 3: ServerAnalyst Says "As NetworkAnalyst, I..." Symptom: ServerAnalyst's response started with "As NetworkAnalyst, I believe..." Root cause: LLM identity bleeding. When agents share ChatHistory, the LLM sometimes loses track of which agent it's currently playing. This is especially common with Gemini. Fix: Identity anchoring at the very top of every agent's instructions: "You are ONLY ServerAnalyst. You must NEVER speak as ClientAnalyst, NetworkAnalyst, or Coordinator." Bug 4: Gemini Gives Thin/Empty Responses Symptom: Some agents responded with just one sentence or "I concur." Root cause: Gemini 2.5 Flash is more concise than GPT-4o by default. Without explicit length requirements, it takes shortcuts. Fix: Added "Every response MUST be at least 200 words" and "Answer the Coordinator's questions" to every specialist's instructions. Bug 5: Coordinator's Report is 18K Characters Symptom: The Coordinator's Round 2 response was absurdly long — repeating everything every specialist said. Fix: Added word limits: "Round 1 max 300 words, Round 2 max 800 words" and "Synthesize, don't echo." Bug 6: MAX_ROUNDS Math Symptom: With MAX_ROUNDS=9, ClientAnalyst spoke a 3rd time after the Coordinator's final report — breaking the clean 2-round structure. Fix: MAX_ROUNDS must equal (number of agents × number of rounds). For 4 agents × 2 rounds = 8. 9. Running with Different AI Providers The beauty of SK's Strategy Pattern is that you change ONE LINE to switch providers. Everything else — agents, orchestration, callbacks, validation — stays identical. Gemini setup: from semantic_kernel.connectors.ai.google.google_ai import GoogleAIChatCompletion service = GoogleAIChatCompletion( gemini_model_id="gemini-2.5-flash", api_key=os.getenv("GOOGLE_AI_API_KEY"), ) OpenAI Setup from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion service = OpenAIChatCompletion( ai_model_id="gpt-4o", api_key=os.getenv("OPEN_AI_API_KEY"), ) 10. What to Build Next Add Plugins to Agents Give agents real tools — not just LLM reasoning - looks exciting right ;) class NetworkDiagnosticPlugin: (description="Pings a host and returns latency") def ping(self, host: str) -> str: result = subprocess.run(["ping", "-c", "3", host], capture_output=True, text=True) return result.stdout class LogSearchPlugin: (description="Searches server logs for error patterns") def search_logs(self, pattern: str, hours: int = 1) -> str: # Query your log aggregator (Splunk, ELK, Azure Monitor) return query_logs(pattern, hours) Add Filters for Governance Intercept every agent call for PII redaction and audit logging: .filter(filter_type=FilterTypes.FUNCTION_INVOCATION) async def audit_filter(context, next): print(f"[AUDIT] {context.function.name} called by agent") await next(context) print(f"[AUDIT] {context.function.name} returned") Try Different Orchestration Patterns Replace GroupChat with Sequential for a pipeline approach: # Instead of debate, each agent builds on the previous orchestration = SequentialOrchestration( members=[client_agent, network_agent, server_agent, coordinator_agent] ) Or Concurrent for parallel analysis: # All specialists analyze simultaneously, Coordinator aggregates orchestration = ConcurrentOrchestration( members=[client_agent, network_agent, server_agent] ) Deploy to Azure Move from InProcessRuntime to Azure Container Apps for production scaling. The agent code doesn't change — only the runtime. Summary The key insight from building this demo: multi-agent systems produce better results than single agents not because each agent is smarter, but because the debate structure forces cross-domain thinking that a single prompt can never achieve. The Coordinator's final report consistently identifies causal chains that span client, network, and server layers — exactly the kind of insight that production incident response teams need. Semantic Kernel makes this possible with clean separation of concerns: agents define WHAT to analyze, orchestration defines HOW they interact, the manager defines WHO speaks when, the runtime handles WHERE it executes, and callbacks let you OBSERVE everything. Each piece is independently swappable — that's the power of SK from Microsoft. Resources: GitHub: github.com/microsoft/semantic-kernel Docs: learn.microsoft.com/semantic-kernel Orchestration Patterns: learn.microsoft.com/semantic-kernel/frameworks/agent/agent-orchestration Discord: aka.ms/sk/discord Disclaimer: The sample scripts provided in this article are provided AS IS without warranty of any kind. The author is not responsible for any issues, damages, or problems that may arise from using these scripts. Users should thoroughly test any implementation in their environment before deploying to production. Azure services and APIs may change over time, which could affect the functionality of the provided scripts. Always refer to the latest Azure documentation for the most up-to-date information. Thanks for reading this blog! I hope you found it helpful and informative for building AI agents with SK (Semantic Kernel) 😀419Views3likes0CommentsWhat is the Best Way to Achieve Azure Certification 70-532
Hi Admin / All, We is the best way to Achieve Azure Certification 70-532. I already taken the course of MVA on associate exam preparation. Need further suggestion to all of you. Thanks in advanced!!! Regards, SudipSolved14KViews2likes9Comments🚀 Azure Control, Data, & MGMT Planes: The Backbone of Cloud Efficiency 🌐
Azure operations can be divided into Three categories (Control Plane - Data Plane - Management Plane) This post describes the differences between those three types of operations. Tip : Suppose that the word "plane" means "function" understand this definition like this !! # Control Plane (Function) # @ The Control Plane is responsible for managing and configuring Azure resources. @ It handles administrative tasks such as creating, updating, and deleting resources. @ All requests for control plane operations are sent to the Azure Resource Manager URL For Azure global, the URL is " https://management.azure.comm. " @ Azure Resource Manager handles all control plane requests. It automatically applies the Azure features you implemented to manage your resources, such as: Azure role-based access control (Azure RBAC) - Azure Policy - Management Locks - Activity Logs @ After Azure Resource Manager authenticates the request, it sends the request to the resource provider, which completes the operation. @ The control plane includes two scenarios for handling requests - "green field" and "brown field". @ Green field refers to ---> new resources. Brown field refers to ---> existing resources. # Data Plane (Function) # @ The Data Plane is responsible for interacting with the actual data within Azure resources. @ Once a resource is created, operations like reading, writing, and processing data occur in the Data Plane. @ Requests for data plane operations are sent to an endpoint that's specific to your instance. Ex : "myaccount.blob.core.windows.nett " ---> for storage account @ Operates independently of the Control Plane, meaning even if the Control Plane is unavailable, the Data Plane remains accessible. # Management Plane (Function) # @ The Management Plane oversees monitoring, security, and configuration of Azure services. @ It ensures that resources are operating efficiently and securely. Ex : Azure Monitor: Collecting logs and metrics from resources Ex : Azure Security Center: Managing security policies and compliance. Ex : Azure Automation: Running scheduled tasks for resource management.258Views2likes0CommentsManaging and Working with Azure Network Security Groups (NSG)
When you are implementing your Microsoft Azure Design like a HUB-Spoke model you have to deal with security of your Azure environment (Virtual Datacenter). One of them are Network Security Groups to protect your Virtual networks and make communication between Azure subnets possible in a Secure Azure Virtual Datacenter. You really have to plan your Azure Virtual networks and implement it by Architectural Design. Now I’m writing about Azure Network Security Groups which is important, but there are more items to deal with like : Naming Conventions in your Azure Virtual Datacenter Azure Subscriptions ( who is Owner, Contributor, or Reader? ) Azure Regions ( Where is my Datacenter in the world? ) Azure VNET and Sub-Nets ( IP-addresses ) Security of your Virtual Networks ( Traffic filtering, Routing ) Azure Connectivity ( VNET Peering between Azure Subscriptions, VPN Gateway ) Permissions (RBAC) Azure Policy ( Working with Blue prints ) How to Manage Microsoft Azure Network Security Groups (NSG) ? Read more on my blog about Infrastructure as Code (IaC) here with Azure DevOps and Visual Studio6.6KViews2likes0CommentsUse tags to organize your Azure resources (with the Azure CLI)!
Hi Azure friends, Start the CloudShell in the Azure portal or go to the following URL: https://shell.azure.com/ Please start with the following steps to begin the deployment (the Hashtags are comments): #Here you can find out which subscription you are working with az account show #View all subscriptions az account list --all --output table #change the subscription (if necessary) az account set --subscription "Name of the Subscription" # To overwrite the tags on a resource az resource tag --tags 'Dept=IT' 'Environment=Test' -g tw-rg01 -n vnet-base --resource-type "Microsoft.Network/virtualNetworks" # To append a tag to the existing tags on a resource az resource update --set tags.'Status'='Approved' -g tw-rg01 -n vnet-base --resource-type "Microsoft.Network/virtualNetworks" # To overwrite the existing tags on a resource group az group update -n tw-rg01 --tags 'Environment=Test' 'Dept=IT' # To append a tag to the existing tags on a resource group az group update -n tw-rg01 --set tags.'Status'='Approved' # To see the existing tags for a resource az resource show -n vnet-base -g tw-rg01 --resource-type "Microsoft.Network/virtualNetworks" --query tags # To see the existing tags for a resource group az group show -n tw-rg01 --query tags # To get all the resources that have a particular tag and value az resource list --tag Dept=Finance # To get resource groups that have a specific tag az group list --tag Dept=IT Now you have organized Azure Tags with the Azure CLI! Congratulations! I hope this article was useful. Best regards, Tom Wechsler P.S. All scripts (#PowerShell, @azure CLI, #Terraform, #ARM) that I use can be found on github! https://github.com/tomwechsler4.1KViews2likes1CommentA Beginner's Guide To Role-Based Access Control on Azure.
When creating access to systems, applications and environments it's important to keep security top of mind. Even working at a rapid pace it's important to consider what credentials and access we give to a resource. Examples of this kind of administration of roles could be access to a Windows Server or providing pull access to a Docker image from an Azure Kubernetes Cluster. These types of actions require some form of authentication and authorization in order to provide access. This guide provides you some information on getting started on understanding Azure RBAC with many of the articles you can find on Microsoft Docs and Microsoft Learn. Defining the difference Authorization and Authentication are the cornerstones of security for computing. Before we dig into examples, let's just define the words from Webster's dictionary. Authentication: an act, process, or method of showing something (such as an identity, a piece of art, or a financial transaction) to be real, true, or genuine : the act or process of authenticating something Authorization: The granting of power to perform various acts or duties To think about this in a practical sense, consider the hierarcy of a Wordpress CMS set of user roles. From Wordpress Docs, Summary of Roles Super Admin – somebody with access to the site network administration features and all other features. See the Create a Network article. Administrator – somebody who has access to all the administration features within a single site. Editor – somebody who can publish and manage posts including the posts of other users. Author – somebody who can publish and manage their own posts. Contributor – somebody who can write and manage their own posts but cannot publish them. Subscriber – somebody who can only manage their profile. When a user authenticates into Wordpress, the SQL database where user roles are stored then determines what rights the user will have when logged in. The Administrator user may be responsible for maintenance of plug-ins for the website. The admin would like to avoid users who are not part of the website maintenance plan to be able to install, delete or modify any of the plug-ins. By ensuring all of these users have a role that does not permit these rights, our website remains more reliable due to unplanned maintenance. The contributor role appears to be what's right: Contributor #Contributor delete_posts edit_posts read read Reusable Blocks In this case, the contributor role for someone who may be just posting new content update may make sense due to the specific set of roles the user is authorized to do. RBAC for Azure Role-Based Authentication (RBAC) is an authorization system built on Azure Resource Manager that provides fine-grained access management of Azure resources. Access management via RBAC on Azure allows you to better control the scope of what your users and applications can access along with what they authorized to do. What can I do with RBAC? Here are some examples of what you can do with RBAC: Allow one user to manage virtual machines in a subscription and another user to manage virtual networks Allow a DBA group to manage SQL databases in a subscription Allow a user to manage all resources in a resource group, such as virtual machines, websites, and subnets Allow an application to access all resources in a resource group Fundamentals The way you control access to resources using RBAC is to create role assignments. This is a key concept to understand – it's how permissions are enforced. A role assignment consists of three elements: security principal, role definition, and scope. User - An individual who has a profile in Azure Active Directory. You can also assign roles to users in other tenants. For information about users in other organizations, see Azure Active Directory B2B. Group - A set of users created in Azure Active Directory. When you assign a role to a group, all users within that group have that role. Service principal - A security identity used by applications or services to access specific Azure resources. You can think of it as a user identity (username and password or certificate) for an application. Managed identity - An identity in Azure Active Directory that is automatically managed by Azure. You typically use managed identities when developing cloud applications to manage the credentials for authenticating to Azure services. Azure RBAC roles Azure includes several built-in roles that you can use. The following lists four fundamental built-in roles. The first three apply to all resource types. Owner - Has full access to all resources including the right to delegate access to others. Contributor - Can create and manage all types of Azure resources but can't grant access to others. Reader - Can view existing Azure resources. User Access Administrator - Lets you manage user access to Azure resources. Different Azure resources also have built in roles to ensure secure access. By using RBAC we can ensure our DBA can log just into the development and UAT of our Azure SQL Database managed instances. We can assign them them with a the built in SQL Managed Instance Contributor role. This role permits users to manage SQL servers and databases, but not access to them, and not their security-related policies. How RBAC determines if a user has access to a resource The following are the high-level steps that RBAC uses to determine if you have access to a resource on the management plane. This is helpful to understand if you are trying to troubleshoot an access issue. A user (or service principal) acquires a token for Azure Resource Manager. The token includes the user's group memberships (including transitive group memberships). The user makes a REST API call to Azure Resource Manager with the token attached. Azure Resource Manager retrieves all the role assignments and deny assignments that apply to the resource upon which the action is being taken. Azure Resource Manager narrows the role assignments that apply to this user or their group and determines what roles the user has for this resource. Azure Resource Manager determines if the action in the API call is included in the roles the user has for this resource. If the user doesn't have a role with the action at the requested scope, access is not granted. Otherwise, Azure Resource Manager checks if a deny assignment applies. If a deny assignment applies, access is blocked. Otherwise access is granted. Next Steps You may want to learn more and get started you've got so many resources. Check out these links: What is role-based access control (RBAC) for Azure resources? Create custom roles for Azure resources with role-based access control (RBAC) Get $200 in Azure Credit
25KViews2likes1CommentAzure REST API - $filter param for time delta throws ProviderError
Hello All, We are facing issues with the Azure API endpoint for fetching security alerts based on given time filter. https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Frest%2Fapi%2Fsecuritycenter%2Falerts%2Flistbyresourcegroup%23code-try-0&data=02%7C01%7Cannishprashan.stevi%40hcl.com%7Ce89ea1593b4247b8b13f08d70530766b%7C189de737c93a4f5a8b686f4ca9941912%7C0%7C0%7C636983578009379205&sdata=JavMX5P2vJPhQ5ERafc4kC5gxELv%2FEjtulWo%2B84xhIQ%3D&reserved=0 and screenshot for the API section, We encountered the following error while hitting the endpoint with the required params and Bearer access token. Error Details: { "error": { "code": "ProviderError", "message": "Resource provider 'Microsoft.Security' failed to return collection response for type 'alerts'." } } Endpoint URL: https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{ResourceGroup}/providers/Microsoft.Security/alerts?api-version={version}}&$filter=properties.reportedTimeUtc eq '2019-07-06T08:00:51.8801218Z' NOTE: The URL gives response without specifying the time “filter” , but when using filter as one of the params, we get the above mentioned error. The param value used: $filter = properties.reportedTimeUtc eq '2019-07-06T08:00:51.8801218Z' Could anyone help in the resolution of this issue? Let me know for any additional details/clarifications. Thank You.5.7KViews1like3CommentsAzure Managed Application
Hi all! During //Build a few weeks ago, we announced "Azure Managed Application". Azure Managed Applications provides an ecosystem that enables vendors to make a PaaS or SaaS services available as self-contained applications. Customers deploy managed applications in their subscriptions, but vendors can manage them. It enables vendors to bill customers using Azure's billing system, use templates to manage the lifecycle of deployed applications. On the other side, it allows customers to automatically acquire updates, and pay for support and maintenance. They do not have to maintain or update the application themselves or diagnose and fix issues when the application fails. The building blocks for an Azure Managed Application consist of: applianceMainTemplate.json This is the Resource Manager template, which will provision the apps and underlying Azure resources into a managed resource group mainTemplate.json This is the template the consumer will deploy (to deploy the managed app), which will be mapped towards the applianceMainTemplate.json applianceCreateUiDefinition.json This is the UI definition the consumer will interact with in the Azure portal, when deploying the managed app. We have a few samples published to github already - and we would love your feedback :-) -kn3.4KViews1like1Comment