hands-on-labs

47 Topics

Building Multi-Agent Orchestration Using Microsoft Semantic Kernel: A Complete Step-by-Step Guide
What You Will Build By the end of this guide, you will have a working multi-agent system where 4 specialist AI agents collaborate to diagnose production issues: ClientAnalyst — Analyzes browser, JavaScript, CORS, uploads, and UI symptoms NetworkAnalyst — Analyzes DNS, TCP/IP, TLS, load balancers, and firewalls ServerAnalyst — Analyzes backend logs, database, deployments, and resource limits Coordinator — Synthesizes all findings into a root cause report with a prioritized action plan These agents don't just run in sequence — they debate, cross-examine, and challenge each other's findings through a shared conversation, producing a diagnosis that's better than any single agent could achieve alone. Table of Contents Why Multi-Agent? The Problem with Single Agents Architecture Overview Understanding the Key SK Components The Actor Model — How InProcessRuntime Works Setting Up Your Development Environment Step-by-Step: Building the Multi-Agent Analyzer The Agent Interaction Flow — Round by Round Bugs I Found & Fixed — Lessons Learned Running with Different AI Providers What to Build Next 1. Why Multi-Agent? The Problem with Single Agents A single AI agent analyzing a production issue is like having one doctor diagnose everything — they'll catch issues in their specialty but miss cross-domain connections. Consider this problem: "Users report 504 Gateway Timeout errors when uploading files larger than 10MB. Started after Friday's deployment. Worse during peak hours." A single agent might say "it's a server timeout" and stop. But the real root cause often spans multiple layers: The client is sending chunked uploads with an incorrect Content-Length header (client-side bug) The load balancer has a 30-second timeout that's too short for large uploads (network config) The server recently deployed a new request body parser that's 3x slower (server-side regression) The combination only fails during peak hours because connection pool saturation amplifies the latency No single perspective catches this. You need specialists who analyze independently, then debate to find the cross-layer causal chain. That's what multi-agent orchestration gives you. The 5 Orchestration Patterns in SK Semantic Kernel provides 5 built-in patterns for agent collaboration: SEQUENTIAL: A → B → C → Done (pipeline — each builds on previous) CONCURRENT: ↗ A ↘ Task → B → Aggregate ↘ C ↗ (parallel — results merged) GROUP CHAT: A ↔ B ↔ C ↔ D ← We use this one (rounds, shared history, debate) HANDOFF: A → (stuck?) → B → (complex?) → Human (escalation with human-in-the-loop) MAGENTIC: LLM picks who speaks next dynamically (AI-driven speaker selection) We use GroupChatOrchestration with RoundRobinGroupChatManager because our problem requires agents to see each other's work, challenge assumptions, and build on each other's analysis across two rounds. 2. Architecture Overview Here's the complete architecture of what we're building: 3. Understanding the Key SK Components Before we write code, let's understand the 5 components we'll use and the design pattern each implements: ChatCompletionAgent — Strategy Pattern The agent definition. Each agent is a combination of: name — unique identifier (used in round-robin ordering) instructions — the persona and rules (this is the prompt engineering) service — which AI provider to call (Strategy Pattern — swap providers without changing agent logic) description — what other agents/tools understand about this agent agent = ChatCompletionAgent( name="ClientAnalyst", instructions="You are ONLY ClientAnalyst...", service=gemini_service, # ← Strategy: swap to OpenAI with zero changes description="Analyzes client-side issues", ) GroupChatOrchestration — Mediator Pattern The orchestration defines HOW agents interact. It's the Mediator — agents don't talk to each other directly. Instead, the orchestration manages a shared ChatHistory and routes messages through the Manager. RoundRobinGroupChatManager — Strategy Pattern The Manager decides WHO speaks next. RoundRobinGroupChatManager cycles through agents in a fixed order. SK also provides AutomaticGroupChatManager where the LLM decides who speaks next. max_rounds is the total number of messages per agent or cycle. With 4 agents and max_rounds=8, each agent speaks exactly twice. InProcessRuntime — Actor Model Abstraction The execution engine. Every agent becomes an "actor" with its own kind of mailbox (message queue). The runtime delivers messages between actors. Key properties: No shared state — agents communicate only through messages Sequential processing — each agent processes one message at a time Location transparency — same code works in-process today, distributed tomorrow agent_response_callback — Observer Pattern A function that fires after EVERY agent response. We use it to display each agent's output in real-time with emoji labels and round numbers. 4. The Actor Model — How InProcessRuntime Works The Actor Model is a concurrency pattern where each entity is an isolated "actor" with a private mailbox. Here's what happens inside InProcessRuntime when we run our demo: runtime.start() │ ├── Creates internal message loop (asyncio event loop) │ orchestration.invoke(task="504 timeout...", runtime=runtime) │ ├── Creates Actor[Orchestrator] → manages overall flow ├── Creates Actor[Manager] → RoundRobinGroupChatManager ├── Creates Actor[ClientAnalyst] → mailbox created, waiting ├── Creates Actor[NetworkAnalyst] → mailbox created, waiting ├── Creates Actor[ServerAnalyst] → mailbox created, waiting └── Creates Actor[Coordinator] → mailbox created, waiting Manager receives "start" message │ ├── Checks turn order: [Client, Network, Server, Coordinator] ├── Sends task to ClientAnalyst mailbox │ → ClientAnalyst processes: calls LLM → response │ → Response added to shared ChatHistory │ → callback fires (displayed in Notebook UI) │ → Sends "done" back to Manager │ ├── Manager updates: turn_index=1 ├── Sends to NetworkAnalyst mailbox │ → Same flow... │ ├── ... (ServerAnalyst, Coordinator for Round 1) │ ├── Manager checks: messages=4, max_rounds=8 → continue │ ├── Round 2: same cycle with cross-examination │ └── After message 8: Manager sends "complete" → OrchestrationResult resolves → result.get() returns final answer runtime.stop_when_idle() → All mailboxes empty → clean shutdown The Actor Model guarantees: No race conditions (each actor processes one message at a time) No deadlocks (no shared locks to contend for) No shared mutable state (agents communicate only via messages) 5. Setting Up Your Development Environment Prerequisites Python 3.11 or 3.12 (3.13+ may have compatibility issues with some SK connectors) Visual Studio Code with the Python and Jupyter extensions An API key from one of: Google AI Studio (free), OpenAI Step 1: Install Python Download from python.org. During installation, check "Add Python to PATH". Verify: python --version # Python 3.12.x Step 2: Install VS Code Extensions Open VS Code, go to Extensions (Ctrl+Shift+X), and install: Python (by Microsoft) — Python language support Jupyter (by Microsoft) — Notebook support Pylance (by Microsoft) — IntelliSense and type checking Step 3: Create Project Folder mkdir sk-multiagent-demo cd sk-multiagent-demo Open in VS Code: code . Step 4: Create Virtual Environment Open the VS Code terminal (Ctrl+`) and run: # Create virtual environment python -m venv sk-env # Activate it # Windows: sk-env\Scripts\activate # macOS/Linux: source sk-env/bin/activate You should see (sk-env) in your terminal prompt. Step 5: Install Semantic Kernel For Google Gemini (free tier — recommended for getting started): pip install semantic-kernel[google] python-dotenv ipykernel For OpenAI (paid API key): pip install semantic-kernel openai python-dotenv ipykernel For Azure AI Foundry (enterprise, Entra ID auth): pip install semantic-kernel azure-identity python-dotenv ipykernel Step 6: Register the Jupyter Kernel python -m ipykernel install --user --name=sk-env --display-name="Semantic Kernel (Python 3.12)" You can also select if this is already available from your environment from VSCode as below: Step 7: Get Your API Key Option A — Google Gemini (FREE, recommended for demo): Go to https://aistudio.google.com/apikey Click "Create API Key" Copy the key Free tier limits: 15 requests/minute, 1 million tokens/minute — more than enough for this demo. Option B — OpenAI: Go to https://platform.openai.com/api-keys Create a new key Copy the key Option C — Azure AI Foundry: Deploy a model in Azure AI Foundry portal Note the endpoint URL and deployment name If key-based auth is disabled, you'll need Entra ID with permissions Step 8: Create the .env File In your project root, create a file named .env: For Gemini: GOOGLE_AI_API_KEY=AIzaSy...your-key-here GOOGLE_AI_GEMINI_MODEL_ID=gemini-2.5-flash For OpenAI: OPENAI_API_KEY=sk-...your-key-here OPENAI_CHAT_MODEL_ID=gpt-4o For Azure AI Foundry: AZURE_OPENAI_ENDPOINT=https://your-resource.cognitiveservices.azure.com AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=gpt-4o AZURE_OPENAI_API_KEY=your-key Step 9: Create the Notebook In VS Code: Click File > New File Save as multi_agent_analyzer.ipynb In the top-right of the notebook, click Select Kernel Choose Semantic Kernel (Python 3.12) (or your sk-env) Your environment is ready. Let's build. 6. Step-by-Step: Building the Multi-Agent Analyzer Cell 1: Verify Setup import semantic_kernel print(f"Semantic Kernel version: {semantic_kernel.__version__}") from semantic_kernel.agents import ( ChatCompletionAgent, GroupChatOrchestration, RoundRobinGroupChatManager, ) from semantic_kernel.agents.runtime import InProcessRuntime from semantic_kernel.contents import ChatMessageContent print("All imports successful") Cell 2: Load API Key and Create Service For Gemini: import os from dotenv import load_dotenv load_dotenv() from semantic_kernel.connectors.ai.google.google_ai import ( GoogleAIChatCompletion, GoogleAIChatPromptExecutionSettings, ) from semantic_kernel.contents import ChatHistory GEMINI_API_KEY = os.getenv("GOOGLE_AI_API_KEY") GEMINI_MODEL = os.getenv("GOOGLE_AI_GEMINI_MODEL_ID", "gemini-2.5-flash") service = GoogleAIChatCompletion( gemini_model_id=GEMINI_MODEL, api_key=GEMINI_API_KEY, ) print(f"Service created: Gemini {GEMINI_MODEL}") # Smoke test settings = GoogleAIChatPromptExecutionSettings() test_history = ChatHistory(system_message="You are a helpful assistant.") test_history.add_user_message("Say 'Connected!' and nothing else.") response = await service.get_chat_message_content( chat_history=test_history, settings=settings ) print(f"Model says: {response.content}") For OpenAI: import os from dotenv import load_dotenv load_dotenv() from semantic_kernel.connectors.ai.open_ai import ( OpenAIChatCompletion, OpenAIChatPromptExecutionSettings, ) from semantic_kernel.contents import ChatHistory service = OpenAIChatCompletion( ai_model_id=os.getenv("OPENAI_CHAT_MODEL_ID", "gpt-4o"), ) print(f"Service created: OpenAI {os.getenv('OPENAI_CHAT_MODEL_ID', 'gpt-4o')}") # Smoke test settings = OpenAIChatPromptExecutionSettings() test_history = ChatHistory(system_message="You are a helpful assistant.") test_history.add_user_message("Say 'Connected!' and nothing else.") response = await service.get_chat_message_content( chat_history=test_history, settings=settings ) print(f"Model says: {response.content}") Cell 3: Define All 4 Agents This is the most important cell — the prompt engineering that makes the demo work: from semantic_kernel.agents import ChatCompletionAgent # ═══════════════════════════════════════════════════ # AGENT 1: Client-Side Analyst # ═══════════════════════════════════════════════════ client_agent = ChatCompletionAgent( name="ClientAnalyst", description="Analyzes problems from the client-side: browser, JS, CORS, caching, UI symptoms", instructions="""You are ONLY **ClientAnalyst**. You must NEVER speak as NetworkAnalyst, ServerAnalyst, or Coordinator. Every word you write is from ClientAnalyst's perspective only. You are a senior front-end and client-side diagnostics expert. When given a problem statement, analyze it EXCLUSIVELY from the client side: 1. **Browser & Rendering**: DOM issues, JavaScript errors, CSS rendering, browser compatibility, memory leaks, console errors. 2. **Client-Side Caching**: Stale cache, service worker issues, local storage corruption. 3. **Network from Client View**: CORS errors, preflight failures, request timeouts, client-side retry storms, fetch/XHR configuration. 4. **Upload Handling**: File API usage, chunk upload implementation, progress tracking, FormData construction, content-type headers. 5. **UI/UX Symptoms**: What the user sees, error messages displayed, loading states. ROUND 1: Provide your independent analysis. Do NOT reference other agents. List your top 3 most likely causes with evidence. Every response MUST be at least 200 words. ROUND 2: You MUST: - Reference NetworkAnalyst and ServerAnalyst BY NAME - State specifically where you AGREE or DISAGREE with their findings - Answer the Coordinator's questions from your perspective - Add NEW cross-layer insights you see from the client perspective - Do NOT just say 'I agree' — provide substantive technical reasoning Be specific, evidence-based, and prioritize findings by likelihood.""", service=service, ) # ═══════════════════════════════════════════════════ # AGENT 2: Network Analyst # ═══════════════════════════════════════════════════ network_agent = ChatCompletionAgent( name="NetworkAnalyst", description="Analyzes problems from the network side: DNS, TCP, TLS, firewalls, load balancers, latency", instructions="""You are ONLY **NetworkAnalyst**. You must NEVER speak as ClientAnalyst, ServerAnalyst, or Coordinator. Every word you write is from NetworkAnalyst's perspective only. You are a senior network infrastructure diagnostics expert. When given a problem statement, analyze it EXCLUSIVELY from the network layer: 1. **DNS & Resolution**: DNS TTL, propagation delays, record misconfigurations. 2. **TCP/IP & Connections**: Connection pooling, keep-alive, TCP window scaling, connection resets, SYN floods. 3. **TLS/SSL**: Certificate issues, handshake failures, protocol version mismatches. 4. **Load Balancers & Proxies**: Sticky sessions, health checks, timeout configs, request body size limits, proxy buffering. 5. **Firewall & WAF**: Rule blocks, rate limiting, request inspection delays, geo-blocking, DDoS protection interference. ROUND 1: Provide your independent analysis. Do NOT reference other agents. List your top 3 most likely causes with evidence. Every response MUST be at least 200 words. ROUND 2: You MUST: - Reference ClientAnalyst and ServerAnalyst BY NAME - State specifically where you AGREE or DISAGREE with their findings - Answer the Coordinator's questions from your perspective - Add NEW cross-layer insights you see from the network perspective - Do NOT just say 'I am ready to proceed' — provide substantive technical analysis Be specific, evidence-based, and prioritize findings by likelihood.""", service=service, ) # ═══════════════════════════════════════════════════ # AGENT 3: Server-Side Analyst # ═══════════════════════════════════════════════════ server_agent = ChatCompletionAgent( name="ServerAnalyst", description="Analyzes problems from the server side: backend app, database, logs, resources, deployments", instructions="""You are ONLY **ServerAnalyst**. You must NEVER speak as ClientAnalyst, NetworkAnalyst, or Coordinator. Every word you write is from ServerAnalyst's perspective only. You are a senior backend and infrastructure diagnostics expert. When given a problem statement, analyze it EXCLUSIVELY from the server side: 1. **Application Server**: Error logs, exception traces, thread pool exhaustion, memory leaks, CPU spikes, garbage collection pauses. 2. **Database**: Slow queries, connection pool saturation, lock contention, deadlocks, replication lag, query plan changes. 3. **Deployment & Config**: Recent deployments, configuration changes, feature flags, environment variable mismatches, rollback candidates. 4. **Resource Limits**: File upload size limits, request body limits, disk space, temporary file cleanup, storage quotas. 5. **External Dependencies**: Upstream API timeouts, third-party service degradation, queue backlogs, cache (Redis/Memcached) issues. ROUND 1: Provide your independent analysis. Do NOT reference other agents. List your top 3 most likely causes with evidence. Every response MUST be at least 200 words. ROUND 2: You MUST: - Reference ClientAnalyst and NetworkAnalyst BY NAME - State specifically where you AGREE or DISAGREE with their findings - Answer the Coordinator's questions from your perspective - Add NEW cross-layer insights you see from the server perspective - Do NOT just say 'I agree' — provide substantive technical reasoning Be specific, evidence-based, and prioritize findings by likelihood.""", service=service, ) # ═══════════════════════════════════════════════════ # AGENT 4: Coordinator # ═══════════════════════════════════════════════════ coordinator_agent = ChatCompletionAgent( name="Coordinator", description="Synthesizes all specialist analyses into a final root cause report with prioritized action plan", instructions="""You are ONLY **Coordinator**. You must NEVER speak as ClientAnalyst, NetworkAnalyst, or ServerAnalyst. You synthesize — you do NOT do domain-specific analysis. You are the lead engineer who synthesizes the team's findings. ═══ ROUND 1 BEHAVIOR (your first turn, message 4) ═══ Keep this SHORT — maximum 300 words. - Note 2-3 KEY PATTERNS across the three analyses - Identify where specialists AGREE (high-confidence) - Identify where they CONTRADICT (needs resolution) - Ask 2-3 SPECIFIC QUESTIONS for Round 2 Round 1 MUST NOT: assign tasks, create action plans, write reports, or tell agents what to take lead on. Observation + questions ONLY. ═══ ROUND 2 BEHAVIOR (your final turn, message 8) ═══ Keep this FOCUSED — maximum 800 words. Produce a structured report: 1. **Root Cause** (1 paragraph): The #1 most likely cause with causal chain across layers. Reference specific findings from each specialist. 2. **Confidence** (short list): - HIGH: Areas where all 3 agreed - MEDIUM: Areas where 2 of 3 agreed - LOW: Disagreements needing investigation 3. **Action Plan** (numbered, max 6 items): For each: - What to do (specific) - Owner (Client/Network/Server team) - Time estimate 4. **Quick Wins vs Long-term** (2 short lists) Do NOT repeat what specialists already said verbatim. Synthesize, don't echo.""", service=service, ) # ═══════════════════════════════════════════════════ # All 4 agents — order = RoundRobin order # ═══════════════════════════════════════════════════ agents = [client_agent, network_agent, server_agent, coordinator_agent] print(f"{len(agents)} agents created:") for i, a in enumerate(agents, 1): print(f" {i}. {a.name}: {a.description[:60]}...") print(f"\nRoundRobin order: {' → '.join(a.name for a in agents)}") Cell 4: Run the Analysis from semantic_kernel.agents import GroupChatOrchestration, RoundRobinGroupChatManager from semantic_kernel.agents.runtime import InProcessRuntime from semantic_kernel.contents import ChatMessageContent from IPython.display import display, Markdown # ╔══════════════════════════════════════════════════════════╗ # ║ EDIT YOUR PROBLEM STATEMENT HERE ║ # ╚══════════════════════════════════════════════════════════╝ PROBLEM = """ Users are reporting intermittent 504 Gateway Timeout errors when trying to upload files larger than 10MB through our web application. The issue started after last Friday's deployment and seems worse during peak hours (2-5 PM EST). Some users also report that smaller file uploads work fine but the progress bar freezes at 85% for large files before timing out. """ # ════════════════════════════════════════════════════════════ agent_responses = [] def agent_response_callback(message: ChatMessageContent) -> None: name = message.name or "Unknown" content = message.content or "" agent_responses.append({"agent": name, "content": content}) emoji = { "ClientAnalyst": "🖥️", "NetworkAnalyst": "🌐", "ServerAnalyst": "⚙️", "Coordinator": "🎯" }.get(name, "🔹") round_num = (len(agent_responses) - 1) // len(agents) + 1 display(Markdown( f"---\n### {emoji} {name} (Message {len(agent_responses)}, Round {round_num})\n\n{content}" )) MAX_ROUNDS = 8 # 4 agents × 2 rounds = 8 messages exactly task = f"""## Problem Statement {PROBLEM.strip()} ## Discussion Rules You are in a GROUP DISCUSSION with 4 members. You can see ALL previous messages. There are exactly 2 rounds. ### ROUND 1 (Messages 1-4): Independent Analysis - ClientAnalyst, NetworkAnalyst, ServerAnalyst: Analyze from YOUR domain only. Give your top 3 most likely causes with evidence and reasoning. - Coordinator: Note patterns across the 3 analyses. Ask 2-3 specific questions. Do NOT assign tasks yet. ### ROUND 2 (Messages 5-8): Cross-Examination & Final Report - ClientAnalyst, NetworkAnalyst, ServerAnalyst: You MUST reference the OTHER specialists BY NAME. State where you agree, disagree, or have new insights. Answer the Coordinator's questions. Provide SUBSTANTIVE analysis. - Coordinator: Produce the FINAL structured report: root cause, confidence levels, prioritized action plan with owners and time estimates. IMPORTANT: Each agent speaks as THEMSELVES only. Never impersonate another agent.""" display(Markdown(f"## Problem Statement\n\n{PROBLEM.strip()}")) display(Markdown(f"---\n## Discussion Starting — {len(agents)} agents, {MAX_ROUNDS} rounds\n")) # Build and run orchestration = GroupChatOrchestration( members=agents, manager=RoundRobinGroupChatManager(max_rounds=MAX_ROUNDS), agent_response_callback=agent_response_callback, ) runtime = InProcessRuntime() runtime.start() result = await orchestration.invoke(task=task, runtime=runtime) final_result = await result.get(timeout=300) await runtime.stop_when_idle() display(Markdown(f"---\n## FINAL CONCLUSION\n\n{final_result}")) Cell 5: Statistics and Validation print("═" * 55) print(" ANALYSIS STATISTICS") print("═" * 55) emojis = {"ClientAnalyst": "🖥️", "NetworkAnalyst": "🌐", "ServerAnalyst": "⚙️", "Coordinator": "🎯"} agent_counts = {} agent_chars = {} for r in agent_responses: agent_counts[r["agent"]] = agent_counts.get(r["agent"], 0) + 1 agent_chars[r["agent"]] = agent_chars.get(r["agent"], 0) + len(r["content"]) for agent, count in agent_counts.items(): em = emojis.get(agent, "🔹") chars = agent_chars.get(agent, 0) avg = chars // count if count else 0 print(f" {em} {agent}: {count} msg(s), ~{chars:,} chars (avg {avg:,}/msg)") print(f"\n Total messages: {len(agent_responses)}") total_chars = sum(len(r['content']) for r in agent_responses) print(f" Total analysis: ~{total_chars:,} characters") # Validation print(f"\n Validation:") import re identity_issues = [] for r in agent_responses: other_agents = [a.name for a in agents if a.name != r["agent"]] for other in other_agents: pattern = rf'(?i)as {re.escape(other)}[,:]?\s+I\b' if re.search(pattern, r["content"][:300]): identity_issues.append(f"{r['agent']} impersonated {other}") if identity_issues: print(f" Identity confusion: {identity_issues}") else: print(f" No identity confusion detected") thin = [r for r in agent_responses if len(r["content"].strip()) < 100] if thin: for t in thin: print(f" Thin response from {t['agent']}") else: print(f" All responses are substantive") Cell 6: Save Report from datetime import datetime timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") filename = f"analysis_report_{timestamp}.md" with open(filename, "w", encoding="utf-8") as f: f.write(f"# Problem Analysis Report\n\n") f.write(f"**Generated:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n") f.write(f"**Agents:** {', '.join(a.name for a in agents)}\n") f.write(f"**Rounds:** {MAX_ROUNDS}\n\n---\n\n") f.write(f"## Problem Statement\n\n{PROBLEM.strip()}\n\n---\n\n") for i, r in enumerate(agent_responses, 1): em = emojis.get(r['agent'], '🔹') round_num = (i - 1) // len(agents) + 1 f.write(f"### {em} {r['agent']} (Message {i}, Round {round_num})\n\n") f.write(f"{r['content']}\n\n---\n\n") f.write(f"## Final Conclusion\n\n{final_result}\n") print(f"Report saved to: {filename}") 7. The Agent Interaction Flow — Round by Round Here's what actually happens during the 8-message orchestration: Round 1: Independent Analysis (Messages 1-4) Msg Agent What They See What They Do 1 ClientAnalyst Problem statement only Analyzes from client perspective: upload chunking, progress bar freezing at 85%, CORS, content-type headers 2 NetworkAnalyst Problem + ClientAnalyst's analysis Gives INDEPENDENT analysis despite seeing msg 1: load balancer timeouts, proxy body size limits, TCP window scaling 3 ServerAnalyst Problem + msgs 1-2 Gives INDEPENDENT analysis: recent deployment regression, request body parser, thread pool exhaustion, disk space 4 Coordinator Problem + msgs 1-3 Observes patterns: "All three mention timeout configuration. ClientAnalyst and NetworkAnalyst both point to body size. Question: Was the deployment a backend-only change or did it include infra?" Round 2: Cross-Examination (Messages 5-8) Msg Agent What They Do 5 ClientAnalyst "I agree with NetworkAnalyst that the load balancer timeout is likely a factor — the 85% freeze point matches the 30-second LB timeout for a 10MB upload on our average upload speed. However, I disagree with ServerAnalyst about thread pool exhaustion because the UI shows a clean 504, not a connection reset." 6 NetworkAnalyst "ServerAnalyst's point about the recent deployment is critical — if the new request parser is 3x slower, that would push uploads past the LB timeout. I can confirm the LB has a 30s idle timeout. The fix is both: increase LB timeout AND optimize the parser." 7 ServerAnalyst "Responding to Coordinator's question: The deployment was backend-only — a new multipart parser using streaming instead of buffered reads. ClientAnalyst is correct that the 504 is from the LB, not the app. The app itself returns 200 after 45 seconds, but the LB kills the connection at 30." 8 Coordinator Produces final structured report with root cause: "The backend deployment introduced a slower multipart parser (45s vs 15s for 10MB). The load balancer's 30s timeout kills the connection at ~85% progress. Fix: immediate — increase LB timeout to 120s. Short-term — optimize parser. Long-term — implement chunked uploads with progress resumption." Notice: The Round 2 analysis is dramatically better than Round 1. Agents reference each other by name, build on each other's findings, and the Coordinator can synthesize a cross-layer causal chain that no single agent could have produced. I made a small adjustment to the issue with Azure Web Apps. Please find the details below from testing carried out using Google Gemini: 8. Bugs I Found & Fixed — Lessons Learned Building this demo taught me several important lessons about multi-agent systems: Bug 1: Agents Speaking Only Once Symptom: Only 4 messages instead of 8. Root cause: The agents list was missing the Coordinator. It was defined in a separate cell and wasn't included in the members list. Fix: All 4 agents must be in the same list passed to GroupChatOrchestration. Bug 2: NetworkAnalyst Says "I'm Ready to Proceed" Symptom: NetworkAnalyst's Round 2 response was just "I'm ready to proceed with the analysis" — no actual content. Root cause: The Coordinator's Round 1 message was assigning tasks ("NetworkAnalyst, please check the load balancer config"), and the agent was acknowledging the assignment instead of analyzing. Fix: Added explicit constraint to Coordinator: "Round 1 MUST NOT assign tasks — observation + questions ONLY." Bug 3: ServerAnalyst Says "As NetworkAnalyst, I..." Symptom: ServerAnalyst's response started with "As NetworkAnalyst, I believe..." Root cause: LLM identity bleeding. When agents share ChatHistory, the LLM sometimes loses track of which agent it's currently playing. This is especially common with Gemini. Fix: Identity anchoring at the very top of every agent's instructions: "You are ONLY ServerAnalyst. You must NEVER speak as ClientAnalyst, NetworkAnalyst, or Coordinator." Bug 4: Gemini Gives Thin/Empty Responses Symptom: Some agents responded with just one sentence or "I concur." Root cause: Gemini 2.5 Flash is more concise than GPT-4o by default. Without explicit length requirements, it takes shortcuts. Fix: Added "Every response MUST be at least 200 words" and "Answer the Coordinator's questions" to every specialist's instructions. Bug 5: Coordinator's Report is 18K Characters Symptom: The Coordinator's Round 2 response was absurdly long — repeating everything every specialist said. Fix: Added word limits: "Round 1 max 300 words, Round 2 max 800 words" and "Synthesize, don't echo." Bug 6: MAX_ROUNDS Math Symptom: With MAX_ROUNDS=9, ClientAnalyst spoke a 3rd time after the Coordinator's final report — breaking the clean 2-round structure. Fix: MAX_ROUNDS must equal (number of agents × number of rounds). For 4 agents × 2 rounds = 8. 9. Running with Different AI Providers The beauty of SK's Strategy Pattern is that you change ONE LINE to switch providers. Everything else — agents, orchestration, callbacks, validation — stays identical. Gemini setup: from semantic_kernel.connectors.ai.google.google_ai import GoogleAIChatCompletion service = GoogleAIChatCompletion( gemini_model_id="gemini-2.5-flash", api_key=os.getenv("GOOGLE_AI_API_KEY"), ) OpenAI Setup from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion service = OpenAIChatCompletion( ai_model_id="gpt-4o", api_key=os.getenv("OPEN_AI_API_KEY"), ) 10. What to Build Next Add Plugins to Agents Give agents real tools — not just LLM reasoning - looks exciting right ;) class NetworkDiagnosticPlugin: (description="Pings a host and returns latency") def ping(self, host: str) -> str: result = subprocess.run(["ping", "-c", "3", host], capture_output=True, text=True) return result.stdout class LogSearchPlugin: (description="Searches server logs for error patterns") def search_logs(self, pattern: str, hours: int = 1) -> str: # Query your log aggregator (Splunk, ELK, Azure Monitor) return query_logs(pattern, hours) Add Filters for Governance Intercept every agent call for PII redaction and audit logging: .filter(filter_type=FilterTypes.FUNCTION_INVOCATION) async def audit_filter(context, next): print(f"[AUDIT] {context.function.name} called by agent") await next(context) print(f"[AUDIT] {context.function.name} returned") Try Different Orchestration Patterns Replace GroupChat with Sequential for a pipeline approach: # Instead of debate, each agent builds on the previous orchestration = SequentialOrchestration( members=[client_agent, network_agent, server_agent, coordinator_agent] ) Or Concurrent for parallel analysis: # All specialists analyze simultaneously, Coordinator aggregates orchestration = ConcurrentOrchestration( members=[client_agent, network_agent, server_agent] ) Deploy to Azure Move from InProcessRuntime to Azure Container Apps for production scaling. The agent code doesn't change — only the runtime. Summary The key insight from building this demo: multi-agent systems produce better results than single agents not because each agent is smarter, but because the debate structure forces cross-domain thinking that a single prompt can never achieve. The Coordinator's final report consistently identifies causal chains that span client, network, and server layers — exactly the kind of insight that production incident response teams need. Semantic Kernel makes this possible with clean separation of concerns: agents define WHAT to analyze, orchestration defines HOW they interact, the manager defines WHO speaks when, the runtime handles WHERE it executes, and callbacks let you OBSERVE everything. Each piece is independently swappable — that's the power of SK from Microsoft. Resources: GitHub: github.com/microsoft/semantic-kernel Docs: learn.microsoft.com/semantic-kernel Orchestration Patterns: learn.microsoft.com/semantic-kernel/frameworks/agent/agent-orchestration Discord: aka.ms/sk/discord Disclaimer: The sample scripts provided in this article are provided AS IS without warranty of any kind. The author is not responsible for any issues, damages, or problems that may arise from using these scripts. Users should thoroughly test any implementation in their environment before deploying to production. Azure services and APIs may change over time, which could affect the functionality of the provided scripts. Always refer to the latest Azure documentation for the most up-to-date information. Thanks for reading this blog! I hope you found it helpful and informative for building AI agents with SK (Semantic Kernel) 😀
ani_ms_emea
Apr 01, 2026 Place Azure
73Views
3likes
0Comments
Applying DevOps Principles on Lean Infrastructure. Lessons From Scaling to 102K Users.
Hi Azure Community, I'm a Microsoft Certified DevOps Engineer, and I want to share an unusual journey. I have been applying DevOps principles on traditional VPS infrastructure to scale to 102,000 users with 99.2% uptime. Why am I posting this in an Azure community? Because I'm planning migration to Azure in 2026, and I want to understand: What mistakes am I already making that will bite me during migration? THE CURRENT SETUP Platform: Social commerce (West Africa) Users: 102,000 active Monthly events: 2 million Uptime: 99.2% Infrastructure: Single VPS Stack: PHP/Laravel, MySQL, Redis Yes - one VPS. No cloud. No Kubernetes. No microservices. WHY I HAVEN'T USED AZURE YET Honest answer: Budget constraints in emerging market startup ecosystem. At our current scale, fully managed Azure services would significantly increase monthly burn before product-market expansion. The funding we raised needs to last through growth milestones. The trade: I manually optimize what Azure would auto-scale. I debug what Application Insights would catch. I do by hand what Azure Functions would automate. DEVOPS PRACTICES THAT KEPT US RUNNING Even on single-server infrastructure, core DevOps principles still apply: CI/CD Pipeline (GitHub Actions) • 3-5 deployments weekly • Zero-downtime deploys • Automated rollback on health check failures • Feature flags for gradual rollouts Monitoring & Observability • Custom monitoring (would love Application Insights) • Real-time alerting • Performance tracking and slow query detection • Resource usage monitoring Automation • Automated backups • Automated database optimization • Automated image compression • Automated security updates Infrastructure as Code • Configs in Git • Deployment scripts • Environment variables • Documented procedures Testing & Quality • Automated test suite • Pre-deployment health checks • Staging environment • Post-deployment verification KEY OPTIMIZATIONS Async Job Processing • Upload endpoint: 8 seconds → 340ms • 4x capacity increase Database Optimization • Feed loading: 6.4 seconds → 280ms • Strategic caching • Batch processing Image Compression • 3-8MB → 180KB (94% reduction) • Critical for mobile users Caching Strategy • Redis for hot data • Query result caching • Smart invalidation Progressive Enhancement • Server-rendered pages • 2-3 second loads on 4G WHAT I'M WORRIED ABOUT FOR AZURE MIGRATION This is where I need your help: Architecture Decisions • App Service vs Functions + managed services? • MySQL vs Azure SQL? • When does cost/benefit flip for managed services? Cost Management • How do startups manage Azure costs during growth? • Reserved instances vs pay-as-you-go? • Which Azure services are worth the premium? Migration Strategy • Lift-and-shift first, or re-architect immediately? • Zero-downtime migration with 102K active users? • Validation approach before full cutover? Monitoring & DevOps • Application Insights - worth it from day one? • Azure DevOps vs GitHub Actions for Azure deployments? • Operational burden reduction with managed services? Development Workflow • Local development against Azure services? • Cost-effective staging environments? • Testing Azure features without constant bills? MY PLANNED MIGRATION PATH Phase 1: Hybrid (Q1 2026) • Azure CDN for static assets • Azure Blob Storage for images • Application Insights trial • Keep compute on VPS Phase 2: Compute Migration (Q2 2026) • App Service for API • Azure Database for MySQL • Azure Cache for Redis • VPS for background jobs Phase 3: Full Azure (Q3 2026) • Azure Functions for processing • Full managed services • Retire VPS QUESTIONS FOR THIS COMMUNITY Question 1: Am I making migration harder by waiting? Should I have started with Azure at higher cost to avoid technical debt? Question 2: What will break when I migrate? What works on VPS but fails in cloud? What assumptions won't hold? Question 3: How do I validate before cutting over? Parallel infrastructure? Gradual traffic shift? Safe patterns? Question 4: Cost optimization from day one? What to optimize immediately vs later? Common cost mistakes? Question 5: DevOps practices that transfer? What stays the same? What needs rethinking for cloud-native? THE BIGGER QUESTION Have you migrated from self-hosted to Azure? What surprised you? I know my setup isn't best practice by Azure standards. But it's working, and I've learned optimization, monitoring, and DevOps fundamentals in practice. Will those lessons transfer? Or am I building habits that cloud will expose as problematic? Looking forward to insights from folks who've made similar migrations. --- About the Author: Microsoft Certified DevOps Engineer and Azure Developer. CTO at social commerce platform scaling in West Africa. Preparing for phased Azure migration in 2026. P.S. I got the Azure certifications to prepare for this migration. Now I need real-world wisdom from people who've actually done it!
Michael_Okpotu_Onoja
Dec 08, 2025 Place Azure
112Views
0likes
0Comments
Boosting Performance with the Latest Generations of Virtual Machines in Azure
Microsoft Azure recently announced the availability of the new generation of VMs (v6)—including the Dl/Dv6 (general purpose) and El/Ev6 (memory-optimized) series. These VMs are powered by the latest Intel Xeon processors and are engineered to deliver: Up to 30% higher per-core performance compared to previous generations. Greater scalability, with options of up to 128 vCPUs (Dv6) and 192 vCPUs (Ev6). Significant enhancements in CPU cache (up to 5× larger), memory bandwidth, and NVMe-enabled storage. Improved security with features like Intel® Total Memory Encryption (TME) and enhanced networking via the new Microsoft Azure Network Adaptor (MANA). By Microsoft Evaluated Virtual Machines and Geekbench Results The table below summarizes the configuration and Geekbench results for the two VMs we tested. VM1 represents a previous-generation machine with more vCPUs and memory, while VM2 is from the new Dld e6 series, showing superior performance despite having fewer vCPUs. VM1 features VM1 - D16S V5 (16 Vcpus - 64GB RAM) VM1 - D16S V5 (16 Vcpus - 64GB RAM) VM2 features VM2 - D16ls v6 (16 Vcpus - 32GB RAM) VM2 - D16ls v6 (16 Vcpus - 32GB RAM) Key Observations: Single-Core Performance: VM2 scores 2013 compared to VM1’s 1570, a 28.2% improvement. This demonstrates that even with half the vCPUs, the new Dld e6 series provides significantly better performance per core. Multi-Core Performance: Despite having fewer cores, VM2 achieves a multi-core score of 12,566 versus 9,454 for VM1, showing a 32.9% increase in performance. VM 1 VM 2 Enhanced Throughput in Specific Workloads: File Compression: 1909 MB/s (VM2) vs. 1654 MB/s (VM1) – a 15.4% improvement. Object Detection: 2851 images/s (VM2) vs. 1592 images/s (VM1) – a remarkable 79.2% improvement. Ray Tracing: 1798 Kpixels/s (VM2) vs. 1512 Kpixels/s (VM1) – an 18.9% boost. These results reflect the significant advancements enabled by the new generation of Intel processors. Score VM 1 VM 1 VM 1 Score VM 2 VM 2 VM 2 Evolution of Hardware in Azure: From Ice Lake-SP to Emerald Rapids Technical Specifications of the Processors Evaluated Understanding the dramatic performance improvements begins with a look at the processor specifications: Intel Xeon Platinum 8370C (Ice Lake-SP) Architecture: Ice Lake-SP Base Frequency: 2.79 GHz Max Frequency: 3.5 GHz L3 Cache: 48 MB Supported Instructions: AVX-512, VNNI, DL Boost VM 1 Intel Xeon Platinum 8573C (Emerald Rapids) Architecture: Emerald Rapids Base Frequency: 2.3 GHz Max Frequency: 4.2 GHz L3 Cache: 260 MB Supported Instructions: AVX-512, AMX, VNNI, DL Boost VM 2 Impact on Performance Cache Size Increase: The jump from 48 MB to 260 MB of L3 cache is a key factor. A larger cache reduces dependency on RAM accesses, thereby lowering latency and significantly boosting performance in memory-intensive workloads such as AI, big data, and scientific simulations. Enhanced Frequency Dynamics: While the base frequency of the Emerald Rapids processor is slightly lower, its higher maximum frequency (4.2 GHz vs. 3.5 GHz) means that under load, performance-critical tasks can benefit from this burst capability. Advanced Instruction Support: The introduction of AMX (Advanced Matrix Extensions) in Emerald Rapids, along with the robust AVX-512 support, optimizes the execution of complex mathematical and AI workloads. Efficiency Gains: These processors also offer improved energy efficiency, reducing the energy consumed per compute unit. This efficiency translates into lower operational costs and a more sustainable cloud environment. Beyond Our Tests: Overview of the New v6 Series While our tests focused on the Dld e6 series, Azure’s new v6 generation includes several families designed for different workloads: 1. Dlsv6 and Dldsv6-series Segment: General purpose with NVMe local storage (where applicable) vCPUs Range: 2 – 128 Memory: 4 – 256 GiB Local Disk: Up to 7,040 GiB (Dldsv6) Highlights: 5× increased CPU cache (up to 300 MB) and higher network bandwidth (up to 54 Gbps) 2. Dsv6 and Ddsv6-series Segment: General purpose vCPUs Range: 2 – 128 Memory: Up to 512 GiB Local Disk: Up to 7,040 GiB in Ddsv6 Highlights: Up to 30% improved performance over the previous Dv5 generation and Azure Boost for enhanced IOPS and network performance 3. Esv6 and Edsv6-series Segment: Memory-optimized vCPUs Range: 2 – 192* (with larger sizes available in Q2) Memory: Up to 1.8 TiB (1832 GiB) Local Disk: Up to 10,560 GiB in Edsv6 Highlights: Ideal for in-memory analytics, relational databases, and enterprise applications requiring vast amounts of RAM Note: Sizes with higher vCPUs and memory (e.g., E128/E192) will be generally available in Q2 of this year. Key Innovations in the v6 Generation Increased CPU Cache: Up to 5× more cache (from 60 MB to 300 MB) dramatically improves data access speeds. NVMe for Storage: Enhanced local and remote storage performance, with up to 3× more IOPS locally and the capability to reach 400k IOPS remotely via Azure Boost. Azure Boost: Delivers higher throughput (up to 12 GB/s remote disk throughput) and improved network bandwidth (up to 200 Gbps for larger sizes). Microsoft Azure Network Adaptor (MANA): Provides improved network stability and performance for both Windows and Linux environments. Intel® Total Memory Encryption (TME): Enhances data security by encrypting the system memory. Scalability: Options ranging from 128 vCPUs/512 GiB RAM in the Dv6 family to 192 vCPUs/1.8 TiB RAM in the Ev6 family. Performance Gains: Benchmarks and internal tests (such as SPEC CPU Integer) indicate improvements of 15%–30% across various workloads including web applications, databases, analytics, and generative AI tasks. My personal perspective and point of view The new Azure v6 VMs mark a significant advancement in cloud computing performance, scalability, and security. Our Geekbench tests clearly show that the Dld e6 series—powered by the latest Intel Xeon Platinum 8573C (Emerald Rapids)—delivers up to 30% better performance than previous-generation machines with more resources. Coupled with the hardware evolution from Ice Lake-SP to Emerald Rapids—which brings a dramatic increase in cache size, improved frequency dynamics, and advanced instruction support—the new v6 generation sets a new standard for high-performance workloads. Whether you’re running critical enterprise applications, data-intensive analytics, or next-generation AI models, the enhanced capabilities of these VMs offer significant benefits in performance, efficiency, and cost-effectiveness. References and Further Reading: Microsoft’s official announcement: Azure Dld e6 VMs Internal tests performed with Geekbench 6.4.0 (AVX2) in the Germany West Central Azure region.
Gabriel_Naranjo
Nov 06, 2025 Place Azure Compute
1.1KViews
0likes
3Comments
Kickstart Conditional Access in Microsoft Entra: Free Starter Pack with Policies & Automation
Introduction Conditional Access (CA) is the backbone of Zero Trust in Microsoft Entra ID. It helps you enforce security without compromising productivity. But rolling out CA can feel risky what if you lock out admins or break apps? To make this easier, I’ve created a free starter pack with: Ready-to-use policy templates (JSON) PowerShell scripts for deployment via Microsoft Graph GitHub Actions workflow for automation Safe rollout strategy using report-only mode Why This Matters Block legacy authentication to reduce attack surface. Require MFA for admins to protect privileged accounts. Handle high-risk sign-ins with compliant device + MFA. Validate impact before enforcing using report-only mode. What’s Inside the Starter Pack ✔ Policies Block legacy authentication Require MFA for admin roles High-risk sign-ins → compliant device + MFA Safety-net report-only baseline ✔ Scripts Deploy policies (deploy-conditional-access.ps1) Export existing policies Toggle report-only mode ✔ Automation GitHub Actions workflow for CI/CD deployment ✔ Docs Usage guide Safe rollout checklist How to Use It Download the repo: GitHub Repo: https://github.com/soaeb7007/entra-ca-starter-pack Install Microsoft Graph PowerShell SDK: Install-Module Microsoft.Graph -Scope CurrentUser Connect-MgGraph -Scopes 'Policy.ReadWrite.ConditionalAccess','Directory.Read.All' Select-MgProfile -Name beta Deploy policies in report-only mode: ./scripts/deploy-conditional-access.ps1 -PolicyPath ./policies -ReportOnly Validate impact in Sign-in logs before enforcing. Safe Rollout Checklist Exclude break-glass accounts, Start with report-only, Validate for 48–72 hours, Roll out to pilot group before org-wide Next Steps Enable report-only mode for new policies. Explore Conditional Access templates in Entra portal. Watch for my next post: “Optimizing Conditional Access for Performance and Security.” What’s your biggest challenge with Conditional Access? Drop it in the comments, I’ll cover the top 3 in my next post.
SoaebRathod
Aug 23, 2025 Place Azure
107Views
0likes
0Comments
Azure Entra Security Copilot: How It’s Changing Identity Protection
Overview Azure Entra Security Copilot is revolutionizing how organizations approach identity protection. By combining the power of generative AI with Microsoft’s deep security insights, it enables faster threat detection, smarter policy recommendations, and simplified incident response. Hands-On Experience After integrating Security Copilot into our Azure Entra environment, here’s what stood out: Natural Language Queries: You can ask things like “Show me risky sign-ins from last week” and get instant, actionable insights. Automated Investigations: It correlates signals across Entra ID, Defender, and Sentinel to surface threats. Policy Recommendations: Based on your environment, it suggests Conditional Access policies to reduce risk. Use Cases 1. Breach Detection Detects anomalies like impossible travel, unfamiliar sign-in patterns, and token theft. Automatically flags high-risk users and suggests remediation steps. 2. Policy Optimization Recommends Conditional Access policies tailored to your org’s risk profile. Helps reduce over-permissive access and enforce least privilege. 3. Incident Response Generates incident summaries and timelines. Suggests next steps and integrates with Microsoft Sentinel for deeper investigation. Comparison with Traditional SIEM Workflows Discussion Starter Have you tried Security Copilot in your environment yet? What use cases have you explored? How does it compare with your existing SIEM or XDR tools? Let’s share insights and build a stronger identity protection strategy together!
SoaebRathod
Aug 23, 2025 Place Azure
80Views
0likes
0Comments
Scaling Smart with Azure: Architecture That Works
Hi Tech Community! I’m Zainab, currently based in Abu Dhabi and serving as Vice President of Finance & HR at Hoddz Trends LLC a global tech solutions company headquartered in Arkansas, USA. While I lead on strategy, people, and financials, I also roll up my sleeves when it comes to tech innovation. In this discussion, I want to explore the real-world challenges of scaling systems with Microsoft Azure. From choosing the right architecture to optimizing performance and cost, I’ll be sharing insights drawn from experience and I’d love to hear yours too. Whether you're building from scratch, migrating legacy systems, or refining deployments, let’s talk about what actually works.
zainab_al_zawjal_faisal
Jul 23, 2025 Place Azure
209Views
0likes
1Comment
Boosting Performance with the Latest Generations of Virtual Machines in Azure
Microsoft Azure recently announced the availability of the new generation of VMs (v6)—including the Dl/Dv6 (general purpose) and El/Ev6 (memory-optimized) series. These VMs are powered by the latest Intel Xeon processors and are engineered to deliver: Up to 30% higher per-core performance compared to previous generations. Greater scalability, with options of up to 128 vCPUs (Dv6) and 192 vCPUs (Ev6). Significant enhancements in CPU cache (up to 5× larger), memory bandwidth, and NVMe-enabled storage. Improved security with features like Intel® Total Memory Encryption (TME) and enhanced networking via the new Microsoft Azure Network Adaptor (MANA). By Microsoft By Microsoft Evaluated Virtual Machines and Geekbench Results The table below summarizes the configuration and Geekbench results for the two VMs we tested. VM1 represents a previous-generation machine with more vCPUs and memory, while VM2 is from the new Dld e6 series, showing superior performance despite having fewer vCPUs. VM1 features VM1 - D16S V5 (16 Vcpus - 64GB RAM) VM1 - D16S V5 (16 Vcpus - 64GB RAM) VM2 features VM2 - D16ls v6 (16 Vcpus - 32GB RAM) VM2 - D16ls v6 (16 Vcpus - 32GB RAM) Key Observations: Single-Core Performance: VM2 scores 2013 compared to VM1’s 1570, a 28.2% improvement. This demonstrates that even with half the vCPUs, the new Dld e6 series provides significantly better performance per core. Multi-Core Performance: Despite having fewer cores, VM2 achieves a multi-core score of 12,566 versus 9,454 for VM1, showing a 32.9% increase in performance. VM 1 VM 2 Enhanced Throughput in Specific Workloads: File Compression: 1909 MB/s (VM2) vs. 1654 MB/s (VM1) – a 15.4% improvement. Object Detection: 2851 images/s (VM2) vs. 1592 images/s (VM1) – a remarkable 79.2% improvement. Ray Tracing: 1798 Kpixels/s (VM2) vs. 1512 Kpixels/s (VM1) – an 18.9% boost. These results reflect the significant advancements enabled by the new generation of Intel processors. Score VM 1 VM 1 VM 1 Score VM 2 VM 2 VM 2 Evolution of Hardware in Azure: From Ice Lake-SP to Emerald Rapids Technical Specifications of the Processors Evaluated Understanding the dramatic performance improvements begins with a look at the processor specifications: Intel Xeon Platinum 8370C (Ice Lake-SP) Architecture: Ice Lake-SP Base Frequency: 2.79 GHz Max Frequency: 3.5 GHz L3 Cache: 48 MB Supported Instructions: AVX-512, VNNI, DL Boost VM 1 Intel Xeon Platinum 8573C (Emerald Rapids) Architecture: Emerald Rapids Base Frequency: 2.3 GHz Max Frequency: 4.2 GHz L3 Cache: 260 MB Supported Instructions: AVX-512, AMX, VNNI, DL Boost VM 2 Impact on Performance Cache Size Increase: The jump from 48 MB to 260 MB of L3 cache is a key factor. A larger cache reduces dependency on RAM accesses, thereby lowering latency and significantly boosting performance in memory-intensive workloads such as AI, big data, and scientific simulations. Enhanced Frequency Dynamics: While the base frequency of the Emerald Rapids processor is slightly lower, its higher maximum frequency (4.2 GHz vs. 3.5 GHz) means that under load, performance-critical tasks can benefit from this burst capability. Advanced Instruction Support: The introduction of AMX (Advanced Matrix Extensions) in Emerald Rapids, along with the robust AVX-512 support, optimizes the execution of complex mathematical and AI workloads. Efficiency Gains: These processors also offer improved energy efficiency, reducing the energy consumed per compute unit. This efficiency translates into lower operational costs and a more sustainable cloud environment. Beyond Our Tests: Overview of the New v6 Series While our tests focused on the Dld e6 series, Azure’s new v6 generation includes several families designed for different workloads: 1. Dlsv6 and Dldsv6-series Segment: General purpose with NVMe local storage (where applicable) vCPUs Range: 2 – 128 Memory: 4 – 256 GiB Local Disk: Up to 7,040 GiB (Dldsv6) Highlights: 5× increased CPU cache (up to 300 MB) and higher network bandwidth (up to 54 Gbps) 2. Dsv6 and Ddsv6-series Segment: General purpose vCPUs Range: 2 – 128 Memory: Up to 512 GiB Local Disk: Up to 7,040 GiB in Ddsv6 Highlights: Up to 30% improved performance over the previous Dv5 generation and Azure Boost for enhanced IOPS and network performance 3. Esv6 and Edsv6-series Segment: Memory-optimized vCPUs Range: 2 – 192* (with larger sizes available in Q2) Memory: Up to 1.8 TiB (1832 GiB) Local Disk: Up to 10,560 GiB in Edsv6 Highlights: Ideal for in-memory analytics, relational databases, and enterprise applications requiring vast amounts of RAM Note: Sizes with higher vCPUs and memory (e.g., E128/E192) will be generally available in Q2 of this year. Key Innovations in the v6 Generation Increased CPU Cache: Up to 5× more cache (from 60 MB to 300 MB) dramatically improves data access speeds. NVMe for Storage: Enhanced local and remote storage performance, with up to 3× more IOPS locally and the capability to reach 400k IOPS remotely via Azure Boost. Azure Boost: Delivers higher throughput (up to 12 GB/s remote disk throughput) and improved network bandwidth (up to 200 Gbps for larger sizes). Microsoft Azure Network Adaptor (MANA): Provides improved network stability and performance for both Windows and Linux environments. Intel® Total Memory Encryption (TME): Enhances data security by encrypting the system memory. Scalability: Options ranging from 128 vCPUs/512 GiB RAM in the Dv6 family to 192 vCPUs/1.8 TiB RAM in the Ev6 family. Performance Gains: Benchmarks and internal tests (such as SPEC CPU Integer) indicate improvements of 15%–30% across various workloads including web applications, databases, analytics, and generative AI tasks. My personal perspective and point of view The new Azure v6 VMs mark a significant advancement in cloud computing performance, scalability, and security. Our Geekbench tests clearly show that the Dld e6 series—powered by the latest Intel Xeon Platinum 8573C (Emerald Rapids)—delivers up to 30% better performance than previous-generation machines with more resources. Coupled with the hardware evolution from Ice Lake-SP to Emerald Rapids—which brings a dramatic increase in cache size, improved frequency dynamics, and advanced instruction support—the new v6 generation sets a new standard for high-performance workloads. Whether you’re running critical enterprise applications, data-intensive analytics, or next-generation AI models, the enhanced capabilities of these VMs offer significant benefits in performance, efficiency, and cost-effectiveness. References and Further Reading: Microsoft’s official announcement: Azure Dld e6 VMs Internal tests performed with Geekbench 6.4.0 (AVX2) in the Germany West Central Azure region.
Gabriel_Naranjo
Feb 17, 2025 Place Azure Infrastructure
1.6KViews
0likes
0Comments
Former Employer Abuse
My former employer, Albert Williams, president of American Security Force Inc., keeps adding my outlook accounts, computers and mobile devices to the company's azure cloud even though I left the company more than a year ago. What can I do to remove myself from his grip? Does Microsoft have a solution against abusive employers?
varle-vs-asf
Nov 05, 2024 Place Azure
101Views
0likes
0Comments
ASR Replication is stuck at "Waiting for First Recovery Point"
Hi All, I am trying to add ASR for one of my setup[VM's] present in Azure environment. But, all the VM's are stuck in "Waiting for First Recovery Point". Please find more details below. Configuration: 1. All the VM's re located in "East US2" 2. All the VM's are installed with Linux[Cent OS] Status of ASR: Created a new Recovery Service Vault “SoakASR-Vault” Enabled Replication for 3 servers for 3 performance servers. You can find the replicated servers in “SoakASR-Vault | Replicated items” Issue: All the 3 servers are stuck at “Waiting for First Recovery Point" Observations: I have created Recovery Services Vault in “Central US”. But, I see Network Mapping as WEST US in "Site Recovery infrastructure | Network mapping" Extension update is failing at "Site Recovery infrastructure | Extension update settings" I see 'Installing Mobility Service and preparing target' with status as “Completed with Information” message. Error ID: 151083 Error Message: Site recovery mobility service update completed with warnings Please help if you have any idea where I am going wrong. Thanks in advance.
psamudrala
Jul 13, 2024 Place Azure Compute
15KViews
0likes
2Comments
Creating Logic App to Identify Low Storage Devices from Intune
Hello everyone, I’m seeking some assistance with creating a Logic App. I need to identify devices in Intune that have 5GB or less of available space and receive an email with the details of these devices, including their names. Is this achievable?
AB21805
Jul 09, 2024 Place Azure
794Views
0likes
3Comments