machine learning
390 TopicsRSAC 2026: What the Sentinel Playbook Generator actually means for SOC automation
RSAC 2026 brought a wave of Sentinel announcements, but the one I keep coming back to is the playbook generator. Not because it's the flashiest, but because it touches something that's been a real operational pain point for years: the gap between what SOC teams need to automate and what they can realistically build and maintain. I want to unpack what this actually changes from an operational perspective, because I think the implications go further than "you can now vibe-code a playbook." The problem it solves If you've built and maintained Logic Apps playbooks in Sentinel at any scale, you know the friction. You need a connector for every integration. If there isn't one, you're writing custom HTTP actions with authentication handling, pagination, error handling - all inside a visual designer that wasn't built for complex branching logic. Debugging is painful. Version control is an afterthought. And when something breaks at 2am, the person on call needs to understand both the Logic Apps runtime AND the security workflow to fix it. The result in most environments I've seen: teams build a handful of playbooks for the obvious use cases (isolate host, disable account, post to Teams) and then stop. The long tail of automation - the enrichment workflows, the cross-tool correlation, the conditional response chains - stays manual because building it is too expensive relative to the time saved. What's actually different now The playbook generator produces Python. Not Logic Apps JSON, not ARM templates - actual Python code with documentation and a visual flowchart. You describe the workflow in natural language, the system proposes a plan, asks clarifying questions, and then generates the code once you approve. The Integration Profile concept is where this gets interesting. Instead of relying on predefined connectors, you define a base URL, auth method, and credentials for any service - and the generator creates dynamic API calls against it. This means you can automate against ServiceNow, Jira, Slack, your internal CMDB, or any REST API without waiting for Microsoft or a partner to ship a connector. The embedded VS Code experience with plan mode and act mode is a deliberate design choice. Plan mode lets you iterate on the workflow before any code is generated. Act mode produces the implementation. You can then validate against real alerts and refine through conversation or direct code edits. This is a meaningful improvement over the "deploy and pray" cycle most of us have with Logic Apps. Where I see the real impact For environments running Sentinel at scale, the playbook generator could unlock the automation long tail I mentioned above. The workflows that were never worth the Logic Apps development effort might now be worth a 15-minute conversation with the generator. Think: enrichment chains that pull context from three different tools before deciding on a response path, or conditional escalation workflows that factor in asset criticality, time of day, and analyst availability. There's also an interesting angle for teams that operate across Microsoft and non-Microsoft tooling. If your SOC uses Sentinel for SIEM but has Palo Alto, CrowdStrike, or other vendors in the stack, the Integration Profile approach means you can build cross-vendor response playbooks without middleware. The questions I'd genuinely like to hear about A few things that aren't clear from the documentation and that I think matter for production use: Security Copilot dependency: The prerequisites require a Security Copilot workspace with EU or US capacity. Someone in the blog comments already flagged this as a potential blocker for organizations that have Sentinel but not Security Copilot. Is this a hard requirement going forward, or will there be a path for Sentinel-only customers? Code lifecycle management: The generated Python runs... where exactly? What's the execution runtime? How do you version control, test, and promote these playbooks across dev/staging/prod? Logic Apps had ARM templates and CI/CD patterns. What's the equivalent here? Integration Profile security: You're storing credentials for potentially every tool in your security stack inside these profiles. What's the credential storage model? Is this backed by Key Vault? How do you rotate credentials without breaking running playbooks? Debugging in production: When a generated playbook fails at 2am, what does the troubleshooting experience look like? Do you get structured logs, execution traces, retry telemetry? Or are you reading Python stack traces? Coexistence with Logic Apps: Most environments won't rip and replace overnight. What's the intended coexistence model between generated Python playbooks and existing Logic Apps automation rules? I'm genuinely optimistic about this direction. Moving from a low-code visual designer to an AI-assisted coding model with transparent, editable output feels like the right architectural bet for where SOC automation needs to go. But the operational details around lifecycle, security, and debugging will determine whether this becomes a production staple or stays a demo-only feature. Would be interested to hear from anyone who's been in the preview - what's the reality like compared to the pitch?32Views0likes0CommentsAgentic AI in IT: Self-Healing Systems and Smart Incident Response (Microsoft Ecosystem Perspective)
Modern IT infrastructures are evolving rapidly. Organizations now run workloads across hybrid cloud environments, microservices architectures, Kubernetes clusters, and distributed applications. Managing this complexity with traditional monitoring tools is becoming increasingly difficult. https://dellenny.com/agentic-ai-in-it-self-healing-systems-and-smart-incident-response-microsoft-ecosystem-perspective/50Views0likes0CommentsBuilding with Azure OpenAI Sora: A Complete Guide to AI Video Generation
In this comprehensive guide, we'll explore how to integrate both Sora 1 and Sora 2 models from Azure OpenAI Service into a production web application. We'll cover API integration, request body parameters, cost analysis, limitations, and the key differences between using Azure AI Foundry endpoints versus OpenAI's native API. Table of Contents Introduction to Sora Models Azure AI Foundry vs. OpenAI API Structure API Integration: Request Body Parameters Video Generation Modes Cost Analysis per Generation Technical Limitations & Constraints Resolution & Duration Support Implementation Best Practices Introduction to Sora Models Sora is OpenAI's groundbreaking text-to-video model that generates realistic videos from natural language descriptions. Azure AI Foundry provides access to two versions: Sora 1: The original model focused primarily on text-to-video generation with extensive resolution options (480p to 1080p) and flexible duration (1-20 seconds) Sora 2: The enhanced version with native audio generation, multiple generation modes (text-to-video, image-to-video, video-to-video remix), but more constrained resolution options (720p only in public preview) Azure AI Foundry vs. OpenAI API Structure Key Architectural Differences Sora 1 uses Azure's traditional deployment-based API structure: Endpoint Pattern: https://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/... Parameters: Uses Azure-specific naming like n_seconds, n_variants, separate width/height fields Job Management: Uses /jobs/{id} for status polling Content Download: Uses /video/generations/{generation_id}/content/video Sora 2 adapts OpenAI's v1 API format while still being hosted on Azure: Endpoint Pattern: https://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/videos Parameters: Uses OpenAI-style naming like seconds (string), size (combined dimension string like "1280x720") Job Management: Uses /videos/{video_id} for status polling Content Download: Uses /videos/{video_id}/content Why This Matters? This architectural difference requires conditional request formatting in your code: const isSora2 = deployment.toLowerCase().includes('sora-2'); if (isSora2) { requestBody = { model: deployment, prompt, size: `${width}x${height}`, // Combined format seconds: duration.toString(), // String type }; } else { requestBody = { model: deployment, prompt, height, // Separate dimensions width, n_seconds: duration.toString(), // Azure naming n_variants: variants, }; } API Integration: Request Body Parameters Sora 1 API Parameters Standard Text-to-Video Request: { "model": "sora-1", "prompt": "Wide shot of a child flying a red kite in a grassy park, golden hour sunlight, camera slowly pans upward.", "height": "720", "width": "1280", "n_seconds": "12", "n_variants": "2" } Parameter Details: model (String, Required): Your Azure deployment name prompt (String, Required): Natural language description of the video (max 32000 chars) height (String, Required): Video height in pixels width (String, Required): Video width in pixels n_seconds (String, Required): Duration (1-20 seconds) n_variants (String, Optional): Number of variations to generate (1-4, constrained by resolution) Sora 2 API Parameters Text-to-Video Request: { "model": "sora-2", "prompt": "A serene mountain landscape with cascading waterfalls, cinematic drone shot", "size": "1280x720", "seconds": "12" } Image-to-Video Request (uses FormData): const formData = new FormData(); formData.append('model', 'sora-2'); formData.append('prompt', 'Animate this image with gentle wind movement'); formData.append('size', '1280x720'); formData.append('seconds', '8'); formData.append('input_reference', imageFile); // JPEG/PNG/WebP Video-to-Video Remix Request: Endpoint: POST .../videos/{video_id}/remix Body: Only { "prompt": "your new description" } The original video's structure, motion, and framing are reused while applying the new prompt Parameter Details: model (String, Optional): Your deployment name prompt (String, Required): Video description size (String, Optional): Either "720x1280" or "1280x720" (defaults to "720x1280") seconds (String, Optional): "4", "8", or "12" (defaults to "4") input_reference (File, Optional): Reference image for image-to-video mode remix_video_id (String, URL parameter): ID of video to remix Video Generation Modes 1. Text-to-Video (Both Models) The foundational mode where you provide a text prompt describing the desired video. Implementation: const response = await fetch(endpoint, { method: 'POST', headers: { 'Content-Type': 'application/json', 'api-key': apiKey, }, body: JSON.stringify({ model: deployment, prompt: "A train journey through mountains with dramatic lighting", size: "1280x720", seconds: "12", }), }); Best Practices: Include shot type (wide, close-up, aerial) Describe subject, action, and environment Specify lighting conditions (golden hour, dramatic, soft) Add camera movement if desired (pans, tilts, tracking shots) 2. Image-to-Video (Sora 2 Only) Generate a video anchored to or starting from a reference image. Key Requirements: Supported formats: JPEG, PNG, WebP Image dimensions must exactly match the selected video resolution Our implementation automatically resizes uploaded images to match Implementation Detail: // Resize image to match video dimensions const targetWidth = parseInt(width); const targetHeight = parseInt(height); const resizedImage = await resizeImage(inputReference, targetWidth, targetHeight); // Send as multipart/form-data formData.append('input_reference', resizedImage); 3. Video-to-Video Remix (Sora 2 Only) Create variations of existing videos while preserving their structure and motion. Use Cases: Change weather conditions in the same scene Modify time of day while keeping camera movement Swap subjects while maintaining composition Adjust artistic style or color grading Endpoint Structure: POST {base_url}/videos/{original_video_id}/remix?api-version=2024-08-01-preview Implementation: let requestEndpoint = endpoint; if (isSora2 && remixVideoId) { const [baseUrl, queryParams] = endpoint.split('?'); const root = baseUrl.replace(/\/videos$/, ''); requestEndpoint = `${root}/videos/${remixVideoId}/remix${queryParams ? '?' + queryParams : ''}`; } Cost Analysis per Generation Sora 1 Pricing Model Base Rate: ~$0.05 per second per variant at 720p Resolution Scaling: Cost scales linearly with pixel count Formula: const basePrice = 0.05; const basePixels = 1280 * 720; // Reference resolution const currentPixels = width * height; const resolutionMultiplier = currentPixels / basePixels; const totalCost = basePrice * duration * variants * resolutionMultiplier; Examples: 720p (1280Ă720), 12 seconds, 1 variant: $0.60 1080p (1920Ă1080), 12 seconds, 1 variant: $1.35 720p, 12 seconds, 2 variants: $1.20 Sora 2 Pricing Model Flat Rate: $0.10 per second per variant (no resolution scaling in public preview) Formula: const totalCost = 0.10 * duration * variants; Examples: 720p (1280Ă720), 4 seconds: $0.40 720p (1280Ă720), 12 seconds: $1.20 720p (720Ă1280), 8 seconds: $0.80 Note: Since Sora 2 currently only supports 720p in public preview, resolution doesn't affect cost, only duration matters. Cost Comparison Scenario Sora 1 (720p) Sora 2 (720p) Winner 4s video $0.20 $0.40 Sora 1 12s video $0.60 $1.20 Sora 1 12s + audio N/A (no audio) $1.20 Sora 2 (unique) Image-to-video N/A $0.40-$1.20 Sora 2 (unique) Recommendation: Use Sora 1 for cost-effective silent videos at various resolutions. Use Sora 2 when you need audio, image/video inputs, or remix capabilities. Technical Limitations & Constraints Sora 1 Limitations Resolution Options: 9 supported resolutions from 480Ă480 to 1920Ă1080 Includes square, portrait, and landscape formats Full list: 480Ă480, 480Ă854, 854Ă480, 720Ă720, 720Ă1280, 1280Ă720, 1080Ă1080, 1080Ă1920, 1920Ă1080 Duration: Flexible: 1 to 20 seconds Any integer value within range Variants: Depends on resolution: 1080p: Variants disabled (n_variants must be 1) 720p: Max 2 variants Other resolutions: Max 4 variants Concurrent Jobs: Maximum 2 jobs running simultaneously Job Expiration: Videos expire 24 hours after generation Audio: No audio generation (silent videos only) Sora 2 Limitations Resolution Options (Public Preview): Only 2 options: 720Ă1280 (portrait) or 1280Ă720 (landscape) No square formats No 1080p support in current preview Duration: Fixed options only: 4, 8, or 12 seconds No custom durations Defaults to 4 seconds if not specified Variants: Not prominently supported in current API documentation Focus is on single high-quality generations with audio Concurrent Jobs: Maximum 2 jobs (same as Sora 1) Job Expiration: 24 hours (same as Sora 1) Audio: Native audio generation included (dialogue, sound effects, ambience) Shared Constraints Concurrent Processing: Both models enforce a limit of 2 concurrent video jobs per Azure resource. You must wait for one job to complete before starting a third. Job Lifecycle: queued â preprocessing â processing/running â completed Download Window: Videos are available for 24 hours after completion. After expiration, you must regenerate the video. Generation Time: Typical: 1-5 minutes depending on resolution, duration, and API load Can occasionally take longer during high demand Resolution & Duration Support Matrix Sora 1 Support Matrix Resolution Aspect Ratio Max Variants Duration Range Use Case 480Ă480 Square 4 1-20s Social thumbnails 480Ă854 Portrait 4 1-20s Mobile stories 854Ă480 Landscape 4 1-20s Quick previews 720Ă720 Square 4 1-20s Instagram posts 720Ă1280 Portrait 2 1-20s TikTok/Reels 1280Ă720 Landscape 2 1-20s YouTube shorts 1080Ă1080 Square 1 1-20s Premium social 1080Ă1920 Portrait 1 1-20s Premium vertical 1920Ă1080 Landscape 1 1-20s Full HD content Sora 2 Support Matrix Resolution Aspect Ratio Duration Options Audio Generation Modes 720Ă1280 Portrait 4s, 8s, 12s â Yes Text, Image, Video Remix 1280Ă720 Landscape 4s, 8s, 12s â Yes Text, Image, Video Remix Note: Sora 2's limited resolution options in public preview are expected to expand in future releases. Implementation Best Practices 1. Job Status Polling Strategy Implement adaptive backoff to avoid overwhelming the API: const maxAttempts = 180; // 15 minutes max let attempts = 0; const baseDelayMs = 3000; // Start with 3 seconds while (attempts < maxAttempts) { const response = await fetch(statusUrl, { headers: { 'api-key': apiKey }, }); if (response.status === 404) { // Job not ready yet, wait longer const delayMs = Math.min(15000, baseDelayMs + attempts * 1000); await new Promise(r => setTimeout(r, delayMs)); attempts++; continue; } const job = await response.json(); // Check completion (different status values for Sora 1 vs 2) const isCompleted = isSora2 ? job.status === 'completed' : job.status === 'succeeded'; if (isCompleted) break; // Adaptive backoff const delayMs = Math.min(15000, baseDelayMs + attempts * 1000); await new Promise(r => setTimeout(r, delayMs)); attempts++; } 2. Handling Different Response Structures Sora 1 Video Download: const generations = Array.isArray(job.generations) ? job.generations : []; const genId = generations[0]?.id; const videoUrl = `${root}/${genId}/content/video`; Sora 2 Video Download: const videoUrl = `${root}/videos/${jobId}/content`; 3. Error Handling try { const response = await fetch(endpoint, fetchOptions); if (!response.ok) { const error = await response.text(); throw new Error(`Video generation failed: ${error}`); } // ... handle successful response } catch (error) { console.error('[VideoGen] Error:', error); // Implement retry logic or user notification } 4. Image Preprocessing for Image-to-Video Always resize images to match the target video resolution: async function resizeImage(file: File, targetWidth: number, targetHeight: number): Promise<File> { return new Promise((resolve, reject) => { const img = new Image(); const canvas = document.createElement('canvas'); const ctx = canvas.getContext('2d'); img.onload = () => { canvas.width = targetWidth; canvas.height = targetHeight; ctx.drawImage(img, 0, 0, targetWidth, targetHeight); canvas.toBlob((blob) => { if (blob) { const resizedFile = new File([blob], file.name, { type: file.type }); resolve(resizedFile); } else { reject(new Error('Failed to create resized image blob')); } }, file.type); }; img.onerror = () => reject(new Error('Failed to load image')); img.src = URL.createObjectURL(file); }); } 5. Cost Tracking Implement cost estimation before generation and tracking after: // Pre-generation estimate const estimatedCost = calculateCost(width, height, duration, variants, soraVersion); // Save generation record await saveGenerationRecord({ prompt, soraModel: soraVersion, duration: parseInt(duration), resolution: `${width}x${height}`, variants: parseInt(variants), generationMode: mode, estimatedCost, status: 'queued', jobId: job.id, }); // Update after completion await updateGenerationStatus(jobId, 'completed', { videoId: finalVideoId }); 6. Progressive User Feedback Provide detailed status updates during the generation process: const statusMessages: Record<string, string> = { 'preprocessing': 'Preprocessing your request...', 'running': 'Generating video...', 'processing': 'Processing video...', 'queued': 'Job queued...', 'in_progress': 'Generating video...', }; onProgress?.(statusMessages[job.status] || `Status: ${job.status}`); Conclusion Building with Azure OpenAI's Sora models requires understanding the nuanced differences between Sora 1 and Sora 2, both in API structure and capabilities. Key takeaways: Choose the right model: Sora 1 for resolution flexibility and cost-effectiveness; Sora 2 for audio, image inputs, and remix capabilities Handle API differences: Implement conditional logic for parameter formatting and status polling based on model version Respect limitations: Plan around concurrent job limits, resolution constraints, and 24-hour expiration windows Optimize costs: Calculate estimates upfront and track actual usage for better budget management Provide great UX: Implement adaptive polling, progressive status updates, and clear error messages The future of AI video generation is exciting, and Azure AI Foundry provides production-ready access to these powerful models. As Sora 2 matures and limitations are lifted (especially resolution options), we'll see even more creative applications emerge. Resources: Azure AI Foundry Sora Documentation OpenAI Sora API Reference Azure OpenAI Service Pricing This blog post is based on real-world implementation experience building LemonGrab, my AI video generation platform that integrates both Sora 1 and Sora 2 through Azure AI Foundry. The code examples are extracted from production usage.596Views0likes0CommentsWhat is trending in Hugging Face on Microsoft Foundry? Feb, 2, 2026
Openâsource AI is moving fast, with important breakthroughs in reasoning, agentic systems, multimodality, and efficiency emerging every day. Hugging Face has been a leading platform where researchers, startups, and developers share and discover new models. Microsoft Foundry brings these trending Hugging Face models into a productionâready experience, where developers can explore, evaluate, and deploy them within their Azure environment. Our weekly Model Mondayâs series highlights Hugging Face models available in Foundry, focusing on what matters most to developers: why a model is interesting, where it fits, and how to put it to work quickly. This weekâs Model Mondays edition highlights three Hugging Face models, including a powerful Mixture-of-Experts model from Z. AI designed for lightweight deployment, Metaâs unified foundation model for image and video segmentation, and MiniMaxâs latest open-source agentic model optimized for complex workflows. Models of the week Z.AIâs GLM-4.7-flash Model Basics Model name: zai-org/GLM-4.7-Flash Parameters / size: 30B total -3B active Default settings: 131,072 max new tokens Primary task: Agentic, Reasoning and Coding Why this model matters Why itâs interesting: It utilizes a Mixture-of-Experts (MoE) architecture (30B total parameters and 3B active parameters) to offer a new option for lightweight deployment. It demonstrates strong performance on logic and reasoning benchmarks, outperforming similar sized models like gpt-oss-20b on AIME 25 and GPQA benchmarks. It supports advanced inference features like "Preserved Thinking" mode for multi-turn agentic tasks. Bestâfit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. Whatâs notable: From the Foundry catalog, users can deploy on a A100 instance or unsloth/GLM-4.7-Flash-GGUF on a CPU. ource SOTA scores among models of comparable size. Additionally, compared to similarly sized models, GLM-4.7-Flash demonstrates superior frontend and backend development capabilities. Click to see more: https://docs.z.ai Try it Use case Bestâpractice prompt pattern Agentic coding (multiâstep repo work, debugging, refactoring) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and stepâbyâstep execution, then a single consolidated result. Longâcontext agent workflows (local or lowâcost autonomous agents) Call out longâhorizon consistency and context preservation. Instruct the model to retain earlier assumptions and decisions across turns. Now that you know GLMâ4.7âFlash works best when you give it a clear goal and let it reason through a bounded task, hereâs an example prompt that a product or engineering team might use to identify risks and propose mitigations: You are a software reliability analyst for a midâscale SaaS platform. Review recent incident reports, production logs, and customer issues to uncover edgeâcase failures outside normal usage (e.g., rare inputs, boundary conditions, timing/concurrency issues, config drift, or unexpected feature interactions). Prioritize lowâfrequency, highâimpact risks that standard testing misses. Recommend minimal, lowâcost fixes (validation, guardrails, fallback logic, or documentation). Deliver a concise executive summary with sections: Observed Edge Cases, Root Causes, User Impact, Recommended Lightweight Fixes, and Validation Steps. Meta's Segment Anything 3 (SAM3) Model Basics Model name: facebook/sam3 Parameters / size: 0.9B Primary task: Mask Generation, Promptable Concept Segmentation (PCS) Why this model matters Why itâs interesting: It handles a vastly larger set of open-vocabulary prompts than SAM 2, and unifies image and video segmentation capabilities. It includes a "SAM 3 Tracker" mode that acts as a drop-in replacement for SAM 2 workflows with improved performance. Bestâfit use cases: Open-vocabulary object detection, video object tracking, and automatic mask generation Whatâs notable: Introduces Promptable Concept Segmentation (PCS), allowing users to find all matching objects (e.g., "dial") via text prompt rather than just single instances. Try it This model enables users to identify specific objects within video footage and isolate them over extended periods. With just one line of code, it is possible to detect multiple similar objects simultaneously. The accompanying GIF demonstrates how SAM3 efficiently highlights players wearing white on the field as they appear and disappear from view. Additional examples are available at the following repository: https://github.com/facebookresearch/sam3/blob/main/assets/player.gif Use case Bestâpractice prompt pattern Agentic coding (multiâstep repo work, debugging, refactoring) Treat SAMâŻ3 as a concept detector, not an interactive click tool. Use short, concrete nounâphrase concept prompts instead of describing the scene or asking questions. Example prompt: âyellow school busâ or âshipping containersâ. Avoid verbs or full sentences. Video segmentation + object tracking Specify the same concept prompt once, then apply it across the video sequence. Do not restate the prompt per frame. Let the model maintain identity continuity. Example: âperson wearing a red jerseyâ. Hardâtoâname or visually subtle objects Use exemplarâbased prompts (image region or box) when text alone is ambiguous. Optionally combine positive and negative exemplars to refine the concept. Avoid overâconstraining with long descriptions. Using the GIF above as a leading example, here is a prompt that shows how SAMâŻ3 turns raw sports footage into structured, reusable data. By identifying and tracking players based on visual concepts like jersey color so that sports leagues can turn tracked data into interactive experiences where automated player identification can relay stats, fun facts, etc when built into a larger application. Here is a prompt that will allow you to start identifying specific players across video: Act as a sports analytics operator analyzing football match footage. Segment and track all football players wearing blue jerseys across the video. Generate pixelâaccurate segmentation masks for each player and assign persistent instance IDs that remain stable during camera movement, zoom, and player occlusion. Exclude referees, opposing team jerseys, sidelines, and crowd. Output frameâlevel masks and tracking metadata suitable for overlays, player statistics, and downstream analytics pipelines. MiniMax AI's MiniMax-M2.1 Model Basics Model name: MiniMaxAI/MiniMax-M2.1 Parameters / size: 229B-10B Active Default settings: 200,000 max new tokens Primary task: Agentic and Coding Why this model matters Why itâs interesting: It is optimized for robustness in coding, tool use, and long-horizon planning, outperforming Claude Sonnet 4.5 in multilingual scenarios. It excels in full-stack application development, capable of architecting apps "from zero to oneâ. Previous coding models focused on Python optimization, M2.1 brings enhanced capabilities in Rust, Java, Golang, C++, Kotlin, Objective-C, TypeScript, JavaScript, and other languages. The model delivers exceptional stability across various coding agent frameworks. Bestâfit use cases: Lightweight local deployment, multi-turn agentic tasks, and logical reasoning applications. Whatâs notable: The release of open-source weights for M2.1 delivers a massive leap over M2 on software engineering leaderboards. https://www.minimax.io/ Try it Use case Bestâpractice prompt pattern Endâtoâend agentic coding (multiâfile edits, runâfix loops) Treat the model as an autonomous coding agent, not a snippet generator. Explicitly require task decomposition and stepâbyâstep execution, then a single consolidated result. Longâhorizon toolâusing agents (shell, browser, Python) Explicitly request stepwise planning and sequential tool use. M2.1âs interleaved thinking and improved instructionâconstraint handling are designed for complex, multiâstep analytical tasks that require evidence tracking and coherent synthesis, not conversational backâandâforth. Longâcontext reasoning & analysis (large documents / logs) Declare the scope and desired output structure up front. MiniMaxâM2.1 performs best when the objective and final artifact are clear, allowing it to manage long context and maintain coherence. Because MiniMaxâM2.1 is designed to act as a longâhorizon analytical agent, it shines when you give it a clear end goal and let it work through large volumes of informationâhereâs a prompt a risk or compliance team could use in practice: You are a financial risk analysis agent. Analyze the following transaction logs and compliance policy documents to identify potential regulatory violations and systemic risk patterns. Plan your approach before executing. Work through the data step by step, referencing evidence where relevant. Deliver a final report with the following sections: Key Risk Patterns Identified, Supporting Evidence, Potential Regulatory Impact, Recommended Mitigations. Your response should be a complete, executive-ready report, not a conversational draft. Getting started You can deploy openâsource Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry1.2KViews0likes0CommentsBeyond the Model: Empower your AI with Data Grounding and Model Training
Discover how Microsoft Foundry goes beyond foundational models to deliver enterprise-grade AI solutions. Learn how data grounding, model tuning, and agentic orchestration unlock faster time-to-value, improved accuracy, and scalable workflows across industries.938Views6likes4CommentsUnlocking Efficient and Secure AI for Android with Foundry Local
The ability to run advanced AI models directly on smartphones is transforming the mobile landscape. Foundry Local for Android simplifies the integration of generative AI models, allowing teams to deliver sophisticated, secure, and low-latency AI experiences natively on mobile devices. This post highlights Foundry Local for Android as a compelling solution for Android developers, helping them efficiently build and deploy powerful on-device AI capabilities within their applications. The Challenges of Deploying AI on Mobile Devices On-device AI offers the promise of offline capabilities, enhanced privacy, and low-latency processing. However, implementing these capabilities on mobile devices introduces several technical obstacles: Limited computing and storage: Mobile devices operate with constrained processing power and storage compared to traditional PCs. Even the most compact language models can occupy significant space and demand substantial computational resources. Efficient solutions for model and runtime optimization are critical for successful deployment. Concerns about the app size: Integrating large AI models and libraries can dramatically increase application size, reducing install rates and degrading other app features. It remains a challenge to provide advanced AI capabilities while keeping the application compact and efficient. Complexity of development and integration: Most mobile development teams are not specialized in machine learning. The process of adapting, optimizing, and deploying models for mobile inference can be resource intensive. Streamlined APIs and pre-optimized models simplify integration and accelerate time to market. Introducing Foundry Local for Android Foundry Local is designed as a comprehensive on-device AI solution, featuring pre-optimized models, a cross-platform inference engine, and intuitive APIs for seamless integration. Initially announced at //Build 2025 with support for Windows and MacOS desktops, Foundry Local now extends its capabilities to Android in private preview. You can sign up for the private preview https://aka.ms/foundrylocal-androidprp for early evaluation and feedback.⯠To meet the demands of production deployments, Foundry Local for Android is architected as a dedicated Android app paired with an SDK. The app manages model distribution, hosts the AI runtime, and operates as a specialized background service. Client applications interface with this service using a lightweight Foundry Local Android SDK, ensuring minimal overhead and streamlined connectivity. One Model, Multiple Apps: Foundry Local centralizes model management, ensuring that if multiple applications utilize the same model in Foundry Local, it is downloaded and stored only once. This approach optimizes storage and streamlines resource usage. Minimal App Footprint: Client applications are freed from embedding bulky machine learning libraries and models. This avoids ballooning app size and memory usage. Run Separately from Client Apps: The Foundry Local operates independently of client applications. Developers benefit from continuous enhancements without the need for frequent app releases. Customer Story: PhonePe PhonePe, one of India's largest consumer payments platforms that enables access to payments and financial services to hundreds of millions of people across the country. With Foundry Local, PhonePe is enabling AI that allows their users to gain deeper insights into their transactions and payments behavior directly on their mobile device. And because inferencing happens locally, all data stays private and secure. This collaboration addresses PhonePe's key priority of delivering an AI experience that upholds privacy. Foundry Local enables PhonePe to differentiate their app experience in a competitive market using AI while ensuring compliance with privacy commitments. Explore their journey here: PhonePe Product Showcase at Microsoft Ignite 2025 Call to Action Foundry Local equips Android apps with on-device AI, supporting the development of smarter applications for the future. Developers are able to build efficient and secure AI capabilities into their apps, even without extensive expertise in artificial intelligence. See more about Foundry Local in action in this episode of Microsoft Mechanics: https://aka.ms/FL_IGNITE_MSMechanics We look forward to seeing you light up AI capabilities in your Android app with Foundry Local. Donât miss our private preview: https://aka.ms/foundrylocal-androidprp. We appreciate your feedback, as it will help us make our product better. Thanks to the contribution from NimbleEdge which delivers real-time, on-device personalization for millions of mobile devices. NimbleEdge's mobile technology expertise helps Foundry Local deliver a better experience for Android users.560Views0likes0CommentsGet to know the core Foundry solutions
Foundry includes specialized services for vision, language, documents, and search, plus Microsoft Foundry for orchestration and governance. Hereâs what each does and why it matters: Azure Vision With Azure Vision, you can detect common objects in images, generate captions, descriptions, and tags based on image contents, and read text in images. Example: Automate visual inspections or extract text from scanned documents. Azure Language Azure Language helps organizations understand and work with text at scale. It can identify key information, gauge sentiment, and create summaries from large volumes of content. It also supports building conversational experiences and question-answering tools, making it easier to deliver fast, accurate responses to customers and employees. Example: Understand customer feedback or translate text into multiple languages. Azure Document IntelligenceWith Azure Document Intelligence, you can use pre-built or custom models to extract fields from complex documents such as invoices, receipts, and forms. Example: Automate invoice processing or contract review. Azure SearchAzure Search helps you find the right information quickly by turning your content into a searchable index. It uses AI to understand and organize data, making it easier to retrieve relevant insights. This capability is often used to connect enterprise data with generative AI, ensuring responses are accurate and grounded in trusted information. Example: Help employees retrieve policies or product details without digging through files. Microsoft FoundryActs as the orchestration and governance layer for generative AI and AI agents. It provides tools for model selection, safety, observability, and lifecycle management. Example: Coordinate workflows that combine multiple AI capabilities with compliance and monitoring. Business leaders often ask: Which Foundry tool should I use? The answer depends on your workflow. For example: Are you trying to automate document-heavy processes like invoice handling or contract review? Do you need to improve customer engagement with multilingual support or sentiment analysis? Or are you looking to orchestrate generative AI across multiple processes for marketing or operations? Connecting these needs to the right Foundry solution ensures you invest in technology that delivers measurable results.112Views0likes0CommentsArchitecting the Next-Generation Customer Tiering System
Authors Sailing Ni*, Joy Yu*, Peng Yang*, Richard Sie*, Yifei Wang* *These authors contributed equally. Affiliation Master of Science in Business Analytics (MSBA), UCLA Anderson School of Management, Los Angeles, California 90095, United States (Conducted December 2025) Acknowledgment This research was conducted as part of a Microsoft-sponsored Capstone Project, led by Juhi Singh and Bonnie Ao from the Microsoft MCAPS AI Transformation Office. Microsoftâs global B2B software business classifies customers into four tiers to guide coverage, investment, and sales strategy. However, the legacy tiering framework mixes historical rules with manual heuristics, causing several issues: Tiers do not consistently reflect customer potential or revenue importance. Statistical coherence and business KPIs (TPA, TCI, SFI) are not optimized or enforced. Tier distributions are imbalanced due to legacy ±1 movement and capacity rules. Sales coverage planning depends on a tier structure not grounded in data. To address these limitations, we, UCLA Anderson MSBA class of Dec'25, designed a next-generation KPI-driven tiering architecture. Our objective was to move from a heuristic, static system toward a scalable, transparent, and business-aligned framework. Our redesigned tiering system follows five complementary analytical layers, each addressing a specific gap in the legacy process: Natural Segmentation (Unsupervised Baseline): Identify the intrinsic structure of the customer base using clustering to understand how customers naturally group Pure KPI-Based Tiering (Upper-Bound Benchmark): Show what tiers would look like if aligned only to business KPIs, quantifying the maximum potential lift and exposing trade-offs. Hybrid KPI-Aware Segmentation (Our Contribution): Integrate clustering geometry with KPI optimization and business constraints to produce a realistic, interpretable, and deployable tiering system. Dynamic Tiering (Longitudinal Diagnostics): Analyze historical patterns to understand how companies evolve over time, separating structural tier drift from noise. Optimization & Resource Allocation (Proof of Concept): Demonstrate how the new tiers could feed into downstream coverage and whitespace prioritization through MIP- and heuristic-based approaches. Together, these components answer a core strategic question: âHow should Microsoft tier its global customer base so that investment, coverage, and growth strategy directly reflect business value?â Our final architecture transforms tiering from a static classification exercise into a KPI-driven, interpretable, and operationally grounded decision framework suitable for Microsoftâs future AI and data strategy. Solution Architecture Diagram 1. Success Metrics Definition Before designing any segmentation system, the first step is to establish success metrics that define what âgoodâ looks like. Without these metrics, models can easily produce clusters that are statistically neat but misaligned with business needs. A clear KPI framework ensures that every modelâregardless of method or complexityâis evaluated consistently on both analytical quality and real business impact. We define success across two complementary dimensions: 1.1 Alignment & Segmentation Quality: These metrics evaluate whether the segmentation meaningfully separates customers based on business potential. 1.1.1 Tier Potential Alignment (TPA) Measures how well assigned tiers follow the rank order of PI_acct, our composite indicator of future growth potential. Implemented as a Spearman rank correlation, TPA tests whether higher-potential accounts systematically land in higher tiers. Step 1 - Formula for PI_acct (Potential Index per Account) Step 2 - Formula for TPA (Tier Potential Alignment) đ_đ = Spearman rank correlation Tier Rank = ordinal tier number (Tier A = highest â Tier D = lowest) Interpretation: TPA=1 â Perfect alignment (higher potential â higher tier) TPA=0 â No statistical relationship TPA<0 â Misalignment (tiers contradict potential) 1.1.2 Tier Compactness Index (TCI) Measures how homogeneous each tier is. Low within-tier variance on PI_acct or Revenue indicates that customers grouped together truly share similar characteristics, improving interpretability and resource planning. (1) Potential-based Compactness - TCI_PI (2) Revenue-based Compactness - TCI_REV TCI=1 â tiers are tight and well-separated TCI=0 â tiers are random or overlapping TCI<0 â within-tier variance exceeds total variance (poor grouping) 1.2 Business Impact These metrics test whether the segmentation supports strategic goals, not just statistical structure. 1.2.1 Strategic Focus Index (SFI) Quantifies how much revenue comes from the companyâs most strategically important tiers. High SFI means segmentation helps focus investmentsâsales coverage, specialist time, programsâon the customers that matter most. Under the Tier Policy framework, the definition of âstrategicâ automatically adapts to the number of tiers K - for example, taking the top L tiers (e.g., top 2) or top x % of tiers ranked by mean potential or revenue. High SFI: strong emphasis on top strategic segments (potentially efficient but watch concentration risk). Moderate SFI: balanced focus across tiers. Low SFI: diffuse portfolio, limited emphasis on priority segment 2. Static Segmentation 2.1 Pure Unsupervised Clustering 2.1.1 Model Conclusions at a Glance Across all unsupervised models evaluatedâWard, Weighted Ward, K-Medoids, K-Means / K-Means++, and HDBSCAN â only the Ward model (K=4, Policy v2) provides a segmentation that is simultaneously: statistically coherent, business-aligned (high SFI), geometrically stable (clean Silhouette), and operationally interpretable. All alternative models either distort cluster geometry, collapse SFI, or produce unstable/illogical tier structures. Final Recommendation: Use Ward (K=4, Policy v2) as the natural segmentation baseline. 2.1.2 High-Level Algorithm Comparison Table 1. Algorithm Comparison Model Algorithm Summary Strengths Weaknesses Business Use Ward Variance-minimizing hierarchical merges Best balance of TPA/TCI/SFI; stable geometry Sensitive to correlated features Primary model for segmentation Weighted Ward Distance reweighted by PI + revenue Higher TPA Silhouette collapse; unstable Not recommended K-Medoids Medoid-based dissimilarity minimization Robust to outliers Cluster compression; weak SFI Diagnostic only K-Means / K-Means++ Squared-distance minimization Fast baseline SFI collapse; over-tight clusters Numeric benchmark only HDBSCAN Density-based clustering with noise Good for anomaly detection TPA collapse; noisy tiers; broken PI ordering Not suitable for tiering 2.1.3 Modeling Results Table 2. Unsupervised Clustering Model Results Metric FY26 Baseline (Legacy A+B) Ward K=4 (Policy v2) Weighted Ward2-B (α=4, ÎČ=0.8, s=0.7, K=5) Unweighted Ward (Policy v2, K=4) Unweighted Ward (Policy v2, K=3) K-Medoids B4 Behavior-only (K=3) K-means K=4 (Policy v2) K-means++ K=4 (Policy v2) HDBSCAN (baseline settings) TPA 0.260 0.260 0.860 0.260 0.300 0.520 0.310 0.310 0.040 TCI_PI 0.222 0.461 0.772 0.461 0.405 0.173 0.476 0.476 0.004 TCI_REV 0.469 0.801 0.640 0.801 0.672 0.002 0.831 0.831 0.062 SFI 0.807 0.868 0.817 0.868 0.960 0.656 0.332 0.332 0.719 Silhouette nan 0.560 0.145 0.560 0.604 0.466 0.523 0.523 0.186 Ward (K=4, Policy v2) remains the strongest performer: SFI â 0.87, Silhouette â 0.56, stable geometric structure. Weighted Ward raises TPA/PI slightly but Silhouette collapses (~0.15) â structural instability â not viable. K-Medoids consistently compresses clusters; TPA/TCI gain is offset by TCI_REV collapse and low SFI. K-Means / K-Means++ tighten numeric clusters but SFI drops to ~0.33 â tiers lose strategic meaning. HDBSCAN generates large noisy segments; TPA = 0.044, TCI_PI = 0.004, Silhouette = 0.186, and Tier A/B contain negative PI â fundamentally unsuitable. Conclusion: Only Ward (K=4) produces segmentation with both statistical integrity and business relevance. 2.1.4 Implications, Limitations, Next Steps Implications Our current unsupervised segmentation delivers statistically coherent and operationally usable tiers, but several structural findings emerged: Unsupervised methods reveal the dataâs natural shape, not business priorities: Ward/K-means/HDBSCAN can discover separations in the feature space but cannot move clusters toward preferred PI or revenue patterns. Cluster outcomes cannot guarantee business-desired constraints. For example: If Tier Aâs PI mean is too low, the model cannot raise it. If Tier C becomes too large, clustering cannot rebalance it. If the business wants stronger SFI, clustering alone cannot optimize that objective Some business-critical metrics are only evaluated after clustering, not optimized within clustering: Tier size distributions, average PI per tier, and revenue share are structurally important but not part of the unsupervised objective. Hence, Unsupervised clustering provides a statistically coherent view of the dataâs natural structure, but it cannot guarantee business-preferred tier outcomes. The models cannot enforce hard constraints (e.g., desired A/B/C distribution, monotonic PI means, revenue share targets), nor can they adjust tiers when PI is too low or clusters become imbalanced. Additionally, key tier-level KPIsâsuch as average PI per tier, tier size stability, and revenue distributionâare only evaluated after clustering rather than optimized during it, limiting their influence on the final tier design. To overcome these structural limitations, the next stage of the system must incorporate semi-supervised guidance and policy-based optimization, where business KPIs directly shape tier boundaries and ranking. Future iterations will expand the policy beyond PI and revenue to include behavioral and market signals and bring tier-level metrics into the objective function to better align the segmentation with real-world operational priorities. 2.2 Semi-supervised KPI-Driven Learning Composite Score â KPI-Driven Objective for Tiering To guide our semi-supervised and hybrid methods, we define a Composite Score that unifies Microsoftâs key business KPIs into a single optimization target. It ensures that all modeling layersâPure KPI-Based Tiering and Hybrid KPI-Aware Segmentationâoptimize toward the same business priorities. Unsupervised clustering cannot optimize business outcomes. A composite objective is needed to consistently evaluate and improve tiering performance across: Potential uplift (TPA) Stability of tier structure (SFI) Within-tier improvement (TCI_PI) Revenue scale (TCI_REV) To align tiering with business priorities, we summarize four key KPIsâTPA, SFI, TCI_PI, and TCI_REVâinto one normalized measure: Composite Score = 0.35ĂTPA + 0.35ĂSFI + 0.30Ă(TCI_PI + TCI_REV) This score provides a single benchmark for comparing methods and serves as the optimization target in our semi-supervised and hybrid approaches. How It Is Used Benchmarking: Compare all methods on a unified scale. Optimization: Serves as the objective in constrained local search (Method 3). Rule Learning: Guides the decision-tree logic extracted after optimization. Why It Matters The Composite Score centers the analysis around a single question: âWhich tiering structure creates the strongest balance of growth potential, stability, and revenue impact?â 2.3 Pure KPI-Based Tiering 2.3.1 Model Conclusions at a Glance Pure KPI-based tiering shows what the tiers would look like if Microsoft prioritized business KPIs above all else. It achieves the largest KPI improvements, but causes major distribution shifts and violates movement rules, making it operationally unrealistic. Final takeaway: Pure KPI tiering is a valuable benchmark for understanding KPI potential, but cannot be operationalized. 2.3.2 High-Level Algorithm Summary Table 3. Methods of KPI-Based Tiering Method Algorithm Summary Strengths Weaknesses Business Use New_Tier_Direct (PI ranking only) Rank accounts by PI/KPI score and assign tiers directly Highest KPI gains; preserves overall tier distribution Moves ~20â40% companies; violates ±1 rule; disrupts continuity KPI upper-bound benchmark Tier_PI_Constrained (PI ranking + ±1 rule) Same as above but restrict movement to adjacent tiers KPI lift + respects movement constraint Still moves ~20â40%; breaks tier distribution (Tier C inflation) Diagnostic only 2.3.3 Modeling Results Table 4. Modeling Results for KPI-Based Tiering KPI FY26 Baseline New_Tier_Direct Tier_PI_Constrained Composite Score 0.5804 0.8105 0.763 TPA 0.2590 0.8300 0.721 TCI_PI 0.2220 0.5360 0.492 TCI_REV 0.4690 0.3970 0.452 SFI 0.8070 0.6860 0.650 New_Tier_Direct Composite Score: 0.5804 â 0.8105 TPA increases sharply (0.259 â 0.830) Violates ±1 rule; major reassignments (~20%â40%) Tier_PI_Constrained Respects ±1 movement KPI still improves (Composite 0.763) But tier distribution collapses (Tier C over-expands) Still ~20â40% movement â not feasible Hence: No PI-only method balances KPI lift with operational feasibility. 2.3.4 Limitations & Next Steps Pure KPI tiering cannot simultaneously: preserve tier distribution, respect ±1 movement rule, and deliver consistent KPI improvements. This creates the need for a hybrid model that combines clustering structure with KPI-aligned tier ordering. 2.4 Hybrid KPI-Aware Segmentation (Our Contribution) 2.4.1 Model Conclusions at a Glance Our hybrid method blends clustering geometry with KPI-driven optimization, achieving a practical balance between: statistical structure, business constraints, and KPI improvement. Final Recommendation: This is the segmentation framework we recommend Microsoft to adopt. â It produces the most deployable segmentation by balancing KPI lift with stability and interpretability. â Delivers meaningful KPI improvement while changing only ~5% of accounts, compared to Model Bâs 20â40%. 2.4.2 High-Level Algorithm Summary Table 5. Algorithm Comparison Component Purpose Strengths Notes Constrained Local Search Optimize composite KPI score starting from FY26 tiers KPI uplift with strict constraints Only small movements allowed (~5%) Tier Movement Constraint (+1/â1) Ensure realistic transitions Guarantees business rules; keeps structure stable Limits improvement ceiling Decision Tree Learn interpretable rules from optimized tiers Deployable, explainable, reusable Accuracy ~80%; tunable with weighting Closed Loop Optimization Improve both rules and allocation iteratively Stable + interpretable Future extension 2.4.3 Modeling Results Table 6. Modeling Results for Hybrid Segmentation KPI FY26 Baseline New_Tier_Direct Tier_PI_Constrained ImprovedTier Composite Score 0.5804 0.8105 0.763 0.6512 TPA 0.2590 0.8300 0.721 0.2990 TCI_PI 0.2220 0.5360 0.492 0.3450 TCI_REV 0.4690 0.3970 0.452 0.5250 SFI 0.8070 0.6860 0.650 0.8160 Interpretation of Hybrid Model (Improved Tier) Composite Score: 0.5804 â 0.6512 TPA improvement (0.259 â 0.299) TCI_PI and TCI_REV both rise SFI improves compared to constrained PI method Only ~5% of companies move tiers, versus Method 2âs 20â40% This makes Method 3 the only method that simultaneously satisfies: KPI improvement original tier distribution ±1 movement rule low operational disruption interpretability (via decision tree) 2.4.4 Conclusion Model C offers a pragmatic middle ground: KPI lift close to pure PI tiering, operational impact close to clustering, and full interpretability. For Microsoft, this hybrid framework is the most realistic and sustainable segmentation approach 3. Dynamic Tier Progression 3.1 Model Conclusions at a Glance Our benchmarking shows that CatBoost and XGBoost consistently deliver the strongest overall performance, achieving the highest macro-F1 (~0.76) across all tested methods. However, despite these gains, the underlying business pattern remains dominant: tier changes are extremely rare (â5.4%), and Microsoftâs one-step movement rule severely limits model learnability. Dynamic tiering is far more valuable as a diagnostic signal generator than a strict forecasting engine. While models cannot reliably predict future tier transitions, they can surface atypical account patterns, signals of risk, and emerging opportunities that support earlier sales intervention and more proactive account planning. 3.2 Models To predict future model upgrades and downgrades, we tested the following models: Table 7. Models Used for Dynamic Prediction Model Strengths Weaknesses When to Use MLR Simple; interpretable; fast baseline Weak on imbalanced data When transparency and explainability are needed Neural Network Captures nonlinear patterns; stronger recall than MLR Requires tuning; sensitive to imbalance data When exploring richer behavioral signals CatBoost (baseline, weighted, oversampled) Strongest overall balance; robust with categorical data; best macro-F1 Still limited by rarity of tier changes; weighted/oversampled versions risk overfitting Default diagnostic model for surfacing atypical account patterns XGBoost (baseline, weighted) High performance; scalable; production-ready Limited by structural imbalance; weighted versions increase false positives When deploying a stable scoring layer to sales teams Performance was then measured using accuracy, but more importantly, macro recall, precision, and F1, since upgrades and downgrades are much rarer and require balanced evaluation. 3.3 Model Results Across all models, overall accuracy appears high (0.95â0.97), but this metric is dominated by the fact that Tier transitions are extremely rare â only 808 of 15,000 cases (5.4%) moved tiers, while 95% stayed unchanged. According to macro metrics such as recall, precision, and F1, every model struggles to reliably detect upgrades and downgrades. CatBoost and XGBoost deliver the strongest balanced results, achieving the highest macro F1 scores (~0.76). However, even these advanced methods only capture half or fewer of the true upgrade and downgrade events. This reinforces that the challenge is not algorithmic performance, but the underlying business pattern: tier movements are infrequent, policy-driven, and weakly connected to observable account features. Table 8. Results for Dynamic Prediction Model Accuracy Macro Recall Precision F1 Score MLR 0.95 0.36 0.70 0.37 Neural Network 0.95 0.58 0.71 0.63 CatBoost 0.97 0.94 0.67 0.76 CatBoost (Weighted) 0.82 0.49 0.82 0.54 CatBoost (Oversampling) 0.69 0.42 0.75 0.42 XGBoost 0.97 0.93 0.67 0.76 XGBoost (Weighted) 0.97 0.85 0.70 0.76 3.4 Dynamic Tiering Implications Based on the results, our dynamic tiering will have the following implications to Microsoft: Tier changes are not reliably forecastable under current rules. Year-over-year stability is so dominant that even strong ML models cannot surface consistent upgrade or downgrade signals. This suggests that transitions are driven more by sales judgment and tier policy than by measurable account behavior. The dynamic model is still valuable: just not as a predictor of future tiers. Rather than serving as a forecasting engine, this pipeline should be viewed as a diagnostic tool that helps identify accounts with unusual patterns, emerging risks, or outlier behavior worth reviewing. Dynamic progression complements, rather than replaces, the core segmentation. It provides an additional layer of insight alongside clustering and KPI-optimized segmentation, helping Microsoft maintain both structural clarity (static segmentation) and forward-looking awareness (dynamic progression). 4. Optimization in Practice To understand how segmentation could support downstream coverage planning, we developed a small optimization proof-of-concept using Microsoftâs sellerâtier capacity guidelines (e.g., max accounts per role Ă tier, geo-entity restrictions, in-person vs remote coverage rules). 4.1 What We Explored Using our final hybrid segmentation (Method 3), we tested a simplified workflow: Formulate a coverage optimization problem â Assign sellers to accounts under constraints such as: role Ă tier capacity limits, single-geo assignment, ±1 tier movement rules, domain restrictions for Tier C/D. â This naturally forms a mixed-integer optimization problem (MIP). Prototype with standard optimization tools â Linear and integer programming formulations using Gurobi, OR-Tools, and Pyomo. â Heuristic solvers (e.g., local search, greedy reallocation, hill climbing) as faster alternatives. Simulate coverage scenarios â Estimate changes in workload balance and whitespace prioritization under different sellerâtier mixes. â Validate feasibility of the optimization with respect to Microsoftâs operational rules. 4.2 What We Learned Due to limited operational metrics (detailed whitespace values, upgrade probabilities, territory boundaries) and time constraints, we did not build a fully deployable engine. However, the PoC confirmed that: The segmentation integrates cleanly into a prescriptive segmentation â optimization â coverage pipeline. A full solver could feasibly allocate sellers under realistic business constraints. Gurobi-style MIP formulations and simulation-based heuristics are both valid paths for future development. In short: the optimization layer is technically viable and aligns naturally with our segmentation design, but its full implementation exceeds the scope of this capstone. 5. AI & LLM Integration To make segmentation accessible to a broad set of stakeholders like sales leaders, strategists, and business analysts, we built a conversational tiering assistant powered by LLM-based interpretation of strategic priorities. The assistant allows users to describe their intended segmentation direction in natural language, which the system translates into numerical weights and a refreshed set of tier assignments. 5.1 LLM Workflow Architecture The following flowchart demonstrates how the LLM work: Users communicate their goals using intuitive, high-level language (e.g. âprioritize runway growthâ, âreward high-potential emerging accountsâ). Front end collects the userâs tiering preference through a chat interface. The frontend sends this prompt to our cloud FastAPI service on Render. The LLM interprets the prompt and infers the relative strategic weights and which clustering method to use (KPI-based or Hybrid Approach). The server applies these weights in the tiering code to generate updated tiers based on the selected approach. The server returns a refreshed CSV with new tier assignments which can be exported through the chat interface. 5.2 Why LLMs Matter LLMs enhanced the project in three ways: Interpretation Layer: Helps business users articulate strategy in plain English and convert it to quantifiable modeling inputs. Explainability Layer: Surfaces cluster drivers, feature differences, and trade-offs across segments in natural language. Acceleration Layer: Enables real-time exploration of âwhat-ifâ tiering scenarios without engineering support. This integration transforms segmentation from a static analytical artifact into a dynamic, interactive decision-support tool, aligned with how Microsoft teams actually work. 5.3 Backend Architecture and LLM Integration Pipeline The conversational tiering system is supported by a cloud-based backend designed to translate natural-language instructions into structured model parameters. The service is deployed on Render and implemented with FastAPI, providing a lightweight, high-performance gateway for managing requests, validating inputs, and coordinating LLM interactions. FastAPI as the Orchestration Layer - User instructions are submitted through the chat interface and delivered to a FastAPI endpoint as JSON. FastAPI validates this payload using Pydantic, ensuring the request is well-formed before any processing occurs. The framework manages routing, serialization, and error handling, isolating request management from the downstream LLM and computation layers. LLM Invocation Through the OpenAI API - Once a validated prompt is received, the backend invokes the OpenAI API using a structured system prompt engineered to enforce strict JSON output. The LLM returns four normalized weights reflecting the userâs strategic intent, along with metadata used to determine whether the user explicitly prefers a KPI-based method or the default Hybrid approach. If no method is specified, the system automatically defaults to Hybrid. Low-temperature decoding is used to minimize stochastic variation and ensure repeatability across identical user prompts. All OpenAI keys are securely stored as Render environment variables. Schema Enforcement and Robust Parsing -To maintain reliability, the backend enforces strict schema validation on LLM responses. The service checks both JSON structure and numeric constraints, ensuring values fall within valid ranges and sum to one. If parsing fails or constraints are violated, the backend automatically reissues a constrained correction prompt. This design prevents malformed outputs and guards against conversational drift. Render Hosting and Operational Considerations - The backend runs in a stateless containerized environment on Render, which handles service orchestration, HTTPS termination, and environment-variable management. Data required for computation is loaded into memory at startup to reduce latency, and the lightweight tiering pipeline ensures that the system remains responsive even under shared compute resources. Response Assembly and Delivery - After LLM interpretation and schema validation, the backend applies the resulting weights and streams the recalculated results back to the user as a downloadable CSV. FastAPIâs Streaming Response enables direct transmission from memory without temporary filesystem storage, supporting rapid interactive workflows. Together, these components form a tightly integrated, cloud-native pipeline: FastAPI handles orchestration, the LLM provides semantic interpretation, Render ensures secure and reliable hosting, and the default Hybrid method ensures consistent behavior unless the user explicitly requests the KPI approach. DEMO: Microsoft x UCLA Anderson MSBA - AI-Driven KPI Segmentation Project (LLM demo) 6. Conclusion Our work delivers a strategic, KPI-driven tiering architecture that resolves the limitations of Microsoftâs legacy system and sets a scalable foundation for future segmentation and coverage strategy. Across all analyses, five differentiators stand out: Clear separation of natural structure vs. business intent: We diagnose where the legacy system diverges from true customer potential and revenueâestablishing the analytical ground truth Microsoft never previously had. A precise map of strategic trade-offs: By comparing unsupervised, KPI-only, and hybrid approaches, we reveal the operational and business implications behind every tiering philosophyâmaking the framework decision-ready for leadership. A business-aligned segmentation ready for deployment: Our hybrid KPI-aware model uniquely satisfies KPI lift, distribution stability, ±1 movement rules, and interpretabilityâproviding a reliable go-forward tiering backbone. A future-proof architecture that extends beyond static tiers: Dynamic progression modeling and optimization PoC show how tiering can evolve into forecasting, prioritization, whitespace planning, and resource optimization. A blueprint for Microsoftâs next-generation tiering ecosystem: The system integrates data science, business KPIs, optimization, and LLM interpretability into one cohesive workflowâpositioning Microsoft for an AI-enabled tiering strategy. In essence, this work transforms customer tiering into a strategic, explainable, and scalable systemâready to support Microsoftâs growth ambitions and future AI initiatives.624Views2likes0CommentsAzure Machine Learning compute cluster - avoid using docker?
Hello, I would like to use an Azure Machine Learning Compute Cluster as a compute target but do not want it to containerize my project. Is there a way to deactivate this "feature" ? The main reasons behind this request is that : I already set up a docker-compose file that is used to specify 3 containers for Apache Airflow and want to avoid a Docker-in-Docker situation. Especially that I already tried to do so but failed so far. I prefer not to use a Compute Instance as it is tied to an Azure account which is not ideal for automation purposes. Thanks in advance.930Views0likes1CommentThe Future of AI: The Model is Key, but the App is the Doorway
This post explores the real-world impact of GPT-5 beyond benchmark scores, focusing on how application design shapes user experience. It highlights early developer feedback, common integration challenges, and practical strategies for adapting apps to leverage the advanced capabilities of GPT-5 in Foundry Models. From prompt refinement to fine-tuning to new API controls, learn how to make the most of this powerful model.717Views3likes0Comments