machine learning
383 TopicsUnlocking Efficient and Secure AI for Android with Foundry Local
The ability to run advanced AI models directly on smartphones is transforming the mobile landscape. Foundry Local for Android simplifies the integration of generative AI models, allowing teams to deliver sophisticated, secure, and low-latency AI experiences natively on mobile devices. This post highlights Foundry Local for Android as a compelling solution for Android developers, helping them efficiently build and deploy powerful on-device AI capabilities within their applications. The Challenges of Deploying AI on Mobile Devices On-device AI offers the promise of offline capabilities, enhanced privacy, and low-latency processing. However, implementing these capabilities on mobile devices introduces several technical obstacles: Limited computing and storage: Mobile devices operate with constrained processing power and storage compared to traditional PCs. Even the most compact language models can occupy significant space and demand substantial computational resources. Efficient solutions for model and runtime optimization are critical for successful deployment. Concerns about the app size: Integrating large AI models and libraries can dramatically increase application size, reducing install rates and degrading other app features. It remains a challenge to provide advanced AI capabilities while keeping the application compact and efficient. Complexity of development and integration: Most mobile development teams are not specialized in machine learning. The process of adapting, optimizing, and deploying models for mobile inference can be resource intensive. Streamlined APIs and pre-optimized models simplify integration and accelerate time to market. Introducing Foundry Local for Android Foundry Local is designed as a comprehensive on-device AI solution, featuring pre-optimized models, a cross-platform inference engine, and intuitive APIs for seamless integration. Initially announced at //Build 2025 with support for Windows and MacOS desktops, Foundry Local now extends its capabilities to Android in private preview. You can sign up for the private preview https://aka.ms/foundrylocal-androidprp for early evaluation and feedback. To meet the demands of production deployments, Foundry Local for Android is architected as a dedicated Android app paired with an SDK. The app manages model distribution, hosts the AI runtime, and operates as a specialized background service. Client applications interface with this service using a lightweight Foundry Local Android SDK, ensuring minimal overhead and streamlined connectivity. One Model, Multiple Apps: Foundry Local centralizes model management, ensuring that if multiple applications utilize the same model in Foundry Local, it is downloaded and stored only once. This approach optimizes storage and streamlines resource usage. Minimal App Footprint: Client applications are freed from embedding bulky machine learning libraries and models. This avoids ballooning app size and memory usage. Run Separately from Client Apps: The Foundry Local operates independently of client applications. Developers benefit from continuous enhancements without the need for frequent app releases. Customer Story: PhonePe PhonePe, one of India's largest consumer payments platforms that enables access to payments and financial services to hundreds of millions of people across the country. With Foundry Local, PhonePe is enabling AI that allows their users to gain deeper insights into their transactions and payments behavior directly on their mobile device. And because inferencing happens locally, all data stays private and secure. This collaboration addresses PhonePe's key priority of delivering an AI experience that upholds privacy. Foundry Local enables PhonePe to differentiate their app experience in a competitive market using AI while ensuring compliance with privacy commitments. Explore their journey here: PhonePe Product Showcase at Microsoft Ignite 2025 Call to Action Foundry Local equips Android apps with on-device AI, supporting the development of smarter applications for the future. Developers are able to build efficient and secure AI capabilities into their apps, even without extensive expertise in artificial intelligence. See more about Foundry Local in action in this episode of Microsoft Mechanics: https://aka.ms/FL_IGNITE_MSMechanics We look forward to seeing you light up AI capabilities in your Android app with Foundry Local. Don’t miss our private preview: https://aka.ms/foundrylocal-androidprp. We appreciate your feedback, as it will help us make our product better. Thanks to the contribution from NimbleEdge which delivers real-time, on-device personalization for millions of mobile devices. NimbleEdge's mobile technology expertise helps Foundry Local deliver a better experience for Android users.169Views0likes0CommentsGet to know the core Foundry solutions
Foundry includes specialized services for vision, language, documents, and search, plus Microsoft Foundry for orchestration and governance. Here’s what each does and why it matters: Azure Vision With Azure Vision, you can detect common objects in images, generate captions, descriptions, and tags based on image contents, and read text in images. Example: Automate visual inspections or extract text from scanned documents. Azure Language Azure Language helps organizations understand and work with text at scale. It can identify key information, gauge sentiment, and create summaries from large volumes of content. It also supports building conversational experiences and question-answering tools, making it easier to deliver fast, accurate responses to customers and employees. Example: Understand customer feedback or translate text into multiple languages. Azure Document IntelligenceWith Azure Document Intelligence, you can use pre-built or custom models to extract fields from complex documents such as invoices, receipts, and forms. Example: Automate invoice processing or contract review. Azure SearchAzure Search helps you find the right information quickly by turning your content into a searchable index. It uses AI to understand and organize data, making it easier to retrieve relevant insights. This capability is often used to connect enterprise data with generative AI, ensuring responses are accurate and grounded in trusted information. Example: Help employees retrieve policies or product details without digging through files. Microsoft FoundryActs as the orchestration and governance layer for generative AI and AI agents. It provides tools for model selection, safety, observability, and lifecycle management. Example: Coordinate workflows that combine multiple AI capabilities with compliance and monitoring. Business leaders often ask: Which Foundry tool should I use? The answer depends on your workflow. For example: Are you trying to automate document-heavy processes like invoice handling or contract review? Do you need to improve customer engagement with multilingual support or sentiment analysis? Or are you looking to orchestrate generative AI across multiple processes for marketing or operations? Connecting these needs to the right Foundry solution ensures you invest in technology that delivers measurable results.Architecting the Next-Generation Customer Tiering System
Authors Sailing Ni*, Joy Yu*, Peng Yang*, Richard Sie*, Yifei Wang* *These authors contributed equally. Affiliation Master of Science in Business Analytics (MSBA), UCLA Anderson School of Management, Los Angeles, California 90095, United States (Conducted December 2025) Acknowledgment This research was conducted as part of a Microsoft-sponsored Capstone Project, led by Juhi Singh and Bonnie Ao from the Microsoft MCAPS AI Transformation Office. Microsoft’s global B2B software business classifies customers into four tiers to guide coverage, investment, and sales strategy. However, the legacy tiering framework mixes historical rules with manual heuristics, causing several issues: Tiers do not consistently reflect customer potential or revenue importance. Statistical coherence and business KPIs (TPA, TCI, SFI) are not optimized or enforced. Tier distributions are imbalanced due to legacy ±1 movement and capacity rules. Sales coverage planning depends on a tier structure not grounded in data. To address these limitations, we, UCLA Anderson MSBA class of Dec'25, designed a next-generation KPI-driven tiering architecture. Our objective was to move from a heuristic, static system toward a scalable, transparent, and business-aligned framework. Our redesigned tiering system follows five complementary analytical layers, each addressing a specific gap in the legacy process: Natural Segmentation (Unsupervised Baseline): Identify the intrinsic structure of the customer base using clustering to understand how customers naturally group Pure KPI-Based Tiering (Upper-Bound Benchmark): Show what tiers would look like if aligned only to business KPIs, quantifying the maximum potential lift and exposing trade-offs. Hybrid KPI-Aware Segmentation (Our Contribution): Integrate clustering geometry with KPI optimization and business constraints to produce a realistic, interpretable, and deployable tiering system. Dynamic Tiering (Longitudinal Diagnostics): Analyze historical patterns to understand how companies evolve over time, separating structural tier drift from noise. Optimization & Resource Allocation (Proof of Concept): Demonstrate how the new tiers could feed into downstream coverage and whitespace prioritization through MIP- and heuristic-based approaches. Together, these components answer a core strategic question: “How should Microsoft tier its global customer base so that investment, coverage, and growth strategy directly reflect business value?” Our final architecture transforms tiering from a static classification exercise into a KPI-driven, interpretable, and operationally grounded decision framework suitable for Microsoft’s future AI and data strategy. Solution Architecture Diagram 1. Success Metrics Definition Before designing any segmentation system, the first step is to establish success metrics that define what “good” looks like. Without these metrics, models can easily produce clusters that are statistically neat but misaligned with business needs. A clear KPI framework ensures that every model—regardless of method or complexity—is evaluated consistently on both analytical quality and real business impact. We define success across two complementary dimensions: 1.1 Alignment & Segmentation Quality: These metrics evaluate whether the segmentation meaningfully separates customers based on business potential. 1.1.1 Tier Potential Alignment (TPA) Measures how well assigned tiers follow the rank order of PI_acct, our composite indicator of future growth potential. Implemented as a Spearman rank correlation, TPA tests whether higher-potential accounts systematically land in higher tiers. Step 1 - Formula for PI_acct (Potential Index per Account) Step 2 - Formula for TPA (Tier Potential Alignment) 𝜌_𝑠 = Spearman rank correlation Tier Rank = ordinal tier number (Tier A = highest → Tier D = lowest) Interpretation: TPA=1 ⇒ Perfect alignment (higher potential → higher tier) TPA=0 ⇒ No statistical relationship TPA<0 ⇒ Misalignment (tiers contradict potential) 1.1.2 Tier Compactness Index (TCI) Measures how homogeneous each tier is. Low within-tier variance on PI_acct or Revenue indicates that customers grouped together truly share similar characteristics, improving interpretability and resource planning. (1) Potential-based Compactness - TCI_PI (2) Revenue-based Compactness - TCI_REV TCI=1 ⇒ tiers are tight and well-separated TCI=0 ⇒ tiers are random or overlapping TCI<0 ⇒ within-tier variance exceeds total variance (poor grouping) 1.2 Business Impact These metrics test whether the segmentation supports strategic goals, not just statistical structure. 1.2.1 Strategic Focus Index (SFI) Quantifies how much revenue comes from the company’s most strategically important tiers. High SFI means segmentation helps focus investments—sales coverage, specialist time, programs—on the customers that matter most. Under the Tier Policy framework, the definition of “strategic” automatically adapts to the number of tiers K - for example, taking the top L tiers (e.g., top 2) or top x % of tiers ranked by mean potential or revenue. High SFI: strong emphasis on top strategic segments (potentially efficient but watch concentration risk). Moderate SFI: balanced focus across tiers. Low SFI: diffuse portfolio, limited emphasis on priority segment 2. Static Segmentation 2.1 Pure Unsupervised Clustering 2.1.1 Model Conclusions at a Glance Across all unsupervised models evaluated—Ward, Weighted Ward, K-Medoids, K-Means / K-Means++, and HDBSCAN — only the Ward model (K=4, Policy v2) provides a segmentation that is simultaneously: statistically coherent, business-aligned (high SFI), geometrically stable (clean Silhouette), and operationally interpretable. All alternative models either distort cluster geometry, collapse SFI, or produce unstable/illogical tier structures. Final Recommendation: Use Ward (K=4, Policy v2) as the natural segmentation baseline. 2.1.2 High-Level Algorithm Comparison Table 1. Algorithm Comparison Model Algorithm Summary Strengths Weaknesses Business Use Ward Variance-minimizing hierarchical merges Best balance of TPA/TCI/SFI; stable geometry Sensitive to correlated features Primary model for segmentation Weighted Ward Distance reweighted by PI + revenue Higher TPA Silhouette collapse; unstable Not recommended K-Medoids Medoid-based dissimilarity minimization Robust to outliers Cluster compression; weak SFI Diagnostic only K-Means / K-Means++ Squared-distance minimization Fast baseline SFI collapse; over-tight clusters Numeric benchmark only HDBSCAN Density-based clustering with noise Good for anomaly detection TPA collapse; noisy tiers; broken PI ordering Not suitable for tiering 2.1.3 Modeling Results Table 2. Unsupervised Clustering Model Results Metric FY26 Baseline (Legacy A+B) Ward K=4 (Policy v2) Weighted Ward2-B (α=4, β=0.8, s=0.7, K=5) Unweighted Ward (Policy v2, K=4) Unweighted Ward (Policy v2, K=3) K-Medoids B4 Behavior-only (K=3) K-means K=4 (Policy v2) K-means++ K=4 (Policy v2) HDBSCAN (baseline settings) TPA 0.260 0.260 0.860 0.260 0.300 0.520 0.310 0.310 0.040 TCI_PI 0.222 0.461 0.772 0.461 0.405 0.173 0.476 0.476 0.004 TCI_REV 0.469 0.801 0.640 0.801 0.672 0.002 0.831 0.831 0.062 SFI 0.807 0.868 0.817 0.868 0.960 0.656 0.332 0.332 0.719 Silhouette nan 0.560 0.145 0.560 0.604 0.466 0.523 0.523 0.186 Ward (K=4, Policy v2) remains the strongest performer: SFI ≈ 0.87, Silhouette ≈ 0.56, stable geometric structure. Weighted Ward raises TPA/PI slightly but Silhouette collapses (~0.15) → structural instability → not viable. K-Medoids consistently compresses clusters; TPA/TCI gain is offset by TCI_REV collapse and low SFI. K-Means / K-Means++ tighten numeric clusters but SFI drops to ~0.33 → tiers lose strategic meaning. HDBSCAN generates large noisy segments; TPA = 0.044, TCI_PI = 0.004, Silhouette = 0.186, and Tier A/B contain negative PI → fundamentally unsuitable. Conclusion: Only Ward (K=4) produces segmentation with both statistical integrity and business relevance. 2.1.4 Implications, Limitations, Next Steps Implications Our current unsupervised segmentation delivers statistically coherent and operationally usable tiers, but several structural findings emerged: Unsupervised methods reveal the data’s natural shape, not business priorities: Ward/K-means/HDBSCAN can discover separations in the feature space but cannot move clusters toward preferred PI or revenue patterns. Cluster outcomes cannot guarantee business-desired constraints. For example: If Tier A’s PI mean is too low, the model cannot raise it. If Tier C becomes too large, clustering cannot rebalance it. If the business wants stronger SFI, clustering alone cannot optimize that objective Some business-critical metrics are only evaluated after clustering, not optimized within clustering: Tier size distributions, average PI per tier, and revenue share are structurally important but not part of the unsupervised objective. Hence, Unsupervised clustering provides a statistically coherent view of the data’s natural structure, but it cannot guarantee business-preferred tier outcomes. The models cannot enforce hard constraints (e.g., desired A/B/C distribution, monotonic PI means, revenue share targets), nor can they adjust tiers when PI is too low or clusters become imbalanced. Additionally, key tier-level KPIs—such as average PI per tier, tier size stability, and revenue distribution—are only evaluated after clustering rather than optimized during it, limiting their influence on the final tier design. To overcome these structural limitations, the next stage of the system must incorporate semi-supervised guidance and policy-based optimization, where business KPIs directly shape tier boundaries and ranking. Future iterations will expand the policy beyond PI and revenue to include behavioral and market signals and bring tier-level metrics into the objective function to better align the segmentation with real-world operational priorities. 2.2 Semi-supervised KPI-Driven Learning Composite Score — KPI-Driven Objective for Tiering To guide our semi-supervised and hybrid methods, we define a Composite Score that unifies Microsoft’s key business KPIs into a single optimization target. It ensures that all modeling layers—Pure KPI-Based Tiering and Hybrid KPI-Aware Segmentation—optimize toward the same business priorities. Unsupervised clustering cannot optimize business outcomes. A composite objective is needed to consistently evaluate and improve tiering performance across: Potential uplift (TPA) Stability of tier structure (SFI) Within-tier improvement (TCI_PI) Revenue scale (TCI_REV) To align tiering with business priorities, we summarize four key KPIs—TPA, SFI, TCI_PI, and TCI_REV—into one normalized measure: Composite Score = 0.35×TPA + 0.35×SFI + 0.30×(TCI_PI + TCI_REV) This score provides a single benchmark for comparing methods and serves as the optimization target in our semi-supervised and hybrid approaches. How It Is Used Benchmarking: Compare all methods on a unified scale. Optimization: Serves as the objective in constrained local search (Method 3). Rule Learning: Guides the decision-tree logic extracted after optimization. Why It Matters The Composite Score centers the analysis around a single question: “Which tiering structure creates the strongest balance of growth potential, stability, and revenue impact?” 2.3 Pure KPI-Based Tiering 2.3.1 Model Conclusions at a Glance Pure KPI-based tiering shows what the tiers would look like if Microsoft prioritized business KPIs above all else. It achieves the largest KPI improvements, but causes major distribution shifts and violates movement rules, making it operationally unrealistic. Final takeaway: Pure KPI tiering is a valuable benchmark for understanding KPI potential, but cannot be operationalized. 2.3.2 High-Level Algorithm Summary Table 3. Methods of KPI-Based Tiering Method Algorithm Summary Strengths Weaknesses Business Use New_Tier_Direct (PI ranking only) Rank accounts by PI/KPI score and assign tiers directly Highest KPI gains; preserves overall tier distribution Moves ~20–40% companies; violates ±1 rule; disrupts continuity KPI upper-bound benchmark Tier_PI_Constrained (PI ranking + ±1 rule) Same as above but restrict movement to adjacent tiers KPI lift + respects movement constraint Still moves ~20–40%; breaks tier distribution (Tier C inflation) Diagnostic only 2.3.3 Modeling Results Table 4. Modeling Results for KPI-Based Tiering KPI FY26 Baseline New_Tier_Direct Tier_PI_Constrained Composite Score 0.5804 0.8105 0.763 TPA 0.2590 0.8300 0.721 TCI_PI 0.2220 0.5360 0.492 TCI_REV 0.4690 0.3970 0.452 SFI 0.8070 0.6860 0.650 New_Tier_Direct Composite Score: 0.5804 → 0.8105 TPA increases sharply (0.259 → 0.830) Violates ±1 rule; major reassignments (~20%–40%) Tier_PI_Constrained Respects ±1 movement KPI still improves (Composite 0.763) But tier distribution collapses (Tier C over-expands) Still ~20–40% movement → not feasible Hence: No PI-only method balances KPI lift with operational feasibility. 2.3.4 Limitations & Next Steps Pure KPI tiering cannot simultaneously: preserve tier distribution, respect ±1 movement rule, and deliver consistent KPI improvements. This creates the need for a hybrid model that combines clustering structure with KPI-aligned tier ordering. 2.4 Hybrid KPI-Aware Segmentation (Our Contribution) 2.4.1 Model Conclusions at a Glance Our hybrid method blends clustering geometry with KPI-driven optimization, achieving a practical balance between: statistical structure, business constraints, and KPI improvement. Final Recommendation: This is the segmentation framework we recommend Microsoft to adopt. ➔ It produces the most deployable segmentation by balancing KPI lift with stability and interpretability. ➔ Delivers meaningful KPI improvement while changing only ~5% of accounts, compared to Model B’s 20–40%. 2.4.2 High-Level Algorithm Summary Table 5. Algorithm Comparison Component Purpose Strengths Notes Constrained Local Search Optimize composite KPI score starting from FY26 tiers KPI uplift with strict constraints Only small movements allowed (~5%) Tier Movement Constraint (+1/–1) Ensure realistic transitions Guarantees business rules; keeps structure stable Limits improvement ceiling Decision Tree Learn interpretable rules from optimized tiers Deployable, explainable, reusable Accuracy ~80%; tunable with weighting Closed Loop Optimization Improve both rules and allocation iteratively Stable + interpretable Future extension 2.4.3 Modeling Results Table 6. Modeling Results for Hybrid Segmentation KPI FY26 Baseline New_Tier_Direct Tier_PI_Constrained ImprovedTier Composite Score 0.5804 0.8105 0.763 0.6512 TPA 0.2590 0.8300 0.721 0.2990 TCI_PI 0.2220 0.5360 0.492 0.3450 TCI_REV 0.4690 0.3970 0.452 0.5250 SFI 0.8070 0.6860 0.650 0.8160 Interpretation of Hybrid Model (Improved Tier) Composite Score: 0.5804 → 0.6512 TPA improvement (0.259 → 0.299) TCI_PI and TCI_REV both rise SFI improves compared to constrained PI method Only ~5% of companies move tiers, versus Method 2’s 20–40% This makes Method 3 the only method that simultaneously satisfies: KPI improvement original tier distribution ±1 movement rule low operational disruption interpretability (via decision tree) 2.4.4 Conclusion Model C offers a pragmatic middle ground: KPI lift close to pure PI tiering, operational impact close to clustering, and full interpretability. For Microsoft, this hybrid framework is the most realistic and sustainable segmentation approach 3. Dynamic Tier Progression 3.1 Model Conclusions at a Glance Our benchmarking shows that CatBoost and XGBoost consistently deliver the strongest overall performance, achieving the highest macro-F1 (~0.76) across all tested methods. However, despite these gains, the underlying business pattern remains dominant: tier changes are extremely rare (≈5.4%), and Microsoft’s one-step movement rule severely limits model learnability. Dynamic tiering is far more valuable as a diagnostic signal generator than a strict forecasting engine. While models cannot reliably predict future tier transitions, they can surface atypical account patterns, signals of risk, and emerging opportunities that support earlier sales intervention and more proactive account planning. 3.2 Models To predict future model upgrades and downgrades, we tested the following models: Table 7. Models Used for Dynamic Prediction Model Strengths Weaknesses When to Use MLR Simple; interpretable; fast baseline Weak on imbalanced data When transparency and explainability are needed Neural Network Captures nonlinear patterns; stronger recall than MLR Requires tuning; sensitive to imbalance data When exploring richer behavioral signals CatBoost (baseline, weighted, oversampled) Strongest overall balance; robust with categorical data; best macro-F1 Still limited by rarity of tier changes; weighted/oversampled versions risk overfitting Default diagnostic model for surfacing atypical account patterns XGBoost (baseline, weighted) High performance; scalable; production-ready Limited by structural imbalance; weighted versions increase false positives When deploying a stable scoring layer to sales teams Performance was then measured using accuracy, but more importantly, macro recall, precision, and F1, since upgrades and downgrades are much rarer and require balanced evaluation. 3.3 Model Results Across all models, overall accuracy appears high (0.95–0.97), but this metric is dominated by the fact that Tier transitions are extremely rare — only 808 of 15,000 cases (5.4%) moved tiers, while 95% stayed unchanged. According to macro metrics such as recall, precision, and F1, every model struggles to reliably detect upgrades and downgrades. CatBoost and XGBoost deliver the strongest balanced results, achieving the highest macro F1 scores (~0.76). However, even these advanced methods only capture half or fewer of the true upgrade and downgrade events. This reinforces that the challenge is not algorithmic performance, but the underlying business pattern: tier movements are infrequent, policy-driven, and weakly connected to observable account features. Table 8. Results for Dynamic Prediction Model Accuracy Macro Recall Precision F1 Score MLR 0.95 0.36 0.70 0.37 Neural Network 0.95 0.58 0.71 0.63 CatBoost 0.97 0.94 0.67 0.76 CatBoost (Weighted) 0.82 0.49 0.82 0.54 CatBoost (Oversampling) 0.69 0.42 0.75 0.42 XGBoost 0.97 0.93 0.67 0.76 XGBoost (Weighted) 0.97 0.85 0.70 0.76 3.4 Dynamic Tiering Implications Based on the results, our dynamic tiering will have the following implications to Microsoft: Tier changes are not reliably forecastable under current rules. Year-over-year stability is so dominant that even strong ML models cannot surface consistent upgrade or downgrade signals. This suggests that transitions are driven more by sales judgment and tier policy than by measurable account behavior. The dynamic model is still valuable: just not as a predictor of future tiers. Rather than serving as a forecasting engine, this pipeline should be viewed as a diagnostic tool that helps identify accounts with unusual patterns, emerging risks, or outlier behavior worth reviewing. Dynamic progression complements, rather than replaces, the core segmentation. It provides an additional layer of insight alongside clustering and KPI-optimized segmentation, helping Microsoft maintain both structural clarity (static segmentation) and forward-looking awareness (dynamic progression). 4. Optimization in Practice To understand how segmentation could support downstream coverage planning, we developed a small optimization proof-of-concept using Microsoft’s seller–tier capacity guidelines (e.g., max accounts per role × tier, geo-entity restrictions, in-person vs remote coverage rules). 4.1 What We Explored Using our final hybrid segmentation (Method 3), we tested a simplified workflow: Formulate a coverage optimization problem ○ Assign sellers to accounts under constraints such as: role × tier capacity limits, single-geo assignment, ±1 tier movement rules, domain restrictions for Tier C/D. ○ This naturally forms a mixed-integer optimization problem (MIP). Prototype with standard optimization tools ○ Linear and integer programming formulations using Gurobi, OR-Tools, and Pyomo. ○ Heuristic solvers (e.g., local search, greedy reallocation, hill climbing) as faster alternatives. Simulate coverage scenarios ○ Estimate changes in workload balance and whitespace prioritization under different seller–tier mixes. ○ Validate feasibility of the optimization with respect to Microsoft’s operational rules. 4.2 What We Learned Due to limited operational metrics (detailed whitespace values, upgrade probabilities, territory boundaries) and time constraints, we did not build a fully deployable engine. However, the PoC confirmed that: The segmentation integrates cleanly into a prescriptive segmentation → optimization → coverage pipeline. A full solver could feasibly allocate sellers under realistic business constraints. Gurobi-style MIP formulations and simulation-based heuristics are both valid paths for future development. In short: the optimization layer is technically viable and aligns naturally with our segmentation design, but its full implementation exceeds the scope of this capstone. 5. AI & LLM Integration To make segmentation accessible to a broad set of stakeholders like sales leaders, strategists, and business analysts, we built a conversational tiering assistant powered by LLM-based interpretation of strategic priorities. The assistant allows users to describe their intended segmentation direction in natural language, which the system translates into numerical weights and a refreshed set of tier assignments. 5.1 LLM Workflow Architecture The following flowchart demonstrates how the LLM work: Users communicate their goals using intuitive, high-level language (e.g. “prioritize runway growth”, “reward high-potential emerging accounts”). Front end collects the user’s tiering preference through a chat interface. The frontend sends this prompt to our cloud FastAPI service on Render. The LLM interprets the prompt and infers the relative strategic weights and which clustering method to use (KPI-based or Hybrid Approach). The server applies these weights in the tiering code to generate updated tiers based on the selected approach. The server returns a refreshed CSV with new tier assignments which can be exported through the chat interface. 5.2 Why LLMs Matter LLMs enhanced the project in three ways: Interpretation Layer: Helps business users articulate strategy in plain English and convert it to quantifiable modeling inputs. Explainability Layer: Surfaces cluster drivers, feature differences, and trade-offs across segments in natural language. Acceleration Layer: Enables real-time exploration of “what-if” tiering scenarios without engineering support. This integration transforms segmentation from a static analytical artifact into a dynamic, interactive decision-support tool, aligned with how Microsoft teams actually work. 5.3 Backend Architecture and LLM Integration Pipeline The conversational tiering system is supported by a cloud-based backend designed to translate natural-language instructions into structured model parameters. The service is deployed on Render and implemented with FastAPI, providing a lightweight, high-performance gateway for managing requests, validating inputs, and coordinating LLM interactions. FastAPI as the Orchestration Layer - User instructions are submitted through the chat interface and delivered to a FastAPI endpoint as JSON. FastAPI validates this payload using Pydantic, ensuring the request is well-formed before any processing occurs. The framework manages routing, serialization, and error handling, isolating request management from the downstream LLM and computation layers. LLM Invocation Through the OpenAI API - Once a validated prompt is received, the backend invokes the OpenAI API using a structured system prompt engineered to enforce strict JSON output. The LLM returns four normalized weights reflecting the user’s strategic intent, along with metadata used to determine whether the user explicitly prefers a KPI-based method or the default Hybrid approach. If no method is specified, the system automatically defaults to Hybrid. Low-temperature decoding is used to minimize stochastic variation and ensure repeatability across identical user prompts. All OpenAI keys are securely stored as Render environment variables. Schema Enforcement and Robust Parsing -To maintain reliability, the backend enforces strict schema validation on LLM responses. The service checks both JSON structure and numeric constraints, ensuring values fall within valid ranges and sum to one. If parsing fails or constraints are violated, the backend automatically reissues a constrained correction prompt. This design prevents malformed outputs and guards against conversational drift. Render Hosting and Operational Considerations - The backend runs in a stateless containerized environment on Render, which handles service orchestration, HTTPS termination, and environment-variable management. Data required for computation is loaded into memory at startup to reduce latency, and the lightweight tiering pipeline ensures that the system remains responsive even under shared compute resources. Response Assembly and Delivery - After LLM interpretation and schema validation, the backend applies the resulting weights and streams the recalculated results back to the user as a downloadable CSV. FastAPI’s Streaming Response enables direct transmission from memory without temporary filesystem storage, supporting rapid interactive workflows. Together, these components form a tightly integrated, cloud-native pipeline: FastAPI handles orchestration, the LLM provides semantic interpretation, Render ensures secure and reliable hosting, and the default Hybrid method ensures consistent behavior unless the user explicitly requests the KPI approach. DEMO: Microsoft x UCLA Anderson MSBA - AI-Driven KPI Segmentation Project (LLM demo) 6. Conclusion Our work delivers a strategic, KPI-driven tiering architecture that resolves the limitations of Microsoft’s legacy system and sets a scalable foundation for future segmentation and coverage strategy. Across all analyses, five differentiators stand out: Clear separation of natural structure vs. business intent: We diagnose where the legacy system diverges from true customer potential and revenue—establishing the analytical ground truth Microsoft never previously had. A precise map of strategic trade-offs: By comparing unsupervised, KPI-only, and hybrid approaches, we reveal the operational and business implications behind every tiering philosophy—making the framework decision-ready for leadership. A business-aligned segmentation ready for deployment: Our hybrid KPI-aware model uniquely satisfies KPI lift, distribution stability, ±1 movement rules, and interpretability—providing a reliable go-forward tiering backbone. A future-proof architecture that extends beyond static tiers: Dynamic progression modeling and optimization PoC show how tiering can evolve into forecasting, prioritization, whitespace planning, and resource optimization. A blueprint for Microsoft’s next-generation tiering ecosystem: The system integrates data science, business KPIs, optimization, and LLM interpretability into one cohesive workflow—positioning Microsoft for an AI-enabled tiering strategy. In essence, this work transforms customer tiering into a strategic, explainable, and scalable system—ready to support Microsoft’s growth ambitions and future AI initiatives.314Views2likes0CommentsAzure Machine Learning compute cluster - avoid using docker?
Hello, I would like to use an Azure Machine Learning Compute Cluster as a compute target but do not want it to containerize my project. Is there a way to deactivate this "feature" ? The main reasons behind this request is that : I already set up a docker-compose file that is used to specify 3 containers for Apache Airflow and want to avoid a Docker-in-Docker situation. Especially that I already tried to do so but failed so far. I prefer not to use a Compute Instance as it is tied to an Azure account which is not ideal for automation purposes. Thanks in advance.905Views0likes1CommentThe Future of AI: The Model is Key, but the App is the Doorway
This post explores the real-world impact of GPT-5 beyond benchmark scores, focusing on how application design shapes user experience. It highlights early developer feedback, common integration challenges, and practical strategies for adapting apps to leverage the advanced capabilities of GPT-5 in Foundry Models. From prompt refinement to fine-tuning to new API controls, learn how to make the most of this powerful model.578Views3likes0CommentsThe Future of AI: Building Weird, Warm, and Wildly Effective AI Agents
Discover how humor and heart can transform AI experiences. From the playful Emotional Support Goose to the productivity-driven Penultimate Penguin, this post explores why designing with personality matters—and how Azure AI Foundry empowers creators to build tools that are not just efficient, but engaging.1.7KViews1like0CommentsUnlocking Smarter Search How to Use Azure AI Search & Azure OpenAI Service Together
In the era of large language models and AI-powered experiences, simply running a keyword search isn’t enough. Users expect conversational, context-aware responses, grounded in real data. That’s where combining Azure’s search infrastructure with generative AI becomes a game-changer. By using Azure AI Search as the retrieval layer and Azure OpenAI Service as the generation layer, you can build applications that understand natural language, fetch relevant documents, and respond with rich, accurate, and contextual answers. In this blog post, we’ll walk through how to achieve that end-to-end, highlight best practices, and give you a blueprint to apply in your own environment. https://dellenny.com/unlocking-smarter-search-how-to-use-azure-ai-search-azure-openai-service-together/46Views0likes0CommentsThe Future of AI: An Intern’s Adventure Turning Hours of Video into Minutes of Meaning
This blog post, part of The Future of AI series by Microsoft’s AI Futures team, follows an intern’s journey in developing AutoHighlight—a tool that transforms long-form video into concise, narrative-driven highlight reels. By combining Azure AI Content Understanding with OpenAI reasoning models, AutoHighlight bridges the gap between machine-detected moments and human storytelling. The post explores the challenges of video summarization, the technical architecture of the solution, and the lessons learned along the way.501Views0likes0CommentsThe Future of AI: Structured Vibe Coding - An Improved Approach to AI Software Development
In this post from The Future of AI series, the author introduces structured vibe coding, a method for managing AI agents like a software team using specs, GitHub issues, and pull requests. By applying this approach with GitHub Copilot, they automated a repetitive task—answering Microsoft Excel-based questionnaires—while demonstrating how AI can enhance developer workflows without replacing human oversight. The result is a scalable, collaborative model for AI-assisted software development.2.3KViews0likes0Comments