phi
3 TopicsStaying in the flow: SleekFlow and Azure turn customer conversations into conversions
A customer adds three items to their cart but never checks out. Another asks about shipping, gets stuck waiting eight minutes, only to drop the call. A lead responds to an offer but is never followed up with in time. Each of these moments represents lost revenue, and they happen to businesses every day. SleekFlow was founded in 2019 to help companies turn those almost-lost-customer moments into connection, retention, and growth. Today we serve more than 2,000 mid-market and enterprise organizations across industries including retail and e-commerce, financial services, healthcare, travel and hospitality, telecommunications, real estate, and professional services. In total, those customers rely on SleekFlow to orchestrate more than 600,000 daily customer interactions across WhatsApp, Instagram, web chat, email, and more. Our name reflects what makes us different. Sleek is about unified, polished experiences—consolidating conversations into one intelligent, enterprise-ready platform. Flow is about orchestration—AI and human agents working together to move each conversation forward, from first inquiry to purchase to renewal. The drive for enterprise-ready agentic AI Enterprises today expect always-on, intelligent conversations—but delivering that at scale proved daunting. When we set out to build AgentFlow, our agentic AI platform, we quickly ran into familiar roadblocks: downtime that disrupted peak-hour interactions, vector search delays that hurt accuracy, and costs that ballooned under multi-tenant workloads. Development slowed from limited compatibility with other technologies, while customer onboarding stalled without clear compliance assurances. To move past these barriers, we needed a foundation that could deliver the performance, trust, and global scale enterprises demand. The platform behind the flow: How Azure powers AgentFlow We chose Azure because building AgentFlow required more than raw compute power. Chatbots built on a single-agent model often stall out. They struggle to retrieve the right context, they miss critical handoffs, and they return answers too slowly to keep a customer engaged. To fix that, we needed an ecosystem capable of supporting a team of specialized AI agents working together at enterprise scale. Azure Cosmos DB provides the backbone for memory and context, managing short-term interactions, long-term histories, and vector embeddings in containers that respond in 15–20 milliseconds. Powered by Azure AI Foundry, our agents use Azure OpenAI models within Azure AI Foundry to understand and generate responses natively in multiple languages. Whether in English, Chinese, or Portuguese, the responses feel natural and aligned with the brand. Semantic Kernel acts as the conductor, orchestrating multiple agents, each of which retrieves the necessary knowledge and context, including chat histories, transactional data, and vector embeddings, directly from Azure Cosmos DB. For example, one agent could be retrieving pricing data, another summarizing it, and a third preparing it for a human handoff. The result is not just responsiveness but accuracy. A telecom provider can resolve a billing question while surfacing an upsell opportunity in the same dialogue. A financial advisor can walk into a call with a complete dossier prepared in seconds rather than hours. A retailer can save a purchase by offering an in-stock substitute before the shopper abandons the cart. Each of these conversations is different, yet the foundation is consistent on AgentFlow. Fast, fluent, and focused: Azure keeps conversations moving Speed is the heartbeat of a good conversation. A delayed answer feels like a dropped call, and an irrelevant one breaks trust. For AgentFlow to keep customers engaged, every operation behind the scenes has to happen in milliseconds. A single interaction can involve dozens of steps. One agent pulls product information from embeddings, another checks it against structured policy data, and a third generates a concise, brand-aligned response. If any of these steps lag, the dialogue falters. On Azure, they don’t. Azure Cosmos DB manages conversational memory and agent state across dedicated containers for short-term exchanges, long-term history, and vector search. Sharded DiskANN indexing powers semantic lookups that resolve in the 15–20 millisecond range—fast enough that the customer never feels a pause. Microsoft Phi’s model Phi-4 as well as Azure OpenAI in Foundry Models like o3-mini and o4-mini, provide the reasoning, and Azure Container Apps scale elastically, so performance holds steady during event-driven bursts, such as campaign broadcasts that can push the platform from a few to thousands of conversations per minute, and during daily peak-hour surges. To support that level of responsiveness, we run Azure Container Apps on the Pay-As-You-Go consumption plan, using KEDA-based autoscaling to expand from five idle containers to more than 160 within seconds. Meanwhile, Microsoft Orleans coordinates lightweight in-memory clustering to keep conversations sleek and flowing. The results are tangible. Retrieval-augmented generation recall improved from 50 to 70 percent. Execution speed is about 50 percent faster. For SleekFlow’s customers, that means carts are recovered before they’re abandoned, leads are qualified in real time, and support inquiries move forward instead of stalling out. With Azure handling the complexity under the hood, conversations flow naturally on the surface—and that’s what keeps customers engaged. Secure enough for enterprises, human enough for customers AgentFlow was built with security-by-design as a first principle, giving businesses confidence that every interaction is private, compliant, and reliable. On Azure, every AI agent operates inside guardrails enterprises can depend on. Azure Cosmos DB enforces strict per-tenant isolation through logical partitioning, encryption, and role-based access control, ensuring chat histories, knowledge bases, and embeddings remain auditable and contained. Models deployed through Azure AI Foundry, including Azure OpenAI and Microsoft Phi, process data entirely within SleekFlow’s Azure environment and guarantees it is never used to train public models, with activity logged for transparency. And Azure’s certifications—including ISO 27001, SOC 2, and GDPR—are backed by continuous monitoring and regional data residency options, proving compliance at a global scale. But trust is more than a checklist of certifications. AgentFlow brings human-like fluency and empathy to every interaction, powered by Azure OpenAI running with high token-per-second throughput so responses feel natural in real time. Quality control isn’t left to chance. Human override workflows are orchestrated through Azure Container Apps and Azure App Service, ensuring AI agents can carry conversations confidently until they’re ready for human agents. Enterprises gain the confidence to let AI handle revenue-critical moments, knowing Azure provides the foundation and SleekFlow provides the human-centered design. Shaping the next era of conversational AI on Azure The benefits of Azure show up not only in customer conversations but also in the way our own teams work. Faster processing speeds and high token-per-second throughput reduce latency, so we spend less time debugging and more time building. Stable infrastructure minimizes downtime and troubleshooting, lowering operational costs. That same reliability and scalability have transformed the way we engineer AgentFlow. AgentFlow started as part of our monolithic system. Shipping new features used to take about a month of development and another week of heavy testing to make sure everything held together. After moving AgentFlow to a microservices architecture on Azure Container Apps, we can now deploy updates almost daily with no down time or customer impact. And this is all thanks to native support for rolling updates and blue-green deployments. This agility is what excites us most about what's ahead. With Azure as our foundation, SleekFlow is not simply keeping pace with the evolution of conversational AI—we are shaping what comes next. Every interaction we refine, every second we save, and every workflow we streamline brings us closer to our mission: keeping conversations sleek, flowing, and valuable for enterprises everywhere.263Views3likes0CommentsEdge AI for Beginners : Getting Started with Foundry Local
In Module 08 of the EdgeAI for Beginners course, Microsoft introduces Foundry Local a toolkit that helps you deploy and test Small Language Models (SLMs) completely offline. In this blog, I’ll share how I installed Foundry Local, ran the Phi-3.5-mini model on my windows laptop, and what I learned through the process. What Is Foundry Local? Foundry Local allows developers to run AI models locally on their own hardware. It supports text generation, summarization, and code completion — all without sending data to the cloud. Unlike cloud-based systems, everything happens on your computer, so your data never leaves your device. Prerequisites Before starting, make sure you have: Windows 10 or 11 Python 3.10 or newer Git Internet connection (for the first-time model download) Foundry Local installed Step 1 — Verify Installation After installing Foundry Local, open Command Prompt and type: foundry --version If you see a version number, Foundry Local is installed correctly. Step 2 — Start the Service Start the Foundry Local service using: foundry service start You should see a confirmation message that the service is running. Step 3 — List Available Models To view the models supported by your system, run: foundry model list You’ll get a list of locally available SLMs. Here’s what I saw on my machine: Note: Model availability depends on your device’s hardware. For most laptops, phi-3.5-mini works smoothly on CPU. Step 4 — Run the Phi-3.5 Model Now let’s start chatting with the model: foundry model run phi-3.5-mini-instruct-generic-cpu:1 Once it loads, you’ll enter an interactive chat mode. Try a simple prompt: Hello! What can you do? The model replies instantly — right from your laptop, no cloud needed. To exit, type: /exit How It Works Foundry Local loads the model weights from your device and performs inference locally.This means text generation happens using your CPU (or GPU, if available). The result: complete privacy, no internet dependency, and instant responses. Benefits for Students For students beginning their journey in AI, Foundry Local offers several key advantages: No need for high-end GPUs or expensive cloud subscriptions. Easy setup for experimenting with multiple models. Perfect for class assignments, AI workshops, and offline learning sessions. Promotes a deeper understanding of model behavior by allowing step-by-step local interaction. These factors make Foundry Local a practical choice for learning environments, especially in universities and research institutions where accessibility and affordability are important. Why Use Foundry Local Running models locally offers several practical benefits compared to using AI Foundry in the cloud. With Foundry Local, you do not need an internet connection, and all computations happen on your personal machine. This makes it faster for small models and more private since your data never leaves your device. In contrast, AI Foundry runs entirely on the cloud, requiring internet access and charging based on usage. For students and developers, Foundry Local is ideal for quick experiments, offline testing, and understanding how models behave in real-time. On the other hand, AI Foundry is better suited for large-scale or production-level scenarios where models need to be deployed at scale. In summary, Foundry Local provides a flexible and affordable environment for hands-on learning, especially when working with smaller models such as Phi-3, Qwen2.5, or TinyLlama. It allows you to experiment freely, learn efficiently, and better understand the fundamentals of Edge AI development. Optional: Restart Later Next time you open your laptop, you don’t have to reinstall anything. Just run these two commands again: foundry service start foundry model run phi-3.5-mini-instruct-generic-cpu:1 What I Learned Following the EdgeAI for Beginners Study Guide helped me understand: How edge AI applications work How small models like Phi 3.5 can run on a local machine How to test prompts and build chat apps with zero cloud usage Conclusion Running the Phi-3.5-mini model locally with Foundry Localgave me hands-on insight into edge AI. It’s an easy, private, and cost-free way to explore generative AI development. If you’re new to Edge AI, start with the EdgeAI for Beginners course and follow its Study Guide to get comfortable with local inference and small language models. Resources: EdgeAI for Beginners GitHub Repo Foundry Local Official Site Phi Model Link452Views1like0CommentsTransforming Android Development: Unveiling MediaTek’s latest chipset with Microsoft's Phi models
Imagine running advanced AI applications—like intelligent copilots and Retrieval-Augmented Generation (RAG)—directly on Android devices, completely offline. With the rapid evolution of Neural Processing Units (NPUs), this is no longer a future vision—it’s happening now. Optimized AI at the Edge: Phi-4-mini on MediaTek Thanks to MediaTek’s conversion and quantization tools, Microsoft’s Phi-4-mini and Phi-4-mini-reasoning models are now optimized for MediaTek NPUs. This collaboration empowers developers to build fast, responsive, and privacy-preserving AI experiences on Android—without needing cloud connectivity. MediaTek’s flagship Dimensity 9400 and 9400+ platform with Dimensity GenAI Toolkit 2.0 delivers excellent performance with the Phi-4 mini (3.8B) model where prefill speed is >800 tokens/sec and decode speed is >21 tokens/sec. Unlock Enhanced Performance: Introducing MediaTek's NeuroPilot SDK The MediaTek NeuroPilot SDK is a robust software development toolkit designed to accelerate AI application development and deployment across MediaTek’s hardware ecosystem. It provides developers with advanced optimization tools and cross-platform compatibility, enabling efficient implementation of neural networks while balancing performance, power efficiency, and resource utilization. Comprehensive toolchain and documentation support The NeuroPilot platform offers a complete toolchain, including SDKs, APIs, and documentation, for model quantization/conversion, compilation, and integration. Developers can leverage these tools to optimize neural networks, significantly improving on-device performance while reducing power consumption and memory usage. MediaTek’s Dimensity GenAI Toolkit 2.0 now supports the Phi-4 series and provides best practices. Users can convert and quantize Phi-4 mini models in just a few steps, enabling seamless deployment on Dimensity series platforms. A key advantage is that developers do not require specialized hardware expertise to rapidly prototype and deploy customized AI solutions. One-time coding, cross-platform deployment The MediaTek NeuroPilot SDK supports all AI-capable MediaTek hardware, empowering developers to adopt a "code once, deploy everywhere" strategy across smartphones, tablets, automotive, smart home devices, IoT products, and future platforms. This aligns with MediaTek’s corporate philosophy of bringing AI to everyone. This unified approach streamlines development, reduces costs, and accelerates time-to-market. The SDK integrates with Android and Linux ecosystems, providing complete compiler suites, analyzers, and application libraries to ensure compatibility and optimize performance. Demo 1: Deploying Phi-4-mini-reasoning with NeuroPilot SDK In this demo, developers are shown how to use the NeuroPilot SDK to deploy the Phi-4-mini-reasoning model on edge devices. The SDK enables efficient conversion and optimization, making it possible to bring advanced reasoning capabilities to smartphones and other local hardware. The Phi-4-mini-reasoning model brings logical and problem-solving capabilities to the edge. With MediaTek’s advanced conversion tools, this new model can be transformed for MediaTek’s DLA, enabling a new class of intelligent applications on mobile devices. Bringing reasoning capabilities to the edge allows developers to build faster, more responsive AI experiences—without relying on cloud access. Demo 2: Deploying Phi-4-mini with NeuroPilot SDK This video demonstrates how to convert and run the Phi-4-mini model using the NeuroPilot SDK. With a focus on instruction-following tasks, this deployment empowers developers to build responsive, embedded AI assistants that understand and execute user commands locally. Whether it’s productivity tools or context-sensitive automation, Phi-4-mini brings natural interaction and reliability directly to the device. Imagine the possibilities: Real world scenarios Intelligent information access with on-device RAG Picture this: your application intelligently accesses and reasons over on-device documents, like PDFs or internal knowledge bases, using an advanced embedding model paired with the MediaTek optimized Phi-4-mini. This enables developers to create: Personalized Assistants: Apps that understand user context from their own documents. Offline Knowledge Hubs: Providing instant access to relevant information without needing cloud connectivity. Enhanced Productivity Tools: Smart summarization, Q&A, and content generation based on local data. Demo 3: Private RAG chatbot on device People are on their mobile devices every day—saving new documents, sending messages, taking notes, and more. With how much we’re able to store on our phones and laptops, it can get hard to find specific files or pieces of information when we need them most. What if you could implement a personal assistant that understands your question and fetches exactly what you’re looking for, without you needing to dig through your device? This demo showcases a Retrieval-Augmented Generation (RAG) implementation of the Phi model embedded directly on a smartphone. The chatbot allows users to ask natural language questions and instantly retrieve relevant information from local files. Because the model runs on-device, there's no need for a cloud connection—ensuring your data stays private while still offering intelligent, context-aware result RAG based Phi-4-mini solution, so that when you searched your device, it parsed through every document to help you find the exact document you are looking for. Stay ahead of the curve: If you're eager to explore the Phi-4 family of models on edge devices and master building next-gen apps with MediaTek's powerful NPU, don't miss the key sessions at Microsoft Build and Computex Taipei happening this week. This is your chance to get direct insights from the experts. Microsoft Build 2025: Uncover the latest on Azure AI Foundry on May 20 during “Unveiling Latest Innovations in Azure AI Foundry Model Catalog” If you are in person on May 20 th , catch the second lab “Fine-Tune End-to-End Distillation Models with Azure AI Foundry Models” Learn about Phi on Windows devices in on May 20 th for “Enable seamless deployment across Intel Copilot+ AI PCs and Azure” Computex 2025 : MediaTek Booth (M0806) on May 20-23. See MediaTek 's AI vision and hardware innovations firsthand. Resources Explore the Phi-4 Model Family on Azure AI Foundry and HuggingFace Get access to the Phi Cookbook: Your practical guide and code repository for building with Phi models. Learn more about Mediatek NeuroPilot Connect with the MediaTek Developer Application924Views0likes0Comments