Blog Post

Azure Managed Redis
5 MIN READ

Orchestrate multi-LLM workflows with Azure Managed Redis

Shruti_Pathak's avatar
Shruti_Pathak
Icon for Microsoft rankMicrosoft
Aug 29, 2025

Using Azure Managed Redis to cut latency, simplify logic, and direct LLMs to the right model, prompt, or agent 

AuthorsRoberto Perez, George von Bülow & Roy de Milde 

 

Key challenge for building effective LLMs 

In the age of generative AI, large language models (LLMs) are reshaping how we build applications — from chatbots to intelligent agents and beyond. But as these systems become more dynamic and multi-modal, one key challenge stands out: how do we route requests efficiently to the right model, prompt, or action at the right time? Traditional architectures struggle with the speed and precision required to orchestrate LLM calls in real-time, especially at scale. This is where Azure Managed Redis steps in — acting as a fast, in-memory data layer to power smart, context-aware routing for LLMs. In this blog, we explore how Redis and Azure are enabling developers to build AI systems that respond faster, think smarter, and scale effortlessly. 

Across industries, customers are hitting real limitations. AI workloads often need to track context across multiple interactions, store intermediate decisions, and switch between different prompts or models based on user intent — all while staying responsive. But stitching this logic together using traditional databases or microservice queues introduces latency, complexity, and cost. Teams face challenges like keeping routing logic fast and adaptive, storing transient LLM state without bloating backend services, and coordinating agent-like behaviors across multiple components. These are exactly the pain points AMR was built to address — giving developers a low-latency, highly available foundation for real-time AI orchestration and more. 

 

How to use Azure Managed Redis as a Semantic Router 

 Semantic routing uses AI to route user queries to the right service, model or endpoint, based on their intent and context. Unlike rule-based systems, it leverages Generative AI to understand the meaning behind requests, enabling more accurate and efficient decisions. Importantly, the semantic router itself does not forward the query—it only selects the appropriate route. Your application is responsible for taking that routing decision and sending the query to the correct agent, model, or human. 

 

  1. The users sends a query, which is passed to the system for processing 
  1. The query is analyzed by an embedding model to understand its semantic intent and context 
  1. The semantic router evaluates the user’s intent and context to choose the optimal route: 
  1. A specific model for further processing 
  1. An agent to handle the query 
  1. A default response if applicable 
  1. Escalation to a human for manual handling, if needed 
  1. Valid queries go through the RAG pipeline to generate a response 
  1. The final response is sent back to the user 

 

 

 

Code examples + Architecture 

Example: Jupyter Notebook with Semantic Router 

Let’s look at a Jupyter Notebook example that implements a simple Semantic Router with Azure Managed Redis and the Redis Vector Library. 

First, we install the required Python packages and define a connection to an AMR instance: 

 

pip install -q "redisvl>=0.6.0" sentence-transformers dotenv

Define the Azure Managed Redis Connection.

 

import os
import warnings

warnings.filterwarnings("ignore")

from dotenv import load_dotenv
load_dotenv()
REDIS_HOST = os.getenv("REDIS_HOST") # ex: "gvb-sm.uksouth.redis.azure.net"
REDIS_PORT = os.getenv("REDIS_PORT") # for AMR this is always 10000
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD")  # ex: "giMzOzIP4YmjNBGCfmqpgA7e749d6GyIHAzCaF5XXXXX"

# If SSL is enabled on the endpoint, use rediss:// as the URL prefix
REDIS_URL = f"redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}"

Next, we create our first Semantic Router with an allow/block list: 

 

from redisvl.extensions.router import Route, SemanticRouter
from redisvl.utils.vectorize import HFTextVectorizer

vectorizer = HFTextVectorizer()

# Semantic router
blocked_references = [
    "things about aliens",
    "corporate questions about agile",
    "anything about the S&P 500",
]

blocked_route = Route(name="block_list", references=blocked_references)

block_router = SemanticRouter(
    name="bouncer",
    vectorizer=vectorizer,
    routes=[blocked_route],
    redis_url=REDIS_URL,
    overwrite=False,
)

To prevent users from asking certain categories of questions, we can define example references in a list of blocked routes using the Redis Vector Library function SemanticRouter(). While it is also possible to implement blocking at the LLM level through prompt engineering (e.g., instructing the model to refuse answering certain queries), this approach still requires an LLM call, adding unnecessary cost and latency. By handling blocking earlier with semantic routing in Azure Managed Redis, unwanted queries can be intercepted before ever reaching the model, saving LLM tokens, reducing expenses, and improving overall efficiency. 

 

Let’s try it out: 

 

user_query = "Why is agile so important?"

route_match = block_router(user_query)

route_match

The router first vectorizes the user query using the specified Hugging Face text vectorizer. It finds a semantic similarity with route reference “corporate question sabout agile” and returns the matching route ‘block_list`. Note the returned distance value – this indicates the degree of semantic similarity between the user query and the blocked reference. You can fine-tune the Semantic Router by specifying a minimum threshold value that must be reached to count as a match. 

 

For full details and more complex examples, you can explore the Jupyter Notebooks in this GitHub repository. 

 

How do customers benefit? 

For customers, this technology delivers clear and immediate value. By using Azure Managed Redis as the high-performance backbone for semantic routing and agent coordination, organizations can significantly reduce latency, simplify infrastructure, and accelerate time-to-value for AI-driven experiences. Instead of building custom logic spread across multiple services, teams get a centralized, scalable, and fully managed in-memory layer that handles vector search, routing logic, and real-time state management — all with enterprise-grade SLAs, security, and Azure-native integration. The result? Smarter and faster LLM interactions, reduced operational complexity, and the flexibility to scale AI use cases from prototypes to production without re-architecting. 

Whether you're building an intelligent chatbot, orchestrating multi-agent workflows, or powering internal copilots, this Redis-backed technology gives you the agility to adapt in real time. You can dynamically route based on user intent, past interactions, or even business rules — all while maintaining low-latency responses that users expect from modern AI applications. And because it’s fully managed on Azure, teams can focus on innovation rather than infrastructure, with built-in support for high availability, monitoring, and enterprise governance. It’s a future-proof foundation for AI systems that need to be not just powerful, but precise. 

 

Try Azure Managed Redis today 

If you want to explore how to route large language models efficiently, Azure Managed Redis provides a reliable and low-latency solution. You can learn more about the service on the Azure Managed Redis page and find detailed documentation in the Azure Redis overview. For hands-on experience, check out the routing optimization notebook and other examples in the Redis AI resources repository and GitHub - loriotpiroloriol/amr-semantic-router. Give it a try to see how it fits your LLM routing needs. 

Updated Aug 22, 2025
Version 1.0
No CommentsBe the first to comment