Blog Post

Analytics on Azure Blog
6 MIN READ

Azure Managed Redis & Azure Databricks: Real-time Feature Serving for Low-Latency Decisions

Jason_Pereira's avatar
Jason_Pereira
Icon for Microsoft rankMicrosoft
Mar 25, 2026

A practical guide for building a speed layer on top of the lakehouse for real-time scoring and decisioning.

This blog content has been a collective collaboration between the Azure Databricks and Azure Managed Redis Product and Product Marketing teams.
Executive summary 

Modern decisioning systems, fraud scoring, payments authorization, personalization, and step-up authentication, must return answers in tens of milliseconds while still reflecting the most recent behavior. That creates a classic tension: lakehouse platforms excel at large-scale ingestion, feature engineering, governance, training, and replayable history, but they are not designed to sit directly on the synchronous request path for high-QPS, ultra-low-latency lookups. This guide shows a pattern that keeps Azure Databricks as the primary system for building and maintaining features, while using Azure Managed Redis as the online speed layer that serves those features at memory speed for real-time scoring. 

The result is a shorter and more predictable critical path for your application: the Payment API (or any online service) reads features from Azure Managed Redis and calls a model endpoint; Azure Databricks continuously refreshes features from streaming and batch sources; and your authoritative systems of record (for example, account/card data) remain durable and governed. You get real-time responsiveness without giving up data correctness, lineage, or operational discipline. 

What each service does 

Azure Databricks is a first-party analytics and AI platform on Azure built on Apache Spark and the lakehouse  architecture. It is commonly used for batch and streaming pipelines, feature engineering, model training, governance, and operationalization of ML workflows. In this architecture, Azure Databricks is the primary data and AI platform environment where features are defined, computed, validated, published, as well as where governed history is retained.  

Azure Managed Redis is a Microsoft‑managed, in‑memory data store based on Redis Enterprise, designed for low‑latency, high‑throughput access patterns. It is commonly used for traditional and real‑time caching, counters, and session state, and increasingly as a fast state layer for AI‑driven applications. In this architecture, Azure Managed Redis serves as the online feature store and speed layer: it holds the most recent feature values and signals required for real‑time scoring and can also support modern agentic patterns such as short‑ and long‑term memory, vector lookups, and fast state access alongside model inference. 

Business story: real-time fraud scoring as a running example 

Consider a payment system that must decide to approve, decline, or step-up authentication in tens of milliseconds—faster than a blink of an eye! The decision depends on recent behavioral signals, velocity counters, device changes, geo anomalies, and merchant patterns, combined with a fraud model. If the online service tries to compute or retrieve those features from heavy analytics systems on-demand, the request path becomes slower and more variable, especially at peak load. Instead, Azure Databricks pipelines continuously compute and refresh those features, and Azure Managed Redis serves them instantly to the scoring service. Behavioral history, profiles, and outcomes are still written to durable Azure datastores such as Delta tables, and Azure Cosmos DB so fraud models can be retrained with governed, reproducible data. 

 

 

The pattern: online feature serving with a speed layer 

The core idea is to separate responsibilities. Azure Databricks owns “building” features, ingest, join, aggregate, compute windows, and publish validated governed results. Azure Managed Redis owns “serving” features, fast, repeated key-based access on the hot path. The model endpoint then consumes a feature payload that is already pre-shaped for inference. This division prevents the lakehouse from becoming an online dependency and lets you scale online decisioning independently from offline compute. 

Pseudocode: end-to-end flow (online scoring + feature refresh) 

The pseudocode below intentionally reads like application logic rather than a single SDK. It highlights what matters: key design, pipelined feature reads, conservative fallbacks, and continuous refresh from Azure Databricks. 

# ---------------------------- 

# Online scoring (critical path) 

# ---------------------------- 

function handleAuthorization(req): 

  schemaV = "v3" 

  keys = buildFeatureKeys(schemaV, req)            # card/device/merchant + windows 

 

  feats = redis.MGET(keys)                         # single round trip (pipelined) 

  feats = fillDefaults(feats)                      # conservative, no blocking 

 

  payload = toModelPayload(req, feats) 

    score = modelEndpoint.predict(payload)         # Databricks Model Serving or an Azure-hosted model endpoint

 

  decision = policy(score, req)                    # approve/decline/step-up 

 

  emitEventHub("txn_events", summarize(req, score, decision))   # async 

  emitMetrics(redisLatencyMs, modelLatencyMs, missCount(feats)) 

 

  return decision 

 

 

# ----------------------------------------- 

# Feature pipeline (async): build + publish 

# ----------------------------------------- 

function streamingFeaturePipeline(): 

  events = readEventHubs("txn_events") 

  ref   = readCosmos("account_card_reference")     # system of record lookups 

 

  feats = computeFeatures(events, ref)             # windows, counters, signals 

  writeDelta("fraud_feature_history", feats)       # ADLS Delta tables (lakehouse) 

 

  publishLatestToRedis(feats, schemaV="v3")        # SET/HSET + TTL (+ jitter) 

 

 

# ----------------------------------- 

# Training + deploy (async lifecycle) 

# ----------------------------------- 

function trainAndDeploy(): 

  hist   = readDelta("fraud_feature_history") 

  labels = readCosmos("fraud_outcomes")            # delayed ground truth 

 

  model = train(joinPointInTime(hist, labels)) 

  register(model) 

  deployToDatabricksModelServing(model) 

Why it works 

This architecture works because each layer does the job it is best at. The lakehouse and feature pipelines handle heavy computation, validation, lineage, and re-playable history. The online speed layer handles locality and frequency: it keeps the “hot” feature state close to the online compute so requests do not pay the cost of re-computation or large fan-out reads. You explicitly control freshness with TTLs and refresh cadence, and you keep clear correctness boundaries by treating Azure Managed Redis as a serving layer rather than the authoritative system of record, with durable, governed feature history and labels stored in Delta tables and Azure data stores such as Azure Cosmos DB. 

Design choices that matter 

Cost efficiency and availability start with clear separation of concerns. Serving hot features from Azure Managed Redis avoids sizing analytics infrastructure for high‑QPS, low‑latency SLAs, and enables predictable capacity planning with regional isolation for online services. Azure Databricks remains optimized for correctness, freshness, and re-playable history while the online tier scales independently by request rate and working set size. 

  • Freshness and TTLs should reflect business tolerance for staleness and the meaning of each feature. Short velocity windows need TTLs slightly longer than ingestion gaps, while profiles and reference features can live longer. Adding jitter (for example ±10%) prevents synchronized expirations that create load spikes. 
  • Key design is the control plane for safe evolution and availability. Include explicit schema version prefixes and keep keys stable by entity and window. Publish new versions alongside existing ones, switch readers, and retire old versions to enable zero‑downtime rollouts. 
  • Protect the online path from stampedes and unnecessary cost. If a hot key is missing, avoid triggering widespread re-computation in downstream systems. Use a short single‑flight mechanism and conservative defaults, especially for risk‑sensitive decisions. 
  • Keep payloads compact so performance and cost remain predictable. Online feature reads are fastest when values are small and fetched in one or two round trips. Favor numeric encodings and small blobs, and use atomic writes to avoid partial or inconsistent reads during scoring. 

Reference architecture notes (regional first, then global) 

Start with a single-region deployment to validate end-to-end freshness and latency. Co-locate the Payment API compute, Azure Managed Redis, the model endpoint, and the primary data sources for feature pipelines to minimize round trips. Once the pattern is proven, extend to multi-region by deploying the online tier and its local speed layer per region, while keeping a clear strategy for how features are published and reconciled across regions (often via regional pipelines that consume the same event stream or replicated event hubs). 

Operations and SRE considerations 

Layer 

What to Monitor 

Why It Matters 

Typical Signals / Metrics 

Online service (API / scoring) 

End‑to‑end request latency, error rate, fallback rate 

Confirms the critical path meets application SLAs even under partial degradation 

p50/p95/p99 latency, error %, step‑up or conservative decision rate 

Azure Managed Redis (speed layer) 

Feature fetch latency, hit/miss ratio, memory pressure 

Indicates whether the working set fits and whether TTLs align with access patterns 

GET/MGET latency, miss %, evictions, memory usage 

Model serving 

Inference latency, throughput, saturation 

Separates model execution cost from feature access cost 

Inference p95 latency, QPS, concurrency utilization 

Azure Databricks feature pipelines 

Streaming lag, job health, data freshness 

Ensures features are being refreshed on time and correctness is preserved 

Event lag, job failures, watermark delay 

Cross‑layer boundaries 

Correlation between misses, latency spikes, and pipeline lag 

Helps identify whether regressions originate in serving, pipelines, or models 

Redis miss spikes vs pipeline delays vs API latency 

 

Monitor each layer independently, then correlate at the boundaries. This makes it clear whether an SLA issue is caused by online serving pressure, model inference, or delayed feature publication, without turning the lakehouse into a synchronous dependency. 

Putting it all together 

Adopt the pattern incrementally. First, publish a small, high-value feature set from Azure Databricks into Azure Managed Redis and wire the online service to fetch those features during scoring. Measure end-to-end impact on latency, model quality, and operational stability. Next, extend to streaming refresh for near-real-time behavioral features, and add controlled fallbacks for partial misses. Finally, scale out to multi-region if needed, keeping each region’s online service close to its local speed layer and ensuring the feature pipelines provide consistent semantics across regions. 

Sources and further reading 

Azure Databricks documentation: https://learn.microsoft.com/en-us/azure/databricks/

Azure Managed Redis documentation (overview and architecture): https://learn.microsoft.com/azure/redis/ 

Azure Architecture Center: Stream processing with Azure Databricks: https://learn.microsoft.com/azure/architecture/reference-architectures/data/stream-processing-databricks 

Databricks Feature Store / feature engineering docs (Azure Databricks): https://learn.microsoft.com/azure/databricks/ 

Updated Mar 25, 2026
Version 1.0
Comments have been turned off for this post