A practical guide for building a speed layer on top of the lakehouse for real-time scoring and decisioning.
This blog content has been a collective collaboration between the Azure Databricks and Azure Managed Redis Product and Product Marketing teams.
Executive summary
Modern decisioning systems, fraud scoring, payments authorization, personalization, and step-up authentication, must return answers in tens of milliseconds while still reflecting the most recent behavior. That creates a classic tension: lakehouse platforms excel at large-scale ingestion, feature engineering, governance, training, and replayable history, but they are not designed to sit directly on the synchronous request path for high-QPS, ultra-low-latency lookups. This guide shows a pattern that keeps Azure Databricks as the primary system for building and maintaining features, while using Azure Managed Redis as the online speed layer that serves those features at memory speed for real-time scoring.
The result is a shorter and more predictable critical path for your application: the Payment API (or any online service) reads features from Azure Managed Redis and calls a model endpoint; Azure Databricks continuously refreshes features from streaming and batch sources; and your authoritative systems of record (for example, account/card data) remain durable and governed. You get real-time responsiveness without giving up data correctness, lineage, or operational discipline.
What each service does
Azure Databricks is a first-party analytics and AI platform on Azure built on Apache Spark and the lakehouse architecture. It is commonly used for batch and streaming pipelines, feature engineering, model training, governance, and operationalization of ML workflows. In this architecture, Azure Databricks is the primary data and AI platform environment where features are defined, computed, validated, published, as well as where governed history is retained.
Azure Managed Redis is a Microsoft‑managed, in‑memory data store based on Redis Enterprise, designed for low‑latency, high‑throughput access patterns. It is commonly used for traditional and real‑time caching, counters, and session state, and increasingly as a fast state layer for AI‑driven applications. In this architecture, Azure Managed Redis serves as the online feature store and speed layer: it holds the most recent feature values and signals required for real‑time scoring and can also support modern agentic patterns such as short‑ and long‑term memory, vector lookups, and fast state access alongside model inference.
Business story: real-time fraud scoring as a running example
Consider a payment system that must decide to approve, decline, or step-up authentication in tens of milliseconds—faster than a blink of an eye! The decision depends on recent behavioral signals, velocity counters, device changes, geo anomalies, and merchant patterns, combined with a fraud model. If the online service tries to compute or retrieve those features from heavy analytics systems on-demand, the request path becomes slower and more variable, especially at peak load. Instead, Azure Databricks pipelines continuously compute and refresh those features, and Azure Managed Redis serves them instantly to the scoring service. Behavioral history, profiles, and outcomes are still written to durable Azure datastores such as Delta tables, and Azure Cosmos DB so fraud models can be retrained with governed, reproducible data.
The pattern: online feature serving with a speed layer
The core idea is to separate responsibilities. Azure Databricks owns “building” features, ingest, join, aggregate, compute windows, and publish validated governed results. Azure Managed Redis owns “serving” features, fast, repeated key-based access on the hot path. The model endpoint then consumes a feature payload that is already pre-shaped for inference. This division prevents the lakehouse from becoming an online dependency and lets you scale online decisioning independently from offline compute.
Pseudocode: end-to-end flow (online scoring + feature refresh)
The pseudocode below intentionally reads like application logic rather than a single SDK. It highlights what matters: key design, pipelined feature reads, conservative fallbacks, and continuous refresh from Azure Databricks.
# ----------------------------
# Online scoring (critical path)
# ----------------------------
function handleAuthorization(req):
schemaV = "v3"
keys = buildFeatureKeys(schemaV, req) # card/device/merchant + windows
feats = redis.MGET(keys) # single round trip (pipelined)
feats = fillDefaults(feats) # conservative, no blocking
payload = toModelPayload(req, feats)
score = modelEndpoint.predict(payload) # Databricks Model Serving or an Azure-hosted model endpoint
decision = policy(score, req) # approve/decline/step-up
emitEventHub("txn_events", summarize(req, score, decision)) # async
emitMetrics(redisLatencyMs, modelLatencyMs, missCount(feats))
return decision
# -----------------------------------------
# Feature pipeline (async): build + publish
# -----------------------------------------
function streamingFeaturePipeline():
events = readEventHubs("txn_events")
ref = readCosmos("account_card_reference") # system of record lookups
feats = computeFeatures(events, ref) # windows, counters, signals
writeDelta("fraud_feature_history", feats) # ADLS Delta tables (lakehouse)
publishLatestToRedis(feats, schemaV="v3") # SET/HSET + TTL (+ jitter)
# -----------------------------------
# Training + deploy (async lifecycle)
# -----------------------------------
function trainAndDeploy():
hist = readDelta("fraud_feature_history")
labels = readCosmos("fraud_outcomes") # delayed ground truth
model = train(joinPointInTime(hist, labels))
register(model)
deployToDatabricksModelServing(model)
Why it works
This architecture works because each layer does the job it is best at. The lakehouse and feature pipelines handle heavy computation, validation, lineage, and re-playable history. The online speed layer handles locality and frequency: it keeps the “hot” feature state close to the online compute so requests do not pay the cost of re-computation or large fan-out reads. You explicitly control freshness with TTLs and refresh cadence, and you keep clear correctness boundaries by treating Azure Managed Redis as a serving layer rather than the authoritative system of record, with durable, governed feature history and labels stored in Delta tables and Azure data stores such as Azure Cosmos DB.
Design choices that matter
Cost efficiency and availability start with clear separation of concerns. Serving hot features from Azure Managed Redis avoids sizing analytics infrastructure for high‑QPS, low‑latency SLAs, and enables predictable capacity planning with regional isolation for online services. Azure Databricks remains optimized for correctness, freshness, and re-playable history while the online tier scales independently by request rate and working set size.
- Freshness and TTLs should reflect business tolerance for staleness and the meaning of each feature. Short velocity windows need TTLs slightly longer than ingestion gaps, while profiles and reference features can live longer. Adding jitter (for example ±10%) prevents synchronized expirations that create load spikes.
- Key design is the control plane for safe evolution and availability. Include explicit schema version prefixes and keep keys stable by entity and window. Publish new versions alongside existing ones, switch readers, and retire old versions to enable zero‑downtime rollouts.
- Protect the online path from stampedes and unnecessary cost. If a hot key is missing, avoid triggering widespread re-computation in downstream systems. Use a short single‑flight mechanism and conservative defaults, especially for risk‑sensitive decisions.
- Keep payloads compact so performance and cost remain predictable. Online feature reads are fastest when values are small and fetched in one or two round trips. Favor numeric encodings and small blobs, and use atomic writes to avoid partial or inconsistent reads during scoring.
Reference architecture notes (regional first, then global)
Start with a single-region deployment to validate end-to-end freshness and latency. Co-locate the Payment API compute, Azure Managed Redis, the model endpoint, and the primary data sources for feature pipelines to minimize round trips. Once the pattern is proven, extend to multi-region by deploying the online tier and its local speed layer per region, while keeping a clear strategy for how features are published and reconciled across regions (often via regional pipelines that consume the same event stream or replicated event hubs).
Operations and SRE considerations
|
Layer |
What to Monitor |
Why It Matters |
Typical Signals / Metrics |
|
Online service (API / scoring) |
End‑to‑end request latency, error rate, fallback rate |
Confirms the critical path meets application SLAs even under partial degradation |
p50/p95/p99 latency, error %, step‑up or conservative decision rate |
|
Azure Managed Redis (speed layer) |
Feature fetch latency, hit/miss ratio, memory pressure |
Indicates whether the working set fits and whether TTLs align with access patterns |
GET/MGET latency, miss %, evictions, memory usage |
|
Model serving |
Inference latency, throughput, saturation |
Separates model execution cost from feature access cost |
Inference p95 latency, QPS, concurrency utilization |
|
Azure Databricks feature pipelines |
Streaming lag, job health, data freshness |
Ensures features are being refreshed on time and correctness is preserved |
Event lag, job failures, watermark delay |
|
Cross‑layer boundaries |
Correlation between misses, latency spikes, and pipeline lag |
Helps identify whether regressions originate in serving, pipelines, or models |
Redis miss spikes vs pipeline delays vs API latency |
Monitor each layer independently, then correlate at the boundaries. This makes it clear whether an SLA issue is caused by online serving pressure, model inference, or delayed feature publication, without turning the lakehouse into a synchronous dependency.
Putting it all together
Adopt the pattern incrementally. First, publish a small, high-value feature set from Azure Databricks into Azure Managed Redis and wire the online service to fetch those features during scoring. Measure end-to-end impact on latency, model quality, and operational stability. Next, extend to streaming refresh for near-real-time behavioral features, and add controlled fallbacks for partial misses. Finally, scale out to multi-region if needed, keeping each region’s online service close to its local speed layer and ensuring the feature pipelines provide consistent semantics across regions.
Sources and further reading
Azure Databricks documentation: https://learn.microsoft.com/en-us/azure/databricks/
Azure Managed Redis documentation (overview and architecture): https://learn.microsoft.com/azure/redis/
Azure Architecture Center: Stream processing with Azure Databricks: https://learn.microsoft.com/azure/architecture/reference-architectures/data/stream-processing-databricks
Databricks Feature Store / feature engineering docs (Azure Databricks): https://learn.microsoft.com/azure/databricks/