Forum Discussion
[Architecture Pattern] Scaling Sync-over-Async Edge Gateways by Bypassing Service Bus Sessions
Hi everyone,
I wanted to share an architectural pattern and an open-source implementation we recently built to solve a major scaling bottleneck at the edge: bridging legacy synchronous HTTP clients to long-running asynchronous AI workers.
The Problem: Stateful Bottlenecks at the Edge
When dealing with slow AI generation tasks (e.g., 45+ seconds), standard REST APIs will drop the connection resulting in 504 Gateway Timeouts.
The standard integration pattern here is Sync-over-Async. The Gateway accepts the HTTP request, drops a message onto Azure Service Bus, waits for the worker to reply, and maps the reply back to the open HTTP connection.
However, the default approach is to use Service Bus Sessions for request-reply correlation. At scale, this introduces severe limitations:
1. Stateful Gateways: The Gateway pod must request an exclusive lock on the session. It becomes tightly coupled to that specific request.
2. Horizontal Elasticity is Broken: If a reply arrives, it must go to the specific pod holding the lock. Other idle pods cannot assist.
3. Hard Limits: A traffic spike easily exhausts the namespace concurrent session limits (especially on the Standard tier).
The Solution: Stateless Filtered Topics
To achieve true horizontal scale, the API Gateway layer must be 100% stateless. We bypassed Sessions entirely by pushing the routing logic down to the broker using a Filtered Topic Pattern.
How it works:
1. The Gateway injects a CorrelationId property (e.g., Instance-A-Req-1) into the outbound request.
2. Instead of locking a session, the Gateway spins up a lightweight, dynamic subscription on a shared Reply Topic with a SQL Filter: CorrelationId = 'Instance-A-Req-1'.
3. The AI worker processes the task and drops the reply onto the shared topic with the same property.
4. The Azure Service Bus broker evaluates the SQL filter and pushes the message directly to the correct Gateway pod.
No session locks. No implicit instance affinity. Complete horizontal scalability. If a pod crashes, its temporary subscription simply drops—preventing locked poison messages.
Open Source Implementation
Implementing dynamic Service Bus Administration clients and receiver lifecycles is complex, so I abstracted this pattern into a Spring Boot starter for the community. It handles all the dynamic subscription and routing logic under the hood, allowing developers to execute highly scalable Sync-over-Async flows with a single line of code returning a CompletableFuture.
GitHub Repository: https://github.com/ShivamSaluja/sentinel-servicebus-starter
Full Technical Write-up: https://dev.to/shivamsaluja/sync-over-async-bypassing-azure-service-bus-session-limits-for-ai-workloads-269d
I would love to hear from other architects in this hub. Have you run into similar session exhaustion limits when building Edge API Gateways? Have you adopted similar stateless broker-side routing, or do you rely on sticky sessions at your load balancers?