Real‑Time AI Streaming with Azure OpenAI and SignalR

pranav_pratik

Microsoft

Nov 12, 2025

Deliver ChatGPT‑style “typing” experiences in your own apps - securely and at cloud scale.

TL;DR

We’ll build a real-time AI app where Azure OpenAI streams responses and SignalR broadcasts them live to an Angular client. Users see answers appear incrementally just like ChatGPT while Azure SignalR Service handles scale. You’ll learn the architecture, streaming code, Angular integration, and optional enhancements like typing indicators and multi-agent scenarios.

Why This Matters

Modern users expect instant feedback. Waiting for a full AI response feels slow and breaks engagement. Streaming responses:

Reduces perceived latency: Users see content as it’s generated.
Improves UX: Mimics ChatGPT’s typing effect.
Keeps users engaged: Especially for long-form answers.
Scales for enterprise: Azure SignalR Service handles thousands of concurrent connections.

What you’ll build

A SignalR Hub that calls Azure OpenAI with streaming enabled and forwards partial output to clients as it arrives.
An Angular client that connects over WebSockets/SSE to the hub and renders partial content with a typing indicator.
An optional Azure SignalR Service layer for scalable connection management (thousands to millions of long‑lived connections).
References: SignalR hosting & scale; Azure SignalR Service concepts.

Architecture

The hub calls Azure OpenAI with streaming enabled (await foreach over updates) and broadcasts partials to clients.
Azure SignalR Service (optional) offloads connection scale and removes sticky‑session complexity in multi‑node deployments.
References: Streaming code pattern; scale/ARR affinity; Azure SignalR integration.

Prerequisites

Azure OpenAI resource with a deployed model (e.g., gpt-4o or gpt-4o-mini)
.NET 8 API + ASP.NET Core SignalR backend
Angular 16+ frontend (using microsoft/signalr)

Step‑by‑Step Implementation

1) Backend: ASP.NET Core + SignalR

Install packages

dotnet add package Microsoft.AspNetCore.SignalR
dotnet add package Azure.AI.OpenAI --prerelease
dotnet add package Azure.Identity
dotnet add package Microsoft.Extensions.AI
dotnet add package Microsoft.Extensions.AI.OpenAI --prerelease
# Optional (managed scale): Azure SignalR Service
dotnet add package Microsoft.Azure.SignalR

Using DefaultAzureCredential (Entra ID) avoids storing raw keys in code and is the recommended auth model for Azure services.

Program.cs

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddSignalR();
// To offload connection management to Azure SignalR Service, uncomment:
// builder.Services.AddSignalR().AddAzureSignalR();

builder.Services.AddSingleton<AiStreamingService>();

var app = builder.Build();

app.MapHub<ChatHub>("/chat");

app.Run();

AiStreamingService.cs - streams content from Azure OpenAI

using Microsoft.Extensions.AI;
using Azure.AI.OpenAI;
using Azure.Identity;

public class AiStreamingService
{
    private readonly IChatClient _chatClient;

    public AiStreamingService(IConfiguration config)
    {
        var endpoint = new Uri(config["AZURE_OPENAI_ENDPOINT"]!);
        var deployment = config["AZURE_OPENAI_DEPLOYMENT"]!; // e.g., "gpt-4o-mini"

        var azureClient = new AzureOpenAIClient(endpoint, new DefaultAzureCredential());
        _chatClient = azureClient.GetChatClient(deployment).AsIChatClient();
    }

    public async IAsyncEnumerable<string> StreamReplyAsync(string userMessage)
    {
        var messages = new List<ChatMessage>
        {
            ChatMessage.CreateSystemMessage("You are a helpful assistant."),
            ChatMessage.CreateUserMessage(userMessage)
        };

        await foreach (var update in _chatClient.CompleteChatStreamingAsync(messages))
        {
            // Only text parts; ignore tool calls/annotations
            var chunk = string.Join("",
                update.Content
                      .Where(p => p.Kind == ChatMessageContentPartKind.Text)
                      .Select(p => ((TextContent)p).Text));
            if (!string.IsNullOrEmpty(chunk))
                yield return chunk;
        }
    }
}

Modern .NET AI extensions (Microsoft.Extensions.AI) expose a unified streaming pattern via CompleteChatStreamingAsync.

ChatHub.cs - pushes partials to the caller

using Microsoft.AspNetCore.SignalR;

public class ChatHub : Hub
{
    private readonly AiStreamingService _ai;
    public ChatHub(AiStreamingService ai) => _ai = ai;

    // Client calls: connection.invoke("AskAi", prompt)
    public async Task AskAi(string prompt)
    {
        var messageId = Guid.NewGuid().ToString("N");

        await Clients.Caller.SendAsync("typing", messageId, true);

        await foreach (var partial in _ai.StreamReplyAsync(prompt))
        {
            await Clients.Caller.SendAsync("partial", messageId, partial);
        }

        await Clients.Caller.SendAsync("typing", messageId, false);
        await Clients.Caller.SendAsync("completed", messageId);
    }
}

2) Frontend: Angular client with microsoft/signalr

Install the SignalR client

npm i microsoft/signalr

Create a SignalR service (Angular)

// src/app/services/ai-stream.service.ts
import { Injectable } from '@angular/core';
import * as signalR from '@microsoft/signalr';
import { BehaviorSubject, Observable } from 'rxjs';

@Injectable({ providedIn: 'root' })
export class AiStreamService {
  private connection?: signalR.HubConnection;
  private typing$ = new BehaviorSubject<boolean>(false);
  private partial$ = new BehaviorSubject<string>('');
  private completed$ = new BehaviorSubject<boolean>(false);

  get typing(): Observable<boolean> { return this.typing$.asObservable(); }
  get partial(): Observable<string> { return this.partial$.asObservable(); }
  get completed(): Observable<boolean> { return this.completed$.asObservable(); }

  async start(): Promise<void> {
    this.connection = new signalR.HubConnectionBuilder()
      .withUrl('/chat') // same origin; use absolute URL if CORS
      .withAutomaticReconnect()
      .configureLogging(signalR.LogLevel.Information)
      .build();

    this.connection.on('typing', (_id: string, on: boolean) => this.typing$.next(on));
    this.connection.on('partial', (_id: string, text: string) => {
      // Append incremental content
      this.partial$.next((this.partial$.value || '') + text);
    });
    this.connection.on('completed', (_id: string) => this.completed$.next(true));

    await this.connection.start();
  }

  async ask(prompt: string): Promise<void> {
    // Reset state per request
    this.partial$.next('');
    this.completed$.next(false);
    await this.connection?.invoke('AskAi', prompt);
  }
}

Angular component

// src/app/components/ai-chat/ai-chat.component.ts
import { Component, OnInit } from '@angular/core';
import { AiStreamService } from '../../services/ai-stream.service';

@Component({
  selector: 'app-ai-chat',
  templateUrl: './ai-chat.component.html',
  styleUrls: ['./ai-chat.component.css']
})
export class AiChatComponent implements OnInit {
  prompt = '';
  output = '';
  typing = false;
  done = false;

  constructor(private ai: AiStreamService) {}

  async ngOnInit() {
    await this.ai.start();
    this.ai.typing.subscribe(on => this.typing = on);
    this.ai.partial.subscribe(text => this.output = text);
    this.ai.completed.subscribe(done => this.done = done);
  }

  async send() {
    this.output = '';
    this.done = false;
    await this.ai.ask(this.prompt);
  }
}

HTML Template

<!-- src/app/components/ai-chat/ai-chat.component.html -->
<div class="chat">
  <div class="prompt">
    <input [(ngModel)]="prompt" placeholder="Ask me anything…" />
    <button (click)="send()">Send</button>
  </div>

  <div class="response">
    <pre>{{ output }}</pre>
    <div class="typing" *ngIf="typing">Assistant is typing…</div>
    <div class="done" *ngIf="done">✓ Completed</div>
  </div>
</div>

Streaming modes, content filters, and UX

Azure OpenAI streaming interacts with content filtering in two ways:

Default streaming: The service buffers output into content chunks and runs content filters before each chunk is emitted; you still stream, but not necessarily token‑by‑token.
Asynchronous Filter (optional): The service returns token‑level updates immediately and runs filters asynchronously. You get ultra‑smooth streaming but must handle delayed moderation signals (e.g., redaction or halting the stream).

Best practices

Append partials in small batches client‑side to avoid DOM thrash; finalize formatting on "completed".
Log full messages server‑side only after completion to keep histories consistent (mirrors agent frameworks).

Security & compliance

Auth: Prefer Microsoft Entra ID (DefaultAzureCredential) to avoid key sprawl; use RBAC and Managed Identities where possible.
Secrets: Store Azure SignalR connection strings in Key Vault and rotate periodically; never hardcode.
CORS & cross‑domain: When hosting frontend and hub on different origins, configure CORS and use absolute URLs in withUrl(...).

Connection management & scaling tips

Persistent connection load: SignalR consumes TCP resources; separate heavy real‑time workloads or use Azure SignalR to protect other apps.
Sticky sessions (self‑hosted): Required in most multi‑server scenarios unless WebSockets‑only + SkipNegotiation applies; Azure SignalR removes this requirement.