Azure Web Pub announces the GA status of the geo-replica feature. Having a failover strategy in place is a must for enterprise-grade applications. What used to take much effort and time can now be configured quicky with a few clicks. Plus, end users from geographically distant locations can benefit from low latency they expect from a real-time system.
While it is perfectly normal for companies to have one Azure Web PubSub resource serving all its real-time message delivery, for enterprise-grade applications replying on just one resource does not cut it for two important reasons.
The single resource could experience downtime which disrupts the availability of the application.
For most applications using Web PubSub, it can be disastrous. As a stateful communication channel, Web PubSub usually powers critical data exchanges between the server and the web clients. Although Web PubSub already has outstanding uptime guarantee (99.9 for standard tier and 99.95 for the premium tier), architects are not comfortable with the fact that there’s a glaring single point of failure for an important real-time communication channel.
Latency issue of a globally distributed user base
These days it is not uncommon for applications to have a user base across the globe. Having only one Web PubSub resource makes the latency for some users significantly longer than others. When you create a Web PubSub resource, each resource is bound to an Azure region. For our explanation, let’s say the resource is created in East US. End users in the east coast of the US will relish in the speed of connecting with your application, but not the users in the UK or Australia. The round-trip time for a message for users outside the US will suffer due to the greater physical distance between continents.
Typical approach: multi-resource setup
Developers using Web PubSub have long recognized the need for a more resilient and low-latency solution for all users. The preferred approach has been creating a Web PubSub resource in a few carefully selected Azure regions and using Azure Traffic Manager to route clients to the geographically nearest resource. Following our example above, they would set up a resource in the US, one in the UK and one in Australia. Developers would need to have a way to notify Azure Traffic Manager when say the Web PubSub resource in the US becomes unavailable so that Azure Traffic Manager does not route clients to that region. The outcome is that latency for the users in the US increases while the affected resource recovers, but the application is up and running.
If all that’s needed is to proactively push a message from server to web clients or in other words to broadcast message, the application server simply invokes an API on each Web PubSub resource. However, the business requirements developers need to work with are seldom this simple. Often the communication is not just from server-to-web clients, but also from web-clients to server, in other words it’s bi-directional. When a client sends a message to your application and your application needs to send to a subset of users, the crude and simple way is to send an API on all Web PubSub resources. It is, however, without problems. You can refer to this documentation to learn more about the considerations when setting up multi-instance failover strategy.
With the new geo-replica feature, what used to be weeks if not months of development work can be saved, giving the development team more time to focus on unique business requirements rather than worrying about the infrastructure.
Fully managed geo-replication
Geo-replication is generally available and setting up replicas is as easy as a few clicks on Azure Portal.
After enabling geo-replications, a DNS router (Azure Traffic Manager) is put in front of your Web PubSub resource. Both your clients and server will be routed to the closest Web PubSub replicas. Health checks are done periodically by the traffic manager. An automatic DNS failover will happen within 3 minutes if a replica becomes unavailable. So, the most disruption to your application is limited to a maximum of 3 minutes.
Cross-region communication between replicas
Sending messages to clients in the same region will be extremely fast (usually < 10ms). When sending messages to a group of clients that are connected to different replicas, messages between replicas go through the Azure network backbone which has improved speed and reliability.
Enable geo-replication without any code change
You can enable geo-replication either through Azure Portal or through Bicep template. Refer to this article for a step-by-step guide.