aks
1 TopicBuilding an Azure architecture that’s ready for every signature
At Exclaimer, we help organizations manage email signatures at scale, so every message can carry a consistent, compliant, on-brand signature without IT teams manually updating thousands of mailboxes. This is more difficult than it may seem, especially when you're doing it for more than 80,000 customers, around 9.6 million seats, and more than 21 billion emails a year. Every signature must show up in the right place, with the right details, for the right sender, recipient, device, and business rule. Behind that are constantly changing employee records, customer-specific policies, email chains, recipient lists, regional disclaimers, and brand requirements. Because our platform sits directly in the email flow, availability is critical. And because many of our customers operate in regulated industries, they also need confidence that data stays in-region and configured signatures are applied consistently. To support that level of scale and reliability, we’ve spent the last several years evolving our architecture on Microsoft Azure. Today, Azure Kubernetes Service (AKS), Azure SQL Database, Azure Database for PostgreSQL, Azure Cosmos DB, Azure Data Explorer, and Azure Databricks help us run a global platform that’s more responsive, more resilient, and more cost-efficient. Reading the signs that our architecture needed to change In the beginning, our cloud product ran more like a multi-server, on-premises product hosted on Azure Virtual Machines (VMs). The platform was split into a smaller number of core services, and the team relied heavily on VM-based infrastructure to keep those services running. As Exclaimer grew, our architecture had to keep pace with higher volumes, more regions, and more complex customer requirements. Regional demand shifted throughout the day, but scaling infrastructure up and down still relied on scripts, pre-baked VMs, and operational coordination. That created more risk during maintenance and failover. We run parallel data centers in regional pairs so we can move traffic away from one site when needed. But when traffic moves, the receiving environment has to be ready to handle the full load. In the VM world, that meant someone or something had to remember to scale up standby resources at the right moment. At the same time, our product was becoming more service-oriented. We were moving away from a smaller set of larger services toward well over 100 microservices. Every new service created more conversations about VM sizing, images, patching, and operational overhead. It was time for a model that could scale faster, run more efficiently, and reduce the amount of infrastructure work required to ship and operate the product. Signing on to AKS for faster, more efficient scaling By moving many workloads to Linux containers on AKS, we gained a smaller footprint, faster startup times, and a more consistent way to package and deploy services. AKS also gave us a managed Kubernetes foundation for running those containers at global scale, with autoscaling capabilities that better matched our traffic patterns. With Horizontal Pod Autoscaler, services can react to load in seconds rather than minutes. With Cluster Autoscaler, we can add or remove node capacity based on what the platform actually needs. That means we can pack workloads onto nodes more efficiently, scale down during quiet periods, and scale up quickly when demand returns. The operational difference is just as important. During an incident, maintenance event, or regional failover, our teams have fewer manual steps to think about. If traffic shifts, the platform can scale with it. That takes away one more thing for engineers to worry about when they should be focused on keeping the customer experience steady. The move to containers and a more streamlined CI/CD workflow also improved our deployment cadence by making it easier to build, test, and deploy changes across the platform. In 2021, we deployed 285 changes, features, and fixes to production over the course of the entire year. Today, we deploy that many every few days. Cost has improved, too. Since 2024, when the bulk of our migration to containerized services took place, we’ve reduced our average cost per user by about 39 percent, even as the product has grown more complex and we’ve added more capabilities for customers. We achieved that through a combination of containerized architecture, AKS autoscaling, and expanded reservations across compute and storage technologies. Choosing the right database for the right kind of data We started with a strong Microsoft SQL Server foundation, and Azure SQL Database remains core to our platform today. It stores critical customer configuration data and continues to give us the reliability, replication, resizing flexibility, and regional scale we need. But not every workload belongs in the same database. Customer configuration, relational service data, key-value storage, usage events, and business intelligence (BI) all have different access patterns. That principle led us to Azure Database for PostgreSQL flexible server for one of our most important migrations. We had used Azure Table storage for a core service that needed to retrieve customer data quickly. It was cost-effective and stable for a long time, but as the product evolved, the data became more relational, and we found ourselves adding complexity in application code that a relational database could handle more naturally. Azure Database for PostgreSQL gave us that relational model with low management overhead, fast read replicas, reserved instances for predictable workloads, and a path to future scale. After the migration, average request time for a critical service dropped from 18.6 milliseconds to 1.79 milliseconds. That’s a 90 percent improvement across a service that handles around 9 billion requests each month. Azure Cosmos DB plays a different role, supporting key-value and document storage where we need scale, availability, low latency, encryption at rest, and straightforward dev/test support. Optimized for unstructured data and high-performance reads and writes, it gives us a highly scalable foundation for workloads that don't fit a traditional relational model. We use it to store customer assets for signatures and video branding, high-volume metadata for internal message-processing operations, audit events that help customers track account changes, and tokens used to collect data from third-party systems on behalf of customers. It also gives us a clean way to keep data and services aligned. Azure Data Explorer solved another scaling challenge: usage and billing data. We need to be able to audit the number of messages we process for our customers so we can bill accurately, and at more than 20 billion emails a year, our previous SQL-based usage pipeline became difficult to manage. With Azure Data Explorer, we can ingest massive volumes of event data at low storage cost, connect to Azure Event Hubs, and avoid maintaining custom plumbing. That move reduced the cost of the system by around 70 percent. Azure Databricks rounds out the picture as our BI and data platform, giving our teams a shared foundation for transformations, analysis, and reporting across product and business data. Keeping every region ready for business Our customers are everywhere, so our platform has to be, too. Exclaimer runs in seven distinct geographic locations: Australia, Canada, Europe, Germany, the United Arab Emirates, the United Kingdom, and the United States. That global footprint helps us meet customer expectations around availability and data residency. Many organizations want their data to stay in-region, and Azure gives us the coverage we need to support that. Availability is especially important because our platform is part of a live communication flow. When someone sends an email, they expect it to keep moving. Our Azure architecture helps us support that expectation across the stack. AKS lets compute scale with regional demand. Azure SQL and Azure Database for PostgreSQL support critical relational workloads. Azure Cosmos DB gives us scalable, low-latency storage for document and key-value patterns. Azure Data Explorer handles very high-volume usage ingestion without the complexity of our former custom pipeline. Across the board, these managed Azure services reduce the amount of operational work our engineers have to carry. We can spend less time maintaining the basics and more time tuning performance, improving stability, and building the capabilities our customers need next. Building for the future on a stronger foundation The biggest sign that our architecture is working may be how little we have to reinvent when we build something new. As we develop upcoming product capabilities, we already have many of the foundational pieces in place: AKS for compute, Azure Cosmos DB for state, and Azure Service Bus for messaging. We also have Azure SQL for core data, Azure Database for PostgreSQL where relational service data needs room to scale, Azure Data Explorer for high-volume event analysis, and Azure Databricks for BI tooling. Together, these services make our platform faster, more efficient, and more resilient. Email signatures may look simple on the surface. Behind every one, there’s a set of decisions about performance, scale, data, availability, and trust. With Azure, we’ve built an architecture that helps us keep every signature moving, wherever our customers do business. About the authors Phil Vetter started in engineering at Exclaimer as a developer at the start of 2013, and now sits at the helm as VP of Engineering. Lee Jones started at Exclaimer in 2013 in the IT department, and now serves as Director of Platform Engineering, managing the infrastructure and resilience of Exclaimer Cloud.146Views0likes0Comments