Azure Architecture Walkthrough: Building a multi-tenant Azure Architecture for a B2C scenario

MVP

Apr 05, 2020

Hi,

Recently, I had a talk about a Mobile App that I have developed for personal needs. I initially planned to replay this talk but given the lockdown precaution measures, I thought it'd be better to make a blog post out of it. The idea is to walk you through the architecture behind this multi-tenant mobile app and explain the rationale behind every single choice.

The Use Case

As a wine lover, it was about time for me to build something new to manage my cellar. There are plenty of apps available on Google Play but I wanted to add my own bits (and wine skills) into the app to serve me exactly as I wanted. I built the app for myself and then, some friends asked me to get it too but...it was not multi-tenant...so I decided to make it multi-tenant, and although it is a hobby project, I wanted to build it in a professional way. That said, should you be excited by this blog post, don't try to look for the app as I haven't published it on Google Play as I don't want to pay for the Azure costs should many folks install it, so right now I just let my friends download the app 🙂.

The features

I wanted an app that:

Shows a dashboard with metrics such as the top countries, top appellations, number of wines, average consumption over 1 month (important to see if you don't drink too much :)), average price, etc.
Sends a daily reminder about the next wine to drink. This is not a random selection but really based on wine age, region, color etc. to determine the best moment to drink it, that's the personal bits I wanted to add.
Allows an easy capture of a new wine by making use of an OCR to extract wine information from the label. Indeed, I don't want to spend 10 minutes whenever I add a new wine to the cellar.
Has a powerful search interface.

The Architecture

My goal is to make sure that the shared backend remains healthy and secure. Any attempt from a malicious user to hack my backend should result in having that user just messing up with his own tenant while not impacting others. With that in mind, I came up with the following architecture:

The rationale

At first sight, it may look simple but it is a little more complicated than it seems. Let me explain all the numbered bullet points; the implementation details will follow:

The choice of Azure AD B2C is a no brainer in this context, since my target audience is lambda people. I currently only use local B2C users but could add social identities as supported IDPs. Azure AD B2C will also help me to secure my APIs with OpenID Connect (OIDC).
The mobile app itself, written in Xamarin Android and leveraging MSAL for authenticating users.
Front Door with WAF enabled. I chose Front Door because it is serverless and fully elastic and because it has a built-in WAF as well. Here I took the default ruleset protecting against OWASP top 10. Why not Azure Application Gateway? It would have also been a good choice but Front Door is cheaper (given my traffic) and since my backend pool is my APIM instance with a Public VIP, Front Door does the job pretty well.
All my backend services are frontended by facade APIs and the API gateway acts as a Policy Enforcement Point (PEP). Different policies such as Throttling, JWT token validation, ... are enforced by the gateway. All custom built services have a network ACL restricted to the APIM gateway VIP and are double-checking sever-side the OIDC bits. Network ACLs are the reason why I do not use the Serverless offering of APIM since it does not come with a static VIP nor even a dedicated ServiceTag for the time being. My APIM instance is restricted to my Frontdoor Service IPs using a custom policy. By the way, it is quite tedious since Service Tags are not incorporated in APIM...
Sensitive information is entirely stored in Azure Key Vault.
The Wine backend service is the main service consumed by the mobile app through its corresponding facade API. It is protected using OIDC and built in .NET Core and EF Core. It is based on the App Service building block. Managed Service Identity (MSI) is enabled to let the service grab the Cosmos DB keys.
I am using Cosmos DB, not for fun but because in this particular use case, it is the perfect choice. More on this in the next section. I restrict the Cosmos DB to a subnet that is integrated with my App Service by leveraging Service Endpoints and Subnet Delegation. That way, I know that only my Wine backend service can talk to my Cosmos DB from a network perspective. I didn't want to use AKS nor an ASE because this would have been overkill in this context with a serious impact on costs, so that's why I just used this trick by having an empty VNET with subnet delegation enabled. Private link is not available through VNET delegation.
I am using a Storage Account with a container per tenant. Each container stores the pictures of its corresponding tenant. The Mobile App uploads blobs directly to the target Storage Account to make it scalable as I don't want to introduce a man in the middle that could be a SPOF. The Blob Storage SDK also comes with awesome features that are not so easy to deal with when introducing a mediation API. Regarding the security aspects, whenever an upload takes place, the App requests a short-lived (5 min) SAS token to the Wine API, scoped to its corresponding container with Append permission only. That way, anyone monitoring the network traffic would only see his own SAS and could only mess up with his own tenant. On top of it, the API operation returning the short-lived SAS token is throttled on a per tenant basis to prevent any abuse.
The OCR service is Azure's Computer Vision cognitive service. I put it behind a facade API in order to let the APIM gateway inject the shared API key. Since it is a shared (cross-tenant) key, I want to make sure not to disclose it to the mobile device. This key is itself stored in Azure Key Vault and retrieved dynamically by the gateway through a policy that leverages MSI, set at APIM instance level with an Access Policy defined in Key Vault.
The subscription service serves a specific purpose which I'll detail in the next section, but in a nutshell, it allows the mobile app to retrieve its own tenant specific API keys by subscribing it to the APIM instance on the fly. So, the only think a mobile device ever sees, is always tenant-specific.
Some basic health check and monitoring is in place. I'm mostly using App Insights + Availability tests.
A Function App with two functions is used to send the daily reminder through Push Notifications. A more advanced health check is also performed through a scheduled function.
Push notifications are made through App Center's Push service. Unfortunately, Microsoft will be retiring this service anytime soon...The alternative is to use Azure Notification Hub.

So, the key aspects here are: I used the network here and there to maximize the security but I mostly rely on identity & MSI. I'm using Microsoft managed keys for encryption since data is certainly not "classified" information . The only sensitive information disclosed to the mobile device is tenant (user in this case)-specific. The Design is not Disaster Recovery ready but who cares for such an app.

The implementation details

Now that we have seen the global architecture, let's zoom into some implementation details. I will only highlight the most important parts. The first and important step in terms of security is about the user registration whose the flow is as follows:

The important bits here are about the retrieval of the tenant-specific API subscription keys. In my particular case, each user is a tenant and is the wine cellar owner. Therefore, upon registration, the system creates an API subscription on the fly and returns one of the generated subscription keys to the device, which stores it locally as long as the key is valid. The backend subscription service (item 10 in the architecture diagram) is protected by a facade API enforcing a JWT validation against our B2C directory.

Here is an example of such a JWT token, requested by the mobile app, to access any of my APIs on behalf of the logged in user:

The token must contain the managecellar scope and of course be issued by my B2C directory with the valid audience (aka client app). Any request to the subscription service containing such a valid access token is forwarded to the subscription service. A throttling limit of 10 requests/minute/user is set to avoid abuse of the subscription service:

<rate-limit-by-key calls="10" 
            renewal-period="60" 
            counter-key="@(context.Request.Headers.GetValueOrDefault("Authorization","").AsJwt()?.Subject)" />

On the subject claim, highlighted in the above picture. So, here again, anyone monitoring the traffic would see his own access token but could only throttle himself should he try to play it the dirty way, while nor my backend service, nor other users will be impacted.

More globally, the facade APIs are doing a lot of the heavy lifting and are organised this way:

An API product applying multiple policies such as IP Filtering on the Front Door service and JWT token validation since none of my APIs can only be accessed using the subscription key only. The product only contains the Wine & OCR APIs. Indeed, my subscription service's facade API is subscription-free since its primary purpose is to return the tenant-specific subscription keys. Therefore, I cannot reuse my product policies and have to replay them locally. The OCR facade API is first extracting the shared Cognitive Service API key from Key Vault and forwards the request to the Cognitive Service.

Last but not least: Cosmos DB! Why did I choose Cosmos DB? I see many folks rushing to NoSQL and quickly facing issues because of poor up-front engineering. In this particular use case, Cosmos is perfect for the following reasons:

One of the rule of thumb of Cosmos DB is to evenly distribute the load. This means that each logical partition should roughly be about the same size
Another rule of thumb is to minimize cross-partition queries to avoid performance issues. So, the partition key should not only be a technical thing but should also ideally have a functional interest.
A hard limit of 20GB (for now) maximum partition size is applied to logical partitions. Therefore, one need to anticipate capacity on that level.

My scenario is perfectly in line with those three rules of thumb. I am using the Tenant (aka subject in my access token) as partition key. I'm okay with rule 1 because I can hardly imagine to have a user with a cellar of 1 000 000 bottles and another one with only 1 bottle. So, globally, the load will be evenly distributed across logical partitions.

I'm okay with Rule 2 because I'll include my partition key in every single query since I want to make sure that users can only see their own wines. At last, since I'm storing the pictures in a Storage Account, I can hardly imagine that a single cellar would need more than 20GB.... given my wine document is just about text metadata.

I think this kind of approach is suitable for other B2C scenarios; that's why I wanted to share it!

Updated Apr 21, 2020

Version 8.0

azure

serverless

software architecture

stephaneeyskens

MVP

Joined February 02, 2019

View Profile

Microsoft Developer Community Blog

Follow this blog board to get notified when there's new activity