Hey everyone, and welcome to this first post on a topic that we will be talking a lot more about over time!
Microsoft 365 is one of the world’s largest enterprise and consumer cloud services, and customer trust is the foundation of our business: customers and people all around the world rely on us to securely operate and maintain some of their most critical assets. To maintain that trust, we invest heavily in securing the infrastructure that powers our services and hosts this data on behalf of our customers – keeping customer data private and secure is THE top priority for our business. This post, and the other ones we’ll share in this series, will shed light on what we do behind the scenes to secure the infrastructure powering the Microsoft 365 service.
As we think about how to secure our infrastructure, we recognize that the service continues to grow and evolve, both in terms of our user base and in terms of the products and experiences we provide to our customers, and so we must constantly work to stay on top of an ever-increasing surface area. Meanwhile, bad actors are not sitting still, either. Attacker groups seeking to exploit enterprise and consumer data continue to evolve, and customers looking to secure their most sensitive data are going up against the most sophisticated and well-funded adversarial organizations in the world, including nation state attackers with seemingly limitless resources.
To secure the service for our customers given these challenges, we focus on these three areas:
In the rest of this post we will briefly explore each of these areas, or if you’d like to go deep, you can check out the full whitepaper here.
Before getting into each of these areas, we wanted to touch on some of the major principles that guide our approach to service security. Here are some of the concepts that form the foundation of what we do to secure service infrastructure:
The next sections will look into how we put these principles into practice to protect the service, mitigate risk if compromise does occur, and validate our security posture to make sure all of this works.
Our favorite attack is the one that never gets started because we prevented it from happening in the first place. Broadly speaking, protecting the service from attack focuses on two vectors: people (making sure that the Microsoft employees who build and manage the service cannot compromise or damage it), and the technical infrastructure of the service itself (making sure that the machinery running the service has integrated defenses and is architected and configured in a most-secure default configuration).
When it comes to securing the infrastructure from internal operators, our motto here is Zero Standing Access (ZSA). This means that, by default, the teams and personnel charged with developing, maintaining, and repairing core Microsoft 365 services have no elevated access to the service infrastructure, and any elevated privileges must be authorized as shown in the flow below.
Illustration of the Lockbox JIT request process. No account has standing administrative rights in the service. Just in time (JIT) accounts are provisioned with just enough access (JEA) to perform the action that is needed
It is important to keep in mind that even with the approved elevated privileges, a specific restrictive account is provisioned just for that activity. This account is bound by time, scope and approved actions. Ultimately, this is all about making sure that the blast radius for a single account is minimized: even if an internal operator’s account is compromised, it is by design prevented from doing any damage unless additional steps are taken.
Our protections go beyond restricting the blast radius of accounts. Network controls restrict the types of connections that can be made into our services, we also restrict the types of connections permitted between service partitions. This reduces the surface area for attackers to target for initial entry, and it also makes it harder for attackers to move around the service to find what they’re looking for.
The assume breach model goes beyond designing architectural protections and access control policies: it means that no matter how effective those protections are, we cannot trust that they will always hold. We must assume a non-zero probability of successful attack, no matter how confident we are in our defenses. We need to have the ability to detect and mitigate these attacks against the service infrastructure before they result in a compromise of customer data.
Our work in this space spans security monitoring and incident response:
Incident response is cloud-powered and service-aware. It can be triggered autonomously for basic actions, or manually for more complex scenarios. Remediation can take effect on a small number of machines, or across a service partition if necessary
As the diagram illustrates, automation and scale are priorities for us in this area. For us to catch and stop attacks against a service the size of Microsoft 365, our systems need to be intelligent enough to proactively and accurately alert us to potential issues, and we need the ability to respond quickly and at scale. Anything less simply won’t do given the scale of the service.
Our assume breach principle is all about planning for the worst – given how seriously we take this philosophy, we would be remiss if we did not have a plan for mitigating potential gaps in our security posture. Indeed, we validate our security posture regularly, automatically, and through cloud-based tools (we hope that you notice a trend here).
We have two primary forms of validation:
Both forms of validation run directly against the service infrastructure, and they do so continuously. If any regression in security posture does occur, we want to learn about it as quickly as possible so that we can repair it before it gets exploited by attackers.
Securing the infrastructure of one of the world’s largest cloud services requires us to stay ahead of attackers while also keeping up with constantly increasing service scale and complexity. Maintaining customer trust in Microsoft 365 requires us to design our services to a robust set of core security principles and to make sure those principles are embedded deeply into service design and operations.
We have written a whitepaper that looks deeper into what this means, and we will expand on this and other security topics critical to our business in future papers. We hope you find this interesting and informative and look forward to hearing any feedback.
Thank you
@Adam Hall on behalf of the entire Datacenter Security team
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.