First published on TechNet on Jun 28, 2017
In this post, we share details on service-impacting incident response and service health for Intune. While we always want the service to be 100% available, there are infrequent times when we experience a service incident. The goal of this blog post is to demystify the experience. Below, we walk you through:
Who’s working on an incident?
First, it’s important to describe who you may interact with or who behind the scenes is working towards resolution:
How we detect a service-impacting incident
There are three main channels by which we detect a service impacting incident:
Scenario 1: Intune detects a service outage
Intune has hundreds of monitors that track customer-facing components and back end service health and responsiveness. Every hour of every day we respond to alerts in a timeline based on their severity. Something like a certificate expiring in 90 days will have a ticket with less severity than a spike in login failures. If the alert is categorized as a Severity 0 - 2, our incident system immediately opens a ticket and auto-calls the 24x7 software engineers to review and respond to the alert.
Scenario 2: A customer reports an issue
A customer may see an issue and contact support to troubleshoot. The support agent will gather some baseline information. This helps us determine whether it’s something in the configuration of your environment, another service, or something in our environment. The support agent may work directly with you to close the ticket, or they may also escalate up the support tier levels.
Scenario 3: A supporting service has an outage
Intune is not a standalone service. For example, Intune’s company portal app can be downloaded from the Google Play and iTunes stores. If one of those stores has an outage, then our customers' ability to download the company portal app, in this example, could be impacted. The Intune software engineers and Microsoft have external relationships and support paths set up with each of the service-supporting companies and work with our contacts/processes when there’s a service-impacting incident. Internally at Microsoft, we have cross-team incident alerting built into the response process. Rarely, you may see a reference to another service’s outage posted on our service health dashboard. In this scenario, our team is working behind the scenes to see if we can take a service change to minimize the service outage. For example, if it’s a regional outage of a supporting service we may choose to offload to another region.
Where do you go to check service-impacting incidents?
You have two options - first off, look at the Tenant Status blade in https://portal.azure.com. There you'll see service health posts from the past 30 days. Other option is to head to the Office 365 Admin Console and select health to look at service health. Both sites refer to the same service data; preference is based on where you administer the service.
There you can see your tenant’s health across services you own, which could include Intune, Office 365, and CRM. Note that Intune used to have our own Service Health standalone page, but we merged with Office’s several years ago since we heard from many of you that you wanted your service health across Microsoft IT Pro services in one location.
There are multiple roles that will provide access to the Service Health Dashboard – you don’t have to assign everyone a global admin role. The roles that have access are:
At the writing of this post, Intune is healthy. If it wasn’t healthy, you’d see a different picture than the one below as described by the article here .
With the Intune service, our goal is to post within one hour of determining what customers are impacted. The impact could be limited to a specific scale unit (where your account resides), a region, or customers using a specific feature set. However, there are a few scenarios where we don’t feel it’s appropriate to post:
NOTE: There’s Incidents and Incident Advisories. An Incident is reserved for Sev0 incidents which are extremely rare. All other incidents are categorized as Incident Advisories and show up on the Advisories tab. From Intune’s standpoint, these are both incidents and you will see an explanation point on the service health dashboard (or another indicator) sharing that something’s up.
Where we post on service changes
To stay informed about the Intune service changes, again head to either the Tenant Status page in https://portal.azure.com or the the Office 365 Admin Console and login with your Intune admin credentials. For Tenant Status blade, you'll see the messages when you land on that blade. For the Office 365 Admin experience, select message center on the landing page, or on the left-hand navigation, click on health-> Message Center. There you’ll find messages about new features, planned changes, and planned maintenance with downtime expected.
In looking at the test tenant information screen shot below, there’s a few things to call out:
NOTE – You can sign up to just see or get emailed Microsoft Intune announcements. Use the edit message center preferences and select which services you follow or would like to receive weekly emailed digest summaries.
By design, bugs, and other service non-incidents
Finally, there are times that we’re not going to post because something has been released in a way that’s not an incident, but rather it’s by design. In addition, sometimes we don’t post an incident because it’s actually a software bug that will be resolved either out of band or with the next build. If you feel you are impacted by an incident but there is not an incident posted to your service health dashboard, please contact support and they will assist you.
Other ways to get service notices
You have two additional options for accessing service health outside of the O365 console:
Hopefully this post helps demystify service health!
4/1/2019 - updated links and added email information
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.