Better visibility of Office 365 outages and downtime

Silver Contributor

Office 365 is highly resilient but it’s not impervious to incidents and I am wondering if these are being handled as well as they could be?

 

There has been a recent incident EX172491 with affected users unable to access their mailboxes, which has had a significant impact and has attracted media interest. There has been confusion on the length of this outage, who was affected, why couldn’t people get more information on this incident, why had it shown up in their tenant and then disappeared and why wasn’t more information forthcoming in a timely fashion.

 

Office 365 down.png


Here are a few suggestions, easier said than done I know but I think they're important enough to mention:

 

  • Let us track service health incidents/advisories even if our tenant is seemingly not affected
  • Post publicly more frequent updates, for the more pressing matters with some expectations   even if it's difficult to do so, always indicate affected regions taking out the guesswork
  • Always post when an issue is resolved
  • Issues shouldn’t disappear from a tenant and not be trackable
  • Screenshots never hurt, show the issue at hand, if pertinent along with the error messages so people can easily tell if it’s the same issue affecting them

For me, I was trying to track EX172491 but it wasn’t appearing in my tenant, yet I was trying to help support the community, as I was seeing many questions about this incident and not being able to help. A few weeks ago another incident I was tracking, was in my tenant TM172119 but it later disappeared and it didn’t appear again, even in the history section either, nor was a resolution to the original issue acknowledged on the Twitter status account.


Do people agree or is it working well enough as it is? Anything to add to the list above? Thanks!

 

9 Replies
Hi Cian,

Agree with all your suggestions.

I would love to see a dedicated mobile app for the Service Status and live updates. This would include the resolution you mentioned. This also triggers notifications. This shouldn't just be for Office 365 admins.

I would also love to see all the applications in the Service Status. Stream and Forms are two which are not in the service status at all and have periodic issues. I have had to personally reach out to the Teams through the TC for community members to get a response on the issue!

Hope that helps Cian,

Best, Chris

This has been discussed repeatedly, but no matter how many times different Microsoft representatives assure us they've heard the feedback, improvements are nowhere to be found. Highly doubt anyone will even read it here, given the number of MS folks we have in this space.

 

You can have my +1 :)

Thanks Chris that is helpful, live updates would be nice. Someone asked me today why can't the service health information on incidents/advisories be pushed to them, they mentioned email alerts, rather than logging onto the portal. The Office 365 admin mobile app, has some notifications but I don't know how well those work and if the issue isn't listed in the tenant or vanishes, that's not much use anyway.

Good points also about missing services, I think that should be one of the criteria for GA, that certain standards kick in, like release notes, support information, documentation and service description, service limits etc. are clear from the get-go.

Agree on above discussions! With recent downtime’s , abruptions, this is in need for improvement

Adam

Thanks for the input, I just thought this incident was another reminder of how things don't always work and the disruption this causes for customers is larger than it needs to be, just because of a lack of information. 

Of course it’s reasonable for Microsoft to get tickets about disruptions before they know level of impact and so on, but most times you have to do reasearch to find info and quite some time later it will be published officially and on your tenant etc..
Agree that it should be a more efficient way that is somewhere in between at least

That's true, also it's not to undervalue the fantastic work that goes on behind the scenes with the support that's provided but a little bit more responsiveness I think would go a long way.  

Yeah... this just in:

 

Updated Features   Current Status   Update Type
Service Health Dashboard Update: Report an Outage (preview)   Cancelled   Status
Service Health Dashboard Update: User level details   Cancelled   Status
Service Health Dashboard Update: Support for multi-geo tenants   Cancelled   Status
Service Health Dashboard: Personalized Tenant Resolution   Cancelled   Status

The 'Service health dashboard email notifications' is a good step forward, get email notifications of new incidents and advisories affecting your tenant as well as any status change for an active incident or advisory. 

 

Every admin should have this configured in their tenant pointing to a shared mailbox ideally for better visibility, at least for incidents, which have a more significant impact:

 

O365 Incidents and advisories.png

 

If you don't see it in your tenant yet, it should be there by the end of March according to a Message center post about GA - MC196504.

 

service health email.png

 

Overall there seems more investment in the Service Health experience, with the improved interface and options like Report an issue, which is a nice feature to have:

 

O365 report an issue.png