Jan 28 2019 07:11 AM
Office 365 is highly resilient but it’s not impervious to incidents and I am wondering if these are being handled as well as they could be?
There has been a recent incident EX172491 with affected users unable to access their mailboxes, which has had a significant impact and has attracted media interest. There has been confusion on the length of this outage, who was affected, why couldn’t people get more information on this incident, why had it shown up in their tenant and then disappeared and why wasn’t more information forthcoming in a timely fashion.
Here are a few suggestions, easier said than done I know but I think they're important enough to mention:
For me, I was trying to track EX172491 but it wasn’t appearing in my tenant, yet I was trying to help support the community, as I was seeing many questions about this incident and not being able to help. A few weeks ago another incident I was tracking, was in my tenant TM172119 but it later disappeared and it didn’t appear again, even in the history section either, nor was a resolution to the original issue acknowledged on the Twitter status account.
Do people agree or is it working well enough as it is? Anything to add to the list above? Thanks!
Jan 28 2019 07:20 AM
Jan 28 2019 09:59 AM
This has been discussed repeatedly, but no matter how many times different Microsoft representatives assure us they've heard the feedback, improvements are nowhere to be found. Highly doubt anyone will even read it here, given the number of MS folks we have in this space.
You can have my +1 :)
Jan 28 2019 10:20 AM
Thanks Chris that is helpful, live updates would be nice. Someone asked me today why can't the service health information on incidents/advisories be pushed to them, they mentioned email alerts, rather than logging onto the portal. The Office 365 admin mobile app, has some notifications but I don't know how well those work and if the issue isn't listed in the tenant or vanishes, that's not much use anyway.
Good points also about missing services, I think that should be one of the criteria for GA, that certain standards kick in, like release notes, support information, documentation and service description, service limits etc. are clear from the get-go.
Jan 28 2019 10:25 AM
Jan 28 2019 10:30 AM
Thanks for the input, I just thought this incident was another reminder of how things don't always work and the disruption this causes for customers is larger than it needs to be, just because of a lack of information.
Jan 28 2019 10:36 AM
Jan 28 2019 10:44 AM
That's true, also it's not to undervalue the fantastic work that goes on behind the scenes with the support that's provided but a little bit more responsiveness I think would go a long way.
Feb 20 2019 10:48 AM
Yeah... this just in:
Updated Features | Current Status | Update Type | ||
---|---|---|---|---|
Service Health Dashboard Update: Report an Outage (preview) | Cancelled | Status | ||
Service Health Dashboard Update: User level details | Cancelled | Status | ||
Service Health Dashboard Update: Support for multi-geo tenants | Cancelled | Status | ||
Service Health Dashboard: Personalized Tenant Resolution | Cancelled | Status |
Feb 10 2020 04:05 AM
The 'Service health dashboard email notifications' is a good step forward, get email notifications of new incidents and advisories affecting your tenant as well as any status change for an active incident or advisory.
Every admin should have this configured in their tenant pointing to a shared mailbox ideally for better visibility, at least for incidents, which have a more significant impact:
If you don't see it in your tenant yet, it should be there by the end of March according to a Message center post about GA - MC196504.
Overall there seems more investment in the Service Health experience, with the improved interface and options like Report an issue, which is a nice feature to have: