Recently, Microsoft has revealed that the Microsoft 365 worldwide outage was triggered by a faulty Enterprise Configuration Service (ECS) deployment, leading to availability impact across multiple regions.
The Microsoft Teams outage ended up expanding downstream to multiple Microsoft 365 services with Teams integration that also leverage ECS, including Exchange Online, Windows 365, and Office Online.
ECS is an internal central configuration repository designed to enable Microsoft services to make wide-scope dynamic changes across multiple services and features, as well as targeted ones such as specific configurations per tenant or user.
Microsoft said, “A deployment in the ECS service contained a code defect that affected backward compatibility with services that leverage ECS. The net result was that for services that utilize ECS it would return incorrect configurations to all its partners.”
The company explained in its preliminary report, “This issue affected the users' ability to connect to the Microsoft Teams Desktop, Web and Mobile clients. Telemetry indicated that approximately 300k calls were impacted by this event. The Asia Pacific (APAC) region was most affected due to business hours coinciding with the impact window. Additionally, Direct Routing and Skype MFA were mostly impacted services.”
The company said it is working on improving the resiliency of the Microsoft Teams service to fail back to a cached ECS configuration version in the event of a future ECS failure. It is also investing in additional fault isolation to limit the impact of an ECS failure and updating monitoring thresholds to identify such low-grade failures better.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.