Blog Post

Microsoft Teams Blog
5 MIN READ

Advance performance approaches to deliver high quality experiences in Microsoft Teams

Mark_Longton's avatar
Mark_Longton
Icon for Microsoft rankMicrosoft
Jan 28, 2022

In light of rapidly evolving customer expectations and massive growth in demand for the digital communications and collaboration services Microsoft Teams offers, we've learned a lot over the past two years about what it takes to deliver a high-performance experience with Teams. In fact, we’ve built out our product roadmap and have already made several enhancements designed to ensure Teams delivers the best performance possible.


The best performance entails ensuring efficient use of device resources, such as CPU and memory, the elasticity to provide a quality experience across a range of device types, network speeds and variability in connectivity. In this blog, we’re providing an overview of our approach to enhancing Teams’ performance, including for fundamentals, and sharing with you our continuous improvements to the performance and user experience with Teams. Here are some of the ways we’re doing that:

Figure 1 Performance Pillars of Investment

 

Setting performance goals

We have aggressive, centrally documented targets for application responsiveness, latency, memory, power, and disk footprint. We tend to focus on edge case metrics that may be a result of very low-end devices as well as the average experience that users experience, as measured on the user’s device, to best represent the user experience across different scenarios. Those target goals are then applied to the development of features. Factors that influence these scenarios and targets include listening to customer and partner feedback, the frequency with which the scenario occurs, and the expected impact that changes will have on our users.


Analyzing performance with advanced reporting and insights

To help us intensively analyze the performance metrics, extensive instrumentation is added to the clients. Metrics are captured first in a lab and then gradually with a small set of users as we deploy to validate our changes. Once the metrics meet their targets, we expand to more users. Dashboards with different pivots are in place to provide visualizations to observe trends, identify improvement opportunities, and validate the impact of changes.


Performance improvements are typically rolled out to just a portion of the user base first, allowing us to experiment and compare metrics against the existing code used by other users. Below is an example chart generated from an A/B test as we moved data processing to a separate thread. This experiment provided real-world data to validate impacts and provide justification that we should roll out the change to the full user base.

Figure 2 Example A/B testing experiment

 

Focusing on performance while innovating

As Microsoft Teams developers introduce features, we’re closely focused on avoiding any regression of our core metrics. A combination of techniques is applied, including the use of reporting, experimentation, and gates. The gates are a set of automated tests that measure performance characteristics in a controlled environment. The performance gates cover latency, application responsiveness, memory, and disk footprint. If a threshold is breached, a live-site incident is created and assigned to the relevant team to mitigate before team members are allowed to advance the feature to the next ring.


Below is an example dashboard showing a gate for the memory that the search component consumes when Microsoft Teams is launched. Each dot represents a new change that a developer has checked in to the code base, triggering the automated test to run and collect the metric. You can see that a change made on Nov. 11 reduced the memory at startup from about 31 KB to around 30 KB. In other examples, we will see changes have a negative impact on the metric, which means we would block the specific features from progressing until a fix is in place. Gates like this are configured for a variety of scenarios, and we continue to expand on the coverage we have.

 

Creating tools to benefit Teams and other products

The performance team also invests in a collection of artifacts (dashboards, debugging and monitoring tools) that enables the organization to identify strategic fixes to meet our goals quickly in a scalable and repeatable manner. One of the tools was designed to find and fix memory leaks in Teams and display them in a simple visualization. This tool was shared with the Microsoft Edge team, which added it to the Microsoft Edge DevTools under the name of Detached Elements. While our charter is focused on Microsoft Teams improvements, it is rewarding when our efforts can benefit other developers around the world.

Figure 3 Detached Elements tool to find memory leaks

 

Investing in strategic improvements

A massive amount of our work is focused on including targeted fixes with incremental gains, as well as more structural architecture changes that are intended to achieve a step function improvement in the metrics. Prioritization of these investments factors in the expected impact, the occurrence of the scenario, the confidence of the fix, and the effort to achieve targeted results. When it comes to major architectural changes, we often start with a proof of concept to validate the hypothesis. Our intent is to have continuous improvements landing every quarter and to invest in going beyond technical constraints of the existing architecture.


A recent strategic investment by our team aimed to make the initial load of the compose box faster to speed up the time it takes for a user to compose a message. Analysis discovered that the client was loading non-visible components along with everything else, and it was making unnecessary renders. The fix was applied by prioritizing the compose box to be interactive before loading all other components, hooks, and extensions. This relatively small optimization achieved a significant impact to this metric (see chart below) and highlights the need for us to seek out both short-term and long-term improvements.

Figure 4 A/B test showing performance improvement of compose time to interactive

 

Advancing a performance-first culture

As the saying goes, “Culture eats strategy for breakfast,” and this certainly applies to ensuring the Microsoft Teams team is prioritizing performance. Our culture is possibly the most impactful tool to ensure our organization of designers, product managers, and engineers is building and executing with a performance-first mindset across all our features and commitments.

 

Shipping a feature that meets a functional need and creates a great user experience is not good enough. It also must meet the bar for the fundamental promises we make across performance, reliability, security, privacy, compliance, accessibility, manageability, scale, and operational efficiency. Driven by dedicated people on the performance team, shared priorities in listening to customers, planning, communications, and training, we are committed to addressing the needs and expectations of our users.


We are happy to hear more from you in this domain to learn and continue to improve, and we intend to publish additional blogs for key topics called out in this overview blog. Please share feedback and upvote your performance requests in the Teams Feedback portal.

Updated Jun 02, 2022
Version 2.0
  • Paul Cook's avatar
    Paul Cook
    Brass Contributor

    Having baselines for 'Work From Home' environments, where building cooling systems aren't available may be of benefit to your testing scenarios.

  • Petri-X's avatar
    Petri-X
    Bronze Contributor

    Mark_Longton 

    It is great to hear such a stories, peace by peace you could improve the Teams. Have you though about to put these on the road map? As currently, my Teams eats 1,1 GB RAM, when it does nothing, just sitting on background. So should I expect that I do not have the improvements applied to my version 1.4.00.35564, or are these coming. Obviously the numbers change when I activate that, and e.g. CPU load gets surprising high when I just typing the message to my colleague (not sent). When having Teams on VDI platform, all improvements matters.

     

    Reason to ask making these more visible (roadmap) is, as many of us on these forums are saying, it is super hard to us to put any expectations to Teams quality when we do not have knowledge when changes has been applied to us. The changes like these should be easy to announce: "In version 1.4.xxxxx memory optimization has been applied."

     

    But one one thing you really should focus is: "dirty data in cache". It is very hard to understand that you are not able to detect when Teams' cache (or part of it) has become corrupted/unavailable. Just too often the solution is "clear the cache", no matter what kind troubles you do have on Teams, that is the solution.

     

    As a solution different companies has build own scripts for users to launch to clear the cache, or others has asked to get a button [Clear Cache] into Teams. I still believe (and hope) that you could build a Cache Health Check into Teams, which knows when cache can be wiped and fresh data need to be downloaded. Isn't that the correct way to fix it? Don't propose feedback site, I have only two votes to that 😄

     

  • Paul Cook   Thank you for the feedback.  Yes hybrid workplace and work from home is very top of mind.  This has increased the demand for us to support a wider variety of network conditions and in some cases higher packet loss.   We can simulate some of this in our labs and we generally look at the 95th percentile for perf metrics to factor in network impact and people who may be on lower end devices.

     

    Petri-X   Thank you for the suggestions.  We do intend to add more transparency for improvements to performance and this is really the first blog to start that dialog.  Memory, especially for low end devices (4GB memory), is an area we would like to improve, and the cache is a part of this analysis.  Signing out and signing back in will clear the cache today but I understand that is not obvious to everyone and there is opportunity to be more elegant.   Your comments remind me of the earlier days of Windows when we all would run the defrag process to organize storage and then over time Windows handled this for us in a much more graceful way.   I would very much like to have Teams self-heal if the cache ever becomes corrupted or bloated.

  • Petri-X's avatar
    Petri-X
    Bronze Contributor

    Hi Mark_Longton 

    "Your comments remind me of the earlier days of Windows when we all would run the defrag process to organize storage and then over time Windows handled this for us in a much more graceful way. "

     

    That was fun ! :cryingwithlaughter::cryingwithlaughter:

    Indeed! That was excellent example of how I see this. If I listen carefully, it sounds a bit like there could be some hope on this 🙂

     

    It is also quite interesting to hear that sign-out should be enough to be clearing the cache from Teams. That might have been helpful in some case, but there has still been requirements to manually clear the cache. Here is one of those examples how to do it: Clearing the Cache for Microsoft Teams which I really would like to avoid. But that start with "Quit Teams" and not "sign-out", so can the sign-out really do the same?

     

    Any way, self healing Teams would the best option.