Using machine learning to improve the Windows 10 update experience
Published Sep 26 2019 01:30 PM 40.6K Views
Microsoft

This post was co-authored by @archana_ramesh (Senior Data Scientist, Microsoft Cloud and AI) and @michael_stephenson (Partner Data Scientist, Microsoft Cloud and AI).

Regular Microsoft updates to your Windows 10 PC help ensure that it’s kept secure from possible threats and empowered with the latest features for peak performance and productivity. Because of the wonderful diversity of hardware, devices and applications available to Windows customers, each PC’s update experience may be slightly different. To ensure that all PCs have a seamless update experience—regardless of their differences—we use a combination of testing, close partner engagement, feedback, diagnostic data, and real-life insights to manage quality.

To help with the complexity of the aspects we need to evaluate, we are increasing our investments in machine learning (ML) technologies. Machine learning helps us detect potential issues more quickly and helps us decide the best time to update each PC once a new version of Windows is available.

In this blog, we’ll cover the technical details of how machine learning is used in the rollout of Windows 10 feature updates.

Evolving the use of ML to update Windows 10 PCs

The rollout of any feature update is a gradual process. We start with PCs predicted to have a great update experience while safeguarding those PCs with known issues and, ultimately, expand to all eligible PCs as issues are resolved.

Windows 10, version 1803 (the April 2018 Update) was the first time we used ML on a broad scale. We started with six core areas of PC health (e.g. overall PC reliability) to determine whether the feature update process went smoothly. With Windows 10, version 1903 (the May 2019 update), our third iteration of using ML in a feature update rollout, we can now evaluate 35 areas of PC health and the process will continue to evolve with additional health measures to improve your update experience.

Throughout our ML journey, we consistently see that PCs nominated for updates via ML have a significantly better update experience. For example, as shown in the chart below, PCs chosen via ML have fewer than half the number of system-initiated uninstalls, half the number of kernel mode crashes, and five times fewer post-update driver issues.

01_comparison.png
Figure 1. A comparison of system initiated uninstalls, post-update
kernel mode crashes, and post-update driver issues for the baseline and the ML model.

Building an ML model to support Windows 10 updates

Since building an ML model to effectively support the rollout of Windows 10 updates is a complex process, we are sharing more detail on the data science behind the model, including what makes this problem different from other ML problems. In addition, we are outlining how the ML model produces a probability of having a seamless update experience, how we identify possible safeguards, and how we determine if the model has learned enough to be put in action.

Operating system updates are unique

From an ML standpoint, operating system (OS) updates are unique for several reasons:

  • Inference relies on OS diagnostic data from millions of PCs collected in order to diagnose issues.
  • Features are highly dimensional and sparse (for example, the millions of possible hardware drivers, each of which is installed on a small number of PCs).
  • Labels are not automatically available; instead, they need to be constructed by combining other diagnostic signals.
  • The class label distribution is intentionally skewed to have more examples of seamless updates and few issues.
  • The entire setup operates in a constantly changing and dynamic environment.

Given this complexity, we need a model that is dynamically trained on the most recent set of PCs that have been updated and a model that is capable of differentiating between PCs having a good update experience and those having a poor one.

Designing a dynamic ML solution

The graphic below shows the overall architecture of how ML is used to nominate PCs.

02_architecture.png
Figure 2. Machine learning architecture used for the Windows 10 intelligent rollout process

Every release starts with offering the Windows 10 update to early adopters (such as Windows Insiders and those actively seeking out the update). Once the initial set of PCs has been offered the update, we monitor their update experience via diagnostic data (e.g. kernel mode crashes, system initiated uninstalls, abnormal shutdowns, and driver issues).

Machine learning provides two key capabilities here:

  • It identifies potential issues that result in safeguard holds to protect PCs that have yet to be updated so that those issues can be promptly investigated and fixed by Windows developers.
  • It predicts and nominates PCs that will have a seamless update experience and should, thus, be offered the update.

As this entire process repeats daily, the model constantly learns from the most recent set of updated PCs. Over time, as issues are fixed, PCs previously predicted to have a poor update experience will now be predicted to have a better one, leading to them being offered the update.

Nominating the "best" PCs

We build a classification-based ML solution for each update. The training data is focused on the latest set of PCs on the newest Windows 10 feature update, the PC configurations at the time of the update (i.e. hardware characteristics, drivers, apps, etc.), and a binary label constructed from a set of core diagnostic signals (e.g. whether a PC had a system initiated uninstall or reliability of the PC after the update).

03_example.png
Figure 3. An example of the diagnostic data used to train the ML model used for intelligent rollout

We use Microsoft Azure Databricks to build the ML model:

  • We begin with a data preparation layer focused on ensuring that the model is only trained with high quality data. Data cleansing and monitoring (facilitated by Microsoft Power BI) help with this, allowing us to track anomalies, such as missing data, and then build in resiliency for cases where data is missing or unintentionally biased.
  • We rely on multiple encoding strategies (such as one hot encoding and target encoding) to allow the model to learn about features, such as drivers, that are high dimensional
  • For modeling, we rely on an ensemble approach, a combination of multiple classifiers, each predicting a separate diagnostic signal of post-update experience (for example, the likelihood of experiencing a kernel mode crash). This allows better understanding of why a PC is likely to have a good (or poor) update experience.

    04_bestPC.png
    Figure 4. Process for identifying the “best” PCs – preparing the data, creating individual models, and combining them into one score.

  • The classification algorithms used in this space tend to be logistic regression models or boosted trees (as those are shown to have better, more consistent performance in this space).
  • The models are also tuned to the best parameters, both in terms of generic parameters (such as the regression penalization factor), as well as a dozen or so custom parameters (for example, the optimum training-validation-test-ensemble split). We use typical classification assessment metrics (e.g. the area under the receiver operating characteristics, or ROC, curve) to understand the accuracy of each model and to pick the best set of parameters. For each release, we thoroughly test hundreds of parameter combinations and modeling strategies to pick the best performing model.

These elements all come together as follows: If your PC is eligible to receive an update, we will apply the best-known ML model to your configuration to assess how likely your PC is to have a good update experience and which compatibility issues we need to fix in order to ensure your update experience will be great.

Identifying safeguard holds

A key element of the ML-driven rollout process is the capability to identify compatibility issues early, enabling us to establish safeguard holds to protect specific PCs from receiving a given update. Historically, compatibility issues were detected via laborious lab tests, feedback, support calls, and other channels. While these channels are still used, applying ML to the diagnostic data from the PCs in our broad ecosystem enables us to identify the patterns (in hardware characteristics, drivers, applications, etc.) that are most correlated with any update-related disruption.

To achieve this, we use anomaly detection, which identifies when a feature or pattern (two or more features) results in a higher failure rate than we see for the entire population. Implemented using Microsoft Azure Databricks, we can rapidly scale to millions of PCs and establish safeguard holds to prevent PCs from being disrupted from potential update-related issues.

05_chart.png
Figure 5. Chart showing a feature or pattern that is failing at 82% against a baseline failure rate of about 3%. This identifies where a safeguard hold is needed to prevent other PCs from experiencing similar issues.

You can find a list and details on the latest known issues and safeguards by visiting the Windows release health dashboard.

Monitoring saturation of feature diversity

With the diversity of the Windows ecosystem and an ML model that is refreshed every day, it's important to determine when the ML model is ready to be broadly applied. In other words, we can only use ML to determine when to offer the update to your PC if we have seen adequate similar hardware configurations that have successfully updated.

Typical ML scenarios use a learning curve to determine when models are adequately trained; however, due to the unique diversity of the Windows ecosystem, we use a concept called saturation, which looks at how many of the diverse hardware components, drivers, applications, etc. have been seen so far from updated PCs. Saturation helps us understand the extent to which the feature update rollout has penetrated the hardware and component ecosystem and, thus, is representative of the population of PCs to be updated.

06_monitoring-saturation.png
Figure 6. Monitoring the saturation of the Windows feature update space. Continuous dynamic training of the ML model typically starts once saturation reaches greater than 60% indicating that the training data is representative of the diversity of the Windows ecosystem.

Measuring the impact of ML on the Windows 10 update process

Keeping Windows 10 PC users safe and current has been an exciting journey, not just in terms of building out the technologies to support a large-scale ML deployment, but also in terms of determining how to measure impact in this context. We measure impact via a few different progress indicators:

  • ML model performance. The earliest indicator of impact is the quantifiable performance of the machine learning model, such as the area under the ROC curve and how well the algorithm calibrates predictions to reflect the actual probability of success.
  • Feature update safeguard holds identified via ML. The second indicator of impact is the speed at which we are able to identify and put in place safeguard holds to protect PCs. This directly relates to the PCs that are protected from any adverse update experiences.
  • Monitoring health signals post-update. The final indicator of impact confirms that the ML algorithm is indeed achieving what it is trained to do. We have consistently observed that the PCs chosen by the ML algorithm fare 5-40% better than others. (See Figure 1 for examples.)

Summary

While we are excited by the promise of machine learning, there is still much work to be done to ensure that ML is comprehensive, more automated, and agile enough to catch issues in a few seconds rather than hours. In upcoming feature updates, we will continue to evolve ML and share more details on the progress we make.

5 Comments
Version history
Last update:
‎Sep 26 2019 01:30 PM
Updated by: