Has the number of bugs in Windows updates increased in the past couple of years? If so, what is the reason for the increase in bugs? That's the question that former Microsoft Senior SDET Jerry Berg answered.
Berg worked for 15 years at Microsoft and one of his roles was to design and develop tools and processes to automate testing for the Microsoft Windows operating system. He left the company after Windows 8.1 shipped to the public.
Microsoft changed testing processes significantly in the past couple of years. Berg describes how testing was done in the late 2014 early 2015 period and how Microsoft's testing processes changed since then.
Back in 2014/2015, Microsoft employed an entire team that was dedicated to testing the operating system, builds, updates, drivers, and other code. The team consisted of multiple groups that would run tests and discuss bugs and issues in daily meetings. Tests were conducted manually by the team and through automated testing, and if tests were passed, would give the okay to integrate the code into Windows.
The teams ran the tests on "real" hardware in a lab through automated testing. The machines had different hardware components, e.g. processors, hard drives, video and sound cards, and other components to cover a wide range of system configurations, and this meant that bugs that affected only certain hardware components or configurations were detected in the process.
Microsoft laid off almost the entire Windows Test team as it moved the focus from three different systems -- Windows, Windows Mobile and Xbox -- to a single system. The company moved most of the testing to virtual machines and this meant that tests were no longer conducted on real and diverse hardware configurations for the most part.
Microsoft employees could self-host Windows which would mean that their machines would also be used for testing purposes. The main idea behind that was to get feedback from Microsoft employees when they encountered issues that they encountered during work days. Berg notes that self-hosting is not as widely used anymore as it was before.
The main sources of testing data, apart from the automated test systems that are in place, comes from Telemetry and Windows Insiders. Windows Insider builds are installed on millions of devices and Microsoft collects Telemetry from all of these devices.
If something crashes, Microsoft gets information about it. One of the issues associated with the collecting of Telemetry is that most bugs are not caught by it. If something does not work right, Microsoft may not be able to discern the relevant bits from Telemetry data. While it is in theory possible that users report issues, many don't and at other times, issues may go under because of other feedback that Microsoft gets from Insiders. Additionally, while Insiders may report bugs, it is often the case that necessary information is not supplied to Microsoft which poses huge issues for the engineers tasked with resolving these issues.
Back in 2014/2015, Microsoft's Testing team would be tasked with analyzing bugs and issues, and supplying engineers with the data they required to resolve these. Nowadays, it is Telemetry that the engineers look at to figure out how to fix these issues and fixes are then pushed to customer devices running Insider Builds again to see if the issue got fixed or if it created new bugs.
One of the main reasons why Microsoft stopped pushing out new feature updates to everyone at once was that issues that were not detected by the processed could potentially affect a large number of customers.
To avoid total disasters like the Windows 10 version 1809 launch, gradual rollouts were introduced that would prevent feature updates from being delivered via Windows Update to the majority of machines in the early days of the release.
Microsoft exchanged the in-house Testing team with Telemetry data that it gathers from Insider Builds that it pushes to consumer and business devices, and replaced much of the PCs that it used for testing with virtual environments.
I hope Microsoft improves the Telemetry tools inside Windows 10 to better catch the bugs, so that if a bug happens somewhere, even if the user don't report it, the Telemetry and diagnostics tools in the Windows 10 catch it and report it to back Microsoft.
I also hope they make a small team to work on real hardware with Windows 10, instead of virtualized ones, to prevent driver issues that people experience time to time after a new Windows 10 build ships broadly to everyone.