Taking the Offensive against Malware

First published on CloudBlogs on Sep, 08 2014
I have led the anti-malware work here at Microsoft for a number of years, and this has been one of the most interesting responsibilities I have ever had. The best way to describe this work is hand-to-hand combat with the bad guys. The work never ends and it changes every few minutes. The malware we defend against today is very, very different than what it was in the past. I vividly remember when Slammer hit. As fate would have it, I was on Wall Street on that day visiting several large banks. Mid-way through a meeting, several people suddenly excused themselves and walked out. A couple hours later, I learned what was happening and how quickly it had spread through that bank. That early kind of Malware was about individuals wanting to make a name for themselves or just trying to cause trouble. Today’s malware is a business – and, all too often, malware is custom built by organized crime or governments who are trying to steal money and secrets. One of the things I find most rewarding about working on anti-malware is the opportunity I have to work with great minds across the tech industry. It’s no exaggeration to say that the anti-malware community is one of those places where it is, literally, the good guys vs. the bad guys. All of the good guys (the anti-malware vendors) are in constant communication with each other and we all share what we learn in real time. We all have the same common goal of protecting individuals and businesses – and we work very collaboratively to do exactly that. If you’re interested in an inside look at how this joint effort works, there are a couple of novels that give very good portrayals of this battle and how it operates. I recommend Worm by Mark Bowden, and any of the books by Mark Russinovich (one of my favorite people here at Microsoft !). I am very proud of the work we are have done over the past couple of years with Microsoft Security Essentials and Windows Defender , and I believe that we have the best anti-malware strategy and the right solution for our customers. Of course, the obvious response to a statement like that is, “Brad, what’s the definition of ‘best anti-malware strategy/solution’ that you’re using to benchmark that statement against?” To begin with, the philosophy guiding our anti-malware work has been based on three axis which we use to grade ourselves on the effectiveness of our work:
  1. Do we protect the device and user?
  2. How invasive are we on the device?
  3. Do we have any false positives?

Do We Protect the Device & User?

This is the measurement of how effective are we in blocking anti-malware attacks, as well as a measurement of how quickly are we able remove a threat when something does get through. Every week, we have an in-depth review of how many devices around the world are being actively protected, and we look at what percentage of them of them were secure and protected during the entire week. This is an area where we do very, very well. Here are a couple figures that may be super surprising: Through our telemetry, we receive more than 1 million pieces of malware every single day . This telemetry comes in each day (it runs as an Azure service), and then through machine learning we are able to identify and categorize all of it within minutes. Most of the malware that comes in is an evolution of an existing family, and we can automatically recognize and update our signatures as necessary. A lot of this new malware is created by machines and these machines are constantly making small changes – thus, we’re always aiming at a moving target that we could never keep up with on our own. In other words, we fight machines with machines. Our machine learning is able to take the 1 million+ pieces of malware that come in every day, categorize and verify that we are already blocking them, and then, when something new is found, it is quickly flagged to humans who dig into the details. The system is really impressive, really fast, and very effective. To ensure we stay on top of every piece of malware, we have teams of Microsoft engineers stationed around the globe, and, thanks to the intelligence gathered by our Machine Learning, we are constantly updating our services. For example, we deploy new signatures to our customer three times a day to protect the Windows devices and their users. One point of view that we have been communicating across the industry is about the importance of relevance in prioritizing our efforts. By relevance I mean that there are certain families of malware that dominate the ongoing attacks and infections – and eradicating the most prevalent and destructive malware families should be where the anti-malware community focuses its collective efforts. I’ve been urging the organizations that test and rate the industry’s anti-malware solutions to do this, and I was excited recently when one of these testing organizations ( AV-Comparatives , based out of Austria) published their latest report with “Relevancy” listed as one of the main criteria. You can read their report here . Relevance is also critical because each month, when the security updates are released by Microsoft, we also update a tool called the Malicious Software Removal Tool (MSRT). One of the primary functions of MSRT is to remove the really ugly and tough malware, i.e. things like rootkits. MSRT is executed on nearly 1 billion PC’s around the world every month, and we get telemetry from it that helps us understand what malware is landing on PCs. This helps us prioritize our efforts to eradicate the most dangerous and problematic malware. There are over 7,500 malware families, and each month roughly 250 families make up 95% of all malware encounters. Roughly 500 families make up 99% of malware encounters. Like most things in life, a small, concentrated group of malware is responsible for the majority of the attacks. It is also interesting to note that a small set of PCs are constantly infected – the majority of which are running without any anti-malware at all. Relevance is a matter of making sure you understand what malware is most common or prevalent and ensuring you are protected against that group. To improve the testing and protection of all the organizations we work with, we have offered all of the anti-malware testing organizations a view of the relevant malware so that they can build their tests in a way that reflects what is actually happening in the world. In stark contrast to this, there have been cases where other testing organizations have published results that look pretty bad until, upon some simple investigation, we see that their reports flag malware that does not appear anywhere within the telemetry we get back from those 1 billion MSRT PC’s.

How Invasive are we on the Device?

I’m sure you’ve experienced some sluggish PC performance in the past – only to investigate the issue and find that it’s your anti-malware doing a scan or intercepting calls within the browser. The intention here is good (protecting the user and the device is job #1, after all), but it’s obviously coming at a significant price. I can’t tell you the number of times a family member, friend, or neighbor has asked for help with their PC or device (yes, my night job is 1-800-CALL-BRAD Tech Support :)), and, in so many of these cases, I have resolved the issue by “upgrading” them to Window Security Essentials or Windows Defender. As we built our anti-malware here at Microsoft, we had, as a first-level requirement, that our solution be non-invasive. The user should never know we are there. As a team, and as an organization, we held ourselves accountable to this design requirement and have built what I believe is the least invasive protection solution that still delivers all the required protection (which, again, is job #1). To see the details of this work, I recommend checking out this site which overviews how we measure the most important scenarios in a customer's endpoint experience – and ensures we consistently deliver the best possible protection. To see a detailed overview of the work being done by the Microsoft Malware Protection Center, check out this in-depth whitepaper .

Do we have any False Positives?

A false positive in anti-malware is when we mistakenly identify a good piece of software as malware and then start removing it from PC’s around the world. Rather than lump this in with the previous section (not impacting the device/user) we expressly called it out as a first-class design principle. We have gone to great lengths to ensure we minimize false positives, and you would be amazed to see what our back-end services look like (built in Azure, of course) where we take every signature and run it through automation – including against millions of files that we know are “good” to ensure we minimize those false positives. False positive break customer trust/confidence and they cause significant expense and randomization for our enterprise customers. We believe that we have the lowest rate of false positive in the industry by a very large margin. One quick story (trying to keep it real) before moving on: A couple years ago, I was driving into work on a Friday morning when I got a call from the leader of the anti-malware team. His first words were, “Are you sitting down?” That’s never a good way to start a conversation. That morning we had published a signature with a false positive – and that small error promptly began removing an old version of Chrome from PC’s all over the world. Because we are constantly receiving and analyzing so much high quality telemetry, we knew about the false positive within minutes and were able to stop/recall the signature immediately. When all was said and done, the signature had removed an old version of Chrome on about 50,000 PC’s – a very small number. That morning ended up being a big learning moment for our organization; it emphasized just how important it is to do execute perfect on every detail when 1,000,000,000 PC’s are counting on the software we create.

The Whole is More Than the Sum of the Parts

Every single day the anti-malware team makes decisions about how to constantly improve and strengthen our anti-malware service and the software we deliver. This is a constant and continual battle, and we put a premium on remaining agile and proactive. For me, the greatest validation that we are doing the right thing here is the number of family/friends/neighbors who tell me how much better their PC’s are running after I moved/upgraded them to MSE or Defender – something I’ve done on literally 100’s of PC’s over the years.

Working Together

The work that we have done to build System Center Endpoint Protection (SCEP) on top of System Center Configuration Manager (SCCM) is something I recommend learning more about. Every IT organization I know of has pressures to do more with less, and to all of it in a shorter amount of time. One of the most effective ways to decrease costs and accelerate value is to minimize the number of infrastructures that you have to deploy, secure, manage, and update. If you are using SCCM and SCEP, there is only one infrastructure that you have to worry about. Because this is all running on a single infrastructure, the value you are able to extract is significantly easier to obtain, faster, and less expensive. For example: You may want to generate a report that shows the devices that were infected over the course of the past month, as well as the user that was associated with that device, and then a view of the compliance of that device’s configuration relative to your organization’s corporate standards. Sounds like a big undertaking, right? If you are using SCCM and SCEP – all of this is in a single database on a single set of objects. On the other hand, if you are using SCCM and something other than SCEP for your anti-malware, you’re going to need to do a huge amount of work to pull this information together. The process will go like this: First, you’ll have to create a data warehouse and create the jobs to constantly sync the SCCM and other anti-malware data into the DW. Next, you’ll need to do the work to correlate devices, e.g. help the DW understand that device foo in SCCM is the same device bah in the anti-malware database. After you’re done with this, you can start creating some custom reports. Chances are you will have given up long before getting to this point. On the off chance you do get this set up, have fun holding it together. The good news is that we have already done all this work for you, and all of this is stored in a single database, and it all operates on a single set of objects. One last note: Many people reading this post purchased SCCM through an Enterprise Agreement with Microsoft – this means you also own SCEP (since it is part of the Enterprise Agreement) and don’t even have to purchase anything. What’s great is that SCEP/MSE/Defender all use the exact same agent on the protected device and the same clouds service. This is an example of how much architecture really matters when it comes to getting the best solution possible.