Microsoft collects, analyzes, and indexes internet data to assist users in detecting and responding to threats, prioritizing incidents, and proactively identifying adversaries' infrastructure associated with actor groups targeting their organization. We learned how Defender TI provides raw and finished threat intelligence in Module 2. The focus of this module is to dive into the raw intelligence, in the form of internet datasets, Defender TI includes.
Defender TI's internet data is categorized into two distinct groups: core and derived. Core datasets include Resolutions, Whois, SSL Certificates, Subdomains, DNS, Reverse DNS, and Services. Derived datasets include Trackers, Components, Host Pairs, and Cookies. Trackers, Components, Host Pairs, and Cookies datasets are collected by observing the Document Object Model (DOM) of web pages crawled. Components and Trackers are also observed from detection rules triggered based on the banner responses from port scans or SSL Certificate details.
Defender TI may include live, real-time observations and threat indicators, including malicious infrastructure and adversary-threat tooling. Any IP, domain, and host searches within our Defender TI platform are safe to search. Microsoft will share online resources (e.g., IP addresses, domain names) that should be considered real threats posing a clear and present danger. We ask that users use their best judgment and minimize unnecessary risk while interacting with malicious systems when performing exercises provided in this module. Please note that Microsoft has worked to reduce risk by defanging malicious IP addresses, hosts, and domains.
The datasets available to search against within Defender TI are typically bucketed into core or derived datasets. A derived dataset is aggregated after some processing happens against data from a source system of record in order to identify, extract, or detect an element of interest.
Figure 1 – Defender TI Home page
Figure 2 – Defender TI Summary view after searching an IP address, domain, or host artifact
Figure 3 – Defender TI Data view after pivoting on an artifact from a project, article, or summary view
Figure 4 – Defender TI Resolutions (PDNS) dataset
Figure 5 – Defender TI DNS & reverse DNS datasets
The Domain Name System (DNS) is often referred to as the phone book of the Internet and is the system by which human-friendly computer hostnames or domain names are translated into IP addresses. This system is maintained by a distributed database of name servers. Every domain name should have at least one authoritative DNS server (name server) that publishes information about that domain in the form of various record types. Below are the most common record types:
Record Type |
Description |
A |
Returns a 32-bit IPv4 address. Most commonly used to map hostnames to an IP address of the host. |
CNAME |
Alias of one name to another. |
MX |
Mail exchange record. Maps a domain name to a list of message transfer agents for that domain. |
TXT |
Text record. |
NS |
Name server record. Delegates a DNS zone to use the given authoritative name servers. |
SOA |
Start of authority record. Specifies authoritative information about a DNS zone. |
AAAA |
Returns a 128-bit IPv6 address. Most commonly used to map hostnames to an IP address of the host. |
PTR |
Pointer record providing the domain name associated with an IP address. |
When a user interacts with an application such as a web browser or email client, the application sends requests to the DNS resolver in the local operating system, which issues requests in the form of queries to configured DNS servers when necessary to retrieve DNS record information for a host or domain. As these queries traverse the Internet, sensors can passively record some content of the responses sent for these queries. This is referred to as Passive DNS or PDNS.
As described in Module 2, our worldwide sensor network ingests and stores hundreds of millions of unique records daily. This historical resolution dataset allows users to not only view which domains resolved or are currently resolving to an IP address (or the other way around) but also understand when the resolutions occurred, which is important for an analyst.
The Resolutions tab within Defender TI is where users will find data associated with A record queries. For example, as of the time writing this module, if I run a search for fabrikam.com, the Resolutions tab will list 18 IP addresses that this domain has resolved to over time for which we have been tracking this data (more than ten years). Similarly, running a search for 20.103.85.33 and viewing the Resolutions tab will provide me with a list of almost 1000 domains (at the time of this writing) that have been reported as resolving to that IP address.
Figure 6 – Resolutions for a Domain
Figure 7 – Resolutions for an IP address
PDNS can enable the identification of additional threat actor infrastructure when an initial domain or IP address of interest has been discovered. For example, in a 2018 PaloAlto Unit 42 report, an investigation centers around the IP address 89.46.222[.]97. Viewing resolution data for the early 2018 timeframe, it is easy to identify multiple domain names attempting to impersonate popular services offered by Google and Facebook. These domains all represent threat hunting and block listing opportunities in cases where you have identified that your organization should defend against the attacks identified in the report.
Figure 8 – Resolutions for IP address, 89.46.222[.]97
The DNS and Reverse DNS tabs will also hold information related to the various records that may be retrieved for a given domain. More specifically, the DNS tab will list all record types other than the A records, which are under the Resolutions tab. Data under the Reverse DNS tab will populate when the target of the search has been observed as a record value for another IP or domain. For example, suppose my original search is for ns1.cloudflare.net, which is a nameserver. In that case, the Reverse DNS tab will display a list of domains that have ns1.cloudflare.net listed as a nameserver (which will be a lot) in addition to any other records where ns1.cloudflare.net is included as the value. This data can become especially interesting when investigating a situation where perhaps a threat actor exclusively controls a specific nameserver, and you would like to know which other domains may be connected to the actor because they are using that particular nameserver. Other good hunting material may come from investigating nameservers used by hosting services that only require a minimal amount of information to purchase a domain or accept cryptocurrency as forms of payment for domains. These types of hosting services are naturally more attractive to malicious actors.
Take a look at this 2016 ThreatConnect report focusing on name servers. Run searches for some of the referenced indicators in Defender TI to better understand how to search and pivot across these datasets.
Figure 9 – WHOIS Current Results for WHOIS Nameserver search, c358ea2d.bitcoin-dns[.]hosting
What do you notice about the domain reputation data returned by drilling down on the results?
Figure 10 – iscs-net[.]org Reputation data
Where would you click to view nameserver data for this domain?
Figure 11 – vice-news[.]com WHOIS Nameservers results
WHOIS is a widely used protocol for searching public registrars and registries which store information related to the individuals, businesses, organizations, or governments who register Internet resources such as domain names and IP address blocks. This information may include names, addresses, email addresses, phone numbers, and administrative and technical contacts for the given resource. Data provided is supposed to be accurate – providing fake credentials violates the Internet Corporation for Assigned Names and Numbers (ICANN) terms of service, and registrars are obligated to validate specific data fields.
Figure 12 – Defender TI Whois dataset
The WHOIS tab in Defender TI breaks out the various fields found within a WHOIS record. These values can be used to search against to find registration records containing the same data. This may be valuable in the case of tracking a threat actor who registered multiple domains using the same information. With more unique data comes a higher level of confidence that can be applied when making a case where the records are related.
As of 2018, however, several domain name registrars offer privacy protection services, which replace a user's information in a WHOIS record with that of a forwarding service. In many cases, this has hampered cyber investigators' ability to perform these types of searches to find valuable data. Defender TI does, however, include the ability to view the history of WHOIS record information for a given resource, which might still be helpful for an investigation. In a June 2020 Lookout report, researchers were able to identify potential command and control domains used by a single actor by WHOIS record history links between the email addresses mars-soft@gmail[.]com, alimjan@gmx[.]com, and cojina@gmx[.]com. Additionally, the use of Xin Net Technology Corporation for registration of the domains (which is included as a value in the Registrar field of the WHOIS record) provided further support for linking the domains together.
Defender TI supports directly searching the following fields of WHOIS records: email, name, organization, address, city, state, postal code, country, phone, and nameserver. Try running some WHOIS email searches for any of the above email addresses and compare the data returned to the diagram on page 14 of the referenced report to better understand how various connections are made. Keep in mind that you will need to examine the history of WHOIS records for any domains that turn up.
Figure 13 – WHOIS search types
Digital certificates enable encrypted communications between clients and servers. Certificates are issued by Certificate Authorities (CAs) which are responsible for managing domain control verification and verifying that the public key attached to the certificate belongs to the user or organization that requested it. A treasure trove of certificate information is gathered via our scanning capabilities including both the field names and values in a given certificate as well as the infrastructure with which the certificate is associated. The calculated SHA-1 hash of the certificate is used to identify an individual certificate, and the following fields are extracted for display within Defender TI, most providing pivot points for investigative purposes: Serial Number, Issued Date, Expiration Date, Subject and Issuer Common Names, Alternative Names, Subject and Issuer Organization Names, SSL Version, Subject and Issuer Organization Unit, Street Address, Locality, State/Province, and Country.
Figure 14 – Defender TI Certificates dataset
Each time an SSL certificate is indexed to our database, the IP address associated with the certificate is also noted. Having this database of mappings enables users to identify infrastructure associations when a certificate hash is spotted across multiple IP addresses. This is a great place to look at for connections when Passive DNS or WHOIS data aren't turning up much. There are also times when the certificate field values, themselves, will contain nuggets of information valuable to an investigation.
In a June 2020 Proofpoint report analyzing 2019 threat activity from TA429 (APT10), researchers are able to link infrastructure related to various FlowCloud malware campaigns in part by looking at the Alternative Name Field in an SSL certificate. This field contained additional previously unknown infrastructure (domain names) that followed the theme of energy certification and education which the threat actor was employing within phishing campaigns targeting the U.S. utility sector.
Figure 15 – IP address with A record SSL subject alternative name leads
Take a look at how the certificate details are displayed in Defender TI:
Figure 16 – Certificate Subject Alternative Name search with associated Certificate Details
Experiment with running searches against various certificate fields via the Certificate search on the Defender TI Home Page:
Figure 17 – Certificate search types
Finally, note that this report is one that was selected for inclusion as an intelligence article within Defender TI and is available for review . Notice that indicators from the report were extracted and included as a list under the Public Indicators tab. If additional indicators had been discovered by our own in-house research team, those would be included in a separate list under the Defender TI Indicators tab, which is available to Defender TI Premium licensed users.
Subdomains are child domains to the root or parent domain and are also referred to as "hosts". This dataset can provide some interesting material during an investigation, whether by revealing links to previously unidentified infrastructure via IP resolutions or by revealing threat actor attempts to impersonate infrastructure belonging to legitimate organizations.
Figure 18 – Defender TI Subdomains dataset
A 2020 TrendMicro report analyzing a campaign called DRBControl describes an example of the former situation, where malware had subdomains hardcoded in the samples. The domain names on their own did not resolve to an IP address, however the various subdomains did resolve and led to different malware families.
Figure 19 – livechatinc[.]org Resolutions
Figure 19 – livechatinc[.]org Subdomains
Are there any subdomains with more recent A record resolutions compared to livechatinc[.]org?
Figure 20 – mail.livechatinc[.]org Resolutions
Service names and port numbers are used to distinguish between different services that run over transport protocols such as TCP, UDP, DCCP, and SCTP. Port numbers can suggest what type of application is running on a particular port. But applications or services can be changed to use a different port to obfuscate or hide the service or application on an IP address. Knowing the port and header/banner information can identify the true application/service and the combination of ports being used. Defender TI surfaces 14 days of history within the Services tab, displaying the last banner response associated with a port observed.
Figure 21 – Defender TI Services dataset
Figure 22 – Services associated with an IP address
Trackers are unique bits of data embedded within websites and typically used to gather data about an end user's interactions with the page. Types of trackers frequently observed within Defender TI data include social media identifiers and identifiers used by various analytics platforms. The trackers dataset is also where users will find things like SSH server public keys or the three different types of JARM hashes that can be calculated for TLS handshakes relevant to the entity a user searches.
Figure 23 – Defender TI Trackers dataset
Take a look at this article on how JARM hashes can figure into analysis. Here is an example screenshot JARM search results for an IP address with observed Cobalt Strike servers in Defender TI:
Figure 24 – Jarm Hash trackers associated with an IP address
Take a look at this intel article for additional ideas on how the trackers dataset may be used in an investigation. This article describes a tool called HTTrack and describes how within Defender TI users can identify web pages that were copied from other pages through the use of this tool. This is a common tactic used by threat actors during phishing campaigns when they need victims to believe that they are browsing a trusted website, when in fact they are browsing a threat actor-controlled replica of the trusted site.
Figure 25 – HTTrack trackers associated with a domain
This is a legitimate PayPal host.
Figure 26 – www.paypalobjects.com WHOIS record results
Figure 27 – PayPal headquarters on Google Maps search
Figure 28 – MarkMonitor headquarters on ZoomInfo
Figure 29 – www.paypalobjects.com Trackers search results
Figure 30 – ua-53389718 Trackers search results
What differences do you see between the storeofbrands[.]pk domain and www.paypalobjects.com host?
Figure 31 – Screenshot of storeofbrands[.]pk in urlscan.io
Does anything in this image itself look suspicious?
The Components dataset identifies technologies in use on Internet infrastructure as detected by evaluating responses received from web crawling and port scanning. Components can give users insight into technology stacks in use by threat actors and assist with identifying additional infrastructure under the control of the same actor. Components can also give users investigative insight into how a threat actor was able to compromise a certain website. We find that many of our users also like to take a look at their own web pages in order to get an idea of what they might look like to an attacker. Use cases are plentiful!
Figure 32 – Defender TI Components dataset
Here is an example for what you might find with a search like 'fabrikam.com'.
Figure 33 – fabrikam.com Components
Examining components can even lead to the identification of tools commonly used by threat actors during the course of setting up a campaign. Here is an example of how we are able to identify the admin panel for a sophisticated phishing kit. A component search for '16Shop Admin Panel' will reveal a list of sites deployed with this kit. Further reading may be found here.
Figure 34 – 16Shop Admin Panel Component Search Results
This investigation highlights how you can investigate Cobalt Strike C2 servers using Defender TI. It walks you through all datasets that are helpful, but also demonstrates how the Components dataset can be helpful in quickly identifying Cobalt Strike servers that have been detected by Defender TI. As an analyst, it will be important to decipher which Cobalt Strike servers are used for pen testing and which ones are used maliciously.
Figure 35 – Cobalt Strike keyword search
What Articles appear?
Figure 36 - Cobalt Strike Components on Hosts
What hosts have observed Cobalt Strike recently?
Figure 37 – Cobalt Strike Components on IPs
What IPs have observed Cobalt Strike recently?
When was an open port last detected?
Are there any immediate typosquats associated with this IP address?
Are there any anomalies with how some of the certificates were setup?
What tracker types do you see associated with this IP address?
Are there any JARM hashes?
How many IPs have observed this JARM hash?
Do you see Cobalt Strike listed as a component?
What ports have been observed in the last 14 days?
What statuses are they in?
What are their protocol responses?
What do you see in the banner response?
The Host Pairs dataset describes the relationships identified from web crawls between web pages. These relationships are described in terms of parent vs. child, meaning that the parent domain led to the child domain perhaps through something simple like a top-level redirect (HTTP 302) or via something more complex like an iframe or script source reference.
The modern web is a complex graph of dependent requests made up of images, code libraries, page content and other references. As we learned in Module 2, Microsoft's crawling technology makes nearly 2 billion HTTP requests online and saves the contents of the session inside of a database. Using years of this data, engineers at Microsoft put together a Defender TI Premium dataset, Host pairs.
Simply put, host pairs are two hosts (a parent and a child) that shared or currently share a connection observed from a Microsoft crawl. When viewing these host pair relationships, users should interpret the data whereby a parent hosts leads to the child host perhaps through something simple like a top-level redirect (HTTP 302) or via something more complex like an iframe or script source reference. In other instances, analysts can observe odd behaviors such as host pair relationships between IP addresses and domains/hosts that go against DNS standards.
What makes this new dataset powerful is the ability to understand relationships between hosts based on details from visiting the actual page. Unlike our other datasets, the host pairs dataset relies on knowing the website content, so it's likely to surface different values that other sources like passive DNS and SSL certificates could miss.
Figure 38 – Defender TI Host pairs dataset
To illustrate how an analyst could use this data, take the host, amaltaricommunityhomestay.org[.]np, as an example. The host was observed utilizing a phish kit hosted by dancevida[.]com.
What is the reputation for this host?
Figure 39 – amaltaricommunityhomestay.org.np Reputation
What are some interesting child hosts amaltaricommunityhomestay.org[.]np is referencing?
What are the causes of these relationships?
Figure 40 – amaltaricommunityhomestay.org.np Host Pairs
What is the reputation score for dancevida[.]com?
Figure 41 – dancevida[.]com Reputation
What are some initial indicators that help to infer that aadcdn.msauth[.]net is a legitimate Microsoft host?
Figure 42 – aadcdn.msauth.net Resolutions
Notice how many other hosts are also referencing images hosted by this legitimate Microsoft authentication host.
Figure 43 – aadcdn.msauth.net Host Pairs
Figure 44 – Screenshot of amaltaricommunityhomestay.org[.]np in urlscan.io
Figure 45 – Screenshot of aadcdn.msauth.net search in amaltaricommunityhomestay.org[.]np DOM in urlscan.io
Figure 46 – Screenshot of aadcdn.msauth.net search in amaltaricommunityhomestay.org[.]np DOM in urlscan.io
Figure 47 – Screenshot of dancevida[.]com search in amaltaricommunityhomestay.org[.]np DOM in urlscan.io
Notice how these host pair relationships that were observed in the DOM of this crawl were also indexed in the Host Pairs results screenshot displayed in Step 2 of this exercise.
What article is associated with that keyword?
Franken-phish: TodayZoo built from other phishing kits
What else can you find out about the DanceVida phish kit?
How many other hosts are referencing the DanceVida phish kit?
Figure 48 – dancevida[.]com Host Pairs
447
Now, you've learned how Host Pairs can be incredibly helpful to chain together related indicators of compromise. In regards to the DanceVida investigation, the additional hosts referencing the dancevida[.]com link.href phish-kit can be added as threat intelligence indicators in your SIEM and proactively blocked from your network.
In a May 2021 Dragos blog post[6], cybersecurity researchers describe how they found malicious code hosted on the website of a Florida water utility contractor. In the page's source code, you can see a script reference to the host bdatac.herokuapp[.]com.
Can you identify which website must have been compromised in order to refer to the malicious script?
Figure 49 – bdatac.herokuapp[.]com Host Pairs
Cookies are small text-based files given to you from a server to your client-side browser by visiting websites in order to identify you to the site. Cookies maintain state information as you navigate different pages on a site or return to the site at a later time. Frequently, websites will also host third-party cookies on their sites, which enable companies to target you with ads and track your browsing behaviors. For more reading on cookies, check out this article in the Windows Resource Center.
As we crawl the web, cookies are one of the many data items that are collected and stored in order to enable investigations. Cookies can be especially helpful when attempting to identify websites that have been stood up by a threat actor in order to attempt to impersonate legitimate brands.
Figure 50 – Defender TI Cookies dataset
This is a legitimate host owned by the United States Internal Revenue Service.
Figure 51 – www.irs.gov Cookies
Figure 52 – www.irs.gov Cookie Domain results
Figure 53 – Screenshot of a www.irs.gov Cookie domain search result in Defender TI
Figure 54 – Screenshot of www.lakshmanahospital[.]com in urlscan.io
Does anything in this image itself look suspicious?
This is a legitimate host owned by JPMorgan Chase.
Pivoting off unique-looking cookie names is a great place to start investigating. The resulting list of hostnames where the cookie has been observed will likely have some hostnames stick out like sore thumbs for further investigation. Take a look and see if you find anything worth further investigation.
Figure 55 – Screenshot displaying cookie name observed by shrapnelneptunium[.]com that originated from static.chasecdn[.] com's domain server
Figure 56 – Screenshot of shrapnelneptunium[.]com in urlscan.io
Does anything in this image itself look suspicious?
Notice how it was only first and last seen for 1 day between 9/27/2022 and 9/28/2022.
Figure 57 – Cookie Host observing cookie value originating from www.chase.com
What reputation has urlscan.io given this host?
What reputation has Google Safe Browsing given this host?
What brand is listed in the 'Targeting these brands'?
Figure 58 – Cookie Host historical web crawl details from urlscan.io
Defender TI offers a variety of search and pivot capabilities. The Data tab supports sorting, filtering, and downloading of data, to streamline the investigation process, while tags enable analysts to provide or view more context when searching artifacts during an investigation. Projects enable analysts to collaborate with one another during investigations or share tactical intelligence with their peers.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.