Why is it important to understand blob-hunting?
1. Exfiltrating sensitive information from misconfigured resources is one of the top 3 cloud storage services* threats, and threat actors are continuously hunting storage objects because it’s easy, cheap, and there’s much to find. In some cases, they target your storage accounts.
2. Most people think they don’t have misconfigured storage resources. Most people do. Misconfiguration by end-users is a common problem; if you are safe today, there might still be a mistake tomorrow.
3. There are quick and effective ways to harden your security posture and prevent these threats from happening.
|
* Cloud storage services such as Azure Blob Storage, Amazon S3, and GCP Cloud Storage
Threat actors use tools to exfiltrate sensitive information from exposed storage resources open to unauthenticated public access. This process is called blob-hunting, also known as Container Enumeration on Leaky Buckets. It is a common collection tactic, easy to do, cheap to carry out, does not require authentication, and there is no shortage of open-source tools that help facilitate and automate its process.
Numerous data breaches across storage services in all cloud providers originated from mistakenly exposing data to public access due to configuration errors in access to the storage objects or mistakenly uploading sensitive content to an already publicly accessible storage container.
Some tools can help detect storage resources open to public access, but there are always human errors, and prevention alone is not enough.
- 2022 Data Breach Investigation Report by Verizon |
This is where Microsoft Defender for Storage comes into play: it detects blob-hunting attempts and other malicious activities by monitoring unusual activities from unexpected sources. It alerts you on time with the relevant information to help you understand what happened and helps you harden your configurations to prevent attacks from happening in the future.
This post will cover the top blob-hunting questions and explain how Microsoft Defender for Storage detects and prevents this type of threat:
How Microsoft Defender for Storage helps detect and prevent these attempts
To better understand Azure Storage, how it’s built, and its access policies, you can go to the Background - Azure Storage accounts and access levels section at the bottom of this post.
Blob-hunting is the act of guessing the URL of containers or blobs open to unauthenticated public access with the intent of exposing data from them. The following conditions must be met to successfully expose and exfiltrate data from storage accounts, and they are controlled by the owners of the storage accounts (or the users/applications with the appropriate permissions):
There are several ways to expose blobs, with different starting points:
If the threat actors have discovered the storage account, they can start brute-force guessing the container names. If containers are found, when the access level is set to ‘Container’, they can enumerate (list) all blobs within the containers and exfiltrate that data.
The last starting point is interesting because if threat actors somehow found a blob URL or blobs were exposed during the brute-force guessing of the full URL, now they know that the account and container names they guessed or found exist. From that point, it’s possible to discover and expose other containers and, specifically, blobs within the discovered container.
While there are other attack vectors, these are the main ones. We will focus on the steps of exposing the storage account name, container name, and blob name.
Public storage accounts have a URL of a public endpoint (more information in the Background section), which means that it's possible to guess storage accounts names by performing DNS queries on the URL and examining the response:
https://<<storage-account-name>>.blob.core.windows.net
|
There are multiple ways to query the DNS, and the simplest ones are to use the nslookup command line in the CLI or the Resolve-DnsName cmdlet in PowerShell, for example:
Resolve-DNS PowerShell
Threat actors enumerate multiple accounts at a time by automating the search for storage accounts with scripts that use a combination of custom/generic wordlists, DNS queries, and search engine APIs to guess and find storage accounts.
The following is an example of enumeration using the python script of dnscan in combination with a custom wordlist:
Python wordlist-based DNS subdomain scanner
It is also possible to find storage accounts using search engines, such as Google Dorking and Shodan. The following is a basic example of Google Dorking. By adding more filters, threat actors can pinpoint the search for sensitive information:
Once the storage account name is known, threat actors can start looking for containers open to public access. As with the storage account names, to map and expose containers, threat actors manually guess the container names or use wordlists of known names that usually imply containers that store sensitive data, such as: 'audit', 'dbbackup', 'vulnerability-assessment', etc. The following is a wordlist example of possible container names taken from one of the blob-hunting tools:
This is the part where the container access level determines if a container can be listed, which means that if someone discovers a container, they can list all the blobs within it.
Threat actors use the Blob service's REST API GET requests to validate that containers exist. List Blobs and Get Container Properties are the most common operations used to validate if the container exists and is open to public access.
These operation types are quite different. If the container access level allows it, ListBlobs lists all the blobs within containers, and GetContainerProperties returns the properties of the container without listing the blobs within (smaller signature).
Using the container URL, it is also possible to guess the names of containers with REST requests from the browser or other API platforms. For example, the following GET request verifies that the container exists and returns all the stored blobs. You can test it yourselves:
https://mediaprod1.blob.core.windows.net/audio?restype=container&comp=list
Note: If the access level is set to ‘Blob’, the blobs within the container can be publicly accessible, but querying the container and performing operations on it (such as listing the blobs or getting the properties of the container) will return a ContainerNotFound 404 error-code, which helps mask the blobs within the container.
Once threat actors discover the names of storage accounts and the containers within them, they can start trying to expose the data stored in the blobs (objects). They first try to use the ListBlobs operation, which lists all the blobs within the containers if the access level to the container permits it (set to 'Public' access level). If the container access level is set to 'Blob', listing the container will not work, leaving threat actors with the option to brute-force guess the blob names (the account and container names are already known).
In cases where the blob containers are not discoverable, threat actors can try brute-force guessing the full URL, but it will be harder.
Common blob-hunting attacks are automated using dedicated tools such as feroxbuster, MicroBurst, and Gobuster. These tools allow easy discovery of storage account names, and if these attempts are successful, threat actors can follow two approaches:
The following is a basic example of using MicroBurst to guess container names of a known account (from our example – ‘mediaprod1’) by using a generic wordlist, exposing blobs by listing the blobs of the exposed containers, and downloading an exposed blob:
Enumerating blob containers of exposed storage accounts with wordlists
Blob-hunting is easy to achieve, cheap in terms of resource usage, does not require authentication, and can originate from any local machine or VM. In some cases, the blob-hunting activities originate from cloud resources.
Blob-hunting can be an ad-hoc activity or a continuous effort to search the web for exposed cloud storage resources in cloud services like Azure Blob Storage, Amazon S3 Buckets, and GCP Files. There are searchable websites with databases for exposed content to check infrastructures for breaches. Unfortunately, these sites attract malicious actors who wish to take advantage of the data found there or from similar sources.
It is not uncommon to find that the source of blob-hunting activities is infected and controlled bots that are part of botnets. It is also common for the threat actors to mask their identity behind Tor exit nodes, which helps hide the source and makes it difficult to investigate and connect it to other activities.
Microsoft Defender for Storage detects blob hunters trying to discover resources open to public access and attempt to expose blobs with sensitive data so that you can block them and remediate your posture. The service does this by continuously analyzing the telemetry stream generated by Azure Storage services without the necessity of turning on the diagnostic logs, accessing the data, and impacting performance. When potentially malicious activities are detected, security alerts are generated. These alerts are displayed in Microsoft Defender for Cloud with the details on the suspicious activity, the threat actor, the access method, affected resources, performed operation types, MITRE ATT&CK tactic, potential causes, proper investigation steps, and instructions on how to remediate the threat and improve the security posture. These alerts can also be exported to any SIEM solution.
The following security alerts are a subset of the Microsoft Defender for Storage detection suite and can be triggered in different stages of the full blob-hunting attack path. These alerts inform you if malicious attempts to expose blobs were carried out, if someone accessed the containers, and if data was exfiltrated. They also provide a heads-up if containers with potentially sensitive information are misconfigured.
There are three flavors of scanning-related (blob-hunting) alerts. They usually indicate a collection attack, where the threat actor tries to list blobs by guessing container names in the hope of finding open storage containers with sensitive data in them:
Scanning alerts contain information on the scanning source, what was scanned successfully, and what failed attempts were made to scan private or non-existent containers. The alert also indicates if the scanning activity originated from a Tor exit node or if the IP address is suspicious because it is associated with other malicious activities (data enriched by Microsoft Threat Intelligence).
There are two flavors to the data exfiltration detection alert. In the scope of blob-hunting attacks, the alerts are triggered if unusual exfiltration activities occur after successful scanning attempts:
The following alert is triggered on possible access level configuration errors to prevent public exposure of sensitive data:
By examining the storage account's data plane logs, you will notice that blob-hunting activities are characterized by repeated anonymous (unauthenticated) attempts to get information from storage resources by guessing URLs. Most of these attempts result in 404 error codes (resource not found), but they may also be successful, meaning that storage containers have been discovered and even possibly that blobs have been enumerated.
The following instructions are the general steps we recommend for investigating blob-hunting-related alerts. If the resource (diagnostic) logs are enabled on the compromised storage account, it helps deepen the investigation process:
In most cases, you should not rule out familiar or private IP addresses too quickly. They may indicate compromised identities or a breached environment. But specifically, since authentication is not required in the blob-hunting of storage resources scenario, it is unlikely that the source originated from your environment. If this is the case and the activity is repeated from an unknown source, this might be a true positive blob-hunting activity.
Damage control – In case you didn't rule out the possibility it's a false positive, the assumption is that the activity is malicious, so the first step to take is to do damage control, and in case there was a data breach, perform quick mitigation steps:
Look at the “List of containers successfully scanned” field in the alert to understand which containers were successfully discovered.
Is there sensitive data inside the discovered containers?
Are there other publicly open containers within the same account that may contain sensitive information?
See if the container access level was changed from ‘Private’ to ‘Public’ and its access level is misconfigured. You can also check whether you received a “Storage account with potentially sensitive data has been detected with a publicly exposed container” alert before this alert – this may indicate that content in the container is sensitive.
Investigate further (in case you don’t have diagnostic logs enabled)
If you don’t have diagnostic logs enabled on the resource, you can still detect anonymous requests from client applications using Azure Metrics Explorer. This helps you understand whether there were unauthenticated requests, how many, and when.
Using the filter, you can look for unauthenticated requests (Authenticate Type), look for repeated failed attempts (Response Type), and filter by different operation types so you can detect successful anonymous GetBlob operations after a series of failed unauthenticated requests. The Metrics information does not include context on the source of the requests and does not let you filter at the container level.
If any of these signals arise during the investigation process, a faster escalation is required to prevent a possible data breach:
It can take up to an hour to immensely improve your posture with prevention steps that help protect your accounts against blob-hunting in your storage resources:
Minimize the number of containers that allow public access.
Reduce the access level to 'Blob' from 'Container' wherever possible. It will make the process of hunting blobs and exposing them much more difficult.
Make sure no sensitive information is inside containers that allow public access.
Manage the remaining containers that allow public access by ensuring that applications or users cannot upload sensitive information and that users with write permissions know that the uploaded data will be publicly accessible.
Consider changing the names of the containers to unrecognizable names (you can use randomly generated names as well) or adding random prefixes/suffixes to the container names. Changing the names will limit the effectiveness of blob-hunting tools based on word lists.
If you do not wish to receive scanning alerts – you can apply suppression rules to dismiss them at your desired scope.
If the alerts are recurring on the same IP addresses, consider blocking them with the networking rules.
You can also configure Monitor alert rules that notify you when a certain number of anonymous requests are made against your storage account.
For more security best practices for Blob storage, visit the Security recommendations for Blob storage documentation.
When diagnostic settings are enabled, you can proactively hunt blob enumeration activity using Microsoft Sentinel. The following two queries can be executed within Microsoft Sentinel to detect suspicious enumeration activity.
The first query combines the IP address and User Agent to create a unique identifier. This identifier is then used to detect enumeration activity by aggregating activity based on the unique user identifier into sessions. By default, this hunting query will detect any single user who has enumerated at least 10 files and has a failure rate of over 50%. When calculating the sessions of activity using row_window_session(), the query will group any requests that occur within 30 seconds of each other and span a maximum time window of 12 hours. Each parameter can be modified at the top of the query depending on your hunting requirements.
let maxTimeBetweenRequests = 30s;
let maxWindowTime = 12h;
let timeRange = 30d;
let authTypes = dynamic(["Anonymous"]);
//
StorageBlobLogs
| where TimeGenerated > ago(timeRange)
// Collect anonymous requests to storage
| where AuthenticationType has_any(authTypes)
| where Uri !endswith "favicon.ico"
| where Category =~ "StorageRead"
// Process the filepath out of the request URI
| extend FilePath = array_slice(split(split(Uri, "?")[0], "/"), 3, -1)
| extend FullPath = strcat("/", strcat_array(FilePath, "/"))
// Extract the IP address, removing the port used
| extend CallerIpAddress = tostring(split(CallerIpAddress, ":")[0])
// Ignore private IP addresses
| where not(ipv4_is_private(CallerIpAddress))
| project
TimeGenerated,
AccountName,
FullPath,
CallerIpAddress,
UserAgentHeader,
StatusCode
| order by TimeGenerated asc
| serialize
// Generate sessions of access activity, where each request is within maxTimeBetweenRequests doens't last longer than maxWindowTime
| extend SessionStarted = row_window_session(TimeGenerated, maxWindowTime, maxTimeBetweenRequests, AccountName != prev(AccountName))
| order by TimeGenerated asc
// Summarize the results using the Session start time
| summarize Paths=make_list(FullPath), Statuses=make_set(StatusCode), CallerIPs=make_list(CallerIpAddress),
DistinctPathCount=dcount(FullPath), AllRequestsCount=count(), CallerIPCount=dcount(CallerIpAddress), CallerUACount=dcount(UserAgentHeader), SessionEnded=max(TimeGenerated)
by SessionStarted, AccountName
// Validate that each path visited is unique, scanners will generally try files once
| where DistinctPathCount > 1 and DistinctPathCount == AllRequestsCount
| order by DistinctPathCount
| extend ["Duration (Mins)"] = datetime_diff("minute", SessionEnded, SessionStarted)
| project-reorder
SessionStarted,
SessionEnded,
['Duration (Mins)'],
AccountName,
DistinctPathCount,
AllRequestsCount,
CallerIPCount,
CallerUACount
IP address and User Agent are the only user identifiers available when investigating anonymous access. However, both of these identifiers can be manipulated by the attacker. The attacker can trivially change the User Agent when constructing the request. However, IP addresses are very difficult to spoof. For this reason, threat actors have moved to use residential proxy services, and these services allow the threat actor to use a different IP address with each request. Most of these services are served from residential IP addresses, so they are difficult to identify as part of a VPN network.
The second query does not rely on grouping activity based on the user's IP or User Agent. Instead, this query produces sessions of candidate scanning activity using the row_window_session() function. These results alone are interesting, and in some instances, the time between access can be reduced to as short as 1 second to detect enumeration activity spanning multiple IP addresses.
After sessions have been identified, the query exploits another aspect of enumeration by checking that each request in the session made a request to a unique file name. By avoiding the use of IP address and User Agent, this query can identify candidate scanning activity originating from a threat actor using volatile IP addresses.
let maxTimeBetweenRequests = 30s;
let maxWindowTime = 12h;
let timeRange = 30d;
let authTypes = dynamic(["Anonymous"]);
//
StorageBlobLogs
| where TimeGenerated > ago(timeRange)
// Collect anonymous requests to storage
| where AuthenticationType has_any(authTypes)
| where Uri !endswith "favicon.ico"
| where Category =~ "StorageRead"
// Process the filepath out of the request URI
| extend FilePath = array_slice(split(split(Uri, "?")[0], "/"), 3, -1)
| extend FullPath = strcat("/", strcat_array(FilePath, "/"))
// Extract the IP address, removing the port used
| extend CallerIpAddress = tostring(split(CallerIpAddress, ":")[0])
// Ignore private IP addresses
| where not(ipv4_is_private(CallerIpAddress))
| project
TimeGenerated,
AccountName,
FullPath,
CallerIpAddress,
UserAgentHeader,
StatusCode
| order by TimeGenerated asc
| serialize
// Generate sessions of access activity, where each request is within maxTimeBetweenRequests doens't last longer than maxWindowTime
| extend SessionStarted = row_window_session(TimeGenerated, maxWindowTime, maxTimeBetweenRequests, AccountName != prev(AccountName))
| order by TimeGenerated asc
// Summarize the results using the Session start time
| summarize Paths=make_list(FullPath), Statuses=make_set(StatusCode), CallerIPs=make_list(CallerIpAddress),
DistinctPathCount=dcount(FullPath), AllRequestsCount=count(), CallerIPCount=dcount(CallerIpAddress), CallerUACount=dcount(UserAgentHeader), SessionEnded=max(TimeGenerated)
by SessionStarted, AccountName
// Validate that each path visited is unique, scanners will generally try files once
| where DistinctPathCount > 1 and DistinctPathCount == AllRequestsCount
| order by DistinctPathCount
| extend ["Duration (Mins)"] = datetime_diff("minute", SessionEnded, SessionStarted)
| project-reorder
SessionStarted,
SessionEnded,
['Duration (Mins)'],
AccountName,
DistinctPathCount,
AllRequestsCount,
CallerIPCount,
CallerUACount
Microsoft Sentinel also makes it possible to identify storage accounts where public access is allowed. The following query can be used to identify containers with Public Access or Public Network Access enabled.
AzureActivity
| where TimeGenerated > ago(30d)
// Extract storage write events
| where OperationNameValue =~ "MICROSOFT.STORAGE/STORAGEACCOUNTS/WRITE"
| where ActivityStatusValue =~ "Start"
// Extract public access details from the properties
| extend RequestProperties = parse_json(tostring(Properties_d["requestbody"]))["properties"]
| extend PublicAccess = RequestProperties["allowBlobPublicAccess"]
| extend PublicNetworkAccess = RequestProperties["publicNetworkAccess"]
| extend ResourceId = iff(isnotempty(_ResourceId), _ResourceId, ResourceId)
| extend StorageAccount = split(ResourceId, "/")[-1]
| project
TimeGenerated,
Account=tostring(StorageAccount),
ResourceId,
OperationNameValue,
PublicAccess,
PublicNetworkAccess,
RequestProperties,
ActivityStatusValue
| where isnotempty(PublicAccess)
| summarize
arg_max(TimeGenerated, PublicAccess),
arg_max(TimeGenerated, PublicNetworkAccess)
by Account
| where PublicAccess == true
| project LastStatus=TimeGenerated, Account, PublicAccess, PublicNetworkAccess
| order by LastStatus
Azure Storage accounts store data objects, including blobs, file shares, queues, tables, and disks. The storage account provides a unique namespace for the data to be accessible from anywhere globally. Data in the storage account is durable, highly available, secure, and massively scalable.
Azure Blob Storage is one of the most popular services used in storage accounts. It's Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data which doesn't adhere to a particular data model or definition, such as text or binary data.
The cloud provider's APIs make it easy to retrieve data directly from the storage service, and threat actors leverage it to collect and exfiltrate sensitive information from open resources.
Blob storage offers three types of resources:
Let's take an example that is used along with this post. We created a storage account named "mediaprod1", which has 3 containers named "pics", "vids", and "audio". There are different blobs representing pictures, videos, and audio files in them. The following diagram shows the relationship between the resources:
This is how it looks in the Azure Portal:Containers list
List of blobs within the ‘pics’ container:
Blobs list
The following is important for our topic because this is exactly what threat actors exploit. Every Blob stored in the account has an address that includes a combination of the account name, the blob service name, the container name, and the blob name. This information forms the endpoint URL that allows access to the Blob. The structure is as follows:
https://<<storage-account-name>>.blob.core.windows.net/<<container-name>>/<<blob-name>>
If we take our example, the URL to one of the blobs in the ‘mediaprod1’ account looks like this:
Blob URL breakdown
Data is stored in blobs, and access to that data is determined by the networking rules, storage account access configuration, and the access level to the container that stores the data.
Storage accounts are configured by default to allow public access from the Internet, but it is possible to block it. Containers can be set into three different access levels, allowing the resource owners to determine if access to the data can be unauthenticated (also known as anonymous access) or only with authentication, which requires the storage account key, SAS token, or AAD to access the container and blob information.
MITRE ATT&CK® tactics and techniques covered in this post |
|
Cloud Infrastructure Discovery (Technique T1580) |
An adversary may attempt to discover available infrastructure and resources within an infrastructure-as-a-service (IaaS) environment. This includes computing resources such as instances, virtual machines, and snapshots, as well as resources of other services, including storage and database services. |
Cloud Storage Object Discovery (Technique T1619) |
Adversaries may enumerate objects in cloud storage infrastructure and use this information during automated discovery to shape follow-on behaviors, including requesting all or specific objects from cloud storage. After identifying available storage services, adversaries may access the contents/objects stored in cloud infrastructure. Cloud service providers offer APIs allowing users to enumerate objects stored within cloud storage. Examples include ListObjectsV2 in AWS and List Blobs in Azure. |
Data from Cloud Storage Object (Technique T1530) |
Adversaries may access data objects from improperly secured cloud storage. These solutions differ from other storage solutions (such as SQL or Elasticsearch) because there is no overarching application. Data from these solutions can be retrieved directly using the cloud provider's APIs. |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.