Microsoft Entra Suite Tech Accelerator
Aug 14 2024, 07:00 AM - 09:30 AM (PDT)
Microsoft Tech Community
Azure Web Application Firewall- Bot Manager Scenarios
Published Jun 26 2023 08:09 AM 7,164 Views


This article is part of our ongoing efforts to continually develop strategies against malicious bots.

The continuous integration of bots to simulate human engagement, especially for unethical activities in web applications lead to both security incidents and diversion of engagement with web resources. The advent of new AI projects and LLMs (Large Language Models) have also opened more avenues for vulnerabilities including prompt injections, data leakage, training data poisoning, unauthorized code execution etc.

It is essential to grant bots access to your websites. Microsoft Bot Manager Ruleset coupled with Web Application Firewall have been positioned to reduce the effect of illegitimate non-human access using different methods such as verified labels, static analysis (rate limit) and other behavioral analysis.

Some bots are of good category such as search engines for indexing bots, content fetchers, automated response services, and some are bad such as spam bots, scrapers, and brute force bots. There is also the grey area of bots that overwhelm your resources due to the share volume of requests.

Search engine optimization tools, for example use bots to perform SEO (search engine optimization) analytics for websites. These bots typically comply with instructions and policies set in the robots.txt file. Careful pruning of bot access includes using the robot.txt file to specify access, rate limiting, and bot challenges Some bots are also used for price extrapolation for web apps that compare prices of products. However, if you are focused on search engine optimization for your business and experience an excessive load from too many requests, you may struggle to slow down crawlers without blocking them. To resolve this, you can implement a combination of bot control detection and take advantage of rate-based rules with a response status code.

When you see contents from your websites mirrored on another website, you have seen the work of bots that aggregate contents for the bot owner’s web page to increase traffic and divert engagement to the owner’s page. The mirrored page might even appear above the original source in search requests.


Bot Protection Rule Action

Rule structure
There are different components of a bot manager rule. There is rule ID, state, action, and exclusions. For example, rule Bot200100 below for Search engine crawlers. We can change the enabledState to disable it and change the action if we do not want search engine crawlers.




           "ruleId": "Bot200100",

           "enabledState": "Disabled",

           "action": "Allow",

            "exclusions": []






Some bots have verified classification while there are also bots with unknown classification. In the example above, we have a good bot category.

If we want to block a known bot, for example an evilbot, you can use the several ways as explained in the documentation. 

We will explore how to use Custom Rules in Azure WAF to block Bots, in later sections.

Good Bot

As mentioned earlier on this post, some bots are necessary for ad targeting, indexing, social media, personal assistants etc.

The step below highlights a simple way a known good bot may be used for SEO purposes.

A user might want to use a SEO bot to query the Bingbot API to confirm that a particular IP address is in their database. By appending the IP address to a GET request (in this case our juice shop website IP address), we obtain a negative response that it is not in the database (false in the result) and a 200 OK. This is a simple request that is otherwise harmless.




When a good bot is allowed e.g., for google ads, you can verify match in the log.



Bad bot

Bad bots are developed for malicious purposes such as bot nets, click frauds, spam, scrapers etc. They may also be used in IP spoofing. IP spoofing occurs when the IP header of a source packet is manipulated to reflect a different IP address in order to mask origin information for malicious activities. When spoofing, an attacker can deceive a target system into sending sensitive data to a false destination, thereby enabling unauthorized access to the data.

In this example, we will use the curl- command to show how WAF with Bot Manager responds when a request is made with at an attempt to falsify packet origin.

Use the Curl command with the User agents for Bing bot to query the Juice shop application as a valid request. You can find more information about the latest user agent here



curl -I -A "Mozilla/5.0 (compatible; bingbot/2.0;" http://juiceshop.server/#/


(Note: juiceshop.server has been mapped to the App Gateway Public IP in my host file for this blog)


Now, we craft a curl command to inject a different public IP as our request origin by using (X-Forwarded-For) XFF IP address in the header for a Yandex user-agent.



curl -H "X-Forwarded-For: XFF:" -A "Mozilla/5.0 (compatible; YandexBot/3.0; +" http://juiceshop.server/#/





We can see the WAF/BotManager response as shown in the log with reference to false identity.



Scenario 2:  Deny Bot access- Web Scraper

Most websites use the robot.txt file to control bot access to their application. This page is an example of a robot.txt file that shows permission status for various web request types. While you can use “User-agent: * Disallow:/ “ to disable all bot access, SEO and ranking bots may be affected as well. We can use the user-agent type to control access to resources provided in the response. As an example, the following is a python script with a chrome user-agent.



import requests
import time

url = "http://juiceshop.server/#/"
headers = {
   'authority': 'juiceshop.server',
    'dnt': '1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': same-origin,
    'sec-fetch-mode': 'navigate',
    'sec-fetch-user': '?1',
    'sec-fetch-dest': 'document',
    'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',

response = requests.get(url, headers=headers)



Replace the URL variable with your URL, specify user-agent of choice and change the remainder fields as needed. (It's pertinent to mention that, even browser detection using the user agent string is unreliable hence the version number should be specified as well). The dnt (do not track) field has been specified as 1- to opt out of tracking across online activities. The sec-fetch-dest field is an example of a logical field to restrict bot requests. This field is used to specify the response type expected. When preventing bots to our web apps, If the destination type specified is audio and you do not serve audio content, you can deny such requests. In this example, we are expecting a type: document in the response.


Run the .py file and this should give Status Response: [200] OK when run. Change line 20 to print(response.text) to see the document.



We can now set up a custom rule to deny this type of access using this user-agent.




If we run the python file this time, we should get Error: 403 as the App Gateway WAF prevents access. Log image below




Scenario 3: Allow blocked bot

There are scenarios where a legitimate or unverified bot is denied access by the web application firewall. A user may want to permit this bot access to the application. For instance, an organization wants to take advantage of a customer survey bot API or leverage on new AI bot development.
In this example, a bot used to obtain and compare different bank data such as interest and API rates needs to fetch information about card services. The target bank Alturo Mutual (demo website for OWASP attack validation) has the URL As seen in the log, this was blocked by the BotManager ruleset under rule 300700.




While an exclusion may be used on Rule 300700 to permit bot access to the web application, this becomes too permissive for other bots as the selector field is no longer inspected for the User-Agents in the BotManager Ruleset 300700 category.




In order to allow access to this bot specifically, we can use the origin IP (clientIp_s) or the Request Header in the details_data_s which uses the User-Agent, in a custom rule and specify the User-agent python-requests/2.31.0. We then set the Action to "Allow" to grant the access.



## create the rule

$variable = New-AzApplicationGatewayFirewallMatchVariable `
   -VariableName RequestHeaders `
   -Selector User-Agent

$condition = New-AzApplicationGatewayFirewallCondition `
   -MatchVariable $variable `
   -Operator Contains `
   -MatchValue "python-requests/2.31.0" `
   -Transform Lowercase `
   -NegationCondition $False

$rule = New-AzApplicationGatewayFirewallCustomRule `
   -Name Allowbankbot `
   -Priority 4 `
   -RuleType MatchRule `
   -MatchCondition $condition `
   -Action Allow `
   -State Enabled

## Get the existing policy

$appGwpolicy = Get-AzApplicationGatewayFirewallPolicy -Name SOC-NS-AGPolicy -ResourceGroupName SEA_tf_demo

## Add the newly created rule

# Update the App Gateway policy
Set-AzApplicationGatewayFirewallPolicy -InputObject $appGwpolicy


The bot is now permitted access to fetch the html elements.




By default, the Azure BotManager rule set is deployed with known good bots allowed, bad bots  are mostly blocked, and unknown bots are reviewed by the Threat Intelligence and logged or blocked depending on the actions assigned to them.


In conclusion, preventing bot access to your web resources involves multi-layered approach that combines security practices such as using updated WAF with Bot Managers, rate limiting, bot challenges and IP blocking. Regularly monitoring and analyzing website traffic, user behavior, and server logs can provide insights into bot activities and assist in refining security measures. 





Version history
Last update:
‎Jun 26 2023 08:12 AM
Updated by: