Virtual Network Flow Logs Recipes
Published May 08 2024 12:18 AM 1,031 Views
Microsoft

You might have heard about the General Availability of Virtual Network Flow Logs in Azure, and even read the announcement blog post. When writing that post with Harsha CS I had the chance to play a bit with VNet Flow Logs and Traffic Analytics, and I would like to share some of the learnings.

 

What the heck am I talking about? Let me bring you up to speed very quickly (attention, oversimplification ahead!): NSG Flow Logs is a technology that logs every packet going through an NSG: in and out, allowed and dropped. The main issue of NSG Flow Logs is, well, that you need an NSG, and some resources in Azure do not support them. For example, Azure Firewall, VPN gateways or ExpressRoute gateways. Enter VNet Flow Logs, which you can enable in a whole VNet or subnet, regardless of whether there are NSGs or not.

What is Traffic Analytics, I hear you say? VNet Flow Logs are stored in Azure Blob Storage. Optionally, you can enable Traffic Analytics, which will do two things: it will enrich the flow logs with additional information, and will send everything to a Log Analytics Workspace for easy querying. This "enrich and forward to Log Analytics" operation will happen in intervals, either every 10 minutes or every hour.

VNet Flow Logs give you much more, such as for example whether traffic is vnet-encrypted or not, whether traffic is dropped by AVNM security admin rules, and some more stuff, but you will find all that in the docs so I am not going to repeat it here.

Table structure: NTAIpDetails

This table will contain some enrichment data about public IP addresses, including whether they belong to Azure services and their region, and geolocation information for other public IPs. Here you can see a sample of what that table looks like:

 

image.jpg

 

For example, looking at the NTAIpDetails table you could extract information about what communications are there in place. The query is very simple (NTAIpDetails | distinct FlowType, PublicIpDetails, Location), and it gives you a glimpse of what you can do, especially when joining this table to NTANetAnalytics (see the scenarios further down for examples on how to join this table):

image-1.jpg

NTAIpDetails
| distinct FlowType, PublicIpDetails, Location

Table structure: NTATopologyDetails

This table contains information about different elements of your topology, including VNets, subnets, route tables, routes, NSGs, Application Gateways and much more. Here you cans see what it looks like:

image-2.jpg

 

 

For example, with a simple query you can get the routes in the route tables configured in a given resource group:

image-3.jpg

NTATopologyDetails
| where TimeGenerated > ago(600d)
| where AzureResourceType == "Route"
| extend name_a = split(Name, "/")
| extend ResourceGroup = tostring(name_a[0]), RouteTableName = tostring(name_a[1]), RouteName = tostring(name_a[2])
| where ResourceGroup == "flowlogs"
| distinct ResourceGroup, RouteTableName, RouteName, NextHopType,NextHopIp

Yeah, there are many other ways of getting topology information in Azure, such as with Azure Resource Graph, but having this in a table is pretty handy to do join queries.

Table structure: NTANetAnalytics

Alright, now we are coming to more interesting things: this table is the one containing the flows we are looking for. Records in this table will contain the usual attributes you would expect such as source and destination IP, protocol, and destination port. Additionally, data will be enriched with information such as:

  • Source and destination VM
  • Source and destination NIC
  • Source and destination subnet
  • Source and destination load balancer
  • Flow encryption (yes/no)
  • Whether the flow is going over ExpressRoute
  • And many more

Further below you can read some scenarios with detailed queries that will show you some examples of ways you can extract information from VNet Flow Logs and Traffic Analytics. Of course, these are just some of the scenarios that came to mind on my topology, the idea is that you can get inspiration from these queries to support your individual use case.

Scenario 1: traffic to/from a virtual machine

For example, imagine you want to see with which IP addresses a given virtual machine has been talking to in the last few days:

image-6.jpg

NTANetAnalytics
| where TimeGenerated > ago(60d)
| where SrcIp == "10.1.1.8" and strlen(DestIp)>0
| summarize TotalBytes=sum(BytesDestToSrc+BytesSrcToDest) by SrcIp, DestIp

What if we want to enrich this information with the NTAIpDetails table to get the geolocation of the public IP addresses? Let's have a look:

image-5.jpg

NTANetAnalytics
| where TimeGenerated > ago(60d)
| where SrcIp == "10.1.1.8" and strlen(DestIp)>0
| join kind = leftouter (NTAIpDetails) on $left.DestIp == $right.Ip
| summarize TotalBytes=sum(BytesDestToSrc+BytesSrcToDest) by SrcIp, DestIp, PublicIpDetails, Location

Mmmh, that packet host inc. looks suspicious, so now you might be interested in getting a time distribution on the protocols that have been going on there. Nothing easier! We can see here that there have been two big data transfers of around 1GB in two different days, the last one on 23rd April.

image-7.jpg

NTANetAnalytics
| where TimeGenerated > ago(60d)
| where SrcIp == "10.1.1.8" and DestIp == "136.144.58.113"
| join kind = leftouter (NTAIpDetails) on $left.DestIp == $right.Ip
| extend App = strcat(L4Protocol,tostring(DestPort))
| summarize TotalBytes=sum(BytesDestToSrc+BytesSrcToDest) by App, bin(TimeGenerated, 1d)
| render columnchart  

Mmmmh, somebody exfiltrating data?

Scenario 2: load balancer traffic distribution

You want to look into the traffic distribution of a given application front-ended by a load balancer? Two of the enrichment fields in the NTANetAnalytics table are SrcLoadBalancer and DestLoadBalancer, that we can leverage for this purpose. We will look at the source IP first, meaning traffic going from the VM to the load balancer:

image-8.jpg

NTANetAnalytics
| where SubType == 'FlowLog' and TimeGenerated > ago(60d)
| where SrcLoadBalancer contains 'web' or DestLoadBalancer contains 'web'
| summarize TotalBytes = sum(BytesSrcToDest + BytesDestToSrc) by tostring(SrcIp)
| render piechart

Interesting, it looks like 10.1.1.70 is not getting too much traffic? Let's look here to the time distribution as well, maybe that machine hasn't been there for a long time:

image-12.jpg

NTANetAnalytics
| where SubType == 'FlowLog' and TimeGenerated > ago(60d)
| where SrcLoadBalancer contains 'web' or DestLoadBalancer contains 'web'
| summarize TotalBytes = sum(BytesSrcToDest + BytesDestToSrc) by tostring(SrcIp), bin(TimeGenerated, 1d)
| render barchart 

We can have a look at the Destination IP as well, which will show an interesting picture:

image-13.jpg

NTANetAnalytics
| where SubType == 'FlowLog' and TimeGenerated > ago(60d)
| where SrcLoadBalancer contains 'web' or DestLoadBalancer contains 'web'
| summarize TotalBytes = sum(BytesSrcToDest + BytesDestToSrc) by tostring(DestIp), bin(TimeGenerated, 1d)
| render barchart 

Wow, it looks like we need to have a look at these VMs, something seems not to be right with our load distribution!

Scenario 3: Traffic between IP ranges

Traffic Analytics will enrich the information with information such as source and destination subnets:

image-14.jpg

NTANetAnalytics
| where SubType == 'FlowLog' and FaSchemaVersion == '3' and TimeGenerated > ago(60d)
| where isnotempty(SrcSubnet) and isnotempty(DestSubnet)
| summarize TotalBytes=sum(BytesSrcToDest + BytesDestToSrc) by SrcSubnet, DestSubnet,L4Protocol,DestPort

However, sometimes you want to do a different data aggregation, for example if you would like to see traffic between on-premises and Azure. In this case you can define the aggregation prefixes yourself and use the handy KQL function ipv4_is_in_range:

image-10.jpg

let prefix1="10.1.1.0/27";
let prefix2="10.1.1.64/27";
NTANetAnalytics
| where SubType == 'FlowLog' and FaSchemaVersion == '3' and TimeGenerated > ago(30d)
| extend SrcIpIsInPrefix1 = ipv4_is_in_range(SrcIp, prefix1), SrcIpIsInPrefix2 = ipv4_is_in_range(SrcIp, prefix2)
| extend DestIpIsInPrefix1 = ipv4_is_in_range(DestIp, prefix1), DestIpIsInPrefix2 = ipv4_is_in_range(DestIp, prefix2)
| where (SrcIpIsInPrefix1 and DestIpIsInPrefix2) or (SrcIpIsInPrefix2 and DestIpIsInPrefix1)
| summarize TotalBytes=sum(BytesSrcToDest + BytesDestToSrc) by SrcIp, DestIp,L4Protocol,DestPort,L7Protocol

You want this information but distributed across time, to have a look at traffic evolution? You got it. We can create a new field to aggregate the information about source, destination and protocol, and represent to a stacked chart:

image-11.jpg

let prefix1="10.1.1.0/27";
let prefix2="10.1.1.64/27";
NTANetAnalytics
| where SubType == 'FlowLog' and FaSchemaVersion == '3' and TimeGenerated > ago(30d)
| extend SrcIpIsInPrefix1 = ipv4_is_in_range(SrcIp, prefix1), SrcIpIsInPrefix2 = ipv4_is_in_range(SrcIp, prefix2)
| extend DestIpIsInPrefix1 = ipv4_is_in_range(DestIp, prefix1), DestIpIsInPrefix2 = ipv4_is_in_range(DestIp, prefix2)
| where (SrcIpIsInPrefix1 and DestIpIsInPrefix2) or (SrcIpIsInPrefix2 and DestIpIsInPrefix1)
//| summarize TotalBytes=sum(BytesSrcToDest + BytesDestToSrc) by SrcIp, DestIp,L4Protocol,DestPort,L7Protocol
| extend FlowDescription = strcat(SrcIp, "-", DestIp, "-", L4Protocol, tostring(DestPort))
| summarize TotalBytes=sum(BytesSrcToDest + BytesDestToSrc) by FlowDescription, bin(TimeGenerated, 1d)
| render columnchart 

Interesting, a lot of traffic on port 80, maybe somebody should look at migrating to HTTPS?

Scenario 4: ExpressRoute traffic

You know that VNet Flow Logs are not tied to NSGs as we saw in the previous scenario about Azure Firewall. Guess what, this applies as well to VPN and ExpressRoute gateways. More concretely, with ExpressRoute we can even leverage the fields SrcExpressRouteCircuit and DestExpressRouteCircuit:

image-9.jpg

NTANetAnalytics
| where SubType == 'FlowLog' and TimeGenerated > ago(60d)
| where isnotnull(SrcExpressRouteCircuit) or isnotnull(DestExpressRouteCircuit)
| extend TargetResourceName = tostring(split(TargetResourceId, "/")[2])
| summarize TotalBytes=sum(BytesSrcToDest + BytesDestToSrc) by TargetResourceName, bin(TimeGenerated, 1d)
| render columnchart 

We can use the previous recipe for prefix aggregation to show traffic from onprem to Azure and Azure to onprem:

image-15.jpg

let prefix1="10.4.0.0/16";
let prefix2="10.1.0.0/16";
NTANetAnalytics
| where SubType == 'FlowLog' and FaSchemaVersion == '3' and FlowStartTime > ago(24h)
| extend SrcIpIsInPrefix1 = ipv4_is_in_range(SrcIp, prefix1), SrcIpIsInPrefix2 = ipv4_is_in_range(SrcIp, prefix2)
| extend DestIpIsInPrefix1 = ipv4_is_in_range(DestIp, prefix1), DestIpIsInPrefix2 = ipv4_is_in_range(DestIp, prefix2)
| where (SrcIpIsInPrefix1 and DestIpIsInPrefix2) or (SrcIpIsInPrefix2 and DestIpIsInPrefix1)
| extend Direction = iff((SrcIpIsInPrefix1 and DestIpIsInPrefix2), "Onprem2Azure", "Azure2Onprem")
| summarize TotalBytesSrcToDest=sum(BytesSrcToDest), TotalBytesDestTosrc=sum(BytesDestToSrc) by Direction
| render columnchart

Wrapping up

These were only some examples of how you can slice and dice the data in VNet Flow Logs, please do not hesitate to let me know in the comments about other cool KQL queries you are using!

Co-Authors
Version history
Last update:
‎May 08 2024 12:18 AM
Updated by: