This article describes how to aggregate the Azure Storage logs collected using the Diagnostic settings in Azure Monitor when selecting an Azure Storage Account as destination. This approach downloads the logs and aggregates them on your local machine.
Please keep in mind that in this article, we only copy the logs from the destination storage account. They will remain on that storage account until you delete them.
At the end of this article, you will have access to a CSV file that will contain the information for the current log structure (e.g.: timeGeneratedUTC, resourceId, category, operationName, statusText, callerIpAddress, etc.).
This script was developed and tested using the following versions but it is expected to work with previous versions:
This article is divided into two steps:
Create Diagnostic Settings to capture the Storage Logs and send them to an Azure Storage Account
Use AzCopy and Python to download and to aggregate the logs
Each step has a theoretical introduction and a practical example.
Critical and business processes that rely on Azure resources can be monitored (availability, performance, and operation) using Diagnostic Settings. Please review this documentation to understand better about Monitoring Azure Blob Storage.
To collect resource logs, you must create a diagnostic setting. When creating a diagnostic setting you can specify one of the following categories of operations for which you want to collect logs (please see more information here Collection and routing)
Support documentation:
As mentioned above, in this article we will explore the scenario of using an Azure Storage account as the destination. Please keep in mind of the following:
To understand how to create a Diagnostic Setting please review this documentation Create a diagnostic setting. This documentation shows how to create a diagnostics setting to send the logs to a Log Analytics workspace. To follow this article, you will need, on the Destination details, to select "Archive to a storage account" instead of "Send to Log Analytics workspace". You can use this documentation if you want to send the logs to a Log Analytics workspace.
An important remark is that in this article, we only copy the logs to the local machine, we do not delete any data from your storage account.
Following this documentation (Create a diagnostic setting), I created a diagnostic setting and selected the following Logs categories ['StorageRead', 'StorageWrite', 'StorageDelete'] and Metrics 'Transaction'. Please keep in mind that for this article, I will only create a diagnostic setting for blobs although it is possible to create a diagnostic setting also for Queue, Table, and File.
Please note that on the storage account defined as the "Destination", you should see the following containers: ['insights-logs-storagedelete', 'insights-logs-storageread', 'insights-logs-storagewrite']. Also, it could take some time for the containers to be created, and it will depend if you selected all the categories and when any log is created for each category.
In this step, we will use AzCopy to retrieve the logs from the Storage Account and then, we will use Python to consolidate the logs.
AzCopy is a command-line tool that moves data into and out of Azure Storage. Please review our documentation about AzCopy Get started with AzCopy. On this documentation you will understand how to Download AzCopy, Run AzCopy, and how to Authorize AzCopy.
For this practical example, we need two storage accounts:
Download AzCopy, unzip the file and copy the path to a notepad. This path will be needed later.
os
subprocess
pip install pandas
Please find below all Python script components explained. The full script will be available after.
Imports needed for the script
import os
import subprocess
import shutil
import pandas as pd
Auxiliary functions
Function to list all files under a specific directory:
# Inputs:
# dirName - Directory path to get all the files
# Returns:
# A list of all files under the dirName
def getListOfFiles(dirName):
# create a list of file and sub directories
# names in the given directory
listOfFile = os.listdir(dirName)
allFiles = list()
# Iterate over all the entries
for entry in listOfFile:
# Create full path
fullPath = os.path.join(dirName, entry)
# If entry is a directory then get the list of files in this directory
if os.path.isdir(fullPath):
allFiles = allFiles + getListOfFiles(fullPath)
else:
allFiles.append(fullPath)
return allFiles
Function to retrieve the logs using AzCopy:
# Inputs:
# azcopy_path: Path to the AzCopy folder
# storageEndpoint: Storage endpoint
# sasToken: SAS token to authorize the AzCopy operations
# path: Path where the logs are on the Azure Storage Account
# localStorage: Path where the logs will be stored on the local machine
# Returns:
# The logs as they are on the Azure Storage Account
def getLogs(azcopy_path, storageEndpoint, sasToken, path, localStorage):
# Define any additional AzCopy command-line options as needed
options = "--recursive"
# Construct the source_url
source_url = storageEndpoint + path + sasToken
# Construct the AzCopy command
azcopy_command = azcopy_path + " " + "copy " +'"'+ source_url + '" ' + localStorage + " " + options
# Execute the AzCopy command
subprocess.run(azcopy_command, shell=True)
Parameters definition
Please see below the parameters that we need to specify - Information need during the script execution:
AzCopy:
azcopy_path: path to the AzCopy executable
Storage account logs destination info (Storage account name where the logs are being stored):
storageAccountName: the storage account name where the logs are stored
sasToken: SAS token to authorize the AzCopy operations
start: As presented above, the blobs within the container use the following naming convention (please see more here Send to Azure Storage)
insights-logs-{log category name}/resourceId=/SUBSCRIPTIONS/{subscription ID}/RESOURCEGROUPS/{resource group name}/PROVIDERS/{resource provider name}/{resource type}/{resource name}/y={four-digit numeric year}/m={two-digit numeric month}/d={two-digit numeric day}/h={two-digit 24-hour clock hour}/m=00/PT1H.json
If we want the logs for a specific year, 2023 for instance, you should define start = "y=2023"
If we want the logs for a specific month, May 2023 for instance, you should define start = "y=2023/m=05"
If we want the logs for a specific day, 31st of May 2023, you should define start = "y=2023/m=05/d=31"
Local machine information - Path on local machine where to store the logs
logsDest: Path on local machine (Where to store the logs)
This path is composed of the main folder to store the logs, defined by you, and inside that folder will be created a folder with the storage account name logged, and a sub folder with the start field defined above
For instance, if you want to store the logs collected about storage account name test, for the entire 2023 year, on a folder on the following path c:\logs. After executing the script, you will have a the following structure: c:\logs\test\logs_y=2023
# -------------------------------------------------------------------------------------------------------
# AzCopy path
# -------------------------------------------------------------------------------------------------------
azcopy_path = "C:\\XXX\\azcopy_windows_amd64_10.19.0\\azcopy.exe"
# -------------------------------------------------------------------------------------------------------
# Storage account information where the logs are being stored (storage account logs destination info):
storageAccountName = "XXX"
storageEndpoint = "https://{0}.blob.core.windows.net/".format(storageAccountName)
sasToken = "XXXX"
# -------------------------------------------------------------------------------------------------------
# Storage account to be logged. Information regarding the storage account where we enabled the Diagnostic Setting logs
subscriptionID = "XXX"
resourceGroup = "XXXX"
storageAccountNameGetLogs = "XXXX"
start = "XXXX"
# The next variables are composed based on the information presented above
storageDeleteLogs = "insights-logs-storagedelete/resourceId=/subscriptions/" + subscriptionID + "/resourceGroups/" + resourceGroup + "/providers/Microsoft.Storage/storageAccounts/" + storageAccountNameGetLogs + "/blobServices/default/" + start
storageReadLogs = "insights-logs-storageread/resourceId=/subscriptions/" + subscriptionID + "/resourceGroups/" + resourceGroup + "/providers/Microsoft.Storage/storageAccounts/" + storageAccountNameGetLogs + "/blobServices/default/" + start
storageWriteLogs = "insights-logs-storagewrite/resourceId=/subscriptions/" + subscriptionID + "/resourceGroups/" + resourceGroup + "/providers/Microsoft.Storage/storageAccounts/" + storageAccountNameGetLogs + "/blobServices/default/" + start
# -------------------------------------------------------------------------------------------------------
# Local machine information - Path on local machine where to store the logs
# -------------------------------------------------------------------------------------------------------
search = "logs_" + start.replace("/", "_")
logsDest = "C:\\XXX\\XXX\\Desktop\\XXX\\" + storageAccountNameGetLogs + "\\" + search + "\\"
# The next variables are composed based on the information presented above.
# The following folders will store temporarily all the individual logs. They will be deleted after all the logs are consolidated
localStorageDeleteLogs = logsDest + "storagedeleteLogs"
localStorageReadLogs = logsDest + "storagereadLogs"
localStorageWriteLogs = logsDest + "storagewriteLogs"
To download all the logs
If you want to download all the logs (Delete, Read, Write operations), please keep the code below as it is. Comment the lines regarding the logs that you do not want to download. Just add # at the beginning of the line.
print("\n")
print("#########################################################")
print("Downloading logs from the requests made on the storage account name: {0}".format(storageAccountNameGetLogs))
print("\n")
getLogs(azcopy_path, storageEndpoint, sasToken, storageDeleteLogs, localStorageDeleteLogs)
getLogs(azcopy_path, storageEndpoint, sasToken, storageReadLogs, localStorageReadLogs)
getLogs(azcopy_path, storageEndpoint, sasToken, storageWriteLogs, localStorageWriteLogs)
Merge all log files into a single file
To merge all the logs into a single file (csv format), please run the following code:
# Inputs:
# logsDest: Path on local machine (Where to store the logs)
# Returns:
# A csv file sorted by time asc, and some expanded fields
print("#########################################################")
print("Merging the log files")
print("\n")
read_files = getListOfFiles(logsDest)
destinationFileJson = logsDest + "logs.json"
with open(destinationFileJson, "wb") as outfile:
for f in read_files:
with open(f, "rb") as infile:
outfile.write(infile.read())
# Read the JSON file into a DataFrame
df = pd.read_json(destinationFileJson, lines=True)
# Sort by time asc
df = df.sort_values('time')
# Change time format
df['time'] = pd.to_datetime(df['time'])
# Split resourceId to create three new columns (subscription, resourceGroup, provider)
df['subscription'] = df['resourceId'].apply(lambda row: row.split("/")[2])
df['resourceGroup'] = df['resourceId'].apply(lambda row: row.split("/")[4])
df['provider'] = df['resourceId'].apply(lambda row: row.split("/")[6])
# Split properties column to create a column for each property
df = pd.concat([df.drop('properties', axis=1), df['properties'].apply(pd.Series)], axis=1)
# Split identify column to create a column for each identify
df = pd.concat([df.drop('identity', axis=1), df['identity'].apply(pd.Series)], axis=1)
df = df.rename(columns={'time' : 'timeGeneratedUTC', 'type': 'authenticationType', 'tokenHash': 'authenticationHash'})
df = df.reset_index(drop=True)
# Save log file in csv format
destinationFileCSV = logsDest + "logs.csv"
df.to_csv(destinationFileCSV, sep = ",", index = False)
print("######################################################### \n")
print("Clean temporary files \n")
if os.path.exists(destinationFileJson):
os.remove(destinationFileJson)
print(f"{destinationFileJson} has been deleted.")
else:
print(f"{destinationFileJson} does not exist.")
print("\n")
try:
shutil.rmtree(localStorageDeleteLogs)
print(f"{localStorageDeleteLogs} and its contents have been deleted.")
except OSError as e:
print(f"Error: {localStorageDeleteLogs} and its contents cannot be deleted. {e}")
print("\n")
try:
shutil.rmtree(localStorageReadLogs)
print(f"{localStorageReadLogs} and its contents have been deleted.")
except OSError as e:
print(f"Error: {localStorageReadLogs} and its contents cannot be deleted. {e}")
print("\n")
try:
shutil.rmtree(localStorageWriteLogs)
print(f"{localStorageWriteLogs} and its contents have been deleted.")
except OSError as e:
print(f"Error: {localStorageWriteLogs} and its contents cannot be deleted. {e}")
print("\n ######################################################### \n")
print("Script finished. The logs from the requests made on the storage account name {0} are merged.".format(storageAccountNameGetLogs))
print("Please see below resources created. \n")
print("Local machine storage merged logs location:")
print("- csv file: ", destinationFileCSV)
print("\n#########################################################")
The full python script is attached to this article.
Output
To understand better the parameters included on the logs after executing this full code script, please review Azure Monitor Logs reference - StorageBlobLogs.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.