Exporting AWS CloudWatch Logs to Azure Data Explorer
Published Mar 31 2022 08:25 AM 4,780 Views
Microsoft

If you’re working in Amazon Web Services (AWS) or have a multi-cloud environment that includes AWS, a solid logging and monitoring solution should be an essential part of your deployment and monitoring strategy. In AWS, this takes the form of CloudWatch, a service designed to collect diagnostic logging, utilization metrics, and events inside your service(s). However, if you’re looking for a fast, scalable solution to query your logs, it is possible to take the data collected in your CloudWatch Log Groups and send that information over to Azure Data Explorer and then use the power of Kusto to provide powerful analysis and reporting over any volume of your logging data for all your log groups.

 

Our product engineers have put together a sample architecture guide to help you achieve this. The process looks like this:

 

Picture1.png

 

Essentially, you’ll be utilizing an AWS Lambda Function for each log group you want to capture into log streams which you will then send to your Azure Data Explorer ingestion endpoint.

 

Let’s dive into some more details of the solution.

CloudWatch Log Groups

Inside CloudWatch, your log groups are where we start. In the example below, let’s assume you have a log group (kusto_log_group) already defined:

 

Picture2.png

 

For each log group, you can define subscription filters that can be used to grab the events from the log group (or filter them) and then send them to a Lambda function. You'll come back to this interface once you deploy our function.

AWS Lambda

 

The core “mover” of our logs will be a Lambda service. Using subscription filters allows you to create a stream of events from our log groups and to allow a Lambda service as a target. As a result, your log data will be sent to the service as a base64 encoded string. The Lambda function then in turn takes this data and relays it to the Azure Data Explorer ingestion endpoint. We have a sample which you can view on GitHub that was written in Node.js.

 

One thing to keep in mind is that you want to try and keep your Lambda service as lightweight as possible. As such, don’t do any processing of this data as the function receives it; leave that to Azure Data Explorer. Instead the event metadata will determine where the log data needs routed to and will be used as the data is being ingested.

 

The other thing to call out in the code sample is the endpoint you’ll be sending the data to. We’ll use our ingestion endpoint of our cluster on the Azure Data Explorer side. Note that in the below example environment variables are used to store the cluster details and authentication credentials, and dynamically building the ingestion endpoint with KustoConnectionStringBuilder:

 

 

 

 

 

 

 

const clusterName = process.env.ADX_CLUSTER_NAME;
const appId = process.env.AAD_APP_ID;
const appKey = process.env.AAD_APP_KEY;
const authorityId = process.env.AAD_AUTHORITY_ID;

const kcsb = KustoConnectionStringBuilder.withAadApplicationKeyAuthentication(`https://ingest-${clusterName}.kusto.windows.net`, appId, appKey, authorityId);

 

 

 

 

 

These credentials take the form of an Azure App Registration. If you've never created an application registration or secret before, note that you need an Azure account to do so (check out the bottom of the article for a link to get started). If you want more details about creating your sample application registration, we've provided a quick start here: https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app  (you'll need to create the application registration, and then create a secret to use in your environment variables. Your authority is also know as your Tenant ID).

 

One final note: once you deploy your function, you might want to consider increasing the time out of the function to around 10 seconds to account for any network or function latency.

 

 

Azure Data Explorer

 

As stated above, since you’ll want to keep your Lambda function processing both lightweight and easy to maintain, you don’t want to have separate functions and services for each log type you want to ingest. Instead, leverage the concept of update policies in Azure Data Explorer to take the raw incoming data streams, decode the base64 data, and push the data to the appropriate tables within your database. For example, if you create a landing (raw) table and a table mapping like so:

 

.create table SourceTable (payload:string) 

.create-or-alter table SourceTable ingestion json mapping "SourceTableReferenceMapping" ```
[
{ "column" : "payload", "datatype" : "string", "Properties":{"Path":"$.payload"}}
]```

The first code block creates the landing table from AWS. Since the data is coming to the table as a base64 encoded string, you only need one column. Then you run the second command to create an ingestion mapping for the incoming data to the column in the table.

 

Putting it all together

Once the Lambda function is deployed and the destination table created, turn on your Lambda trigger for our log group. For example, inside of your “kusto_log_group” log group, set up a subscription filter to call the Lambda endpoint:

 

Picture3.png

 

When setting up the subscription filter, you can leave most of the options as default. If you want to change your log format, you'll want to keep that in mind when processing your messages in Azure Data Explorer. For now, if you leave the log format as "Other" the raw message data will come across as-is:

 

Screen Shot 2022-03-31 at 11.15.46 AM.png

 

 

 

As part of the data ingestion, the  gzip_decompress_from_base64_string function will be responsible for decoding the payload. However, you can take this one step further by creating an update policy on our table to automatically take the raw payload and put it into a target table using KQL functions. For instance, if you wanted to capture the logs of a specific service, first create a function that queries the raw data to filter these events:

 

.create-or-alter function ExtractKustoLogGroupLogs() 
{
SourceTable
| project awsObject = todynamic(gzip_decompress_from_base64_string(payload))
| where awsObject.logGroup == "kusto_log_group"
| mv-expand logEvents = awsObject.logEvents
| project message = tostring(logEvents.message), id = tostring(logEvents.id), timestamp = unixtime_milliseconds_todatetime(todouble(logEvents.timestamp))
}

Next, create a table to hold the decompressed records:

 

.create table KustoLogGroupTable (message:string, id:string, timestamp:datetime )

Then, on your target table, create an update policy that will execute any time new records are applied to the source table:

 

.alter table KustoLogGroupTable policy update 
@'[{ "IsEnabled": true, "Source": "SourceTable", "Query": "ExtractKustoLogGroupLogs()", "IsTransactional": true, "PropagateIngestionProperties": false}]'

And finally, you’ll want to set a soft delete policy on the SourceTable to 0 seconds, which means that after the update policy places the decoded data in the destination table, the “raw” records won’t be retained (and take up any unneeded space):

.alter-merge table SourceTable policy retention softdelete = 0s

With all of this in place when you query this new table, you’ll see your data:

Picture4.png

 

Try this Today, For Free!

 

If you’re interested in trying this out today, you can start with your own free Azure Data Explorer cluster to host your incoming data. In addition, you will need an Azure Active Directory tenant for automating your data ingestion. For more details on getting started with your own free Data Explorer cluster, you can read out the Start For Free clusters here.  If you need to set up an Azure account (for creating your automation accounts), you can do so by following this link.

1 Comment
Co-Authors
Version history
Last update:
‎Apr 04 2022 06:12 AM
Updated by: