An Approach to Unit Testing ADX Functions

dgiard

Microsoft

Jun 01, 2022

ADX Unit Testing

Overview

Our application contains many functions that return data stored in Azure Data Explorer (ADX). We wrote these functions in Kusto Query Language (KQL) and each function returns a set of data based on the arguments passed. Although developers tested these functions as they wrote them, we needed a way to validate that the functions continued to work as the code and the data changed.

Automated Unit testing is an essential part of any application development life cycle. It validates that code works properly and minimizes the risk that future code changes will break existing functionality.

In this article, I will discuss the approach we took in automating the testing of ADX functions.

The KQL Assert Function

Azure Data Explorer does not ship with a unit testing framework, but Kusto Query Language has a static assert function that can be used to test functions and queries.

The assert function accepts two arguments: a Boolean condition and a string. If the Boolean condition returns false, an error is raised, and the string is returned.

We can use this to test that our ADX functions perform as expected.

Currently, a test will continue to run even after an error occurs; but, a future version of Kusto.CLI will include a switch allowing a script to terminate as soon as an error occurs.

A Sample Test

For example, the following code tests if table1 contains any rows.

let testTitle = 'table1 should return some rows';
let checkfn = (rowCount: long) 
{
  assert(rowCount > 0, testTitle)
};
let results = table1;
print testTitle, checkfn(toscalar(results
    | summarize rowCount = count()
    ))

The code above throws an exception and outputs "table1 should return some rows" if no data is in table1.

We can be more deliberate in our tests and test the flow of data into the table (via Update Policies or into Materialized views) if we populate our tables with well-known data. ADX provides the .ingest command, which allows us to script data ingestion.

For small datasets, we can use the .ingest inline command. For larger datasets, we can ingest from Azure storage.

Assume we have an AssetHistory table with the data below:

timeStamp	organizationId	assetId	location
2022-02-08T21:25:03Z	ORG-0001	90000001	{"lat":34.13803101,"lon":-84.13373566}
2022-02-08T21:25:04Z	ORG-0001	99999934	{"lat":34.13801193,"lon":-84.13371277}
2022-02-08T21:25:04Z	ORG-0001	99999939	{"lat":34.06805801,"lon":-84.28517914}
2022-02-08T21:25:06Z	ORG-0001	90000003	{"lat":34.11829376,"lon":-84.24249268}
2022-02-08T21:25:36Z	ORG-0001	99999931	{"lat":43.05483246,"lon":-89.44286346}
2022-02-08T21:27:08Z	ORG-0001	99999935	{"lat":40.12084579,"lon":-75.38314056}
2022-02-08T21:29:28Z	ORG-0001	99999936	{"lat":40.02775955,"lon":-82.80722046}
2022-02-08T21:29:38Z	ORG-0001	99991000	{"lat":43.86902237,"lon":-78.85910034}
2022-02-08T21:30:04Z	ORG-0001	99999939	{"lat":34.06805420,"lon":-84.28518677}
2022-02-08T21:30:04Z	ORG-0001	99999934	{"lat":34.13801193,"lon":-84.13371277}
2022-02-08T21:30:06Z	ORG-0001	90000003	{"lat":34.11829376,"lon":-84.24250031}
2022-02-08T21:32:08Z	ORG-0001	99999935	{"lat":40.12084579,"lon":-75.38313293}
2022-03-12T21:32:08Z	ORG-0001	99999935	{"lat":42.12084579,"lon":-77.38313293}

The getassetHistory function listed below filters the AssetHistory table by the input arguments. However, it also has some logic to ignore filters if an argument is not passed to the function.

.create-or-alter function
  with (docstring = "Find Asset History of an asset", skipvalidation = "true" )
    getassetHistory(
        organizationIdFilter:string,
        assetIdFilter:string,
        startTimeStamp:datetime,
        endTimeStamp:datetime,
        locationInPolygon: string=""){
        AssetHistory
          | where isnull(startTimeStamp) or isnull(endTimeStamp)
              or TimeStamp between (startTimeStamp .. endTimeStamp)
          | where (isempty(organizationIdFilter) or organizationId == organizationIdFilter)
              and assetId == assetIdFilter
          | where isempty(locationInPolygon)
              or geo_point_in_polygon(todouble(location.lon), todouble(location.lat), todynamic(locationInPolygon))
          | project
                TimeStamp,
                assetId,
                organizationId,
                tostring(location)
      ;
    }

We can use assert to write some tests. We know in advance the data in the table, so we know how many rows the function should return for a given set of input arguments. Here are some examples:

let testTitle = 'getassetHistory: brief time range should return 1 row';
let results = getassetHistory('ORG-0001', '99999935', '2022-02-08T21:27:00Z', '2022-02-08T21:28:00Z');
print testTitle, assert(toscalar(results | count) == 1, testTitle)

let testTitle = 'getassetHistory: longer time range should return 3 rows';
let results = getassetHistory('ORG-0001', '99999935', '2022-02-08T21:27:00Z', '2022-03-12T21:35:00Z');
print testTitle, assert(toscalar(results | count) == 3, testTitle)

let testTitle = 'getassetHistory: unknown organizationId should return 0 rows';
let results = getassetHistory('ORG-9999', '2022-02-08T21:25:03Z', '2022-02-08T22:33:06Z');
print testTitle, assert(toscalar(results | count) == 0, testTitle)

As you can see, each of the tests above passes a distinct set of arguments to the function, so that each test filters the table for a different time range. Each test validates that the function correctly applies the filters by verifying that it returns the correct number of rows. But it also validates that the function handles things correctly when a null argument is passed or when an argument (e.g., locationInPolygon) is omitted.

We can even test passing dynamic data, as in this function call, which passes a polygon object to the function:

let polygon = dynamic({
        "type": "Polygon",
        "coordinates": [ [
            [ -75.8056640625, 40.052847601823984],
            [ -74.970703125, 40.052847601823984 ],
            [ -74.970703125, 40.43022363450862  ],
            [ -75.8056640625, 40.43022363450862 ],
            [ -75.8056640625, 40.052847601823984] ] ]
      });
let testTitle = 'getassetHistory: with Bounding Box polygon should return 2 rows';
let startTime = datetime(2022-01-09T21:32:08Z);
let endTime = datetime(2022-03-12T21:32:08Z);
let results = getassetHistory('ORG-0001', '99999935', startTime, endTime, polygon);
print testTitle, assert(toscalar(results | count) == 2, testTitle)

Each of the tests described above expects the data to be in a well-known state, therefore it is important to initialize the data correctly. For our project, we created automated scripts that would recreate all the database objects; then populate the tables with test data. We run this script before each test run, so we always start with a clean database. This made the data predictable, and it made our function calls idempotent with a given set of inputs.

Other Testing Approaches

Alternatively, we can test a query by using the appropriate SDK to call it from a program written in Java, .NET, or some other high-level language. This has the advantage of allowing us to write tests in a robust testing framework, such as xUnit, or JUnit. However, this approach adds complexity to our tests and introduces the possibility that errors may occur in our code outside ADX. Keeping all our tests in ADX tends to reduce the amount of code we need to write and focuses the tests on the KQL functions we have written.

Conclusion

The KQL tests above show how to use the assert function to create automated Unit Tests that validate that our ADX functions are working correctly. This is a powerful tool for improving and maintaining code quality.

You can read more about this pattern at the Microsoft Documentation site.

Updated Jun 01, 2022

Version 1.0

dgiard

Microsoft

Joined September 25, 2018

View Profile

Azure Data Explorer Blog

Follow this blog board to get notified when there's new activity