Microsoft Sentinel Blog

13 MIN READ

Guided Hunting Notebook: Base64-Encoded Linux Commands

Microsoft

Aug 10, 2020

MSTIC has observed an increasing number of Linux attackers encoding their scripts into Base64 both for ease of use and to avoid detection. Because the commands are encoded, it can be time-intensive and inefficient to hone in on malicious Base64-encoded commands by looking at raw log data. To solve this problem, we have created a Jupyter notebook that makes this process easier by scoring and ranking Base64 commands found in your Sentinel data. This notebook can be accessed here and will be explained in detail in this blog. A full video walkthrough of the notebook is also available here.

A blog series written last year covers the use of Jupyter notebooks in threat hunting in more detail. Here are links to Part 1, Part 2, and Part 3.

Other notebooks you can access now are available on the Azure Sentinel Notebook Github and cover Windows host exploration, IP Addresses, Domains & URLs, Linux hosts, and much more. These notebooks are all built using Microsoft Threat Intelligence Center’s Python API MSTICpy.

Please note that all notebooks are live on Github and under revision so be aware that the notebooks you use might be slightly different from those described in blog posts.

Intro

Many of our Azure customers use Linux virtual machines, and we are always looking for ways to help our customers in their security investigations. Thus, we’ve been working on expanding coverage on Linux-specific investigations. This Guided Hunting: Base64-Encoded Linux Commands Notebook was created in response to an increasing number of attackers encoding their bash commands into Base64. This is often seen in crypto mining attacks. A specific case of this is discussed and analyzed here.

This notebook attempts to query for and analyze Base64-encoded commands found in execve logs in your Azure Sentinel workspace. It then walks you through an investigation and scoring process that will highlight the commands most likely to be malicious, which can focus your investigation down to individual hosts and lead to further exploration using the Linux Host Explorer Notebook or any other tools and methodologies you prefer.

This blog will walk you through the setup and use of this notebook. It may be helpful to run the notebook yourself as you read through it.

AUOMS Agent Setup

AUOMS is a Microsoft audit collection tool that can collect events from kaudit or auditd/audisp. The MSTIC research branch of it can forward events to Syslog, which can be collected and accessed on Azure Sentinel (see this link for more information on collecting Syslog in Sentinel), making it a great option for collecting command line data and parsing it for Base64 encodings. It can also be used for other command line parsing, which you can explore through your own Jupyter Notebooks or through queries on Sentinel.

AUOMS can be installed and set up through the Linux terminal. You will need to install it on each host you’d like to collect data from in your Log Analytics workspace.

This blog covers the installation and troubleshooting options in detail, as well as how you can create a Log Analytics workspace.

Once you’ve installed AUOMS, you may want to check if the currently distributed version from Log Analytics/Azure Sentinel includes the Syslog output option by default. Do this by running the following code from the terminal of the machine you want to test:

/opt/microsoft/auoms/bin/auomsctl status

This will check if AUOMS running and reveal its version if it is. If the version is 2.1.8 then it should include the Syslog output option. This means that if you put the syslog.conf file from the MSTIC-Research repo into the /etc/opt/microsoft/auoms/output.d directory and restart AUOMS it should start outputting to Syslog.

You may also want to check out some of these parsers to test your system once AUOMS is installed.

Once you have your machines connected to Log Analytics, you can run the parsing queries by following these steps:

Go to the Azure Portal
Search for “Azure Sentinel” in the search bar and press enter
Choose your workspace
Click “Logs” on the sidebar

Type and run your query

This query is available in the SyslogExecve.txt file found in the Git repo.

Setting Up Jupyter Notebooks

You can run this notebook in Azure via Azure Sentinel's notebook feature or locally.

Running the Notebook in Azure

This notebook is available to launch directly in Azure Machine Learning. You may use the steps below to launch. For more information, see this link.

From the Azure Sentinel portal, navigate to the Threat Management section and open the Notebooks blade.
Go to the Templates tab.
Search for and select Guided Hunting: Base64-Encoded Linux Commands.
On the right panel, select Save notebook. You can rename the selected notebook or keep the default name and save it to a default AML workspace. Then select OK.
The notebook is now accessible to your AML workspace. From the same panel, select Launch notebook. Then you are prompted to log into the AML workspace.
In the AML workspace, notice that a Guided Hunting - Base64-Encoded Linux Commands.ipynb file and a config.json file have been automatically generated from step 6 above.
- The Guided Hunting - Base64-Encoded Linux Commands.ipynb file has the main content of the notebook.
- The config.json file has configuration information about your Azure Sentinel environment where your notebook was launched from. It contains tenant_id, subscription_id, resource_group, workspace_id, and workspace_name, which are used for Azure Authentication (see step 9 below).
Select a compute instance for your notebook server. If you don’t have a compute instance, create one by following step 5 in Launch a notebook using your Azure ML workspace.

Running the Notebook Locally

If you have never run a Jupyter notebook or are unfamiliar with the process, this article walks through installing and running Jupyter Notebook and Anaconda on your computer.

For a more friendly user interface, you can also try running Jupyter Lab locally. Jupyter Lab is included as an application within your Anaconda installation. By running Anaconda Navigator on your machine (search for “Anaconda Navigator” in your Start search bar), you should be able to choose between the two in the GUI.

It may be best to clone the Azure Sentinel Notebook repo first so you can use the guides to open up the Base64 Notebook when the installation is finished.

Configuring the Notebook

Once you have set up your Notebook environment, open up the Guided Hunting: Base64-Encoded Linux Commands notebook.

If you are unfamiliar with using a Jupyter notebook, you may want to take a look at the Getting Started with Azure Sentinel Notebooks notebook first. This notebook will guide you through the basics of Jupyter Notebook features and shortcuts, as well as some configuration tips and uses of notebooks in threat hunting.

The top of the notebook describes its purpose: to find and rank malicious command line data that has been encoded in Base64, with the hope that this will launch further investigation.

It also lists the Python edition and libraries that will be used and the other data sources that will be included in the investigation. Running the first cell in the notebook will confirm that all of the technical configurations are in place. If not, it lists options for troubleshooting.

We also recommend that at this point you fill out the msticpyconfig.yaml file found in the Azure Sentinel Notebooks folder. See the MSTICpy documentation if you need help.

At minimum, the notebook will require you to fill out the Azure Sentinel default workspace WorkspaceId and TenantId.

The WorkspaceId can be found on the Workplace Settings tab of Azure Sentinel.

The WorkspaceID will be on the first page once you’re there.

The TenantID can be found on Azure Active Directory
- Search for Azure Active Directory in the Azure Portal and press enter.

Your Tenant ID will be in the information box on the first page.

Threat Intelligence Configuration

To utilize Threat Intelligence in the notebook, you will also need to fill out TI Providers Section with at least one provider. In this blog's examples, we show how to use AlienVault OTX and IBM XForce, as they accept many queries at once. VirusTotal is also shown since it is commonly used for TI investigations. Note, however, that it may not process multiple queries as quickly so results will be varied.

You can create free accounts on the following sites if you don’t have an account already.

To find your AuthKey, navigate to the API page sections/settings on these websites once you’ve made an account.

Connecting to Azure Sentinel

The next two cells will authenticate you with Azure Sentinel using MSTICpy’s data connector features. If you haven’t included your workplace ID and tenant ID in the msticpyconfig.yaml file, it will prompt you for them when you run the first cell. With this information, it will authenticate you with Log Analytics.

The second cell will authenticate you with your Workspace. Running it will load a feature called KQLMagic that will give you a code to paste into a separate browser window that will open. You can then log in with the appropriate Workspace credentials.

Set Time Parameters

After running the cell in the Set Time Parameters section following your authentication, you will be prompted to enter a time range using a slider and some input boxes. This is the time range the KQL query will use to search for Base64 commands so set it to the range you are investigating. Consider how many machines and how much data you will be querying when you set the time range, as that will affect performance.

Get Base64 Commands

In this section, the cell will run a query using the MSTICpy KQL functionalities and return a Python Pandas data frame you will be able to explore, similar to what you would see if you ran the same query in Log Analytics in your workspace on Sentinel.

You can edit this KQL query as if you were in Log Analytics if you want to specify a type of Base64 command or if you choose not to use AUOMS but have another equivalent data source. Note that the rest of the notebook relies upon the projected columns in their current format so if you make any changes, try not to change these columns. If you do, you will have to edit much of the rest of the notebook as well to accommodate those changes.

Categorize Decoded Base64 Commands

The cell in this section will tag all the commands pulled from the query with its relevant categories. These categories are based on the bash commands and files present in the decoded commands and what they tend to be used for.

Network connections/downloading: wget, curl, urllib.urlopen
File manipulation: chmod, chattr, touch, cp, mv, ln, sed, awk, echo
Host enumeration: uname, grep, /proc/cpuinfo
File/process deletion/killing: rm, pkill
Archive/compression programs: tar, zip, gzip, bzip2, lzma, xz

Consider this a basic list of categories and commands to get started. If there are other categories or commands/file names/processes you want to look for and categorize here, feel free to add them to the code. This can be especially useful if you are looking for a specific type of command as you can create your own standards.

After the cell is run, it will display all unique decoded commands found and a list of their categories. This can help you focus your investigation if you’re looking for a certain pattern of behavior.

GTFOBins Classification

This is similar to the “Categorize Decoded Commands” section above, but uses the open source GTFOBins. GTFOBins is a vetted collection of bash commands frequently exploited by attackers as well as a reference as to how those commands may be used. We are using it to find potentially exploited commands in the dataset and tag those with their corresponding functionalities.

The first cell in this section will print out GTFO Functions and what they mean if you are unfamiliar. It may also help to take a look at the GTFOBins website itself.

Running the second cell will create two new columns in the data frame, “GTFO Bins” and “GTFO Functions,” and display these. The GTFO Bins column will consist of links to more information on the GTFOBins website.

Be aware that retrieving GTFO Functions may take a bit longer to run.

Generating Scores and Rankings

The cells under this section generate scores for each unique Base64 command based on criteria such as frequency of the command, severity of TI lookup results, and related commands run. Each score is added to the data frame at the end, so you can view and rank each score individually or by the aggregate score.

These scores are not intended to be an exact scoring of whether a command is malicious. Rather, they provide a guide to analysts as to which commands to prioritize for investigation. They do not represent any mathematical value and are not calculated in comparison to any particular number other than each other. In general, higher scores are more likely to be malicious commands.

If you have your own ideas for how you want to score or rank, you can adjust the code in the following sections to do so.

Frequency Analysis

This cell creates a frequency score for each command based on its frequency within the workspace and within its host.

This is calculated as follows: [(1 / frequency in workspace) + (1 / frequency in its host)] / 2.

1 divided by the frequency numbers results in bigger numbers for commands that occur fewer times, which gives rare commands higher scores. Dividing by 2 normalizes this score to be under 1.

The data frame is shown again at the bottom, with frequency score added to the total score count.

IoC Extraction

This section extracts Indicators of Compromise (IoCs) from the command lines using MSTICPy’s IoC extraction features. It will display the observables found (IPv4, IPv6, DNS, URL, Windows path, Linux path, MD5 hash, SHA1 hash, and SHA256 hash), the IoC types, and the original source indices of the rows in the previous data frame it was found in.

If you are interested in an observable pattern that is not being matched, see the MSTICpy IoC Extraction documentation for more information on how to add your own.

You can use this data frame to get a quick glimpse of the unique IoCs found. This can be helpful if you are looking for something specific or to give another clue as to what the encoded command was doing.

This data will also be merged with the original data frame so you will be able to see which IoCs came from which commands.

Threat Intel Lookup

This section allows you to use outside TI data to take in the IoCs you want to look up and get TI information on them.

The first cell in this section will print out your current TI configurations, which you can set up in msticpyconfig.yaml. This was covered earlier in the blog in the “Configuring the Notebook” section. Once you have this set up, run the second cell and use the select tool to choose which providers you want to use.

Then, use the select widget created by the third cell to pick the IoCs you want to look up. Run the next cell in this section for a data frame of TI results from different providers.

The last cell will calculate severity scores based on how the providers ranked each IoC. Scores are calculated by looking up each unique IoC in each command, and adding the score corresponding to the highest severity rating given out of the providers you used. Commands with more high severity IoCs will score higher on this.

Related Alerts

This section scores related Azure Sentinel alerts that took place in the given time frame.

If you know of a time frame that you’re interested in, whether from perusing the data or from prior knowledge, you can use the time selector to focus in on this.

Run the first cell to select a time in the widget. The second cell will display a list of alert types found, their names, and how many times they occurred in that time frame for each host. There will also be a data frame with in-depth information on each of the alerts found in each host. Finally, there will be a timeline that illustrates when alerts occurred within the time frame. This is done through the use of MSTICpy’s related alerts feature.

The final cell in this section will return a further revised Base64 command data frame with Alert Scores, as well as the previous TI Severity Scores and Frequency Scores, both individually and added to Total Scores.

Scoring will be done based on the presence of high severity alerts found: each unique alert will add points to the score corresponding to severity, with high severity alerts scoring higher.

Final Scores and Ranking

The final section of the notebook brings together all the previous scoring and provides you with a final ranking of Base64 encoded commands that are most likely to be malicious.

Customizing Your Data Frame

You can use the selector widget from the first cell to choose which columns you’d like to view. The second cell will show those columns with the highest total scoring commands at the top. The widgets generated here allow you to choose a cutoff score for any of the scores displayed. The data frame also shows the relative makeup of each total score.

The bar chart created underneath further enhances this perception and allows you to see which factors led commands to score the way they did. The x-axis highlights the indices, so you can view the rest of the information by finding that index in the data frame.

Overall Timeline

The final cell in the section creates a timeline out of all the base 64 command data retrieved. Mouse over each dot on the timeline for information on which command it referred to and its categories. This may be helpful in gaining a better understanding of possible attacker behavior and noting anomalies.

Summary

Base64 commands are increasingly used by attackers on Linux hosts and are often uncaught by traditional detections. It can be helpful, when hunting or investigating, to take a deeper look at Base64-encoded commands.

This notebook creates a method for investigators to understand Base64 encoded activity in their workspaces by providing an accessible way to rank encoded commands that are most likely to be malicious and multiple avenues through which to view and explore this data. Other notebooks and methodologies can be helpful for deeper explorations. You can find more blogs on these here.

If you are interested in building your own notebooks in the TI space, you may also want to take a look at MSTICpy, which offers plenty of tools and features for you to easily explore your datasets. Feel free to let the MSTIC team know if you encounter any issues or have any suggestions and contributions to make to our notebooks and MSTICpy.