Microsoft Secure Tech Accelerator
Apr 03 2024, 07:00 AM - 11:00 AM (PDT)
Microsoft Tech Community
msticpy - Python Defender Tools
Published Jun 17 2019 08:27 AM 15.5K Views
Microsoft

Introduction

This article has been superseded by a newer version - please see the "MSTICPy and Jupyter Notebooks in Azure Sentinel" article]

 

msticpy is a package of python tools intended to be used for security investigations and hunting (primarily in Jupyter notebooks). Most of the tools originated from code written in Jupyter notebooks which was tidied up and re-packaged into python modules. I’ve added some references to other blogs in the References section, where I describe some of these notebooks in more detail.

 

The goals of the package are twofold:

  1. Reduce the clutter of code in notebooks making them easier to use and read.
  2. Provide building-blocks for future notebooks to make authoring them simpler and quicker.

There are some side benefits from this:

  • The functions and classes are easier to test when extracted into standalone modules, so (hopefully) they are more robust.
  • The code is easier to document, and the functionality is more discoverable than having to wade through old notebooks and copy and paste the desired functions.

While much of the functionality is only useful in Jupyter notebooks (e.g. much of the nbtools sub-package), there are several modules that are usable in any python application - most of the modules in the sectools sub-package fall into this category.

 

msticpy is organized into three main sub-packages:

  • sectools - python security tools to help with data analysis or investigation. These are all focused on data transformation, data analysis or data enrichment.
  • nbtools - Jupyter-specific UI tools such as widgets and data display. These are mostly presentation-layer tools concentrating on how to view or interact with the data.
  • data - data interfaces and query library for log and alert APIs including Azure Sentinel/Log Analytics, Microsoft Graph Security API and Microsoft Defender Advanced Threat Protection (MDATP).

The package is still in an early preview mode so there are likely to be bugs, possible API changes and much is not yet optimized for performance. We welcome feedback, bug reports and suggestions for new or improved features as well as contributions directly to the package.

 

In this article I'll give a brief overview of the main components. This is intended as an overview of some of the features rather than a full user guide. Although the modules/functions/classes are documented at the API level, we are still missing more detailed user guidance. In future blogs I will drill down into some of the specific components to describe their use (and limitations) in more detail, which will help fill some of this gap. Some of the modules have user document notebooks, which are listed in the References section at the end of the document. The API documentation is available on mstipy ReadTheDocs.

 

Request for Comments

 

We would really appreciate suggestions for future or better features. You can add these in comments to this doc or directly as issues on the msticpy GitHub.

 

Installing

 

The package requires Python 3.6 or later (see Supported Platforms for more details).

pip install msticpy

or for the latest dev build (although usually we publish direct to PyPi)

pip install git+https://github.com/microsoft/msticpy

A conda recipe and package is in the works but not yet available.

Installing the package will also install dependencies if required versions of these are not already installed. If you are installing into an environment where you are using some of these dependencies (especially if you are using conflicting versions), you should to create a python or conda virtual environment and use your notebooks from within that.

 

Security Tools Sub-package - sectools

 

This sub-package contains several modules helpful for working on security investigations and hunting. These are mostly data processing modules and classes and usually not restricted to use in a Jupyter/IPython environment (some of the modules have a visualization component that may not work outside a notebook environment).

 

base64unpack

This is a Base64 and archive (gz, zip, tar) extractor intended to help decode obfuscated attack command lines and http request strings. Input can either be a single string or a specified column of a pandas dataframe. The module will try to identify any base64 encoded strings and decode them. If the result of a decoding looks like one of the supported archive types, it will try to unpack the contents. The results of each decode/unpack are rechecked for further base64 content and it will recurse down up to 20 levels (the default can be overridden, but if you need more than 20 levels, there is probably something wrong!). Output is to a decoded string (for single string input) or a DataFrame (for dataframe input).Base64unpack.png

 

iocextract

This uses a set of built-in regular expressions to look for Indicator of Compromise (IoC) patterns. Input can be a single string or a pandas dataframe with one or more columns specified as input. You can add additional patterns and override built-in patterns.

The following types are built-in: IPv4 and IPv6, URLs, DNS domains, Hashes (MD5, SHA1, SHA256), Windows file paths and Linux file paths (this latter regex is kind of noisy because a legal linux file path can have almost any character). The two path regexes are not run by default.

 

Output is a dictionary of matches (for single string input) or a DataFrame (for dataframe input).ioc_extract.png

 

vtlookup

Wrapper class around Virus Total API. Input can be a single IoC observable or a pandas DataFrame containing multiple observables. Processing requires a Virus Total account and API key and processing performance is limited to the number of requests per minute for the account type that you have. For example a VirusTotal free account is limited to 4 requests per minute. Supported IoC types are: Filehash (MD5, SHA1, SHA256), URL, DNS Domain, IPv4 Address.vt_lookup.png

 

geoip

Geographic location lookup for IP addresses is implemented as generic class with support for different data providers. The shipped module has two data providers:

Both services offer a free tier for non-commercial use. However, a paid tier will normally get you more accuracy, more detail and a higher throughput rate. Maxmind geolite uses a downloadable database, while IPStack is an online lookup (an account and API key are required).

 

The following screen shot shows both the use of the GeoIP lookup classes and map display with another msticpy module using folium (a python package using leaflet.js)geo_ip.png

 

eventcluster

This module is intended to be used to summarize large numbers of events into clusters of different patterns. High volume repeating events can often make it difficult to see unique and interesting items.

The module contains functions to generate clusterable features from string data. For example, an administration command that does some maintenance on thousands of servers with a commandline such as:

install-update -hostname {host.fqdn} -tmp:/tmp/{some_GUID}/rollback

These repetitions can be collapsed into a single cluster pattern by ignoring the character values in the string and using delimiters or tokens to group the values.

This module uses an unsupervised learning module implemented using SciKit Learn DBScan.event_cluster.png

 

outliers

Similar to the eventcluster module but a little bit more experimental (read 'less tested'). It uses SciKit Learn Isolation Forest to identify outlier events in a single data set or using one data set as training data and another on which to predict outliers.

 

auditdextract

Module to load and decode Linux audit logs. It collapses messages sharing the same message ID into single events, decodes hex-encoded data fields and performs some event-specific formatting and normalization (e.g. for process start events it will re-assemble the process command line arguments into a single string).

 

The following figures shows examples of raw audit messages and converted messages (these are two different event sets, so don’t show the same messages).auditd_raw.png

 

auditd_processed.png

 

 

Notebook tools sub-package - nbtools

 

This is a collection of display and utility modules designed to make working with security data in Jupyter notebooks quicker and easier.

  • nbwidgets - groups common functionality such as list pickers, time boundary settings, saving and retrieving environment variables into a single line callable command. In most cases these are simple wrappers and collections of the standard IPyWidgets.
  • nbdisplay - functions that implement common display of things like alerts, events in a slightly prettier and more consumable way than print().

 

nbwidgets

 

Query time selector

 

query_time_widget.png

Session browser

 

session_browser.png

 

Alert browser

alert_selector.png

 

nbdisplay

 

Event timeline

 

event_timeline.png

 

Logon display

 

logon_display.png

 

Process Tree

 

process_tree.png

 

Data sub-package - data

 

Some of these components are currently part of the nbtools sub-package but will be migrated to the data sub-package.

 

Parameterized query manager

This is a collection of modules that includes a set of commonly used queries and can be supplemented by user-defined queries supplied in yaml files. The purpose of these is to give you quick access to commonly used-queries in a way that allows easy substitution of parameter values such as date range, host name, account name, etc. The package current supports Kusto query language (KQL) queries targeted at Log Analytics and OData queries targeted at Microsoft Graph Security API. We are building driver modules to work with Microsoft Defender Advanced Threat Protection API and, in principle could be extended to cover queries expresses as a simple string expression. The architecture and yaml format was inspired by the Intake package – although some of the parameter substitution gymnastics meant that I was not able to use this package directly.

 

Sample query definition

 

yaml_query_definition.png

 

Query provider setup

 

query_provider_setup.png

 

Running a query

 

running_query.png

 

Note: the parameters for the query are auto-extracted from the query_times date widget object.

 

Other Modules

 

security_alert and security_event

These are encapsulation classes for alerts and events. Each has a standard 'entities' property reflecting the entities found in the alert or event. These can also be used as meta-parameters for many of the queries. For example, the query:

qry.list_host_logons(query_times, alert)

will extract the value for the hostname query parameter from the alert.

 

entityschema

This module implements entity classes (e.g. Host, Account, IPAddress, etc.) used in Log Analytics alerts and in many of these modules. Each entity encapsulates one or more properties related to the entity. This example shows a Linux alert with the related entities.entity_view.png

 

To-Do Items

 

Some of the items on our to-do list are shown below. However, other things requested by popular demand or contributed by others can certainly change this.

  • Create generic Threat Intel lookup interface supporting multiple providers.
  • Add additional modules for host-to-ip and ip-to-host resolution.
  • Add syslog queries, processing and visualizations.
  • Add network queries, processing and visualizations.
  • Add additional notebooks to document use of the tools.

 

Supported Platforms and Packages

 

  • msticpy is OS-independent
  • Requires Python 3.6 or later
  • Requires the following python packages: pandas, bokeh, matplotlib, seaborn, setuptools, urllib3, ipywidgets, numpy, attrs, requests, networkx, ipython, scikit_learn, typing
  • The following packages are recommended and needed for some specific functionality: Kqlmagic, maxminddb_geolite2, folium, dnspython, ipwhois

Contributing to msticpy

 

msticpy is intentionally an open source package so that it is available to be used as-is or in modified form by anyone who wants to. We also welcome contributions – whether these are whole features, extensions of existing features, bug-fixes or additional documentation.

 

I’m a little finicky about code hygiene so I would (politely) ask the following for potential contributors:

  • Include doc comments in all modules, classes, public functions and public methods. Please use numpy docstring standard for consistency and to allow our auto-documentation to work well.
  • We are converting to Black code formatting throughout the project. This will happen whether you format your code like this or not. :smiling_face_with_smiling_eyes:
  • Type annotations are a great thing. History and I will thank you for adding type annotations. See this section of the docs for more information.
  • We write unit tests using Python unitest format but run these with pytest. Please add unit tests for any substantial PRs – and please make sure that the existing unit tests complete successfully.
  • Linters and other stuff. Committed branches will kick of tests and linting in the Azure build pipeline. Many of these are none-breaking (i.e. your build will complete with warnings) but please try to avoid introducing any new warnings (I’m having a hard-enough time fixing my own warnings!). Using pylint, prospector, mypy and pydocstyle is a good minimum combination.

 

References

 

Notebook blogs

Notebooks illustrating uses of msticpy:

User Documentation notebooks

 

2 Comments
Co-Authors
Version history
Last update:
‎Apr 26 2021 12:36 PM
Updated by: