msticpy is a package of python tools intended to be used for security investigations and hunting (primarily in Jupyter notebooks). Most of the tools originated from code written in Jupyter notebooks which was tidied up and re-packaged into python modules. I’ve added some references to other blogs in the References section, where I describe some of these notebooks in more detail.
The goals of the package are twofold:
There are some side benefits from this:
While much of the functionality is only useful in Jupyter notebooks (e.g. much of the nbtools sub-package), there are several modules that are usable in any python application - most of the modules in the sectools sub-package fall into this category.
msticpy is organized into three main sub-packages:
The package is still in an early preview mode so there are likely to be bugs, possible API changes and much is not yet optimized for performance. We welcome feedback, bug reports and suggestions for new or improved features as well as contributions directly to the package.
In this article I'll give a brief overview of the main components. This is intended as an overview of some of the features rather than a full user guide. Although the modules/functions/classes are documented at the API level, we are still missing more detailed user guidance. In future blogs I will drill down into some of the specific components to describe their use (and limitations) in more detail, which will help fill some of this gap. Some of the modules have user document notebooks, which are listed in the References section at the end of the document. The API documentation is available on mstipy ReadTheDocs.
We would really appreciate suggestions for future or better features. You can add these in comments to this doc or directly as issues on the msticpy GitHub.
The package requires Python 3.6 or later (see Supported Platforms for more details).
pip install msticpy
or for the latest dev build (although usually we publish direct to PyPi)
pip install git+https://github.com/microsoft/msticpy
A conda recipe and package is in the works but not yet available.
Installing the package will also install dependencies if required versions of these are not already installed. If you are installing into an environment where you are using some of these dependencies (especially if you are using conflicting versions), you should to create a python or conda virtual environment and use your notebooks from within that.
This sub-package contains several modules helpful for working on security investigations and hunting. These are mostly data processing modules and classes and usually not restricted to use in a Jupyter/IPython environment (some of the modules have a visualization component that may not work outside a notebook environment).
base64unpack
This is a Base64 and archive (gz, zip, tar) extractor intended to help decode obfuscated attack command lines and http request strings. Input can either be a single string or a specified column of a pandas dataframe. The module will try to identify any base64 encoded strings and decode them. If the result of a decoding looks like one of the supported archive types, it will try to unpack the contents. The results of each decode/unpack are rechecked for further base64 content and it will recurse down up to 20 levels (the default can be overridden, but if you need more than 20 levels, there is probably something wrong!). Output is to a decoded string (for single string input) or a DataFrame (for dataframe input).
iocextract
This uses a set of built-in regular expressions to look for Indicator of Compromise (IoC) patterns. Input can be a single string or a pandas dataframe with one or more columns specified as input. You can add additional patterns and override built-in patterns.
The following types are built-in: IPv4 and IPv6, URLs, DNS domains, Hashes (MD5, SHA1, SHA256), Windows file paths and Linux file paths (this latter regex is kind of noisy because a legal linux file path can have almost any character). The two path regexes are not run by default.
Output is a dictionary of matches (for single string input) or a DataFrame (for dataframe input).
vtlookup
Wrapper class around Virus Total API. Input can be a single IoC observable or a pandas DataFrame containing multiple observables. Processing requires a Virus Total account and API key and processing performance is limited to the number of requests per minute for the account type that you have. For example a VirusTotal free account is limited to 4 requests per minute. Supported IoC types are: Filehash (MD5, SHA1, SHA256), URL, DNS Domain, IPv4 Address.
geoip
Geographic location lookup for IP addresses is implemented as generic class with support for different data providers. The shipped module has two data providers:
Both services offer a free tier for non-commercial use. However, a paid tier will normally get you more accuracy, more detail and a higher throughput rate. Maxmind geolite uses a downloadable database, while IPStack is an online lookup (an account and API key are required).
The following screen shot shows both the use of the GeoIP lookup classes and map display with another msticpy module using folium (a python package using leaflet.js)
eventcluster
This module is intended to be used to summarize large numbers of events into clusters of different patterns. High volume repeating events can often make it difficult to see unique and interesting items.
The module contains functions to generate clusterable features from string data. For example, an administration command that does some maintenance on thousands of servers with a commandline such as:
install-update -hostname {host.fqdn} -tmp:/tmp/{some_GUID}/rollback
These repetitions can be collapsed into a single cluster pattern by ignoring the character values in the string and using delimiters or tokens to group the values.
This module uses an unsupervised learning module implemented using SciKit Learn DBScan.
outliers
Similar to the eventcluster module but a little bit more experimental (read 'less tested'). It uses SciKit Learn Isolation Forest to identify outlier events in a single data set or using one data set as training data and another on which to predict outliers.
auditdextract
Module to load and decode Linux audit logs. It collapses messages sharing the same message ID into single events, decodes hex-encoded data fields and performs some event-specific formatting and normalization (e.g. for process start events it will re-assemble the process command line arguments into a single string).
The following figures shows examples of raw audit messages and converted messages (these are two different event sets, so don’t show the same messages).
This is a collection of display and utility modules designed to make working with security data in Jupyter notebooks quicker and easier.
Query time selector
Session browser
Alert browser
Event timeline
Logon display
Process Tree
Some of these components are currently part of the nbtools sub-package but will be migrated to the data sub-package.
This is a collection of modules that includes a set of commonly used queries and can be supplemented by user-defined queries supplied in yaml files. The purpose of these is to give you quick access to commonly used-queries in a way that allows easy substitution of parameter values such as date range, host name, account name, etc. The package current supports Kusto query language (KQL) queries targeted at Log Analytics and OData queries targeted at Microsoft Graph Security API. We are building driver modules to work with Microsoft Defender Advanced Threat Protection API and, in principle could be extended to cover queries expresses as a simple string expression. The architecture and yaml format was inspired by the Intake package – although some of the parameter substitution gymnastics meant that I was not able to use this package directly.
Sample query definition
Query provider setup
Running a query
Note: the parameters for the query are auto-extracted from the query_times date widget object.
security_alert and security_event
These are encapsulation classes for alerts and events. Each has a standard 'entities' property reflecting the entities found in the alert or event. These can also be used as meta-parameters for many of the queries. For example, the query:
qry.list_host_logons(query_times, alert)
will extract the value for the hostname query parameter from the alert.
entityschema
This module implements entity classes (e.g. Host, Account, IPAddress, etc.) used in Log Analytics alerts and in many of these modules. Each entity encapsulates one or more properties related to the entity. This example shows a Linux alert with the related entities.
Some of the items on our to-do list are shown below. However, other things requested by popular demand or contributed by others can certainly change this.
msticpy is intentionally an open source package so that it is available to be used as-is or in modified form by anyone who wants to. We also welcome contributions – whether these are whole features, extensions of existing features, bug-fixes or additional documentation.
I’m a little finicky about code hygiene so I would (politely) ask the following for potential contributors:
Notebook blogs
Notebooks illustrating uses of msticpy:
User Documentation notebooks
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.