We published an overview of MSTICPy 18 months ago and a lot has happened since then with many changes and new features. We recently released 1.0.0 of the package (it's fashionable in Python circles to hang around in "beta" for several years) and thought that it was time to update the aging Overview article.
MSTICPy is a package of Python tools for security analysts to assist them in investigations and threat hunting, and is primarily designed for use in Jupyter notebooks. If you've not used notebooks for security analysis before we've put together a guide on why you should.
The goals of MSTICPy are to:
MSTICPy is organized into several functional areas:
There are also some additional benefits that come from packaging these tools in MSTICPy:
Like many of our blog articles, this one has a companion notebook. This is the source of the examples in the article and you can download and run the notebook for yourself. The notebook has some additional sections that are not covered in the article.
The notebook is available here.
Since the original Overview article we have invested a lot of time in improving and expanding the documentation - see msticpy ReadTheDocs. There are still some gaps but most of the package functionality has detailed user guidance as well as the API docs. We do also try to document our code well so that even the API documents are often informative enough to work things out (if you find examples where this isn't the case, please let us know).
In most cases we also have example notebooks providing an interactive illustration of the use of a feature (these often mirror the user guides since this is how we write most of the documentation). They are often a good source of starting code for projects. These notebooks are on our GitHub repo.
If you are new to MSTICPy and use Azure Sentinel the first place to go is the Use Notebooks with Azure Sentinel document. This will introduce you the the Azure Sentinel user interface around notebooks and walk you through process of setting up an Azure Machine Learning (AML) workspace (which is, by default, where Azure Sentinel notebooks run). One note here - when you get to the Notebooks tab in the Azure Sentinel portal, you need to hit the Save notebook button to save an instance of one of the template notebooks. You can then launch the notebook in the AML notebooks environment.
The next place to visit is our Getting Started for Azure Sentinel notebook. This covers some basic introductory notebook material as well as essential configuration. More advanced configuration is covered in Configuring Notebook Environment notebook - this covers configuration settings in more detail and includes a section on setting up a Python environment locally to run your notebooks.
Although this article is aimed primarily at Azure Sentinel users, you can use MSTICPy with other data sources (e.g. Splunk or anything you can get into a pandas DataFrame) and in any Jupyter notebook environment. The Azure Sentinel notebooks can be found in our Notebooks GitHub repo.
Assuming that you have a blank notebook running (in either AML or elsewhere) what do you do next?
Most of our notebooks include a more-or-less identical setup sections at the beginning. These do three things:
If you see warnings in the output from the cell about configuration sections missing you should revisit the previous Getting Started Guides section. This cell includes the first two functions in the list above. The first one - running utils.check_versions() - is not essential in most cases once you have your environment up and running but it does do a few useful tweaks to the notebook environment, especially if you are running in AML.
The init_notebook function automates a lot of import statements and checks to see that the configuration looks healthy.
The third part of the initialization loads the Azure Sentinel data provider (which is the interface to query data) and authenticates to your Azure Sentinel workspace. Most data providers will require authentication.
Assuming you have your configuration set up correctly, this will usually take you through the authentication sequence, including any two-factor authentication required.
Once this setup is complete, we're at the stage where we can start doing interesting things!
MSTICPy has many pre-defined queries for Azure Sentinel (as well as for other providers). You can choose to run one of these predefined queries or write your own. This list of queries documented here is usually up-to-date but the code itself is the real authority (since we add new queries frequently). The easiest way to see the available queries is with the query browser. This shows the queries grouped by category and lets you view usage/parameter information for each query.
These options are shown below.
Now that we can get data, let's do something more interesting with it.
One of the most basic but also most useful visualizations is to project events onto a timeline. You can do this using MSTICPy's separate Timeline function or, more conveniently call it directly from a DataFrame using the mp_timeline pandas extension.
MSTICPy makes extensive use of the interactive graphics of Bokeh. These charts can be panned and zoomed. Each event also has a hover-over tooltip containing summary information about the event (the summary is derived from the source_columns parameter list).
A process tree is another common visualization used when investigating endpoint (host) data.
Like the timeline, the process tree supports panning, zooming, hover details and has an optional data table viewer (you need to specify show_table=True when calling the function).
MSTICPy also has a number of special-purpose viewers for things like alerts, where it is often difficult to see the required data in a tabular format.
This example combines both the timeline viewer and the SelectAlert browser.
MSTICPy contains many enrichment components for geo-location, ASN/whois, threat intelligence, Azure resource data and others.
This example shows calling a method of the IpAddress entity class to get WhoIs information for an IP address.
Although the whois feature is available as a standalone function, we've used a pivot function of the IpAddress class here. To do this we needed to load the Pivot class.
If what you want to do is entity-related, there is a good chance that the MSTICPy function will appear as an entity pivot function. Queries, enrichment functions and analysis functions that relate to a particular entity type are all exposed as pivot functions of that entity.
Wait - what is an Entity?
An entity is essentially a "noun" in the CyberSec world - for example: an IP Address, host, URL, account, etc. They are typically things that do stuff or have stuff done to them. Entities will always have one or more properties that identify the entity and might have additional context properties. For example, an IpAddress entity has its primary Address property and it might also have contextual properties like geo-location or ASN data.
Pivot functions are "verbs" to the entities "nouns". They perform investigative actions (like data queries) on the entity and return a result. The Host entity class, for example, has data queries that retrieve process or logon events logged for that host. The IpAddress entity has functions to lookup its geolocation or query information about the address from threat intelligence providers.
Pivot functions are not statically coded into the entity classes. Instead, the pivot subsystem harvests pivot functions from available queries and components and dynamically adds them to the entities. This gives us a lot of flexibility to add new functions as the features of MSTICPy evolve. It also allows you to use functions from third party libraries or write your own functions and expose them as pivot functions.
How do you find what pivot functions (and even what entities) are available? The easiest way to view the entities, their pivot functions and the help associated with each function is to use the Pivot browser (similar to the query browser shown earlier)
Being grouped with their respective entities makes the pivot functions easy find (compared with hunting through documents to find the right module or function to import). Pivot functions are grouped into related sub-containers of the entity (so all AzureSentinel queries have the form entity.AzureSentinel.query_function().
Another advantage of pivot functions (over standalone functions) is that they have a homogenous interface. They will all accept inputs as single values, lists of values or values stored in DataFrames. They also always return their results as DataFrames.
A nice side benefit of pivot functions using DataFrames as both input and output is that we can chain several together in a pandas pipeline. Here we're taking IP addresses from an alert and successively getting WhoIs data, geo-location data. Finally, we're querying multiple Threat Intelligence providers to see if they have any data about the IP address. At each stage we're asking for the new data obtained by each stage to be joined to the previous stage (via the join parameter) - although joining is optional.
We then display the results received from the threat intel providers in another special-purpose viewer - the TI browser.
Here is an example of a more manual pipeline that stitches together the base64 decoder, IoC pattern extractor and threat intel lookup. It's taking an obfuscated PowerShell command line and extracting and examining the contents found in the decoded string. Finally, it displays the TI results in the TI browser.
We might also want to know where this IP address is physically located. We can do that by using our geo-location enrichment and displaying the results on a map.
MSTICPy has several more-advanced analysis components that help with identifying anomalous patterns in large data sets.
First we'll show the time series decomposition and visualization component. This works by determining regular patterns of bulk events (think of outbound network byte counts or logon failures) and then identifies outliers to this pattern with a simple statistical calculation.
You can see the daily cadence network traffic and the presence of two outlying events on 7-11 (the date, not the store).
Where the data is more complex than bulk counts, we can use another anomaly identification technique using Markov Chains. Markov chains is a technique used to predict the future probability of something occurring given the probability of things that happened previously.
Here we are using it to analyze Office 365 data. We will build a model to determine the probability of specific sequences of actions and then identify rare sequences that deviate from this base probability. Office activity data is first grouped into sessions based on user name and source IP address. The probability of a given sequence of actions within a session is measured during the creation of the model. For example, it would be very common to see actions like opening emails, reading a few files, etc. A session that reads hundreds of files or performs unusual actions like setting a mail forwarding rule or delegating control of a mailbox would be much less probable and so would detectable.
(the full code for this is given in the notebook)
There is clearly one session that stands out from the crowd in terms of the unusual actions seen in that session. We can use the rarity score (the inverse of the probability) to quickly filter out the other sessions and see what was happening. In this case, the event revealed a series of unexpected privilege assignment actions.
Thanks for sticking with me in through this marathon article. I hope it has given you a flavor of the power of Jupyter notebooks in CyberSec hunting and investigation tasks and a reasonable overview of many of the capabilities in MSTICPy.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.