With special thanks to @ianhelle, @robmead, and @Pete Bryan.
In this blog, we introduce the MitreMap Notebook, which lets you input a threat report and infers the most likely MITRE ATT&CK technique(s) that map to the report. Doing this allows you to unleash the value of this unstructured data by identifying the associated motivations and tools, techniques, and procedures (TTPs) used by an actor group when carrying out an attack on your enterprise’s digital infrastructure. It also adds context to the Indicators of Compromise (IoCs) in the report, which you can use to detect patterns and trends of cyber-attacks occurring across your workspace.
With the increasing number and sophistication of attacks occurring across an organisation’s digital infrastructure, SecOps teams are increasingly using Threat Intelligence to document the operations of an actor group, to record their investigation framework, results, and any entities or IoCs discovered. A variety of entities, ranging from public and private organisations to social media platforms and the open source community, publish threat reports in the form of unstructured text data, blogs, and white papers, which describe the TTPs used by actor groups in an operation, and the best-practices that enterprises can adopt to protect themselves from these attacks.
However, with the growing corpus of unstructured threat intel, it is not easy to extract the patterns of attack that an enterprise or the associated industry vertical has observed on its infrastructure.
MITRE ATT&CK is a framework used to standardise the discovery and explanation of attacker behaviour. It is an open-sourced knowledge base of TTPs used by adversaries across enterprise, mobile, and ICS applications. MITRE TTPs allow people and organisations to proactively identify vulnerabilities in their system, based on the behaviours, methods, and patterns of activity used by an actor group in different stages of a cyber operation.
MITRE ATT&CK Enterprise Techniques - Snapshot
Our work builds on the pioneering work done by TRAM and Death to the IoC – What’s Next in Threat Intelligence, both of which developed Machine Learning models to infer MITRE techniques from TI reports. TRAM leverages scikit learn’s simpler Machine Learning algorithms to train a model, with 1.5K training data points, while Death to the IOC uses a combined Word2Vec/LSTM model with limited training data. In our work, we have extracted training data across 7 training datasets, corresponding to 13K training data points (including training data used by TRAM). We used GPT2, which is an advanced transformer-based NLP (Natural Language Processing) model with better performance metrics.
The MitreMap notebook is a Jupyter notebook that aims to automate the process of inferring MITRE ATT&CK Enterprise techniques from unstructured threat intelligence data, using Natural Language Processing. We trained the GPT-2 language model on threat reports with past incidents descriptions scraped from open-source repositories.
When processing a new threat report, correlate the unstructured threat intel data to one of the MITRE Enterprise techniques in the open-source framework, with a certain degree of confidence. We also extract any IoCs which may be present in the threat report. We also provide some explainability into the words and phrases in our input threat report, which contributed to the prediction and confidence scoring.
MitreMap Notebook
The MitreMap notebook is currently in public preview in the Microsoft Sentinel Notebooks repository, and the documentation for setting up, configuring, and running the notebook can be found here. We have also outlined steps to run the notebook outside of Microsoft Sentinel Notebooks, in your own notebook infrastructure.
After setting up your virtual environment with the steps outlined in the documentation, the notebook requires you to configure the following parameters –
User-defined configurations
You input the Threat Intel Report text that you would like the model to analyse, along with a Confidence score threshold, which is the lower-bound threshold for the confidence associated with the MITRE Technique prediction. You can also choose whether you would like to obtain the Model Explainability Summary, which highlights words and phrases in the threat intel report that contribute highly to the technique prediction.
After analysis, an output summary appears as follows -
Output of the Notebook
The Predicted MITRE Technique corresponds to the MITRE technique predicted by the trained GPT-2 model, with an associated confidence score. The Model Explainability summary highlights the words and phrases in the token which contributed to the prediction – In this example, words such as “Adversary”, “.exe”, “Proxy”, “Execution” contribute positively to the technique prediction. If there are any IOCs mentioned in the TI report, the model will extract them and store them in a data frame.
We would appreciate any feedback or thoughts on this notebook – Feel free to reach out to @vani_asawa or @ianhelle with your suggestions for the next version of the package!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.