Using threat intelligence (TI) is vital part of most hunts and investigations. You see a suspicious IP address in your logs and you want to check its reputation: is it a known C2 (Command and Control) address? is it associated with malware drops? You want to check a file hash to see if it is known malware. In many cases, getting a confirmation that your suspected IoC (indicator of compromise) is known to be bad can shortcut hours of investigation time.
Since its early days the msticpy package has had a module to perform look-ups at the popular VirusTotal service. We've recently released an update that gives you a generic TI lookup capability. You can configure multiple TI providers and submit one or more IoCs to them in a single function call.
One of the first set of providers is for Azure Sentinel TI. If you have set up a TI feed from the Microsoft Graph Security API, the indicators are made available as a table in Azure Sentinel Log Analytics store (see more about this below). This can be queried using the msticpy TILookup class alongside other providers such as VirusTotal, AlienVault OTX, and IBM XForce.
Here's an example, looking up a single observable. In the remainder of the article we'll take you through setting up your TI providers, querying IoCs and interpreting the results.
The terms observable, indicator, indicator of compromise (IoC) are almost interchangeable. Strictly speaking, an observable is an identifier such as a domain name, IP address or filehash before you have done any determination on whether it is bad or not. Indicator/IoC are observables that have a known bad determination - i.e. what you get back from a TI provider along with data describing its context.
Many TI providers offer sophisticated libraries to perform look-ups against their APIs. We are not trying to compete with these. In fact, for detailed examination of the results from a specific provider, you should consider using their client libraries. TILookup is specifically designed to allow look-ups against multiple providers, minimizing your need to worry about the specifics of each. It was built to quickly answer the question "is there anything suspicious about this item?". As such, we try to return raw data and a quick verdict rather than attempt to render the details of the provider results in high fidelity.
As mentioned above, Microsoft recently announced the ability to pipe TI from the MS Graph Security API into Azure Sentinel. You can stream logs from Threat intelligence providers into Azure Sentinel using the new Threat Intelligence data connector. Read more about how to set this up here.
The msticpy TILookup class is provider-independent but works really well with TI data stored in Azure Sentinel. This is especially true for bulk look-ups, because of the high-performance Log Analytics data store that backs the underlying searches.
If you want to follow along with real code examples, there is a sample notebook showing more extended usage of TILookup here.
Documentation based on this notebook is also available here.
The full API documents can be found on ReadTheDocs.
To check which providers are currently supported by msticpy (we intend to add more as time goes on) you need to create an instance of the TILookup class. The available_providers property will return a list of the providers that you can use.
None of these are yet configured so you won't be able to do anything with them just yet. For most providers, you will need to create an account at the TI provider site and obtain an API key that authorizes you to query the data. The Azure Sentinel provider uses your existing workspace credentials, so you do not need additional credentials for this. Once you have these you can enter the details in a configuration file to be read by the respective provider.
If you do not have accounts with these TI providers, go here to create them. Be sure to save your API keys somewhere safe. Please read and abide by the Terms of Use of each provider.
VirusTotal - https://developers.virustotal.com/reference
AlienVault Open Threat Exchange - https://otx.alienvault.com/api
IBM XForce - https://api.xforce.ibmcloud.com/doc/
The msticpyconfig.yaml file is read from the current directory or its location can be specified with an environment variable MSTICPYCONFIG. The former method is useful if you want to vary the configuration for different investigation patterns (e.g. use a different set of providers). The latter is more useful if you want to keep a single configuration and use it everywhere.
Although other things can be specified in the file, you only need the TIProviders section. You can put the values directly into the file or give the name of an environment variable to fetch the value from (this is shown in the XForce example above) .
Note, that the guids shown here are all dummy values, so don't try to use them.
Each provider section has a name (you can use whatever you like for this) and an number of configuration options:
Once you have your configuration set up you are ready to use the library. Reloading the new settings as shown below:
The provider_status property will list the currently loaded providers.
The method to lookup a single IoC is lookup_ioc(). Let's have a look at the parameters (you can type ti_lookup.lookup_ioc? in a Jupyter cell and execute it to get help for the method).
The observable parameter is the only required parameter - this is the suspected IoC that you want to look up. If you do not specify an ioc_type the type will be inferred. This inference uses the msticpy IoCExtract class to try to determine the IoC type using regular expressions. This is not foolproof though, so if you know the IoC type, it is always better and more efficient to specify it explicitly.
The raw output from lookup_ioc is a little terse so it's better to transform it into a more readable format. There is a built-in method called result_to_df that will convert the output into a pandas DataFrame. The number of rows in the DataFrame will depend on how many providers you have configured. In the example below, we've converted to a DataFrame and also flipped the frame using the dataframe.T (matrix transform) property to show the results from each provider in columns.
The fields returned are:
Note: due to the complexities of parsing data across different providers the accuracy of the Severity and Details may vary. You will normally want to check the full RawResult field for details about the IoC.
For the curious the raw output from lookup_ioc has the following format:
You can use the providers parameter of lookup_ioc to supply a list of providers to use.
Most providers support multiple IoC types. Many providers also offer different types of sub-query associated with some of the types. For example, GeoIP, Whois and Passive DNS are commonly provided for IP addresses, even though they are not strictly TI data.
To view the supported types for a single provider, you first need to know the provider short name - you can get this from ti_lookup.provider_status. Using the short name type the following:
You can also just list the usage for all loaded providers more simply using provider_usage().
In this last example you can see some IoC types that have ioc_query_type entries. Where sub-query types are supported you can chose to lookup just the data of this type for an IoC. The string shown here (e.g. "passivedns", "geo", etc.) is value that you supply for the optional ioc_query_type parameter mentioned above.
Here's an example with a query type requesting passive DNS data.
TILookup contains a couple of optimizations to prevent needlessly performing expensive lookups:
Sometimes you will want to lookup sets of IoCs. You can do this with the lookup_iocs() method. This is similar to the method to lookup a single IoC but takes a data parameter which holds the collection of IoCs to search for.
The input can be a pandas DataFrame, a python dict or a python iterable (such as a list).
You can specify a sub-query type (this will apply to all look-ups), a provider list and a provider scope.
Most HTTP TI providers have throttle limits for data queries. If you are using a free tier account from these providers, these limits can be easily exceeded if you submit large collections of IoCs. There is currently no throttling mechanism in TILookup. As long as there are things in the queue it will keep submitting them. So please, do yourself and your TI provider a favor and be mindful of these limits when performing bulk look-ups.
A good strategy may be restricting the provider list you use to providers where you have a paid tier supporting high lookup rates when querying large data sets. You can do this using the providers=["prov1", "prov2"] parameter. Once you have done the initial lookup, you can submit smaller numbers to other providers with lower limits for a second opinion.
This is the primary use case for the Primary configuration parameter in the msticpyconfig.yaml file (see Setting up Your Providers earlier in the article). Set your high-bandwidth providers as Primary=True and the remainder as Primary=False. You can then perform secondary look-ups on a subset of your suspected IoCs to get more detail.
The second caveat is performance-related. The HTTP requests are currently not implemented as asynchronous calls (although we plan to change this). All requests are submitted in sequence, blocking until each one returns. While this is not a huge problem for a few tens of look-ups and small numbers of providers, it will be an issue if you are searching for very large sets (provider throttling limits, notwithstanding).
One performance optimization that we have built for the HTTP provider lookup is to add a Least-recently-used cache to the look-ups. This means that you can freely re-run the same query and it will return the result without hitting the provider site. The cache size is 1024 entries and is maintained per-provider. It is a memory-only cache, so will disappear if you reset your kernel and is not shared across notebooks.
lookup_iocs always returns a DataFrame. This has the same schema as the individual IoC lookups (see
This example shows multiple URLs using all configured providers.
The final example shows that you can send a mixed set of IoCs at once.
Conclusion
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.