Log Analytics vs. AML Jupyter Notebooks for log anomaly detection

Copper Contributor

Hi there,

 

I'm new to Sentinel and as a data scientist I'm helping out in a project to detect anomalies in view logs of one of our internal systems. For example those logs would show who viewed some page in (the front end application of) an internal database system. Since those pages could contain sensitive data we're attempting to set up a system to flag potential misuse of those access/viewing rights. 

 

Although I have some ideas for the detection system (and it's going to be pretty simple for starters), I'm new to Sentinel and the whole Cybersecurity angle of approach. I was hoping you can help out sum up some of the pros and cons of using Log Analytics versus AML Notebooks for my usecase. 

 

Some background details of what I am trying to achieve:

  • I have to create some kind of alerting/signaling that will ultimately email team leads if any employee is flagged as potentially misusing access. (the emailing isn't part of my scope, but I need to be able to provide input 'flags').
  • These alerts would be generated periodically, e.g. once a month
  • We're planning to connect the data source via a connector from an Azure storage. The data itself is a bunch of logs from an internal system, so I assume there is no relevant "plug and play" solution.
  • The solution for anomaly detection for starters is going to be pretty simple: I want to compare the logs for each Id in the dataset compared to other users within the same role. E.g. see if the number of views from user X exceeds the mean for their role plus two standard-deviations -> flag them, otherwise don't. ML solutions may be desired down the line.

 

So what I figure both Log Analytics and AML Notebooks are suitable contenders for a solution but I have a hard time deciding which is the way to go.

 

LA Pros/Cons:

+ Seems easy to setup the rule via KQL (managed to do this via some dummy data in the demo environment)

+ Seem easy to automate

- I'd rather not use KQL over Python or SQL

- Limited to basic statistics and simple rules (?)

- Unclear to me where the "compute" takes place

 

AML Pros/Cons

+ Much more powerful, full suite of Python libraries, easy to expand and customize to our needs

+ Runs on specific compute instances (well, normally)

+ Could be used for further analysis of signals

- Automation seems like a pain. also seems to require a custom VM and all that, losing flexibility. 

 

Would love some input to see if I'm on the right track here and if my assumptions make any sense. Let me know what you think?

1 Reply

@NigelSt666 If you are already going to ingest the data into Microsoft Sentinel, using a scheduled rule that runs KQL would be the best overall solution.  Some of the reasons include A) KQL is designed to query the data in Microsoft Sentinel very quickly.  If you are familiar with SQL, learning KQL will not be hard at all B) it is easy to schedule the query to run (although the maximum amount of time is every 14 days).  C) You can create a playbook to execute a workflow when the incident is created to perform specific tasks.

 

While you can do all of this in a Microsoft Sentinel notebook, keep in mind that there is some additional cost since it is using a VM to perform the actions.   It is also a bit harder, currently, to execute Notebooks on a schedule.   I would recommend Notebooks if the data is not going to be stored in Microsoft Sentinel (which is something to consider.  If you do not need to ingest the data, why do it?) as a Notebook can easily access data outside of Microsoft Sentinel.