In the last few weeks I have been hearing more and more prospects and customers ask about our response service-level agreement (SLA), and, to be honest, it’s great and about time! Security operators, incident responders, or security analysts should all have an easy way to evaluate their work. And what’s better than well-defined key performance indicators (KPIs)?
When we meet prospects, some of the things we ask them are "What’s your mean time to detect?", "What’s your mean time to investigate?", and "What’s your mean time to respond and mean time to repair?", and the most common response is: “What? We don't formally track our time … we think it's X minutes/hours.”
The next question we would usually ask is, "So how do you know that you are doing a good job?", "How do you know that you are improving/progressing from time to time?"
Answers we would get to these questions are:
My job is to "clean" the queue (SIEM/Product Specific)
We are overwhelmed anyway so we are doing what we can.
As long as there is no breach on my network, I am doing my job
As long as my company is not on the front page of the newspaper I’m fine
Usually, the last question would be, "Have you heard about the "golden hour"?". The answer was usually “Yes”.
In case you’re not familiar with the term, “golden hour” originated from hospitals. From Wikipedia: "The golden hour, also known as golden time, refers to the period of time following a traumatic injury during which there is the highest likelihood that prompt medical and surgical treatment will prevent death". The term was used in the early 2010s to express how important it is to respond to/repair a cybersecurity threat in the first hour.
Fast forward to today's world: the term “golden hour” was deemed a buzzword and vendors have stopped using it. The concept was not lost, though; it’s just called something else. Nothing really changed, based on our perception, and it should. I think we should learn from our past, so we can progress: although the concept of the golden hour is great for medicine, it’s time to re-evaluate how it’s used in cybersecurity
In some of the attacks we’ve seen in the last few years (WannaCry, Petya, Bad Rabbit, etc.), the "golden hour" concept just doesn't hold. On top of malware threats that spread quickly, we have seen more and more attackers using automation (we provide some examples in this blog).
We should go back to basics: we need to do our best to prevent attacks from happening in the first place. And just because it's fun I’ll now provide you with one analogy from the software development world:
If you really like to secure your organization, I would recommend the following (divided into three categories)
Known knowns - We can't continue to ignore this area (Yes, it’s less ), but most (not all) of you problems (threats) could be solved or dramatically reduced by implementing a vulnerability management program, exploit mitigations, secure configuration (fix all misconfiguration, security hardening), and by building secure perimeters (network perimeter, physical devices, application control, etc.).
Known unknowns - In this area our mindset should be, "We haven't seen this before, but we have the tools to stop it". We should take into consideration solutions like machine learning-driven next-generation antimalware, automatic containment (i.e., conditional access), and automatic investigation and remediation solutions to reduce risk as fast as possible
Unknown unknowns - In this area we should focus on visibility and expert assistance: best-in-class endpoint detection and response (EDR) solution and managed detection and response (MDR) services that have your back
One last recommendation is to define a set of KPIs and to monitor your efficiency and progress. You can start with the following:
Mean time to detect - The average time it takes to recognize a threat
Mean time to Investigate - The average time it takes to analyze the potential threat and identify if any further response actions are needed
Mean time to respond - the average time it takes to response and stop a potential threat from spreading or creating any damage.
Mean time to repair/recover - The average time it takes to restore to normal operations
I hope you found this blog interesting and worth your time. Use the comments to give feedback and share your thoughts on KPIs that you use in your organization. (Hint: We might have another blog in the oven on the "Top six security KPIs organizations should track".)