Forum Discussion
davinelm
Aug 08, 2022Copper Contributor
Quantity of bad emails vs legit email query
Hello, I'm trying to create a kql hunting query that will display the total number of bad emails (malware, spam, phishing) but am having some trouble combining the columns of various bad emails toget...
- Aug 10, 2022Ok, I think I understand the ask now. When you get a chance try the below query. If you don't care about clean emails you can comment \ uncomment the lines noted to make it more performant. Let me know if this works for you.
let vip = pack_array("email address removed for privacy reasons");
EmailEvents
| where RecipientEmailAddress in~ (vip)
//| where isnotempty( ThreatTypes) // Uncomment this line if you don't care about clean email count
| summarize
Phish = countif(ThreatTypes has 'Phish')
, Spam = countif(ThreatTypes has 'Spam')
, Malware = countif(ThreatTypes has 'Malware')
, Clean = countif(isempty(ThreatTypes)) // You can comment this line if you don't care about clean email count
by RecipientEmailAddress
MichaelJMelone
Microsoft
Hello davinelm,
While this sounds like it should be a simple query its actually quite complex :). Technically you're representing multiple dimensions in a single table which is great from a reporting perspective, but a bit awkward from a data perspective. To accomplish this you need to union together two tables:
- one that is a pivot of the e-mails per recipient and threat type
- one that is a sum of the e-mails per threat type
If you simply summarize the dataset you will get one row per column name which is a different schema from your format of counts per user and data type. Essentially we need to flip one of these tables on its side.
The query below unions together your pivot query above with a summary per column. To flip it I pack the rows up as JSON objects, make them a property bag, then mv-expand them to turn it back into a table. Let me know if this accomplishes what you hope!
let PerUserSummary = (
EmailEvents
| where RecipientEmailAddress has_any (vip)
| project RecipientEmailAddress, ThreatTypes
);
union (
PerUserSummary
| evaluate pivot(ThreatTypes)
), (
PerUserSummary
| extend ThreatTypes = iff(isempty(ThreatTypes), '_Empty', ThreatTypes)
| summarize count() by ThreatTypes
| project ThreatTypes, count_
| where isnotempty( ThreatTypes)
| project packed = pack(ThreatTypes, count_)
| summarize make_bag(packed)
| evaluate bag_unpack(bag_packed)
| extend RecipientEmailAddress = 'Total'
)
While this sounds like it should be a simple query its actually quite complex :). Technically you're representing multiple dimensions in a single table which is great from a reporting perspective, but a bit awkward from a data perspective. To accomplish this you need to union together two tables:
- one that is a pivot of the e-mails per recipient and threat type
- one that is a sum of the e-mails per threat type
If you simply summarize the dataset you will get one row per column name which is a different schema from your format of counts per user and data type. Essentially we need to flip one of these tables on its side.
The query below unions together your pivot query above with a summary per column. To flip it I pack the rows up as JSON objects, make them a property bag, then mv-expand them to turn it back into a table. Let me know if this accomplishes what you hope!
let PerUserSummary = (
EmailEvents
| where RecipientEmailAddress has_any (vip)
| project RecipientEmailAddress, ThreatTypes
);
union (
PerUserSummary
| evaluate pivot(ThreatTypes)
), (
PerUserSummary
| extend ThreatTypes = iff(isempty(ThreatTypes), '_Empty', ThreatTypes)
| summarize count() by ThreatTypes
| project ThreatTypes, count_
| where isnotempty( ThreatTypes)
| project packed = pack(ThreatTypes, count_)
| summarize make_bag(packed)
| evaluate bag_unpack(bag_packed)
| extend RecipientEmailAddress = 'Total'
)
davinelm
Aug 08, 2022Copper Contributor
Thank you for your reply! I tried out this query and it returned the same result from my first query minus one row. From messing around with it some more it is turning out to be a bit of a mind bender (at least for a kql noob as myself).
so just to clarify if i have columns Red, Blue, Yellow and Red and Blue and Yellow, with 1 count in each column, how to combine them in a new column called "Colors" with a count of all the 1s from each of the individual columns? As of now it works to massage the data in excel and open it in powerbi but eventually i'd like to automate this for reporting.
Thanks again for your reply.
- MichaelJMeloneAug 10, 2022MicrosoftOk, I think I understand the ask now. When you get a chance try the below query. If you don't care about clean emails you can comment \ uncomment the lines noted to make it more performant. Let me know if this works for you.
let vip = pack_array("email address removed for privacy reasons");
EmailEvents
| where RecipientEmailAddress in~ (vip)
//| where isnotempty( ThreatTypes) // Uncomment this line if you don't care about clean email count
| summarize
Phish = countif(ThreatTypes has 'Phish')
, Spam = countif(ThreatTypes has 'Spam')
, Malware = countif(ThreatTypes has 'Malware')
, Clean = countif(isempty(ThreatTypes)) // You can comment this line if you don't care about clean email count
by RecipientEmailAddress- davinelmAug 10, 2022Copper Contributor
Yes! That did it. Thanks a lot!