Forum Discussion
Content Search: Stacking Keyword Groups
I am trying to create a Content Search for a data subject request, and I am having a really hard time building out my KQL. The issue is that I need to stack two sets of keyword searches, but the estimated results are wildly high so I feel I must be doing something wrong.
In English, the search requirement is (using example keywords):
Find all mail or SharePoint content where:
- Keywords include (Max OR John OR Sally)
- AND
- Keywords include (White OR Black OR Red)
- AND
- Date between Jun 2024 and Nov 2024
I have tried all different forms of this KQL, but I've essentially come up with this:
((Max John Sally) AND (White Black Red)) AND (Date=2024-06-01..2024-11-04)
Does anybody have an idea where I'm going wrong?
6 Replies
- FavoriteMartianCopper Contributor
I've been having similar issues for a while with eDiscovery advanced logical searches. The KQL editor is very awkward and I'm quite sure possessed by a playful demon. It's when trying to AND 2 sets of keywords that I have issues. I tried to force and it removed (keyword=word) from the submitted KQL query. Trying to manually add (c:c) or (c:s) seems to be a further nightmare.
Something like the below logic where I have 1 primary keyword to be AND'd with a set of keywords that have an OR between them. Using simple query building like Word1 AND (Word2 or Word3) breaks search statistics.
Keywords=Martian
AND
(Keywords=Exchange)
(c:s) Keywords=microsoft
(c:s) keywords=powerpoint
AND(Received=2024-10-01..2025-01-11)
- Jason E. HeiserIron Contributor
FavoriteMartian "playful demon" 🤣 I often feel that we have the same issues with those playful demons.
It's great that you seem to have had success combining KQL with the auto (c:c) or (c:s). I had explicitly tried to stay away from that because one of the documents in Learn stated that it wouldn't work.
Copilot finally helped me figure out something that at least provided expected results, though I don't think I believe they're 100% accurate. I ended up having to write out different combinations of Subject: and Keyword:
For instance, this provided the best results (note I even had to break out the date component because using the standard syntax (Received=Date..Date) was returning dates outside of my range)
((Subject:"Max" OR Subject:"John" OR Subject:"Sally") OR (Keywords:"Max" OR Keywords:"John" OR Keywords:"Sally"))
AND
((Subject:White OR Keywords:White) OR (Subject:Black OR Keywords:Black) OR (Subject:Red OR Keywords:Red))
AND
((Received>=2024-06-01 AND Received<=2024-11-01))
Looks OK, and should apply your logic above. I suppose you'll have to go over the results and try to figure out if there are any false matches therein.
One thing that comes to mind is to split the query per workload, as I'm not sure the "date" keyword is the best choice for email items.
- Jason E. HeiserIron Contributor
Thanks, Vasil - I feel privileged to get a response from the man himself! 😁
Good idea on the date property - I have switched it to Received and am segregating the Exchange and SharePoint searches now, but unfortunately it yields no better results - there are tens of thousands of hits that have none of the keywords.
It simply seems that the Content Search cannot handle two sets of Keyword searches. (I have tried with the new and old content search experiences) 🤬
If the first set corresponds to users within the organization, perhaps try limiting the search scope to just those mailboxes/ODFB sites instead and remove the keywords?
Tony Redmond is quite proficient with eDiscovery, he might have better ideas.