Forum Discussion
matt0020190
Nov 07, 2024Brass Contributor
Searching column cells for similar text values and summarising in another column
Hi Experts
I hope you can help here. To be honest without Macros not sure this is possible with formulas?
I attach an example workbook with my desired results from the data.
Basically we have a support ticket system for users and can export the "summary" or "titles" for each ticket. We want to have a formula to analyse the column with all the ticket titles and produce a list of common similar themed topics, and the number of deemed occurrences.
Now I know this can be done either looking for an exact text word match but the difficulty comes where I am hoping excel can attempt to categorise similar patterns of words / phrases / text.
Any solution via formula or am I expecting too much?
Thank you for your help!
so I have a couple of thoughts. First off the example was very simple in the sense that each output was basically based on a key word. So you could use:
=COUNTIFS($B$2:$B$10,"*fresh*")
to count all items with "fresh" somewhere in the topic. you could create a column next to your output table with a list of keywords so then you could use:
=COUNTIFS($B$2:$B$10,"*"&D2&"*")
where D2 is the keyword "fresh" or "freshness" or whatever you want
you can even make it an array output using:
=COUNTIFS($B$2:$B$10,"*"&D2:D4&"*")
you could get more fancy and have a list of possible words and use something like:
=LET(list,TEXTSPLIT(C2," "),SUM(--(MMULT(--ISNUMBER(SEARCH(list,$B$2:$B$10)),SEQUENCE(COLUMNS(list),,1,0))>0)))
where this will break each word in C2 up and check if it exists in each row of the inputs and then OR all those together to find how many rows have at least 1 of those key words. In your example this procudes a wrong result for the 3rd row as "issues" is found in another line.
there are MANY other options and roads you could go down with excel functions and could simulate your macro solution very closely if not the same.
That all said another option is to add the Excel Labs plug-in and use the Labs GenerativeAI plugin and let an AI service help determine the results:
=LABS.GENERATIVEAI("does """&B2&""" have to do with """&H1&"""?")
this requires you to have an AI account and give the plug-in an access key
- matt0020190Brass Contributor
Hi m_tarler
Thanks again for all your guidance on the above.
The code has been working well but I now need to add an IF condition to the formula.
I tried changing the the SUM to SUMIF but it doesn't work.
I need to perform the search in $B$2:$B$10 only IF the date in $C$2:$C$10 matches a month defined in F1
Are you able to advise where this condition could be added to the formula?
My other question would be, is it possible with another formula to actually list those rows matching the count?
- m_tarlerBronze Contributor
Ok so it would seem the prior solution was:
=LET(list,TEXTSPLIT(C2," "),SUM(--(MMULT(--ISNUMBER(SEARCH(" "&list&" "," "&$B$2:$B$10&" ")),SEQUENCE(COLUMNS(list),,1,0))>0)))
so lets build on that:
=LET(list,TEXTSPLIT(C2," "), data, $B$2:$B$10, dates, $C$2:$C$10, month, $F$1, fdata, FILTER(data, (MMULT(--ISNUMBER(SEARCH(" "&list&" "," "&data&" ")),SEQUENCE(COLUMNS(list),,1,0))>0) * (dates> EOMONTH(month,-1) ) * (dates <= EOMONTH(month,0) ), "none found") IF( TAKE(fdata,1)="none found" , 0 , ROWS(fdata) ) )
so on row 6 is the same check used before but instead of doing a 'sum' of the 0's and 1's it produces, I use it as a filter and multiply it by line 7 which is a filter for the date range (assuming F1 is an actual date in the month of interest).
Line 9 wil then output how many rows were found
To output the actual rows just change row 9 to "fdata" to return that filtered range.
hope that makes sense and works
- matt0020190Brass Contributor
m_tarlerThanks again for your advice as always.
I tried your latest formula and I keep getting a "too few arguments for this function" error. I cant see where the error might be. Any ideas?
- matt0020190Brass Contributor
Thanks very much for this very helpful reply.
I ended up using the powerful LET function example above that did the trick.
Only thing that could be better is I like how the terms are searched within each cell from the list. However if one of my search terms in the list is for example "topic", all words in the cell with this sequence are still flagged e.g "topical" or "atopic".
Is there a way to ensure the searching from the list is only for exact matches?
- m_tarlerBronze Contributor
First off I'm glad this is helpful. As for a tweak to make it match words exactly might be:
=LET(list,TEXTSPLIT(C2," "),SUM(--(MMULT(--ISNUMBER(SEARCH(" "&list&" "," "&$B$2:$B$10&" ")),SEQUENCE(COLUMNS(list),,1,0))>0)))
but of course then "topic" will NOT find "topics" or "topic-1" or "1)topic"
fyi what i did was add a " " before and after each search term AND before and after each cell so if the cell has "topic" as the beginning, end or only word it will still be found.
- matt0020190Brass Contributor
Thanks again, I thought it worked at first but the above does produce some strange results. For example if I search the column manually, I sometimes find less than what this formula does. Maybe something to do with my list?
I have one cell with search terms a few examples:
store splits, regional splits
security, guarding, guards, SG
FF4, Fit for 4, FFT, Fit for trade
With this new formula, is it searching exactly for "store splits" or "store" and "splits" separately?
A little confused as I want every term separated by commas searched exactly.
If I remove the space between the terms ie. "store splits,regional splits" I get less results so unsure what is expected if you can clarify?
- m_tarlerBronze Contributor
so I have a couple of thoughts. First off the example was very simple in the sense that each output was basically based on a key word. So you could use:
=COUNTIFS($B$2:$B$10,"*fresh*")
to count all items with "fresh" somewhere in the topic. you could create a column next to your output table with a list of keywords so then you could use:
=COUNTIFS($B$2:$B$10,"*"&D2&"*")
where D2 is the keyword "fresh" or "freshness" or whatever you want
you can even make it an array output using:
=COUNTIFS($B$2:$B$10,"*"&D2:D4&"*")
you could get more fancy and have a list of possible words and use something like:
=LET(list,TEXTSPLIT(C2," "),SUM(--(MMULT(--ISNUMBER(SEARCH(list,$B$2:$B$10)),SEQUENCE(COLUMNS(list),,1,0))>0)))
where this will break each word in C2 up and check if it exists in each row of the inputs and then OR all those together to find how many rows have at least 1 of those key words. In your example this procudes a wrong result for the 3rd row as "issues" is found in another line.
there are MANY other options and roads you could go down with excel functions and could simulate your macro solution very closely if not the same.
That all said another option is to add the Excel Labs plug-in and use the Labs GenerativeAI plugin and let an AI service help determine the results:
=LABS.GENERATIVEAI("does """&B2&""" have to do with """&H1&"""?")
this requires you to have an AI account and give the plug-in an access key