Forum Discussion
Amber McCurtis
Jun 09, 2017Copper Contributor
Help counting duplicate values/words
Hello, I have a worksheet with about 80 responses that are in a sentence format that I am trying to figure out how to count and sort keywords that are used multiple times. These are verbatim respons...
SergeiBaklan
Jun 09, 2017MVP
Amber,
Back to this. I don't know if you are familiar with Power Query / Get & Transform in Excel, but with it the task could be solved quite easy. As the basis we may take published by Chris Webb solution Finding Shakespeare’s Favourite Words With Data Explorer (Power Query) which is very good explained and could be applied in your case practically without changes.
It is in the attached file. Simply copy/paste your text strings into the table Answers, in Ribbon->Data->Refresh All and the list of multiply words is in the next sheet.
Code of the query whic performs the task is below, it's practicaly self-explained, more details in the above blog. If you know Power Query at least a bit you may easy modify it if necessary, for example add more words to exclude from the calculation (in general could be taken from another table in Excel, but takes time to implement).
/* That is literally Chris Webb's solution "Finding Shakespeare’s Favourite Words With Data Explorer (Power Query)" published on March 15, 2013 at https://blog.crossjoin.co.uk/2013/03/15/finding-shakespeares-favourite-words-with-data-explorer/ */ let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], OurList = Table.ToList(Source), RemoveLineBreaks = Lines.ToText(OurList, " "), RemovePunctuation = Text.Remove(RemoveLineBreaks, { "," ,"." ,"?" ,";" ,":" ,";" ,"'", "@" ,"#" ,"~" ,"{" ,"[" ,"}" ,"]" ,"(" ,")", "*" } ), Lowercase = Text.Lower(RemovePunctuation), TurnIntoTable = Table.FromValue(Lowercase), SplitIntoWords = Table.TransformColumns(TurnIntoTable, {"Value", Splitter.SplitTextByWhitespace()}), ListOfWords = SplitIntoWords{0}[Value], TableFromList = Table.FromList(ListOfWords, Splitter.SplitByNothing(), null, null, ExtraValues.Error), RenameColumnToWord = Table.RenameColumns(TableFromList,{{"Column1", "Word"}}), RemoveBlanks = Table.SelectRows(RenameColumnToWord, each ( [Word] <> "" and [Word] <> "1" and [Word] <> "2" and [Word] <> "3" and [Word] <> "4" and [Word] <> "5" and [Word] <> "6" and [Word] <> "7" and [Word] <> "8" and [Word] <> "9" and [Word] <> "0" and [Word] <> "i" and [Word] <> "me" and [Word] <> "and" and [Word] <> "are" and [Word] <> "be" and [Word] <> "the" and [Word] <> "to" and [Word] <> "was" ) ), FindWordCounts = Table.Group(RemoveBlanks, {"Word"}, {{"Count", each Table.RowCount(_), type number}}), SortedRows = Table.Sort(FindWordCounts,{{"Count", Order.Descending}}), RemoveSingleWords = Table.SelectRows(SortedRows, each [Count] > 1) in RemoveSingleWords
Amber McCurtis
Jun 10, 2017Copper Contributor
Thank you! This is really over my head. Maybe I can put the text in Microsoft word and accomplish the same thing?
- SergeiBaklanJun 10, 2017MVP
Amber, you may use Microsoft Word, whatever, if you are only able to save your text as .txt file. Since you don't need formatting for frequency analysis that shall not be an issue. With Word that's simple, you always may do Save As into txt.
From Power Query solution point of view that's easy, just couple more lines in the script. I'll update in a while.
- SergeiBaklanJun 10, 2017MVP
Hi Amber,
Perhaps in your case is better to use available frequency analysis tools, for example this free one http://countwordsfree.com/. You just paste your text into the web from and receive back a lot of statistic, includes frequency analysis.
However, I updated the Excel solution as well, see attached file. To use it you shall put full path to your txt file in B2, minimum length of words to count in next down cell and into the right table words which you would like to exclude from statistics. Don't forget to Refresh All after any change:
I won't explain the script here, it was slightly modified compare to previous post.
- Amber McCurtisJun 14, 2017Copper Contributor
Thank you so much Sergei! This was perfect and did the job. I have to figure out how to exclude certain words but at least I have the numerical information. You are the best!