Forum Discussion
Help counting duplicate values/words
Amber,
Back to this. I don't know if you are familiar with Power Query / Get & Transform in Excel, but with it the task could be solved quite easy. As the basis we may take published by Chris Webb solution Finding Shakespeare’s Favourite Words With Data Explorer (Power Query) which is very good explained and could be applied in your case practically without changes.
It is in the attached file. Simply copy/paste your text strings into the table Answers, in Ribbon->Data->Refresh All and the list of multiply words is in the next sheet.
Code of the query whic performs the task is below, it's practicaly self-explained, more details in the above blog. If you know Power Query at least a bit you may easy modify it if necessary, for example add more words to exclude from the calculation (in general could be taken from another table in Excel, but takes time to implement).
/*
That is literally Chris Webb's solution
"Finding Shakespeare’s Favourite Words With Data Explorer (Power Query)"
published on March 15, 2013 at
https://blog.crossjoin.co.uk/2013/03/15/finding-shakespeares-favourite-words-with-data-explorer/
*/
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
OurList = Table.ToList(Source),
RemoveLineBreaks = Lines.ToText(OurList, " "),
RemovePunctuation = Text.Remove(RemoveLineBreaks,
{
"," ,"." ,"?" ,";" ,":" ,";" ,"'",
"@" ,"#" ,"~" ,"{" ,"[" ,"}" ,"]" ,"(" ,")", "*"
}
),
Lowercase = Text.Lower(RemovePunctuation),
TurnIntoTable = Table.FromValue(Lowercase),
SplitIntoWords = Table.TransformColumns(TurnIntoTable, {"Value", Splitter.SplitTextByWhitespace()}),
ListOfWords = SplitIntoWords{0}[Value],
TableFromList = Table.FromList(ListOfWords, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
RenameColumnToWord = Table.RenameColumns(TableFromList,{{"Column1", "Word"}}),
RemoveBlanks = Table.SelectRows(RenameColumnToWord, each (
[Word] <> "" and
[Word] <> "1" and
[Word] <> "2" and
[Word] <> "3" and
[Word] <> "4" and
[Word] <> "5" and
[Word] <> "6" and
[Word] <> "7" and
[Word] <> "8" and
[Word] <> "9" and
[Word] <> "0" and
[Word] <> "i" and
[Word] <> "me" and
[Word] <> "and" and
[Word] <> "are" and
[Word] <> "be" and
[Word] <> "the" and
[Word] <> "to" and
[Word] <> "was"
)
),
FindWordCounts = Table.Group(RemoveBlanks, {"Word"}, {{"Count", each Table.RowCount(_), type number}}),
SortedRows = Table.Sort(FindWordCounts,{{"Count", Order.Descending}}),
RemoveSingleWords = Table.SelectRows(SortedRows, each [Count] > 1)
in
RemoveSingleWords
- Amber McCurtisJun 10, 2017Copper ContributorThank you! This is really over my head. Maybe I can put the text in Microsoft word and accomplish the same thing?
- SergeiBaklanJun 10, 2017Diamond Contributor
Amber, you may use Microsoft Word, whatever, if you are only able to save your text as .txt file. Since you don't need formatting for frequency analysis that shall not be an issue. With Word that's simple, you always may do Save As into txt.
From Power Query solution point of view that's easy, just couple more lines in the script. I'll update in a while.
- SergeiBaklanJun 10, 2017Diamond Contributor
Hi Amber,
Perhaps in your case is better to use available frequency analysis tools, for example this free one http://countwordsfree.com/. You just paste your text into the web from and receive back a lot of statistic, includes frequency analysis.
However, I updated the Excel solution as well, see attached file. To use it you shall put full path to your txt file in B2, minimum length of words to count in next down cell and into the right table words which you would like to exclude from statistics. Don't forget to Refresh All after any change:
I won't explain the script here, it was slightly modified compare to previous post.