Lookup five characters appearing in any order in a wordlist

Copper Contributor

Hi, all,

 

I have a wordlist and want to write a formula that will find every word that contains five alphabet characters in any order.

 

Here is my formula. It indicates what the first letter must be (I1) and what letters must be in the word (anywhere from three to five characters). My formula only searches for words in the wordlis that contain the characters appearing in the same order as in the formula. If I search for w-o-n-d-e, it will find wonderful, but it won't find endow.

 

Column E is a string containing the letters to search for. Columns R - V show each character in the string in a separate cell.

 

=IF(C1>=9,VLOOKUP(I1&"*"&LEFT(E1,1)&"*"&MID(E1,2,1)&"*"&RIGHT(E1,1)&"*",$A$2:$F$1200,1,0),IF(C1=8,VLOOKUP(I1&"*"&S1&"*"&R1&"*"&U1&"*"&T1&"*",$A$2:$F$1200,1,0),VLOOKUP(I1&"*"&LEFT(E1,1)&"*"&MID(E1,2,1)&"*"&MID(E1,3,1)&"*"&MID(E1,4,1)&"*"&RIGHT(E1,1)&"*",$A$2:$F$1200,1,0)))

 

Can someone show me a formula that will work?

 

Also, even if the order of the characters happens to be right, it will only find the first instance of a word that contains all the characters. I would like to have a list of all the words that contain all the characters.

 

Thanks,

Michael

23 Replies

@mhlester 

Could you attach a small sample workbook demonstrating the problem (without sensitive data), or if that is not possible, make it available through OneDrive, Google Drive, Dropbox or similar?

Hi, Hans,

Thank you for your reply. Here is a link to the complete dropbox file.

https://www.dropbox.com/scl/fi/otl468c3tqoap0cnetd66/Scrabble-Dicitionary-for-Letter-Boxed-230705-2....

The spreadsheet is intended to solve the New York Times Letter Boxed puzzle, which puts 12 unique alphabet characters around a square, three to a side. You must use all twelve letters in as few words as possible. No two characters from the same side of the square can be adjacent in any of the words. (I only want to solve for two words even though the puzzle lets you solve in more than two words.) The second word must start with the last letter of the first word.

Column H is a solver but it can only solve one row at a time and I have to change the formula to reflect the row number each time. Too much work! I would like to automate that.

Columns I through L are a second solver that could work to find the first legal word for each word in the list if the lookup criteria didn't have to appear in the same order as in the second word of the pair.

Column A = wordlist
Column B = words from the wordlist that contain one or more of the twelve letters
Column C = number of unique letters in the word in each row
Column D = filled with the twelve letters from the day's puzzle
Column E = letters that are missing from the words in each row
Column F = the unique letters in each row
Column G = repeat of the wordlist (can delete)

Thanks, Michael

@mhlester 

O dear, that's far too complicated for me, sorry! I hope that the sample workbook enables someone else to help you.

You may want to include a stripped down workbook with less formulas. All that's really needed is a word list and some examples of the desired return (even if done manually). The workbook you linked above has a ton of resource-heavy formulas that make using the workbook very difficult.

@Patrick2788 

 

Hi, Patrick,

 

Thank you for your reply.

 

Here is a screenshot with no formulas containing what I believe is all the information necessary to solve the problem from scratch and showing what the results might look like.

 

mhlester_0-1688762221355.png

 

@mhlester 

You say you are using 365, in which case formulae such as 

= LET(
      characters, TEXTSPLIT(letters,"-"),
      hasChr?,    ISNUMBER(FIND(characters, wordlist#)),
      wordMatch?, BYROW(hasChr?, LAMBDA(x, AND(x))),
      FILTER(wordlist#, wordMatch?)
  )

might provide a good starting point.

Hi, Peter,
Thank you for this. I don't how to implement. Would you mind explaining a bit more how this would work with the sample spreadsheet I sent in reply to Patrick? In the meantime, I will do some research on the functions that are unfamiliar to me.

@mhlester 

 

Column B = words from the wordlist that contain one or more of the twelve letters

What do you mean one or more?

In the attached from Peter Bartholomew ,I have found All 5 letters matched at the same time not include 1,2,3,4 these 4 scenarios.

 

 

About the picture you uploaded,why diptych and forward match? 

Are they the expected result?

 

I am really confused about the expected result?

 

Can you give more detail explains by an example?

 

Can you explain 

@peiyezhu 

 

The word list is long... 214,066 words. First, we need to eliminate every word that has adjacent letters from the same side of the square and every word that includes one of the 14 letters of the alphabet that are not included in the puzzle square. That will narrow the list down to 2,000 (give or take 1,000) words.

 

From the remaining list, the challenge is to find the fewest number of words combined that use all the letters in the puzzle squared. I shoot for two words. That is the real challenge.

 

Today's letters are

 

Top of puzzle square - YOD

Right side of puzzle square - WHI

Bottom of puzzle square - ATC

Left side of puzzle square - FRP.

 

"Forward" contains only letters from the puzzle square (in red), no letters that are on the same side of the square are adjacent to each other, and no letters that aren't in the puzzle, so it meets all the rules.

 

"Diptych" contains all the rest of the letters in the puzzle square, no letters that are on the same side of the square are adjacent to each other, no letters that aren't in the puzzle, and it starts with "D," the last letter of "Forward," thus meeting all conditions of the puzzle.

 

Forward and Diptych are one matching pair, but there could be others, and they all must meet the same conditions.

 

On some days, It might be possible for there to be a word that uses 11 of the letters in the square and meets all other conditions of the puzzle. In that case, a two-letter word could be a match if it starts with the last letter of the first word and uses the twelfth letter in the square.

 

The second word can re-use letters from the first word as long as it also uses all the remaining letters from the square and meets all the other conditions.

 

I was mistaken about a one-letter word, because if the first word uses all twelve letters and meets all the other conditions, the puzzle is solved with just the one word. It's rare, though.

 

Thanks,

Michael

@mhlester 

 

 

 

Suppose we have

bsa

olu

te

xit

 

expected result:

absolute

exit

 

Or

absolute

exist

 

Just include like absolute

exist

 

or must match fully no redundancy letter like absolute

exit ?

 

The first letter of the second word should be the last letter of the first word?

 

Only 2 words match whole puzzle square

How about 3?

 

Re:

, we need to eliminate every word that has adjacent letters from the same side of the squar

 

Can you provide a sample of this sentence?

eliminate every word that has adjacent letters like

  boundless

 

with adjacent letter ss?

 

Re:

The word list is long... 214,066 words. 

 

your dictionary has 214,066 words?

But my dictionary on hand only 13,532 words and phrases

Screenshot_2023-07-08-13-25-45-175_cn.uujian.browser.jpg

 

 

@peiyezhu 

 

Your third line "te" is missing a letter, but we can ignore that for our purposes.

 

*Yes, "absolute - exist" is okay. There can be redundancy.

*Yes, the first letter of the second word must be the last letter of the first word.

*Yes, you can use all the puzzle letters with three words "bobs - salute - exist," but I don't care to solve for that. I only want to solve for two words.

*Boundless is illegal because of the two ss. In your example letters, boundless would also be illegal because o and u are adjacent, and they appear together on one side of the square "olu."

*If your wordlist is too short, solutions may not be possible. I use the Scrabble word list. It doesn't have all the words that the puzzle maker uses, but it's close enough.

 

 

 

@mhlester 

Sorry, this task is going to take too long.  So far all I have done is filter the original scabble list to return only words comprising letters taken from the square.  The adjacent letters rule is not as straightforward but most likely your excluded character pairs approach will prove effective.

 

The new Excel functions LET and LAMBDA enable the development of array solutions that do not look remotely similar to any past spreadsheet practice.  LET introduces local variables scoped to the formula.  LAMBDA builds on that to allow values to be passed to such variables by using an argument list.  The result is to allow a spreadsheet formula to be laid out like a program module, calling other Lambda functions where required.

because o and u are adjacent, and they appear together on one side of the square "olu."
The adjacent letters rule is not as straightforward
why o and u are adjacent?


at last summary:
Have 12 unique letters.
Divided to 2 or 3 groups, each group should be part of an English word regardless of sequence。

The expected result are two English or 3 English words.
Each word in those 2 or 3 words,all letters should be unique. e.g. class boundless are illegal.
And the last letter of the first word should be the first letter of the second word and the first letter of third same as the last of second word. head letter same as former foot letter.

@PeterBartholomew1 

 

I've come a lot closer to solving the problem. All I am missing now is a way to find all solutions for each valid word in the list. VLOOKUP and XLOOKUP will only return the first match in the list. I tried using Filter in place of XLOOKUP but got a #calc error.

 

Here is a screenshot from the spreadsheet after hiding the dozen or more columns that do all the work.

 

mhlester_0-1688861176879.png

 

Gauche is also a valid word with a solution, but the formula only finds the first instance that meets the conditions, and that word happens to be invalid, so gauche is excluded from the list.

 

I would love to see an example of the LET LAMBDA combo as it applies to this problem.

 

Thanks for your help,

Michael

@peiyezhu 

 

Hi,

 

o and u are adjacent in the word doubtless, and o and u come from the same side of the square. The letters on the square can be in any order. We only look at the word for adjacency.

 

The pair, class - boundless are illegal because they have adjacent letters... s s and o u.

 

It is possible to solve the puzzle with two or more words that use letters from the square, each word's letters unique as to the other words, but that is not necessary. As you can see in the list below, letters from the square can be used multiple times in all the words that solve the puzzle, so long as none of the words have adjacent letters from the same side.

 

As I indicated previously, I only care about two-word solutions.

 

I am still working on a way to find all matches, not just the first one in the list.

Thank you for your help.

 

mhlester_0-1688861928594.png

 

@mhlester 

Worn out!

@PeterBartholomew1 

 

Don't blame you! I can't let it go, yet. I want to find all the instances of matching pairs that are in my dictionary... 

@PeterBartholomew1 

@HansVogelaar 

@Patrick2788 

@peiyezhu 

 

I was finally able to make Filter work to find all possible matches for all legal words with this formula.

 

=IFERROR(TRANSPOSE(FILTER($A$2:$A$3570,ISNUMBER(SEARCH(AA451,$F$2:$F$3570)))),"")

 

The word in column AH in the sheet snippet below is the first word to solve the puzzle and the words in columns AI, AJ, et. seq. are candidates for the second word.

 

I could not figure out how to make the formula return only words that start with the last letter of AH or end with the first letter of AH. Only the words highlighted in yellow in cells AI, AJ, et. seq., in the sheet below (a small snippet of the full list) solve the problem.

 

Will someone please help me revise the formula to only find legal words (in AI, AJ, et. seq.) that start with the last letter of the first word (AH) or end with the first letter of the first word (AH)?

 

Addendum:

 

=FILTER($AK$451:$JL$451,RIGHT($AK$451:$JL$451,LEN(AI451))=AI451,”No Results!”)

 

This filter would filter the results of the above filter to find only the words that end with the first letter of the first word. So, I am part way there. I just need to add the filter that also finds any words that start with the last letter of the first word. Maybe these filters could be nested? That would be it, I think.

 

mhlester_0-1688974399401.png

 

@mhlester 

The formula used in the solution I posted above was

...
firstWord,    TOCOL(IF(allChar=maxChar, legalList, NA()), 3),
secondWord,   TOCOL(IF(allChar=maxChar, TRANSPOSE(legalList), NA()), 3),
FILTER(HSTACK(firstWord, secondWord), RIGHT(firstWord, 1) = LEFT(secondWord, 1))

The next task would be to optimise the formula to improve performance.

For example, I checked every character within a Scrabble word to see whether it was drawn from the Square.  Had I followed your suggestion and checked for the occurrence of other characters, then recursion would allow one to bail out before the whole word is validated.