I've recently become a little obsessed with an online word puzzle game in which you have six attempts to guess a random five-letter word. The word changes every day, and you can only play once per day. After each guess, each of the letters in your guess is highlighted: gray means that letter does not appear in the mystery word, yellow means that letter appears in the word but not at that position, and green means the letter appears in the word at that correct position.
Here's how you can use the Linux command line to help you play guessing games like Wordle. I used this method to help me solve the January 6 puzzle:
First try
Linux systems keep a dictionary of words in the /usr/share/dict/words
file. This is a very long plain text file. My system's words file has over 479,800 entries in it. The file contains both plain words and proper nouns (names, places, and so on).
To start my first guess, I just want a list of plain words that are exactly five letters long. To do that, I use this grep
command:
$ grep '^[a-z][a-z][a-z][a-z][a-z]$' /usr/share/dict/words > myguess
The grep
command uses regular expressions to perform searches. You can do a lot with regular expressions, but to help me solve Wordle, I only need the basics: The ^
means the start of a line, and the $
means the end of a line. In between, I've specified five instances of [a-z]
, which indicates any lowercase letter from a to z.
I can also use the wc
command to see my list of possible words is "only" 15,000 words:
$ wc -l myguess
15034 myguess
From that list, I picked a random five-letter word: acres. The a was set to yellow, meaning that letter exists somewhere in the mystery word but not in the first position. The other letters are gray, so I know they don't exist in the word of the day.
Second try
For my next guess, I want to get a list of all words that contain an a, but not in the first position. My list should also not include the letters c, r, e, or s. Let's break this down into steps:
To get a list of all words with an a, I use the fgrep
(fixed strings grep) command. The fgrep
command also searches for text like grep
, but without using regular expressions:
$ fgrep a myguess > myguess2
That brings my possible list of next guesses down from 15,000 words to 6,600 words:
$ wc -l myguess myguess2
15034 myguess
6634 myguess2
21668 total
But that list of words also includes the letter a in the first position, which I don't want. The game already indicated the letter a exists in some other position. I can modify my command with grep
to look for words containing some other letter in the first position. That narrows my possible guesses to just 5,500 words:
$ fgrep a myguess | grep '^[b-z]' > myguess2
$ wc -l myguess myguess2
15034 myguess
5566 myguess2
20600 total
But I know the mystery word also does not include the letters c, r, e, or s. I can use another grep
command to omit those letters from the search:
$ fgrep a myguess | grep '^[b-z]' | grep -v '[cres]' > myguess2
$ wc -l myguess myguess2
15034 myguess
1257 myguess2
16291 total
The -v
option means to invert the search, so grep
will only return the lines that do not match the regular expression [cres]
or the single list of letters c, r, e, or s. With this extra grep
command, I've narrowed my next guess considerably to only 1,200 possible words with an a somewhere but not in the first position, and that do not contain c, r, e, or s.
After viewing the list, I decided to try the word balmy.
Third try
This time, the letters b and a were highlighted in green, meaning I have those letters in the correct position. The letter l was yellow, so that letter exists somewhere else in the word, but not in that position. The letters m and y are gray, so I can eliminate those from my next guess.
To identify my next list of possible words, I can use another set of grep
commands. I know the word starts with ba, so I can begin my search there:
$ grep '^ba' myguess2 > myguess3
$ wc -l myguess3
77 myguess3
That's only 77 words! I can narrow that further by looking for words that also contain the letter l in anywhere but the third position:
$ grep '^ba[^l]' myguess2 > myguess3
$ wc -l myguess3
61 myguess3
The ^
inside the square brackets [^l]
means not this list of letters, so not the letter l. That brings my list of possible words to 61, not all of which contain the letter l, which I can eliminate using another grep
search:
$ grep '^ba[^l]' myguess2 | fgrep l > myguess3
$ wc -l myguess3
10 myguess3
Some of those words might contain the letters m and y, which are not in today's mystery word. I can remove those from my list of guesses with one more inverted grep
search:
$ grep '^ba[^l]' myguess2 | fgrep l | grep -v '[my]' > myguess3
$ wc -l myguess3
7 myguess3
My list of possible words is very short now, only seven words!
$ cat myguess3
babul
bailo
bakal
bakli
banal
bauld
baulk
I'll pick banal as a likely word for my next guess, which happened to be correct.
The power of regular expressions
The Linux command line provides powerful tools to help you do real work. The grep
and fgrep
commands offer great flexibility in scanning lists of words. For a word-based guessing game, grep
helped identify a list of 15,000 possible words of the day. After guessing and knowing what letters did and did not appear in the mystery word, grep
and fgrep
helped narrow the options to 1,200 words and then only seven words. That's the power of the command line.
1 Comment