Cheating on Word Games

This was a class I gave on Alumni Day at NCSSM in October of 2023. Much of the audience consisted of graduates of the 20-year class of 2003, along with a motley assortment of other brigands. I was requested to do this by Jenna Ingersoll '03.

What is a UNIX Filter?

Every UNIX command has three parts.

A UNIX filter is a special type of command which accepts a file as input, processes that file, and then makes a file as output, which by default goes to the screen. Filters traverse a file line by line.

         __    
         \ \----------------------
          \    Entrails that     O\
           \   process the      ___\
            \  input file       |------<  INPUT FILE
             \ | |              ----
              --O-----------------/
                ↓
             OUTPUT FILE

These tools allow you to dig for information in a file.

The filter cat does not filter; it just spews the entire file to the screen. Note: the unix> is just the system prompt.

unix> cat chars.txt
a
b
c
d
e
f
g
h
i
j
k
l
m
n
n
o
p
q
r
s
t
u
v
w
x
y
z

A
B
C
D
E
F
G
H
I
J
K
L
M
N
N
P
Q
R
S
T
U
V
W
X
Y
Z
0
1
2
3
4
5
6
7
8
9
,
.
?
!
@
#
$
%
^
&
*
(
)
=
]
[

Here we will see an option at work. It puts line numbers in the output file.

unix> cat -n chars.txt
     1  a
     2  b
     3  c
     4  d
     5  e
     6  f
     7  g
     8  h
     9  i
    10  j
    11  k
    12  l
    13  m
    14  n
    15  n
    16  o
    17  p
    18  q
    19  r
    20  s
    21  t
    22  u
    23  v
    24  w
    25  x
    26  y
    27  z
    28
    29  A
    30  B
    31  C
    32  D
    33  E
    34  F
    35  G
    36  H
    37  I
    38  J
    39  K
    40  L
    41  M
    42  N
    43  N
    44  P
    45  Q
    46  R
    47  S
    48  T
    49  U
    50  V
    51  W
    52  X
    53  Y
    54  Z
    55  0
    56  1
    57  2
    58  3
    59  4
    60  5
    61  6
    62  7
    63  8
    64  9
    65  ,
    66  .
    67  ?
    68  !
    69  @
    70  #
    71  $
    72  %
    73  ^
    74  &
    75  *
    76  (
    77  )
    78  =
    79  ]
    80  [

The filter grep needs two arguments. The first is a search string. The second is a file. This filter will filter in all lines containing the search string, and ignore the rest.

The file scrabble.txt is a scrabble dictionary. Let's print out all lines containing the word COW.

unix> grep COW scrabble.txt
BECOWARD
BECOWARDED
BECOWARDING
BECOWARDS
COW
COWAGE
COWAGES
COWARD
COWARDICE
COWARDICES
COWARDLINESS
COWARDLINESSES
COWARDLY
COWARDS
COWBANE
COWBANES
COWBELL
COWBELLS
COWBERRIES
COWBERRY
COWBIND
COWBINDS
COWBIRD
COWBIRDS
COWBOY
COWBOYED
COWBOYING
COWBOYS
COWCATCHER
COWCATCHERS
COWED
COWEDLY
COWER
COWERED
COWERING
COWERS
COWFISH
COWFISHES
COWFLAP
COWFLAPS
COWFLOP
COWFLOPS
COWGIRL
 
   .
   .
   . 
PICOWAVED
PICOWAVES
PICOWAVING
SCOW
SCOWDER
SCOWDERED
SCOWDERING
SCOWDERS
SCOWED
SCOWING
SCOWL
SCOWLED
SCOWLER
SCOWLERS
SCOWLING
SCOWLINGLY
SCOWLS
SCOWS
STUCCOWORK
STUCCOWORKS

What is a character class?

This is a character wildcard. We will learn about some basic character classes. Most characters, with the exception of some magic characters, are wildcards representing only themselves. F'rinstance, the character a is just the letter a.

You can have a range character class. Let's make one. To use a character class as a search item, we must use the -E option for grep.

unix> grep -E '[a-f]' chars.txt
a
b
c
d
e
f

Now let's make a list character class. You just put the characters you want inside of [ ... ].

unix> grep -E '[arqs]' chars.txt
a
q
r
s

Now look at this mystery.

unix> grep -E '[Z-a]' chars.txt
a
Z
^
]
[

What is happening here? First, a bit of computer history. In the Fred and Barney days, all code was written using only upper-case letters. Lower case letters were added to the character set later.

Another interesting fact is that the characters you see (even in plain text) on the screen are illusions. Every character has a numerical value. English characters are stored in a single byte. For example the letter a is 01100001. The letter A is 01000001. This encoding is called ASCII. There is also a Wikipedia article on this encoding.

Character ranges are, in my parlance, asciicographical; to wit, characters are ordered by their numerical (byte) values.

Python allows us to see this correspondence. The ord function, given a one-character string, will tell you the numerical value for that character. The chr function goes the other way. Here I invoke Python for this demonstration.

Python 3.11.5 (main, Sep 29 2023, 18:17:13) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ord("a")
97
>>> ord("]")
93
>>> ord("Z")
90
>>> chr(91)
'['
>>> chr(92)
'\\'

Mystery solved.

Regexes

A regular expression (regex) is something you build with character classes. Its most basic operation is juxtaposition, which means "and then immediately". a character class is just a one-character regex.

Let us introduce the character class . This character class represents any single character except for a newline. We will demonstrate some simple regexes now.

Let's Cheat on an NYT Crossword!

We have a few tasty situations you might find in a crossword. Let us introduce two items into regular expressions. The character ^ means "beginning of line," and the character $ means "end of line."

Our dictionary file scrabble.txt has one word to a line. We will use the i option on grep to keep it case-insensitive.

Puzzle 1:

cat on the spot: --el-t

We have

  1. word starting
  2. a blank (one character)
  3. a blank
  4. an e
  5. an l
  6. a blank
  7. a t
  8. word ending

all in immediate succession. Now let's build our regex.

  1. word starting: ^
  2. a blank (one character): ^.
  3. a blank: ^..
  4. an e: ^..e
  5. an l: ^..el
  6. a blank ^: ^..el.
  7. a t: ^..el.t
  8. word ending^: ^..el.t$

A hunting we will go!

unix> grep -iE '^..el.t$' scrabble.txt
EYELET
OCELOT
OMELET

Our cat on the spot has spots, OCELOT

Puzzle 2:

8 lb baby: --ed--h-m---

unix> grep -iE '^..ed..h.m...$' scrabble.txt
SLEDGEHAMMER

A regular sledgehammer is 16 lb. The 8 lb version is a baby sledge.

Puzzle 3:

butter: g--t

unix> grep -iE '^g..t$' scrabble.txt
GAIT
GAST
GELT
GENT
GEST
GHAT
GIFT
GILT
GIRT
GIST
GLUT
GNAT
GOAT
GOUT
GRAT
GRIT
GROT
GUST

This looks mysterious until you remember what goats do: they butt heads. The goat is a butter!

Puzzle 4:

amative preparation: --il--e

grep -iE '^..il..e$' scrabble.txt
ANILINE
AXILLAE
EPILATE
FAILURE
GHILLIE
PHILTRE
SOILAGE
SOILURE
UTILISE
UTILIZE

Puzzle 5:

nose gutter: p...t..m

PHILTRE is the love potion.

unix> grep -iE '^p...t..m$' scrabble.txt
PHANTASM
PHILTRUM
PLASTRUM
PLECTRUM

PHILTRUM is the word.

Let's Cheat at Keyword in the WaPo!

Here is the game we solved.

image of a Keyword game

Here is what we did. We figured out all letters we could put in the blanks.

Now our regex: ^[bcefghlmopqrstv]e[lns][oi][oy][hr]$

unix> grep -iE '^[bcfghlmoprstv]e[lnst][aio][oy][hr]$' scrabble.txt
SENIOR

Boom.

Hunting in the Dictionary

Here we find all words with three Zs in the scrabble dictionary. The * after the . means "zero or more of." So we look for a Z followed by some characters (possibly none), another Z followed by more characters (possibly none), then a final Z

unix> grep -iE 'z.*z.*z' scrabble.txt
BEZAZZ
BEZAZZES
PAZAZZ
PAZAZZES
PIZAZZ
PIZAZZES
PIZAZZY
PIZZAZ
PIZZAZES
PIZZAZZ
PIZZAZZES
PIZZAZZY
RAZZAMATAZZ
RAZZAMATAZZES
RAZZMATAZZ
RAZZMATAZZES
ZIZZLE
ZIZZLED
ZIZZLES
ZIZZLING
ZYZZYVA
ZYZZYVAS
ZZZ

We also did this in the bigger file hughJass.txt.

unix> grep 'z.*z.*z' hughJass.txt
benzeneazobenzene
bezazz
bezazzes
drizzle-drozzle
fuzzy-guzzy
fuzzy-wuzzy
mezzo-mezzo
pazazz
pazazzes
pizazz
pizazzes
pizazzy
pizzazz
pizzazzes
razzle-dazzle
razzmatazz
zizz
zizzle
zizzled
zizzles
zizzling
zyzzyva
zyzzyvas

Let's Cheat at Wordle!

We began with ADIEU. Wordle told us we had an A in the right place and the E was in the wrong place. If you have a character class such as [ABC], the character class [^ABC] is anything except ABC.

So our hint gives us the regex ^A..[^E].$

unix> grep -iE '^A..[^E].$' scrabble.txt
AALII
AARGH
ABACA
ABACI
ABACK
ABAFT
ABAKA
ABAMP
ABASE
ABASH
ABATE
ABAYA
ABBAS
ABBOT
ABEAM
ABELE
ABETS
ABHOR
ABIDE
ABMHO
ABODE
ABOHM
ABOIL
ABOMA
ABOON
ABORT
  
  .
  .
  .

AWOKE
AWOLS
AXELS
AXIAL
AXILE
AXILS
AXING
AXIOM
AXION
AXITE
AXMAN
AXONE
AXONS
AYAHS
AYINS
AZANS
AZIDE
AZIDO
AZINE
AZLON
AZOIC
AZOLE
AZONS
AZOTE
AZOTH
AZUKI
AZURE

There is a lot of crap we don't want. For instance, AZURE has U. Wordle ruled that out. And AXMAN has no E, so it's a dud.

We can filter for items with an E using grep E. We can exclude the duds using grep -v [DIU]. We can connect these with a pipe (|). Videte et Spectate!

unix> grep -iE '^A..[^E].$' scrabble.txt | grep -v [DIU]
ABATE
ABEAM
ABELE
ABOVE
ACETA
AGAPE
AGATE
AGAVE
AGAZE
AGENE
AGENT
AGONE
AKELA
AKENE
ALANE
ALATE
ALEPH
ALGAE
ALONE
AMAZE
AMBLE
AMEBA
AMENT
AMOLE
AMPLE
ANELE
ANENT
ANGLE
ANKLE
ANOLE
ANTAE
APACE
APEAK
APPLE
ATONE
AWAKE
AWOKE
AXONE
AZOLE
AZOTE

Still a lot. Vanna White reminds us of the magic of RSTLNE. We eliminated R and S and O. So we revise our query as follows

unix> grep -iE '^A..[^E][^E]$' scrabble.txt | grep -v [ORSDIU] |grep E
ABEAM
ACETA
AGENT
AKELA
ALEPH
AMEBA
AMENT
ANENT
APEAK

Result was AGENT.

Where do I get this tool?

Go to Getting BASH to find out. This tool is on all Macs, and you can install the UNIX subsystem on Windoze machines.