Following my post on repeated phrases in HBO’s Westworld, I was curious to see what else I could do with the transcipts dataset (made available as an .RData file here). In that vain, I attempted to make a word cloud for each episode. However, even removing common english words, word clouds based solely on word frequency were completely boring. There was too much common spoken terms to get any information about what made that episode unique, especially in the context of the show.

The final metric that I settled on for the below word cloud is a measure of how frequent a word appears in a given episode, relative to how common it is in the show. The exact formula is given below:

\begin{aligned} \mbox{Metric} &= (\mbox{Freq in Episode}) * (\mbox{Relative Freq in Episode}) \\ &= (\mbox{Freq in Episode}) * {(\mbox{Freq in Episode}) \over (\mbox{Freq in All Episodes})} \end{aligned} where $$(\mbox{Freq in All Episodes})$$ includes the frequency for all episodes up to and including the episode of interest. This metric lead to a rather interesting word cloud, which seems to highlight words important to the episode, which give a good sense for what the episode was about. For those curious about the color scheme, I use a color palette based on the episode cover image (see above), and color all words with a Metric of 1 or more.