Words have always been important when it comes to communicating concepts and emotions. Given the short attention span with which we now consume words on social media platforms, the choice of what words to use has become even more pressing.
I wanted to see how word selection, either individually or in context, could impact an emotional response. This is my search for love, through text mining in R, using “the greatest love story ever written,” Wuthering Heights (1847) by Emily Brontë.
I started with the prescribed textbook approach and came up with a bag of words, or rather a sack of WTH?!
I just needed context as to what emotions some of these words evoke, plain and simple.
As I delved further, I stumbled on Text Mining with R by Julia Silge & David Robinson as well as her incredible blog, juliasilge.com. Many thanks for the inspiration and help in the second half of my quest—namaste.
With the use of R’s nrc lexicon in tidy text, the sentiment words evoked 3,478 moments/sentiments of which only 43% were “positive,” a.k.a. feelings of love (see below). [Heads up, just for the sake of optimism, I categorized “surprise” at 4% as a “positive.”]
# Ranked Sentiment & Sentiment Word cloud > twh %>% #file inner_count(get_sentiments("nrc")) %>% #choice lexicon of three available count(sentiment,sort=T) %>% #stop here for tibble above with(wordcloud(sentiment,nn, max.words=100, random.order=FALSE,rot.per=0.35, colors=brewer.pal(8,"Dark 2")))
The sentiment word cloud of Wuthering Heights below summarizes what to expect when one does find the greatest love of all. I hope I didn’t put Whitney Houston in your head right at this moment! That said, I see potential future blog(s) to explore further, like looking into the emotive cycle of the book itself, comparing this to other love stories or even other classics, etc. In the meantime, I am contemplating my next post on tidytext and text or sentiment mining! Thank you for reading.
Preview the full R script & before you leave check out check out caption for the picture of Wuthering Heights for a good chuckle. IMPORTANT NOTE : There are two text-mining Option 1 oldskool & Option 2 with ever efficient tidytext : Terry’s ‘WHeights’ Project on GitHub