Intermediate Learning NLP Perspective R Text Analytics

Three Text Sentiment Lexicons in R’s tidytext

For this blog post, I would like to share my exploration of three different lexicons in R’s tidytext from my last post on sentiment analysis. This is also an opportunity to re-ground oneself in tidy data1 principles, and showcase the tidytext package. The simplicity and efficiency of tidytext will allow you to get creative with your analysis using three very different output options.

Using Brontë’s Wuthering Heights, the table below illustrates the output of the three lexicons. I have also included, under each of the headers, a topline description of each lexicon.

Screen Shot 2018-03-11 at 1.04.41 AM

As you can see by each output generated, the lexicon will impact how you summarize and assess your project.

It is important to note that the tidytext package is firmly grounded in tidy data principles, where “each variable is a column, each observation is a row, and each type of observational unit is a table.”1 The unnest_tokens() function breaks text into individual tokens (tokenization) with a tidy data structure; hence the output above. Couple this with the built-in stop_words function, and your text project is ready for analysis in minutes (see this in action with the github link below). All of this made me hungry for an overall approach to text mining which I hope will help you as well.

As a structuralist, I found a succinct workflow in Text Mining with R by Julia Silge & David Robinsonfurther affirming my love for tidytext. 

Screen Shot 2018-03-12 at 11.05.55 AM

As per the workflow, the next step is to summarize (and assess) the output from the three sentiment lexicons.

#summarize arc
> lxw1 %>% 
> inner_join(get_sentiments("nrc")) %>% 
> count(sentiment,sort=T)

#summarize "bing"
> lxw1 %>%
> inner_join(get_sentiments("bing")) %>%
> count(sentiment,sort=T)

#summarize "affin" opted for sort=F to see distribution across all scores
> lxw1 %>%
> inner_join(get_sentiments("afinn")) %>%
> count(score,sort=F)

Screen Shot 2018-03-11 at 11.29.57 PM

I hope this quick preview will help jumpstart your next text sentiment project and help you visualize where you want to take it. Don’t forget the workflow guide above.

As for those of you wondering if something was missed, yes, I have stopped short of the visualization which is another whole discussion!  Happy text mining!

 

 

1Hadley Wickham, “Tidy Data,” Journal of Statistical Software (Volume 59, 2014).

1 comment on “Three Text Sentiment Lexicons in R’s tidytext

  1. Pingback: What’s Your Twitter story? #TextMining #SentimentAnalysis – DataCritics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: