elxn2018 Intermediate Learning NLP R Text Analytics

#elxn2018 in R: Ontario Tweeters’ Top Priority is Healthcare

Using only a few lines of code with twitteR and tidyverse

I am NOT politically savvy, and I decided the quickest way to know more about the upcoming June elections in Ontario was to look at what is being discussed on Twitter.

As this was a quick midday break project, my goal was to sift through “#elxn2018” tweets based on the past seven days (as of May 19, 2018). I decided to load up the twitteR package to grab all the information I needed from Twitter and to use R’s tidyverse & tidytext to remove some noise for a succinct word count. I did not focus on specific parties or platforms for this quick project, as my time allocation was only two hours.

DISCLAIMER: Data is as of May 19, 2018. You do need to create a twitter app and token in order to explore the twitteR package (see official CRAN document).

> library(ROAuth)
> library(twitteR)
> library(tidyverse)
> library(tidytext)

#secure your twitter 
consumer_key <- "xxxxx"
consumer_secret <- "xxxx"
access_token <- "xxxxx"
access_secret <- "xxxxx"

setup_twitter_oauth(consumer_key, 
                    consumer_secret, 
                    access_token=NULL, 
                    access_secret=NULL)

#search query
ontvote2018 <- searchTwitter("elxn2018",n=1000)

#convert to data frame
ontvote2018_df <- twListToDF(ontvote2018) 

#tokenization
ontvotetkn <- ontvote2018_df %>% 
              select(id, text) %>% 
              unnest_tokens(word,text) 

#custom stop words
stop_words <- stop_words %>%
 bind_rows(data.frame(word = c("https", "t.co", "rt", "amp","elxn2018","onpoli",
 "public","ontario","ontario's","2","7","june","day","don't","1","2018","election","party",
 "parties","9pm","00","30pm","saturday","brampton","ontario","don't")))

onvote1 <- ontvotetkn %>% 
           anti_join(stop_words)

#top 20 word count
onvote1 %>% 
  group_by(word) %>% 
  tally(sort=TRUE) %>% 
  slice(1:20) %>% 
  ggplot(aes(x = reorder(word, n, function(n) -n), y = n)) + 
    geom_bar(stat = "identity",fill="yellow1") + 
    theme(axis.text.x = element_text(angle = 60, hjust = 1)) + 
    xlab("") + ggtitle("Ontario Votes 2018 - Overall Top 20 words")

Screen Shot 2018-05-19 at 12.05.14 AM

The top two words (beyond the typical focus on candidates, debates, current party-in-power, challenging the status quo, etc.) were:

  1. healthcare
  2. oncall4on

With further examination, oncall4on is actually the twitter account for Your Ontario Doctors with the hashtag #CarenotCuts.

I decided to explore #CarenotCuts.

#twitter query
> carenotcuts <- searchTwitter("carenotcuts",n=1000)

#follow steps above till stop_words

#update stop words
> stop_words2 <- stop_words %>%
 bind_rows(data.frame(word = c("https", "t.co", "rt", "amp","elxn2018","onpoli",
 "public","ontario","ontario's","2","7","june","day","don't","1","2018","election","party",
 "parties","9pm","00","30pm","saturday","brampton","ontario","don't",
 "carenotcuts","candidate","candidates","debates","oncall4on","healthcare",
 "dockaurg","onhealth","nj6tgy7pcg","learn","onelxn","oma","mds")))

> carenotcuts1 <- carenotcutstkn %>% anti_join(stop_words2)

#top 20 word count
> carenotcuts1 %>% 
  group_by(word) %>% 
  tally(sort=TRUE) %>% 
  slice(1:15) %>% 
  ggplot(aes(x = reorder(word, n, function(n) -n), y = n)) + 
    geom_bar(stat = "identity",fill="yellow1") + 
    theme(axis.text.x = element_text(angle = 60, hjust = 1)) + 
    xlab("") + ggtitle("Ontario Votes 2018 - #1 Issue is Healthcare")

Screen Shot 2018-05-19 at 12.30.45 AM

These are the following key notations, beyond the actual words themselves:

  1. There is a definite rally cry with action verbs like #join and #fight.
  2. There is finger pointing at #kathleeen_wynn and #ontliberals and their healthcare policy.
  3. The situation is in #crisis and is #hurting #doctors, #nurses and #patient(s).

We read so much about how amazing healthcare is in Canada. This lunchtime project made me realize this may not be the case, especially in Ontario. I need, or rather we all need, to get on top of this and all other issues. Go on Twitter & explore Liberal’s Kathleen Wynn, Conservative’s Doug Ford, NDP’s Andrea Horwath and Ontario Green’s Mike Schreiner.

 

Ontario is Home. June 7, 2018. VOTE.

Storyteller of Content & Data Analytics | Media, Sponsorship & Content Specialist | Data For Change Advocate | Yoga & Mindfulness Enthusiast | Instagram Lifer #onebillionhappy NAMASTE

4 comments on “#elxn2018 in R: Ontario Tweeters’ Top Priority is Healthcare

  1. Patrick Kakou

    Fantastic insights.
    Thanks Terry

    Like

  2. Pingback: Text Mining and Sentiment Analysis with Canadian Underwriter Magazine Headlines – datacritics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: