Perspective R

Scraping & Plotting: Who Is the “Worst” Bachelor?

Data Scientists gather data to prove once and for all who the worst Bachelor was.

LeilaThis article was written in collaboration with Leila Taweel. Leila had posted a suggestion on my previous post, The Fall of The Simpsonswith an article premise I couldn’t help but laugh over: how bad was Arie—the most recent bachelor—compared to the others? On a personal level we both agree; he’s the worst. So, we modified a web-scraper used in the Simpsons article to work with some missing episode ratings and began examining the evidence.
Connect with Leila on LinkedIn
GitHub Repository for Project Code & Visuals


Hey all, Jake here! And today I am joined by Leila Taweel.

Hello everyone!

Backtrack and read more of the story about how we came to write this article in her introduction above.

Together, we are going to write an exciting data science article on the hit reality show, The Bachelor.

If you’ve watched the last few seasons or kept up with pop culture, you may have heard that Arie Luyundyk Jr. (the most recent bachelor) is being considered by many to be the “worst” of all time.

We’re here to look at the data and see if Arie upset the rest of Bachelor Nation as much as us. At the end, we’ll crown our pick for “worst” bachelor.

Already you can tell this is a little different format we’re trying. Let us know if it was cool in the comments!

Getting our Data

So we wanted to get the episode ratings from IMDB like Jake had done in his last post. One issue was that seasons 1-12 were missing a lot of episode ratings, so we decided to start our comparisons from season 13 and on.

There were still some missing ratings in those seasons too, so we broadened our html_node parameters and pulled a mess of other things along with the rating.

> imdbScrape <- function(x){
  page <- x
  name <- page %>% read_html() %>% html_nodes('#episodes_content strong a') %>% html_text() %>% as.data.frame()
  rating <- page %>% read_html() %>% html_nodes('.ipl-rating-widget') %>% html_text() %>% as.data.frame()
  details <- page %>% read_html() %>% html_nodes('.zero-z-index div') %>% html_text() %>% as.data.frame()
  
  chart <- cbind(name, rating, details)
  names(chart) <- c("Name", "Rating", "Details")
  chart <- as.tibble(chart)
  return(chart)
  Sys.sleep(5)
}

It got the job done. Here’s a look at what we had to work with:

scrape_output.PNG

You can see the rating value is at the beginning of the string. It really wasn’t too hard to pull out the rating with some regex:

> str_extract(bachelor$Rating, "[0-9].[0-9]")

There’s definitely more than one way to get the ratings out, but this did the trick.

I thought it was simple enough, and if there’s an alternative way for how you’d do it, share in the comments.

About The Bachelor

So to give those not familiar with The Bachelor franchise some background, let’s establish a few key episodes:

  • “Women Tell All” is a segment where all the eliminated women rejoin host Chris Harrison and dissect their relationship and all the house drama.
  • The finale is usually episode 11 and where the bachelor picks between the two remaining women. We say usually episode 11; since other seasons varied, we took this into account.
  • “After the Final Rose” is the final episode of the season where the couple comes back to talk about where they are in their relationship and if they’re still together.

Here’s a recap of the most recent season to get the gist of why there was such outrage.

TL;DW: Arie bombards Becca with cameras, calls off their engagement, and proposes to the runner-up.

Alright. Gloves are off. Let’s hit him with some ggplot-powered justice.

First we compare our bachelors based on their season rating average. Surely, people hated Arie as much as we did.

Bachelor Season Averages

I’m sorry, how on earth was Arie above average?

I’m as surprised as you are. I at least expected him to be below when you consider how things ended. But, I do remember Juan Pablo being a total meathead.

Makes sense, given that Juan Pablo was the only bachelor who didn’t propose in the finale.

Related image
Juan Pablo. The worst-rated bachelor in recent history.

Like I said, total meathead. Take a look at his track record:

Juan Pablo Chart

It’s pretty clear that he didn’t have a large fan base. He may have started to win people over in the first few episodes, but that quickly went downhill. Obviously, after finding out he didn’t propose, his “After the Final Rose” episode earned the lowest rating in recent bachelor history.

Not proposing could actually be his saving grace from the worst finale. Everyone was probably relieved he ended up alone.

Arie Chart

We know Arie upset the viewers—and definitely one particular woman—when he overturned his decision. But I don’t think we can statistically call him “the worst.” Not with this data anyway.

I still think there might be some merit to believing Arie was the worst.

I think you might be experiencing something called recency bias.

Funny, but no. Arie may not have had the lowest-rated season, but I think I can prove he did have the most disappointing season with my good friend the linear model (y ~ x):

Linear Comparison

And here I was thinking Arie got away unscathed by the data.

I mean, Juan Pablo certainly had the lower average, but the data shows Arie had the steepest decline in ratings as the season progressed among all the bachelors included in the analysis. I think that qualifies as disappointing, wouldn’t you?

Oh yes, I can agree with that.

the-bachelor-arie-final-rose.png
By our interpretation of the data, Arie disappointed fans the most, despite that meathead Juan Pablo.

Picking “The Worst” from the Bad

I may or may not have some personal bias, so I’ll stick to the facts.

Arie began the season with a lot of momentum and an introductory rating of 7.8—the highest rating the show has ever received. After such a strong start, the drop to an all-time low rating of 4.1 at the finale is evidence that he lost the support of his fanbase.

Leila’s Pick: Arie

Arie hurt the fans more than any bachelor in the past ten seasons. I think that carries more water than Juan Pablo’s underwhelming performance.

In fact, I am going to dub Juan Pablo “the best bachelor” since he started and left the show maintaining his bachelor status.

Jake’s Pick: Arie

That’s all, thank you for reading!

 

Get the complete GitHub project with R code and graphs here: The Bachelor Repository.

If you have ideas for some TV show analysis and want to collaborate with a DataCritics member, send an email to team@datacritics.com with your idea!

 

 

0 comments on “Scraping & Plotting: Who Is the “Worst” Bachelor?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: