Beginner Free Learning News SQL

[Review]: DataCamp’s Intro to SQL for Data Science

Intro to SQL for Data Science

Pre-requisite Knowledge Required: None
Function Summary: SELECT, FROM, WHERE, AGGREGATES, GROUP BY and ORDER BY
Interactivity: In-browser Interface
Content Types: Multiple-Choice Questions, Building Queries, Video Instruction

Quick Course Overview

Learning to query structured data is a hassle-free way for an amateur data scientist to begin discovering details within an organized dataset instead of wrestling with new functions and programs required for tidying semi-structured data.

Most structured data is composed in a relational database where the data is neatly sorted into tables with records and fields. These tables are efficient at mapping the data so it may be effectively returned from simple queries using SQL.

a hassle-free way for an amateur data scientist to begin discovering details within an organized dataset

Learning how to build an effective query is always in demand in the data science marketplace, and DataCamp’s Intro to SQL for Data Science is a great way to get started on a few of the basics. Like David Robinson tweeted, SQL is underrated in data science.

The SQL Environment

A table represents an entity, like EMPLOYEE, described by fields (top to bottom) and records (left to right). Databases have many tables with different information inside them that feed into other tables to create your database.

id name age nationality
1 Mike 47 Canada
2 Adelaide 24 UK
3 Maureen 38 Ireland

This entity PEOPLE has four fields and three records.

While SQL can alter these databases, this introduction course focuses on composing a well-manufactured data request from a database table, known as a query. Querying is an essential skill for a data scientist, since the data you need for your analyses will often live in databases.

Querying is an essential skill for a data scientist, since the data you need for your analyses will often live in databases.

By the end of the course you’ll compose a query to find a national film’s average budget and gross by country, where the titles are longer than ten characters. Seriously, that’s the final question. There’s no time left to waste; below is a glimpse at the course content which is all covered within the exercises.

pexels-photo-276452.jpeg

Selecting Columns Statements

SELECT: Use this statement to filter which columns you want to return in your result.

* returns all fields in a table

COUNT(field): returns how many total rows are in the table

DISTINCT: returns how many unique rows are in the table

FROM: This is where the fields you selected are located.

LIMIT: limits the number of results returned to preserve computing time; any numeric follows

SELECT *
 FROM people;

SELECT name, birthdate
 FROM people
LIMIT 10;

SELECT COUNT(DISTINCT birthdate)
 FROM people;

Filtering Rows

WHERE: Here you can filter text or numbers to meet a stated condition using the operators below:

  • = equal
  • <> not equal
  • < less than
  • > greater than
  • <= less than or equal to
  • >= greater than or equal to

AND: add another condition statement necessary to return a result

OR: add an optional  condition necessary to return a result

SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R')

Aggregate Functions

These are included in the SELECT statements.

SUM: find the total value of the records in a field

MAX/MIN: find the maximum or minimum value of the field

AVG: find the average value of the field

SELECT SUM(budget)
FROM films
WHERE release_year >= 2010;

Sorting, Grouping and Joins

ORDER BY: sort results in ascending (default) or descending order

DESC: modify the ORDER BY statement to descend results

SELECT name
FROM people
ORDER BY name DESC;

GROUP BY: used to filter the results to a column’s distinct values

HAVING: a condition statement for the GROUP BY clause similar to the WHERE statement, but specifically dealing with aggregates

SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(title) > 10;

My Course Review

When I was a data analytics student, this course presented its content in a gradual manner that made it digestible for me to identify each function’s purpose in the query-making process. I love DataCamp’s in-browser interface so there’s no tabbing out to my R environment, and the videos help break up the content style from written to verbal.

The one essential SQL query element this course lacks is the JOIN condition used when selecting fields from different tables, which it alludes to at the end of the course in the final section, “A Taste of Things to Come.”

I have repeatedly recommended this course to new students in a data science program, as it is an efficient way to learn the basics of SQL. Anyone should yield high returns from the time invested in this course, and upon taking Joining Data in PostgreSQL to learn joins, should be ready to scrape around any relational database with ease.

0 comments on “[Review]: DataCamp’s Intro to SQL for Data Science

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: