Pre-requisite Knowledge Required: None
Function Summary: SELECT, FROM, WHERE, AGGREGATES, GROUP BY and ORDER BY
Interactivity: In-browser Interface
Content Types: Multiple-Choice Questions, Building Queries, Video Instruction
Quick Course Overview
Learning to query structured data is a hassle-free way for an amateur data scientist to begin discovering details within an organized dataset instead of wrestling with new functions and programs required for tidying semi-structured data.
Most structured data is composed in a relational database where the data is neatly sorted into tables with records and fields. These tables are efficient at mapping the data so it may be effectively returned from simple queries using SQL.
a hassle-free way for an amateur data scientist to begin discovering details within an organized dataset
Learning how to build an effective query is always in demand in the data science marketplace, and DataCamp’s Intro to SQL for Data Science is a great way to get started on a few of the basics. Like David Robinson tweeted, SQL is underrated in data science.
Underrated tech: SQL
Underrated model: extensions to linear (glm, gam, LASSO, etc)
Underrated skill: writing/communication
— David Robinson (@drob) January 29, 2018
The SQL Environment
A table represents an entity, like EMPLOYEE, described by fields (top to bottom) and records (left to right). Databases have many tables with different information inside them that feed into other tables to create your database.
This entity PEOPLE has four fields and three records.
While SQL can alter these databases, this introduction course focuses on composing a well-manufactured data request from a database table, known as a query. Querying is an essential skill for a data scientist, since the data you need for your analyses will often live in databases.
Querying is an essential skill for a data scientist, since the data you need for your analyses will often live in databases.
By the end of the course you’ll compose a query to find a national film’s average budget and gross by country, where the titles are longer than ten characters. Seriously, that’s the final question. There’s no time left to waste; below is a glimpse at the course content which is all covered within the exercises.
Selecting Columns Statements
SELECT: Use this statement to filter which columns you want to return in your result.
* returns all fields in a table
COUNT(field): returns how many total rows are in the table
DISTINCT: returns how many unique rows are in the table
FROM: This is where the fields you selected are located.
LIMIT: limits the number of results returned to preserve computing time; any numeric follows
SELECT * FROM people; SELECT name, birthdate FROM people LIMIT 10; SELECT COUNT(DISTINCT birthdate) FROM people;
WHERE: Here you can filter text or numbers to meet a stated condition using the operators below:
<=less than or equal to
>=greater than or equal to
AND: add another condition statement necessary to return a result
OR: add an optional condition necessary to return a result
SELECT title FROM films WHERE (release_year = 1994 OR release_year = 1995) AND (certification = 'PG' OR certification = 'R')
These are included in the SELECT statements.
SUM: find the total value of the records in a field
MAX/MIN: find the maximum or minimum value of the field
AVG: find the average value of the field
SELECT SUM(budget) FROM films WHERE release_year >= 2010;
Sorting, Grouping and Joins
ORDER BY: sort results in ascending (default) or descending order
DESC: modify the ORDER BY statement to descend results
SELECT name FROM people ORDER BY name DESC;
GROUP BY: used to filter the results to a column’s distinct values
HAVING: a condition statement for the GROUP BY clause similar to the WHERE statement, but specifically dealing with aggregates
SELECT release_year FROM films GROUP BY release_year HAVING COUNT(title) > 10;
My Course Review
When I was a data analytics student, this course presented its content in a gradual manner that made it digestible for me to identify each function’s purpose in the query-making process. I love DataCamp’s in-browser interface so there’s no tabbing out to my R environment, and the videos help break up the content style from written to verbal.
The one essential SQL query element this course lacks is the JOIN condition used when selecting fields from different tables, which it alludes to at the end of the course in the final section, “A Taste of Things to Come.”
I have repeatedly recommended this course to new students in a data science program, as it is an efficient way to learn the basics of SQL. Anyone should yield high returns from the time invested in this course, and upon taking Joining Data in PostgreSQL to learn joins, should be ready to scrape around any relational database with ease.