Beginner Free Learning R

Importing Data to R: The First Step Towards Your Data Science Project

The aim of this article is to provide you with a quick look-up guide for your first step towards a data science project.
importing-data-into-r

Before importing data, a data scientist needs to identify the relevant sources of data required for the problem at hand. Data Collection and data management are the foundation stones for the success of any data-related project. Every enterprise has a dedicated data management team that continuously strives to identify different sources of data, and extracts, transforms and loads the data (a.k.a. ETL) to a central repository called a data warehouse.

Screen Shot 2018-02-01 at 12.47.17 AM
Image Source: Segue Technologies

This topic is enormous and, hence, out of scope for this article, but in my opinion it is a very important concept to be understood by any aspiring data scientist.

Once you have identified your source of data, it can be imported to R for further analysis. There are multiple functions in R specific to your data file type (e.g., CSV, TXT, HTML, XLSX, etc.)

Importing TXT/CSV Files to R

Using Base Functions

The table below summarizes the base functions (i.e., no additional package installation required) for importing your data to R, based on file format.

Screen Shot 2018-02-01 at 12.33.19 AM

Each of the functions in the table above comes with a set of default arguments which makes them different from the others. These arguments are:

  • header: logical value. If TRUE, the function assumes your file has a header row. If that’s not the case, you can add the argument header = FALSE.
  • fill: logical value. If TRUE, rows having unequal length will be added with blank fields implicitly.
  • sep: the field separator character. For example, “\t” is used for a tab-delimited file.
  • dec: the character used in the file for decimal points.
  • stringsAsFactor is another important argument and should be set to FALSE if you don’t want your text data to be converted to factors.

Watch Out! If you don’t explicitly set the above arguments, the function will assume the default argument values.

Within each of the above functions, you also need to specify either the file name (if it is on your local machine) or the URL (if the file is located on the web).

Screen Shot 2018-02-01 at 1.08.27 AM

Screen Shot 2018-02-01 at 1.14.42 AM.png

Reading a Local File

To locate a file on your machine, you can follow one of these approaches:

  • Set your working directory to point to the folder containing your file with the command setwd (“”) and then provide the file name in the function.
  • Use file.choose() within the import function. This lets you interactively choose your file from your machine.

Screen Shot 2018-02-01 at 1.29.53 AM

Tip: read.table() is a general function that can be used to read any file in table format provided you set the arguments as per your requirements. The data will be imported as a data frame. For example, if you have a text file with data fields separated by “|” you can use the command below:

Screen Shot 2018-02-01 at 1.18.15 AM.png

Using the readr Package

The functions in this package are used in a similar way as the base functions. The readr package is much faster (over 10 times) than the base functions and, hence, very useful with large TXT or CSV files.

Screen Shot 2018-02-01 at 11.02.56 PM.png

  • delim: the character that separates values in the data file.
  • col_names: can be either TRUE (default value), FALSE or a character vector specifying column names. If TRUE, the first row of the input will be used as the column names.

Similar to base functions, within each of the above functions you also need to specify either the file name (if it is on your local machine, or use file.choose()) or the URL (if the file is located on the web).

Screen Shot 2018-02-01 at 11.06.14 PM

Screen Shot 2018-02-01 at 11.06.22 PM

Summary

In this article, we got to know different ways to import TXT/CSV files to R depending on the data volume, file location and data separators. For base functions, no additional package needs to be installed whereas for advanced functions, you first need to install the package (e.g., readr in our case) and then call the library to use the functions.

In the next article we shall summarize the functions for importing other file types to R. Cheers!

I come from Business & Technology background and have rich global experience in solving clients' Business & Data problems through IT & Analytics solutions. I love programming (in R, SQL and Python), painting and interacting with people. Connect with me on Linkedin: https://www.linkedin.com/in/gupta-sakshi/

1 comment on “Importing Data to R: The First Step Towards Your Data Science Project

  1. Pingback: Rolling up the Sleeves on My First Data Project

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: