Download data programmatically

Overview

Download a file from a URL and store it on your local machine. That way, it’s super easy for others to run your workflow (e.g., team members), or to refresh the data once it’s been udpated. All you need to do is rerun your code - that’s it!

Code

Here’s an example of how to download data from within R.

download_data <- function(url, filename, filepath) {
  # create directory
  dir.create(filepath)
  # download file
  download.file(url = url, destfile = paste0(filepath, filename))
}

download_data(url = "http://data.insideairbnb.com/the-netherlands/north-holland/amsterdam/2020-12-12/visualisations/reviews.csv", filename = "airbnb_listings.csv", filepath = "data/")

Advanced Use Cases

Downloading data from Dropbox or Google Drive

You can also use the code snippet above to download data directly from your personal Dropbox or Google Drive.

Just generate a download link for your file ( see here for Dropbox, and here for Google Drive – share a link to the file).

All you need to do is to put your link in the code snippet above.

Running the download code from the terminal

If you want to download data to work on it in a data pipeline, it’s useful to include the download snippet in a source file (e.g., download.R). You can then save the script, and run it from the terminal (e.g., as part of a make workflow).

In your command line/terminal, you can enter:

R --vanilla < download.R

Download data to different directories

Keep in mind that the filepath is dependent on the location from where your R script is called. The use of absolute directory names (e.g., c:/research/project) should be avoided so that the code remains portable to other computers and work environments.

Open (rather than download) data

The code snippet above just downloads the data from the web, but does not yet open it in R. If the target data is in tabular format (i.e., has rows and columns), you could directly load it into R using the read.table function.

airbnb <- read.table("http://data.insideairbnb.com/the-netherlands/north-holland/amsterdam/2020-12-12/visualisations/reviews.csv", sep = ',', header = TRUE)