Hannes Datta
make
cheatsheetLet's describe our tool stack…
ggplot2
tidyverse
& data preparation/engineering, setup + ITOToday, we learn how to automate projects.
target
in R)make
How to use automation tools?
make
/automation throughoutmake
is properly installedIf you can't run make
and Rscript
from your terminal, you won't be able to proceed.
Verify make
is properly installed
make
(enter). make: *** No targets specified
…then it is correctly installed!
Verify you can run Rscript
from the terminal/command prompt
Rscript
(followed by enter). Usage: Rscript [options] file [args]
…then you are all set!
Today's project is based on a prototype project using Airbnb data. Please download the source code now (run code below).
download.file('https://github.com/hannesdatta/course-dprep/raw/refs/heads/master/content/docs/modules/week5/tutorial/workflow.zip', 'workflow_in_class.zip')
unzip('workflow_in_class.zip')
run.txt
and check out the running instructions.download.R
)
Rscript download.R
. What happens?R --vanilla < download.R
. What difference does it make?We will automate each step in this research project
script_X.R
from the command line - does it run & build what is expected?make
“recipe” to automate step 2 [→ next slide]make -n
)makefile
and put it in the root of our project directorymakefile
consists of one or “recipes”
Example:
output.csv: input.csv script.R
Rscript script.R
In our own words…
input.csv
and script.R
have been updated latelyscript.R
, which builds output.csv
input.csv
or script.R
do not exist → find another recipe that instructs how to build theseWe now start automating the entire project.
makefile
(no extension, i.e., not makefile.txt
)makefile
and type the code below. listings.csv reviews.csv: download.R
R --vanilla < download.R
Important: Use a tab to indent the second line: R --vanilla
or Rscript
…! NO spaces.
make -n
and confirm with enter. What happens?make
and confirm with enter. What happens?script_X.R
from the command line - does it run & build what is expected?make
“recipe” to automate the next stepmake -n
)Repeat make workflow for the next step of the project.
When working with make
, it's handy to keep our project tidy (i.e., get right of “old” files automatically).
makefile
again and enter a next recipe (put below what was done earlier).wipe:
R -e "unlink('*.csv')"
Important: Use a tab to indent the second line! NO spaces.
make -n
(enter), and then make
(enter). What happens at every step?make wipe -n
(enter), and then make wipe
(enter). What happens at every step?script_X.R
from the command line - does it run & build what is expected?make
“recipe” to automate the next stepmake -n
)Repeat make workflow__ for the next step of the project.
QUESTION: What's our next move?!
Reflect:
clean.R
clean.R
+ also use file name of current script.R
files from the terminal.Put the new rule below what has been written earlier. Try to run: make -n
, make
. what happens?
Put the new rule above everything that has been written earlier. Try to run: make -n
, make
. what happens?
aggregated_df.csv: clean.R reviews.csv listings.csv
Rscript clean.R
listings.csv reviews.csv: download.R
R --vanilla < download.R
wipe:
R -e "unlink('*.csv')"
Reflect:
clean.R
clean.R
+ also use file name of current script.R
files from the terminal.Write the rule below all the code you've written in the makefile
earlier.
Then, run make: make -n
, make
.
data1
data1: fileA
run fileA
data2: fileB data1
run fileB
data2
: requires & builds data1
, then builds data2
.data2: fileB data1
run fileB
data1: fileA
run fileA
makefile
.first_rule: data2
data1: fileA
run fileA
data2: fileB data1
run fileB
Goal: automate plot_all.R
and plot_antwerp.R
Reflect:
clean.R
clean.R
+ also use file name of current script.R
files from the terminal.Please consider implementing a first rule
, similar to how I've just shown you (first_rule: ...
)
Run make: make -n
, make
at every iteration.
first_rule: plot_all.pdf plot_antwerp.pdf
plot_all.pdf: aggregated_df.csv plot_all.R
Rscript plot_all.R
plot_antwerp.pdf: pivot_table.csv plot_antwerp.R
Rscript plot_antwerp.R
aggregated_df.csv: clean.R reviews.csv listings.csv
Rscript clean.R
listings.csv reviews.csv: download.R
R --vanilla < download.R
wipe:
R -e "unlink('*.csv')"
Let us now change something in the source code of plot_antwerp.R
, e.g., change Universiteitsbuurt
to Stadspark to experience a
make workflow.
make -n
(preview what will have to be done)make
(to run the pipeline)make
again (verify everything up-to-date & makefile has been correctly specified)Last step really important and often forgotten in projects.
Do we need to update our cleaning rule to wipe “more” temporary files?
make wipe
. What happens? R -e
do?But… we have not cleanly separated src
files from generated output files. See also here for full description on how to name directories.
Do:
src
code directory and move source code files theregen
directory in an R file (preferably run early on).R
filesmake
still runs.Let's take a step back…