Project Oriented Workflows

Working in R and RStudio

Athanasia Mo Mowinckel

Adopt a project-oriented workflow


Why

  • work on more than 1 thing at a time

  • collaborate, communicate, distribute

  • start and stop

How

  • dedicated directory

  • RStudio Project

  • Git repo, probably syncing to a remote

Project workflows

  • All necessary files contained in the project and referenced relatively

  • All necessary outputs are created by code in the project

  • All code can be run in fresh sessions and produce the same output

  • Does not force other users to alter their own work setup

If the top of your script is


setwd("C:\Users\jenny\path\that\only\I\have")
rm(list = ls())


Jenny will come into your your office and SET YOUR COMPUTER ON FIRE đŸ”„.


Project-oriented workflow designs this away. 🙌

Which persist after rm(list = ls())?

Option Persists?
A. library(dplyr)
B. summary <- head
C. options(stringsAsFactors = FALSE)
D. Sys.setenv(LANGUAGE = “fr”)
E. x <- 1:5
F. attach(iris)
03:00

Which persist after rm(list = ls())?

Option Persists?
A. library(dplyr)
B. summary <- head
C. options(stringsAsFactors = FALSE)
D. Sys.setenv(LANGUAGE = “fr”)
E. x <- 1:5
F. attach(iris)

What does it mean to be an RStudio Project?


RStudio leaves notes to itself in foo.Rproj


Open Project = dedicated instance of RStudio

  • dedicated R process

  • file browser pointed at Project directory

  • working directory set to Project directory

Many projects open


Use a “blank slate”


usethis::use_blank_slate()


OR


Tools -> Global Options

Restart R often


Session -> Restart R

Windows

  • Ctrl + Shift + F10

Mac

  • Cmd + Shift + 0

  • Cmd + Shift + F10

Project initiation: the local case

  1. New folder + make it an RStudio Project
  • usethis::create_project("~/i_am_new")

  • File -> New Project -> New Directory -> New Project

  1. Make existing folder into an RStudio Project
  • usethis::create_project("~/i_exist")

  • File -> New Project -> Existing Directory

Try option 2 now for wtf-explore-libraries.

05:00

Safe paths

On reproducibility of code


A large-scale study on research code quality and execution.
Trisovic, A., Lau, M.K., Pasquier, T. et al. 
Sci Data 9, 60 (2022).

Do you know where
your files are?

Practice “safe paths”


relative to a stable base


use file system functions

    not paste(), strsplit(), etc.

Packages with file system functions


install.packages("fs")

fs = file path handling


install.packages("here")

here = project-relative paths

Examples of a stable base

Project directory

here::here("data", "raw-data.csv")
here::here("data/raw-data.csv")

Automatically complete paths with Tab.

User’s home directory

file.path("~", ...)
fs::path_home(...)

Official location for installed software

library(thingy)
system.file(..., package = "thingy")

See example in gapminder readme.

Absolute paths

I have nothing against absolute paths.

Some of my best friends are absolute paths!

But don’t hard-wire them into your scripts.

Instead, form at runtime relative to a stable base

> (BAD <- "/Users/shannon/tmp/test.csv")
[1] "/Users/shannon/tmp/test.csv"

> (GOOD <- fs::path_home("tmp/test.csv")
[1] "/Users/shannon/tmp/test.csv"

Practice safe paths

  • Use the here package to build paths inside a project.

  • Leave working directory at top-level at all times, during development.

  • Absolute paths are formed at runtime.

here example

ggsave(here::here("figs", "built-barchart.png"))
  • Works on my machine, works on yours!

  • Works even if working directory is in a sub-folder.

  • Works for RStudio Projects, Git repos, R packages, etc.

  • Works with knitr / rmarkdown.

here::here()

The here package is designed to work inside a project, where that could mean:

  • RStudio Project

  • Git repo

  • R package

  • Folder with a file named .here

here::here() does not create directories; that’s your job.

Kinds of paths

Absolute path.

dat <- read.csv("C:/Users/pileggis/Documents/wtf-fix-paths/data/installed-packages.csv")


Relative path to working directory, established by the RStudio Project.

dat <- read.csv("data/installed-packages.csv")


Relative path within the RStudio Project directory.

dat <- read.csv(here::here("data/installed-packages.csv"))

Your turn


Practice calling here::here() in a project
to get a feel for it.


library(usethis)
# saves project on desktop by default for most users
use_course("rstats-wtf/wtf-fix-paths")
# use_course("rstats-wtf/wtf-fix-paths", destdir = "my/new/location")
# can alternatively download from 
# https://github.com/rstats-wtf/wtf-fix-paths


Read the README.md to get started.

15:00

What if my data can’t live in my project directory?

  1. Are you sure it can’t?

  2. Review the Good Enough Practices paper for tips.

  3. Create a symbolic link to access the data. (fs::link_create(), fs::link_path())

  4. Put the data in an R package.

  5. Use pins.

  6. Explore other data warehousing options.

RStudio Community threads: