1  Environments as Code

1.1 Environments:

  • stack of software and hardware below our code

  • should be treated as “cattle not pet” / should be stateless

  • Risk of it “only works on my machine”

Building a completly reproducible environement is a “fool’s errand” but first step should be easy.

(any trouble with renv and sf anyone?)

1.2 Environements have layers

Layer Contents Example
Packages R packages cori.db
System R versions / GDAL / MacOS 14.4.1
hardware Physical / Virtual hardware Apple M3

Hardware and System should be in the hand of IT (see later chapter 7 and 14), packages layer should be the data scientist.

1.3 The package layer

Package can in 3 places:

  • repository: CRAN / GH / “Supermarket”

  • library: a folder on a drive / “pantry”

  • loaded : “ready to cook”

Each project should have it’s own “pantry”

Project was higlighted in text but I think it is important: if you do not have a project workflow it is way harder to do it.

A package environement shouldbe :

  • isolated and cannot be disrupted (example updating a packge in an other project)

  • can be “captured” and “transported”

In R: {Renv} (“light”/“not exactly the same” option also exist, Box, capsule)

Author does not like Conda (good to not being alone!)

1.4 Workflow

  • Create a standalone directory with a virtual environment

(spend time exploring renv/ and .gitignore)

  • Document environment state (see lockfile)

  • Collaborate / deploy: you can’t share package because their binay can be OS or system specific, hence specific package need to be installed (could be a pain point).

  • Use virtual env

1.5 Under the hood

  • test .libpaths() in a specific project and in a “random” R session

  • order of Paths matter

1.6 Key points

  • being in production is what make a DS a software developper

  • kill and create new environment fast is important