1 Environments as Code
1.1 Environments:
stack of software and hardware below our code
should be treated as “cattle not pet” / should be stateless
Risk of it “only works on my machine”
Building a completly reproducible environement is a “fool’s errand” but first step should be easy.
(any trouble with renv and sf anyone?)
1.2 Environements have layers
Layer | Contents | Example |
---|---|---|
Packages | R packages | cori.db |
System | R versions / GDAL / MacOS | 14.4.1 |
hardware | Physical / Virtual hardware | Apple M3 |
Hardware and System should be in the hand of IT (see later chapter 7 and 14), packages layer should be the data scientist.
1.3 The package layer
Package can in 3 places:
repository: CRAN / GH / “Supermarket”
library: a folder on a drive / “pantry”
loaded : “ready to cook”
Each project should have it’s own “pantry”
Project was higlighted in text but I think it is important: if you do not have a project workflow it is way harder to do it.
A package environement shouldbe :
isolated and cannot be disrupted (example updating a packge in an other project)
can be “captured” and “transported”
In R: {Renv}
(“light”/“not exactly the same” option also exist, Box, capsule)
Author does not like Conda (good to not being alone!)
1.4 Workflow
- Create a standalone directory with a virtual environment
(spend time exploring renv/
and .gitignore
)
Document environment state (see
lockfile
)Collaborate / deploy: you can’t share package because their binay can be OS or system specific, hence specific package need to be installed (could be a pain point).
Use virtual env
1.5 Under the hood
test
.libpaths()
in a specific project and in a “random” R sessionorder of Paths matter
1.6 Key points
being in production is what make a DS a software developper
kill and create new environment fast is important