This is the CORI/RISI MDA team style guide.
This is an evolving set of best practices for R programming on the MDA team. A template GitHub repo is available for new projects and provides suggestions for base structure and minimal documentation.
Naming Conventions
- Variable and table names, both in code and on the database, should always be in snake_case, i.e. all lower case with underscore separation of words.
- Github repositories dedicated to a new project should begin with the prefix
proj-
. Similarly, repositories dedicated to ETL of a specific data set should begin with the prefixdata-
. All repositories should have clear, descriptive names. - Functions are verbs. Whenever possible use a descriptive verb or action phrase to name your function (e.g.
add_column()
, rather thancolumn_adder()
). - Scripts that need to be run in a specific order should be prefixed with two digits and and an underscore, e.g.
01_
,02_
…10_
. Scripts numbered with a single digit will be displayed out of order in most file explorers when there are 10 or more scripts.
Code
- Always use
library()
to load packages.require()
is used in a lot of MDA legacy code, and should be replaced withlibrary()
when it is encountered. - Neither
library()
norrequire()
should ever appear inside of functions.- State package dependencies in your scripts explicitly at the top level using
library()
.
- State package dependencies in your scripts explicitly at the top level using
- When creating functions that call tidyverse functions, use the embracing operator, e.g.
{ var }
, instead of!!as.name(var)
or similar.- Variables inside the embracing operator should always be padded with a space on either side.
- There is no such thing as over-commented code. To the extent possible, a script should be self-documenting. Be kind to your reviewers (and six-months-from-now you) and write the comments even when you’re in a hurry.
- Avoid
while
loops and nested loops of any kind.- While both have applications, they are rare and there are almost always cleaner solutions that are easier to understand and reason about in the long run.
- Always use a project-oriented workflow.
- A great deal of legacy MDA code prefixes all variables with
x.
. This practice should be avoided moving forward. - Separate tasks, such as downloading a source file and then processing that data into a usable table, should be separated into numbered scripts.
- Whitespace makes code more readable. Always pad assignment operators and equals signs (including within function calls!) with a single space on both sides. Separate commands with one or more new lines.
- The assignment operator (
<-
) is preferred to=
for object creation, consistent with the tidyverse style guide. - When in doubt about good formatting, defer to the tidyverse style guide
Tools
- If using RStudio, ensure that under
Tools >> Global Options
the option ‘Restore .RData into workspace at startup’ is not checked and the option ‘Save workspace to .RData on exit’ is set toNever
. Using .RData to save presets and variables trades a minor convenience for major code portability headaches. Always start from a fresh session.
The Zen of Functional Programming (The Zen of Python +)
- Beautiful is better than ugly.
- Explicit is better than implicit.
- Simple is better than complex.
- Complex is better than complicated.
- Flat is better than nested.
- Sparse is better than dense.
- Readability counts.
- Special cases aren’t special enough to break the rules.
- Although practicality beats purity.
- Errors should never pass silently.
- Unless explicitly silenced.
- In the face of ambiguity, refuse the temptation to guess.
- There should be one– and preferably only one –obvious way to do it.
- If you find the one obvious way to do it, document it here for the sake of your coworkers and yourself in six months.
- Now is better than never.
- Although never is often better than right now.
- If the implementation is hard to explain, it’s a bad idea.
- If the implementation is easy to explain, it may be a good idea.
- Copy and paste is the root of 42% of all evil.
- If you would copy and paste code a third time, instead write a function.
- If you would copy and paste a function for the third time in a script, instead use a functional.
- Although a loop is better than nothing.
- If you would copy a function into a new script even once, document it and add it to the coriverse.