Code Analysis in R

You’ve been analysing data all day, now let’s analyse your code …

Within it’s programming toolkit, R has some really cool things for analysing code and for identifying / fixing issues in your code.

What kinds of code-level stuff (eg, software-design/architectural properties, code smells) might you want to be aware of when developing packages or writing analysis scripts? And what tools are available to do this?

  • Dependencies

    • between packages (pkgnet)

    • between functions within a package or script (pkgnet / CodeDepends, / codetools)

  • Complexity

    • Labyrinthine functions (cyclocomp)

    • Elongated functions

  • Style

    • Code formatting (lintr / styler / formatR)

    • Refactorings: alternative (eg, safer / more idiomatic) language constructs with similar behaviour (goodpractice does a bit of this; but R doesn’t have much refactoring support)

  • Test quality

    • Are all your functions tested? (covr)

    • Are your tests good enough? (See mutate / hedgehog)

  • Performance

    • microbenchmark, profvis etc
  • Documentation / Installability / Loadability

    • R CMD check / goodpractice (for packages at least, running R CMD check and goodpractice should be essential)

… and being aware of these things can help you restructure your code, so that it’s more easily used, maintained and extended.

Code analysis splits neatly into two camps, dynamic- and static-analysis, loosely depending on whether your code is ran during the analysis. For example, you could determine whether all the variable names in a script conform to some style guide without having to run that script (static); but you’d need to run a script to work out which function calls have the largest appetite for processor time or memory (dynamic).

In the following posts, I’m going to explain how to use these tools (and a couple of tools that are less well developed) on your analysis scripts or packages.

In the face of these automatic code-analysis tools, it’s important to remember that there’s no real substitute for getting a human to code-review your stuff, because there isn’t any automated way to address the most important thing: code readability.

Resources

“Nice R Code” has a blog post on function lengths in R packages and also a post about why nice code is something to strive for.

The Software Sustainability Institute also has a webpage about source code readability.

Maelle Salmon from rOpenSci has a nice blog post that covers some tools that help during package development.

For more on “code smells” see Martin Fowler’s “Refactoring”, Robert C. Martin’s “Clean Code” or Jenny Bryan’s useR!2018 talk.

Some code smells that are particularly relevant here pave the many rings of the R inferno.

There’s a list of the different aspects of software architecture here.

Data Analysis Project Architecture `lintr`