R-function Pokemon

R-Function Pokemon and the Informal Formats of Formals

When writing R, I tend to use snake_case for object names. The bioconductor project tends to use camelCase (limma::makeContrasts, biomaRt::useMart) and a lot of base functions use dotted.case.

There are functions in R that use a few different formats for the function and argument names. For example,

  • scan has both dotted and camelCase parameters (na.strings, allowEscapes),

  • sapply has ALLUPPERCASE, DOTTED.UPPER.CASE and alllowercase parameters (FUN, USE.NAMES, simplify).

Makes you wonder a few things:

  • which function has the greatest diversity of parameter-name formats;

  • which function has the most parameters;

  • whither consistency?

The dev version of lintr has regexes for identifying some of the more common formats (it uses rex to build regexes). I’ve modified these a bit so that they’re more strict

library(rex)

# swiped from `lintr` and modified a bit to reduce overlaps between the styles:
loweralnum <- rex(one_of(lower, digit))
upperalnum <- rex(one_of(upper, digit))

style_regexes <- list(
  "UpperCamelCase" = rex(start, upper, zero_or_more(alnum), lower, upper, zero_or_more(alnum), end),
  "lowerCamelCase" = rex(start, lower, zero_or_more(alnum), lower, upper, zero_or_more(alnum), end),
  "snake_case" = rex(start, one_or_more(loweralnum), one_or_more("_", one_or_more(loweralnum)), end),
  "dotted.case" = rex(start, one_or_more(loweralnum), one_or_more(dot, one_or_more(loweralnum)), end),
  "alllowercase" = rex(start, one_or_more(loweralnum), end),
  "ALLUPPERCASE" = rex(start, one_or_more(upperalnum), end)
)

Can any function in base-R catch all these stylistic Pokemon?

How are we going to count the number of styles present for a single function?

# Which of the coding-name-styles are matched by the parameters
# of a given function?
matches <- function(fun) {
  params <- names(formals(fun))
  sapply(style_regexes, function(regex){
    any(rex::re_matches(params, regex))
    })
}

How are we going to get the names of all the functions in my attached packages?

# https://stackoverflow.com/questions/20535247/seeking-functions-in-a-package

# lsf.str("package:<pkg_name>") can be used to get all function names from the
# package `pkg_name` (this must have been loaded into R first)

# all the attached packages
packages <- c(
  sessionInfo()$basePkgs,
  names(sessionInfo()$otherPkgs)
)

# all the functions within those packages:
function_names <- unlist(
  Map(
    function(pkg) {
      lsf.str(paste0("package:", pkg))
    },
    packages
  )
)

names(function_names) <- NULL

That gives us 2294 functions to consider.

Which formats (columns) are present for each function (row)?

formats_by_function <- t(sapply(function_names, matches))

Which function has caught the most formats?

fun <- rownames(formats_by_function)[which.max(rowSums(formats_by_function))]

t(formats_by_function[fun, , drop = FALSE])
##                fisher.test
## UpperCamelCase       FALSE
## lowerCamelCase        TRUE
## snake_case           FALSE
## dotted.case           TRUE
## alllowercase          TRUE
## ALLUPPERCASE          TRUE
args(fun)
function (x, y = NULL, workspace = 2e+05, hybrid = FALSE, hybridPars = c(expect = 5, 
    percent = 80, Emin = 1), control = list(), or = 1, alternative = "two.sided", 
    conf.int = TRUE, conf.level = 0.95, simulate.p.value = FALSE, 
    B = 2000) 
NULL

Well done Fisher, but you didn’t catch them all!

`lintr` `dupree`