R-Function Pokemon and the Informal Formats of Formals
When writing R, I tend to use snake_case for object names. The bioconductor
project tends to use camelCase (limma::makeContrasts, biomaRt::useMart)
and a lot of base functions use dotted.case.
There are functions in R that use a few different formats for the function and argument names. For example,
scanhas both dotted and camelCase parameters (na.strings,allowEscapes),sapplyhas ALLUPPERCASE, DOTTED.UPPER.CASE and alllowercase parameters (FUN,USE.NAMES,simplify).
Makes you wonder a few things:
which function has the greatest diversity of parameter-name formats;
which function has the most parameters;
whither consistency?
The dev version of lintr has regexes for identifying some of the more common
formats (it uses rex to build regexes). I’ve modified these a bit so that
they’re more strict
library(rex)
# swiped from `lintr` and modified a bit to reduce overlaps between the styles:
loweralnum <- rex(one_of(lower, digit))
upperalnum <- rex(one_of(upper, digit))
style_regexes <- list(
"UpperCamelCase" = rex(start, upper, zero_or_more(alnum), lower, upper, zero_or_more(alnum), end),
"lowerCamelCase" = rex(start, lower, zero_or_more(alnum), lower, upper, zero_or_more(alnum), end),
"snake_case" = rex(start, one_or_more(loweralnum), one_or_more("_", one_or_more(loweralnum)), end),
"dotted.case" = rex(start, one_or_more(loweralnum), one_or_more(dot, one_or_more(loweralnum)), end),
"alllowercase" = rex(start, one_or_more(loweralnum), end),
"ALLUPPERCASE" = rex(start, one_or_more(upperalnum), end)
)
Can any function in base-R catch all these stylistic Pokemon?
How are we going to count the number of styles present for a single function?
# Which of the coding-name-styles are matched by the parameters
# of a given function?
matches <- function(fun) {
params <- names(formals(fun))
sapply(style_regexes, function(regex){
any(rex::re_matches(params, regex))
})
}
How are we going to get the names of all the functions in my attached packages?
# https://stackoverflow.com/questions/20535247/seeking-functions-in-a-package
# lsf.str("package:<pkg_name>") can be used to get all function names from the
# package `pkg_name` (this must have been loaded into R first)
# all the attached packages
packages <- c(
sessionInfo()$basePkgs,
names(sessionInfo()$otherPkgs)
)
# all the functions within those packages:
function_names <- unlist(
Map(
function(pkg) {
lsf.str(paste0("package:", pkg))
},
packages
)
)
names(function_names) <- NULL
That gives us 2294 functions to consider.
Which formats (columns) are present for each function (row)?
formats_by_function <- t(sapply(function_names, matches))
Which function has caught the most formats?
fun <- rownames(formats_by_function)[which.max(rowSums(formats_by_function))]
t(formats_by_function[fun, , drop = FALSE])
## fisher.test
## UpperCamelCase FALSE
## lowerCamelCase TRUE
## snake_case FALSE
## dotted.case TRUE
## alllowercase TRUE
## ALLUPPERCASE TRUE
args(fun)
function (x, y = NULL, workspace = 2e+05, hybrid = FALSE, hybridPars = c(expect = 5,
percent = 80, Emin = 1), control = list(), or = 1, alternative = "two.sided",
conf.int = TRUE, conf.level = 0.95, simulate.p.value = FALSE,
B = 2000)
NULL
Well done Fisher, but you didn’t catch them all!