Function naming for R packages

I am writing an R package, and I would really like to avoid using the function names found in other packages. For example, I planned to call the annotate function, but it was already used in the NLP package. Obviously, it is better to avoid obvious variations of the name, but is there a systematic way to look for an exhaustive list of published CRAN function names to avoid duplication? I acknowledge that this is primarily important for common CRAN packages, but it can also be relevant when sharing locally only in the event of a conflict with another downloaded package.

+5
source share
1 answer

Name mappings occur when two packages are loaded that contain functions with the same name. Thus, name collisions can be avoided in two places:

  • when defining function names in a package
  • when calling functions from a package

Creating Functions with Unique Names

At the time of writing (August 23, 2017) an incredible number of 11272 packages were available on CRAN (the last digit can be found here ), and new packages are added every day .

Thus, creating function names that are unique today can lead to name conflicts in the future when other packages are added.

Alistaire already mentioned the prefix option for all of your features. Besides stringi and stringr , stringr packages are another example that uses the fct_ and lvls_ .

This approach can significantly reduce the chance of name collisions.

(Although this did not guarantee that no other package supporter could choose the same prefix.)

Calling Functions Uniquely Using a Double Push Operator

IMHO, the ultimate responsibility for preventing name conflicts is user-defined.

I saw here questions about SO with over half a dozen downloadable packages. Or, library(tidyverse) is called for convenience, which loads 19 other packages, where dplyr and tidyr would be enough.

Entangling a namespace with many packages loaded increases the risk of name conflicts. And even when downloading only two packages, name collisions can occur. For example, the lubridate package and data.table both defined

 hour, isoweek, mday, minute, month, quarter, second, wday, week, yday, year 

Which function is called will depend on the order in which packages are downloaded. (You can use conflicts() to search for objects that exist with the same name in two or more places on the search path.)

To avoid ambiguities and unexpected results, I suggest downloading as few packages as possible and using the double colon operator ?"::" to call functions from packages without first loading the package, for example,

 library(data.table) DT <- data.table(t = lubridate::now() + 0:3) # call function from loaded package data.table DT[, second(t)] 
 [1] 18 19 20 21 
 # call function from lubridate package DT[, lubridate::second(t)] 
 [1] 18.88337 19.88337 20.88337 21.88337 

There is another advantage to using the double colon operator. It will serve as documentation in the code from which the package calls the function.

This happens with a few extra keystrokes, but it can save a lot of time when the code is checked, corrected, or debugged weeks or years later. I saw a lot of questions about SO where the OP did not mention the package.

+4
source

Source: https://habr.com/ru/post/1271110/


All Articles