A Quickstart Guide for Building Your First R Package

Post Provided By DR IAIN STOTT

Iain is a Postdoctoral Researcher at the Max Planck Institute for Demographic Research and the MaxO Center at the University of Southern Denmark. He is currently working as a part of MaxNetAging, a Research Network on Aging. Iain was one of the presenters at the UK half of the Methods in Ecology and Evolution 5th Anniversary Symposium in April. You can watch his talk, ‘Methods Put to Good Use: Advances in population ecology through studies of transient demography’ here.

If you’re anything like me, you might experience a minor existential crisis weekly. As scientists we question the world around us and, for me, this questioning turns all-too-often inwards to my career. I don’t think that’s unusual: ask any scientist about their ‘Plan B’, and the extent to which it’s thought through is often astonishing (if a café-cum-cocktail bar ever opens in Glasgow’s West End, which specialises in drinks that employ spice blends from around the world and are named after old spice trade routes and trading vessels, then you know I’ve jumped the science ship).

Contributing open-source software is something which has made my work feel a bit more relevant and helped me feel a bit less of an imposter. I’ll explain why that is, give some tips to beginners for building a first R package, and hopefully persuade other (especially early-career) researchers to do the same.

My Experience

In 2012, I published an applications paper in Methods in Ecology and Evolution, introducing an R package called popdemo. The package works with matrix population models to forecast POPulation-level DEMOgraphy and is specifically designed to measure transient dynamics of non-stable populations. If you’re interested as to why that’s important, please take a look at my Methods 5th Anniversary Symposium talk.

Popdemo consists of a bunch of R scripts I used in my PhD research. I decided it would be cool for the code to be freely available and wanted to learn how to build an R package. However, I felt it was probably of limited use: many of my peers could write it themselves and the methods do very specific things (although that’s probably the nature of PhD research generally!). I was pleased to find popdemo was useful to some people though: it has been used to inform conservation of endemic island orchids, assess impacts of land use change on grassland bioindicator species and uncover relationships between mammal life histories and their demographic responses to disturbance. Some people have been nice enough to recommend it for comparative population biology. At least one person found it useful for their own PhD research.

My software won’t change the world much, but it makes me happy if it makes someone’s life easier, helps conserve biodiversity, or contributes indirectly to scientific knowledge. So, if you’re in a similar situation…

Building Your First R Package: A Quickstart Guide

This is a very concise tutorial covering the basic steps required to build an R package using devtools (I didn’t actually use devtools to develop popdemo, but it seems a better option now). I’ve included a few  links to other places for extra information. To start with: here’s an excellent comprehensive guide by Hadley Wickham to using devtools and here is an exhaustive guide to developing R.

The steps we will take are:

  1. Install tools
  2. Create a package skeleton
  3. Add/edit .R files
  4. Add/edit roxygen2 blocks in .R files
  5. Build .Rd files and NAMESPACE file
  6. Edit DESCRIPTION file
  7. Check the package
  8. Repeat 3-7 as necessary
  9. Build package
  10. Distribute (if you want to)

Before beginning, you should be familiar with coding in R and have some R functions you want to include in your package.  Also, it’s always best to use the most recent version of R.

  1. Install tools

First, it’s best to install some extra development tools (these are not R packages)

  • Windows: RTools (choose to allow the environment variables to be changed)
  • Mac: Xcode
  • Linux (Ubuntu/Debian only): r-base-dev package

Second, install a LaTeX compiler (also not an R package)

  • Windows: MiKTeX (choose ‘Yes’ to install missing packages on-the-fly and run the updater after installation)
  • Mac: MacTeX
  • Linux (Ubuntu/Debian only): texlive-full package

Third, install the devtools package (this is an R package), using install.packages("devtools") from within R.

All of the above development tools can be accessed and installed for free.

  1. Create package skeleton

When developing a package it’s called the ‘source’. Every source package has the same basic ‘skeleton’ structure, as illustrated below. This should include the following at least:

  • DESCRIPTION file
  • NAMESPACE file
  • ‘R’ folder (Actually you don’t HAVE to have the R folder if your package only contains data, but that’s fairly unusual)
  • ‘man’ folder

To create the skeleton structure for your package, use:

devtools::create("path/to/package/pkgname")

If you’re using RStudio, there is a GUI option.

Stott 1

Source package structure. Solid objects are mandatory, dotted objects are common optional extras.

  1. Add/edit .R files

The ‘R’ folder is where your functions go. They must be .R files (as used by default in R and RStudio for scripts), and you may have multiple functions in one file if you wish. Copy and paste the script files into the R folder (or if you use RStudio’s GUI to create the package, you are given the option to include these files upon creation). Try to forsee what might go wrong using the functions and write in errors using stop() and warnings using warning().

  1. Add/edit roxygen2 blocks

The ‘man’ folder is where the documentation (‘MANuals’) for your files go. They must be .Rd (R documentation) files. Rd files are a bit LaTeX-like (which is why you need a LaTeX compiler). Rd files are NOT created with the skeleton structure. devtools recommends adding roxygen2 blocks in front of each function in your .R files (in the ‘R’ folder), which can be used to generate Rd files. I’ve provided examples of roxygen, Rd and html for a simple popdemo function below, to illustrate the syntaxes. For an R package to function, Rd files aren’t actually needed. But, to be submitted to CRAN the Rd files must meet certain requirements.

Your documentation must include:

  • Name: the name of the document (added automatically by roxygen2)
  • Alias: The function documented (added automatically by roxygen2, but can be added manually using @alias)
  • Title: Documentation title (first line of roxygen2 block)
  • Usage: a list of arguments and their defaults (added automatically by roxygen2)
  • Arguments: description of each argument (@param)
  • Description: Short description of what the function does (added automatically by roxygen2, but can be added manually using @description)

It should include:

  • Value: what the function returns (@return)
  • Details: detailed description of function usage (@details)
  • Examples: self-contained examples of the code in action (@examples)

It may include:

  • Authors: Who wrote the functions (@authors)
  • One document with multiple functions (multiple aliases: use @alias and separate each function with a space)
  • Anything else that’s relevant

ALL your functions must be documented somewhere with an alias, and ALL the arguments of each function must be described under @param. Make your documentation as useful, thorough and accurate as possible.

Rd syntax, corresponding roxygen2 syntax (included in the function’s .R file), and the resulting html helpfile

Rd syntax, corresponding roxygen2 syntax (included in the function’s .R file), and the resulting html helpfile

The NAMESPACE file allows R packages to communicate with one another. You can export your functions to make them available to others and import functions from other packages if you use them in your code. roxygen2 can be used to to export and import functions.

  • export() exports your functions (@export in roxygen2).
  • import() imports all functions from other packages (@import).
  • importFrom() imports specific functions from a single package (@importFrom)

Imagine you want to export the minCS function from popdemo, and that you need to import functions from expm for it to work. To its roxygen2 block, you would add:

#' @export

#' @import expm

The real NAMESPACE file for popdemo exports all functions and imports the matrix power function %^% from expm (note that I have edited this by hand, so it looks different to the roxygen2-generated NAMESPACE).

popdemo’s NAMESPACE file (version 0.1-4)

popdemo’s NAMESPACE file (version 0.1-4)

  1. Build .Rd files and NAMESPACE file

If your roxygen blocks are correct then the .Rd files and the NAMESPACE file are created from them using

devtools::document("path/to/package/pkgname")

  1. Edit DESCRIPTION file

The DESCRIPTION file contains important information about the package. View/edit it using a text editor (or directly in RStudio).

It must include:

  • Package: package name
  • Title: short package description
  • Version: package version (numbers, separated using . or -)
  • Authors: authors and emails
  • Maintainer: Package maintainer (one name, and one valid email address in angle brackets)
    • Pro Tip: Make sure this email address is valid… embarrassingly, popdemo was pulled from CRAN for a while when my PhD student email account expired and it wasn’t a straightforward issue to sort out!
  • License: distribution license (see here)
  • Description: One paragraph saying what the package does

It should include:

  • Imports: other packages containing functions used in your package (if you use any).
    • Alternative ways of importing functions exist, including Depends and Suggests, but it is often better and/or quicker to avoid using these if you can.

It may include:

  • Authors and maintainer under a single heading Authors@R using the person function.
  • Date: when was the package last changed
  • Anything else of relevance
popdemo’s DESCRIPTION file (version 0.1-4)

popdemo’s DESCRIPTION file (version 0.1-4)

  1. Check the package

You need to check if all the files are correct before building the package. Run

devtools::check("path/to/package/pkgname")

(or use the RStudio GUI option in the ‘Build’ tab). If something is wrong, this function will find it for you. ERRORS mean the package can’t be built. WARNINGS mean it can be built but won’t pass CRAN checks. Ideally for CRAN you would have no NOTES either.

  1. Repeat 3-7 as necessary

This step is pretty self-explanatory. Essentially, this is where you fix any of the ERRORS, WARNINGS or NOTES that came up in the previous step.

  1. Build the package

Packages are distributed as .tar.gz files. When you’re happy with the package and checks run smoothly, create this using

devtools::build("path/to/package/pkgname")

(or again, use the RStudio ‘Build’ tab). You should be able to install this package immediately on your machine using

install.packages("path/to/package/packagename_version.tar.gz")

  1. Distribute (if you want to)

You can submit to CRAN manually here, or use devtools::release(). Think carefully before submitting to CRAN: read the policies and make sure your package check is 100% clean. Some other options are to release your software on GitHub or share it with a few people by sending them the .tar.gz file.

Extra tips…

  • Try to keep your code neat! It will be easier to find errors and the whole process will run a lot more smoothly.
  • Keep packages as self-contained as you can. Fewer dependencies/imports make for lighter use.
  • Keep account of changes you make: consider including a ChangeLog.txt file in your package for this.

This quickstart guide has covered the basics but R packages can include much more and there’s lots not covered.

  • Consider including data, which gives people something to work with when they are getting to know your package, and makes working with open data in R easier.
  • Add some demos or vignettes to help walk people through using the package. This can help maximise the uptake of your package and minimise the queries that you get from users.
  • Create your own classes and methods to further your package’s functionality.

Extra things often require extra documentation and indexing, but now you’re prepared to venture into this particular unknown. Hopefully this beginner’s guide will help you to make someone’s life easier, help conserve biodiversity, contribute indirectly to scientific knowledge and, like me, feel a bit less like an imposter (even if it does mean that Glasgow is deprived of yet another café-cum-cocktail bar).

Advertisements

3 thoughts on “A Quickstart Guide for Building Your First R Package

  1. Cool Iain and thanks for sharing. I strongly agree with your suggestion to provide some example data and a vignette together with your package. No one will use your package if they don’t know how to start.

    R prodigy Hadley Wickham has written a wonderful and very practical book on developping R packes. The online version is free: http://r-pkgs.had.co.nz/

  2. Pingback: Links round-up: 08/01/2016 | BES Quantitative Ecology Blog

  3. Pingback: Stage-dependent Demographic Modelling at Your Finger Tips | methods.blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s