3 R features relevant for debugging

3.1 Interpreter

R is an interpreted language, which means that code is interpreted by the R interpreter during runtime. This has the effect that syntax errors occur only when the code is run. Thus using an IDE like Rstudio makes coding easier as it notifies the user on such errors beforehand.

When R exists it also can save the current environment (aka. workspace) into a .RData-file. One should be careful to make certain that existing workspace does not interfere with the code.

For non-interactive usage one usually uses Rscript or R CMD BATCH. Rscript is usually better. However, when interpreter is launched using Rscript, the list of default packages is minimal:

Rscript --help

## Usage: /path/to/Rscript [--options] [-e expr [-e expr2 ...] | file] [args]
## 
## --options accepted are
##   --help              Print usage and exit
##   --version           Print version and exit
##   --verbose           Print information on progress
##   --default-packages=list
##                       Where 'list' is a comma-separated set
##                         of package names, or 'NULL'
## or options to R, in addition to --slave --no-restore, such as
##   --save              Do save workspace at the end of the session
##   --no-environ        Don't read the site and user environment files
##   --no-site-file      Don't read the site-wide Rprofile
##   --no-init-file      Don't read the user R profile
##   --restore           Do restore previously saved objects at startup
##   --vanilla           Combine --no-save, --no-restore, --no-site-file
##                         --no-init-file and --no-environ
## 
## 'file' may contain spaces but not shell metacharacters
## Expressions (one or more '-e <expr>') may be used *instead* of 'file'
## See also  ?Rscript  from within R

3.2 Functions

3.2.1 R is a functional language

R is at its core a functional language. This means that coding is often done in terms of functions that operate on objects and which return a new object. These function calls are then often chained together and vectorized with apply-family of functions.

For more info, see this chapter of Advanced R.³

3.2.2 Lazy evaluation

R uses lazy evaluation when functions are called. This means that functions are evaluated only when the output is actually used.

In the following example the value of g(x) is not evaluated for positive values and thus the undefined variable is not encountered until f(x) is called with a negative value:

g <- function(x) {
  return(x + z)
}

f <- function(x,y = g(x)) {
  if (x>0) {
    return(x)
  } else {
    return(y)
  }
}
f(1)
## [1] 1
f(-1)
## Error in g(x): object 'z' not found

Lazy evaluation can make it easy to miss bugs in alternative execution paths.

See this r-bloggers’ post for more information on lazy evaluation.

3.2.3 R functions are evaluated in an environment

Each R function has an environment that houses known names to values (e.g. function names, variables etc.).

The global environment is the top level environment that contains everything done by the user. Function calls etc. get their own environment that inherit their parent’s environment.

When running R code interactively the global environment can often become filled with various variables and definitions. Thus it is a good idea to clean up the environment every now and then and verify that your code works from a clean slate.

Thus one can easily run into situations such as this:

  f(-2)
## Error in g(x): object 'z' not found
  z <- 2
  f(-2)
## [1] 0

Objects can also write to their parents’ environment using the <<--operator:

h <- function(x) {
  x <<- x
}
x <- 1
x
## [1] 1
h(2)
x
## [1] 2

This can obviously cause problems and should be avoided in most cases.

For more info, see this chapter of Advanced R.

3.3 Objects

3.3.1 All objects in R have a base type

There are 25 base types in R such as integer, double, list and closure (function). One can check the type using typeof(x)-function.

a <- 1:10
b <- list(a=1:10)
c <- data.frame(b)
typeof(a)
## [1] "integer"
typeof(b)
## [1] "list"
typeof(c)
## [1] "list"
typeof(f)
## [1] "closure"

R is strongly, but dynamically typed. This means that objects have to keep their type, but objects can be cast to new types automatically. As an example, consider the following function that does a simple division:

d <- function(x,y) {
  print(is.integer(x))
  print(is.integer(y))
  print(is.integer(x/y))
  return(x/y)
}
d(10,3)
## [1] FALSE
## [1] FALSE
## [1] FALSE
## [1] 3.333333
d(10L,3L)
## [1] TRUE
## [1] TRUE
## [1] FALSE
## [1] 3.333333

For more info, see this chapter of Advanced R.

3.3.2 Vectors are everywhere and they are immutable

R is a heavily vectorized language. Most mathematical operations in R are sped up by doing them to the vector as a whole.

You can think of vectors as a collection of a thing with a length. There are two types of vectors in R: atomic vectors and list vectors.

Atomic vectors are your typical logical, numeric (integer or double) and character vectors.

x_vector <- 1:6
is.atomic(x_vector)
## [1] TRUE

Each vector has a type and a length:

typeof(x_vector)
## [1] "integer"
length(x_vector)
## [1] 6

Lists are vectors that can contain other vectors. They are not atomic. Elements in a list can have names.

x_list <- list(x_vector=x_vector)
x_list
## $x_vector
## [1] 1 2 3 4 5 6
names(x_list)
## [1] "x_vector"
is.vector(x_list)
## [1] TRUE
is.atomic(x_list)
## [1] FALSE

Whenever you modify objects R will copy the data to a new location:

x_vector2 <- x_vector
tracemem(x_vector) == tracemem(x_vector2)
## [1] TRUE
x_vector2 <- x_vector2 + 1
tracemem(x_vector) == tracemem(x_vector2)
## [1] FALSE

However, when you modify lists, only the list changes, but not the atomics it points to. This makes it excellent for storing data:

x_list2 <- x_list
tracemem(x_list) == tracemem(x_list2)
## [1] TRUE
x_list2$x_vector2 <- x_vector2
## tracemem[0x562c3131a340 -> 0x562c31a59400]: eval eval eval_with_user_handlers withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
tracemem(x_list) == tracemem(x_list2)
## [1] FALSE
tracemem(x_list$x_vector) == tracemem(x_list2$x_vector)
## [1] TRUE

Most R objects are built on top of vectors or lists by giving them attributes. When they get attributes, they cease to be pure vectors.

print(x_vector)
## [1] 1 2 3 4 5 6
print(attributes(x_vector))
## NULL
print(is.vector(x_vector))
## [1] TRUE
x_array <- array(x_vector, dim=c(3,2))
print(x_array)
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
print(attributes(x_array))
## $dim
## [1] 3 2
print(is.vector(x_array))
## [1] FALSE

data.frame is an extremely important object in R built on top of a list. It has an additional constraints added to it:

All vectors in a data frame need to have the same length.
It has rownames() and colnames(). names() of the data frame are the column names.
A data frame has nrow() rows and ncol() columns. The length() of a data frame gives the number of columns.

tibble is a modern drop-in replacement for data.frame that is highly recommended for its usability. For more information on tibbles, see the packages web page.⁴

data.table is another popular alternative for data.frame. It has its own usability enhancements and it scales better for big data than a normal data frame. For more information on data.table, see the packages web page.⁵

For more info on vectors, lists and data frames, see this chapter of Advanced R.

3.3.3 Base types are extended using object-oriented (OO) programming

Examples in this section utilize the sloop-package for finding more information on objects. You can install it with:

install.packages("sloop")

In R nomenclature, everything is an object. But not everything is an OO-style object. OO-objects have a class-attribute. One can also use is.object()-function to check whether the object is an OO object and sloop::s3_class() to get the class in a reliable way:

a <- 1:10
b <- list(a=1:10)
c <- data.frame(b)
attr(a, "class")
## NULL
attr(b, "class")
## NULL
attr(c, "class")
## [1] "data.frame"
is.object(a)
## [1] FALSE
is.object(b)
## [1] FALSE
is.object(c)
## [1] TRUE
sloop::s3_class(a)
## [1] "integer" "numeric"
sloop::s3_class(b)
## [1] "list"
sloop::s3_class(c)
## [1] "data.frame"

One can strip the object back to its base types with unclass()-function:

unclass(c)
## $a
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## attr(,"row.names")
##  [1]  1  2  3  4  5  6  7  8  9 10

There are multiple OO paradigms in R: - S3: Oldest and simplest system. Most of R OO-objects are S3 objects. - S4: More advanced version of S3. Heavily used by the Bioconductor-project. - R6: More like other OO languages. Improved version on R’s feature called reference classes. See R6 manual for more information⁶.

3.3.4 Method dispatch

Most R functions are S3 generic functions that choose between available methods for a class. One can check whether a function is a generic function or a method with sloop::ftype().

For example, paste() is an internal method that works with types, but print() is an S3 generic:

sloop::ftype(paste)
## [1] "internal"
sloop::ftype(print)
## [1] "S3"      "generic"

To see which function is used for a given object, one can use sloop::s3_dispatch():

sloop::s3_dispatch(print(c))
## => print.data.frame
##  * print.default

Now one can check the difference between these methods:

sloop::ftype(print.data.frame)
## [1] "S3"     "method"
sloop::ftype(print.default)
## [1] "internal"
print.data.frame(c)
##     a
## 1   1
## 2   2
## 3   3
## 4   4
## 5   5
## 6   6
## 7   7
## 8   8
## 9   9
## 10 10
print.default(c)
## $a
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## attr(,"class")
## [1] "data.frame"

Knowing about method dispatch is especially important when dealing with numerical data. Numeric can mean that something is a double or that something behaves like a number (integer and double). Typically everything in R is stored as doubles.

x_i <- 1L
x_n <- 1
is.integer(x_i)
## [1] TRUE
is.numeric(x_i)
## [1] TRUE
is.integer(x_n)
## [1] FALSE
is.numeric(x_n)
## [1] TRUE

Another problem that can happen is that one can easily remove the class functionality from an S3 object. Let’s consider a simple linear model. After fitting a linear model, one might try to store the coefficients into a variable:


x <- 1:10
err <- rnorm(10, 0, 0.5)
y <- 10 * x - 10 + err
d <- data.frame(x=x, y=y)

my_model <- lm(y ~ x, data=d)

my_coefs <- my_model$coefficients

my_coefs
## (Intercept)           x 
##   -9.606322    9.977472

After this, one might do a predictive fit:

y_pred <- my_coefs[2] * x + my_coefs[1]
y_pred
##  [1]  0.37115 10.34862 20.32609 30.30356 40.28104 50.25851 60.23598 70.21345
##  [9] 80.19092 90.16839

However, coefficients do not contain all of the information that the model has. Instead of ripping the coefficients out from the object, one should utilize the S3 generic function predict for models that support it. This will allow the model to stay as an S3 object:

attributes(my_model)
## $names
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"        
## 
## $class
## [1] "lm"

sloop::ftype(predict)
## [1] "S3"      "generic"
predict(my_model)
##        1        2        3        4        5        6        7        8 
##  0.37115 10.34862 20.32609 30.30356 40.28104 50.25851 60.23598 70.21345 
##        9       10 
## 80.19092 90.16839

Utilizing these generic functions gives R great power, as one can use similar coding structures for various different models. For example, predict can operate on multiple different models with corresponding methods:

methods(predict)

##  [1] predict.ar*                predict.Arima*            
##  [3] predict.arima0*            predict.glm               
##  [5] predict.HoltWinters*       predict.lm                
##  [7] predict.loess*             predict.mlm*              
##  [9] predict.nls*               predict.poly*             
## [11] predict.ppr*               predict.prcomp*           
## [13] predict.princomp*          predict.smooth.spline*    
## [15] predict.smooth.spline.fit* predict.StructTS*         
## see '?methods' for accessing help and source code

3.4 Signals and error handling

R has a robust system of messages, warnings and errors, that allow users to prevent erroneous behaviour.

R has three different signal types:

Messages: Messages are meant for informing the user that some action has been taken.
Warnings: Warnings are meant to signal that not everything went correctly, but the program execution will continue.
Errors: Errors indicate to R that something went wrong and the program execution should stop.

To raise these signals, one can use the following functions:

message('This is a message')
## This is a message
warning('This is a warning')
## Warning: This is a warning
stop('This is an error')
## Error in eval(expr, envir, enclos): This is an error

One can suppress these messages with try(), suppressWarnings() and suppressMessages():

signalsender <- function(signal_function, text) { signal_function(text) }
signalsender(message, 'This is a message')
## This is a message
suppressMessages(signalsender(message, 'This is a message'))

signalsender(warning, 'This is a warning')
## Warning in signalsender(warning, "This is a warning"): This is a warning
suppressWarnings(signalsender(warning, 'This is a warning'))

try(signalsender(stop, 'This is an error'))
## Error in signalsender(stop, "This is an error") : This is an error
try(signalsender(stop, 'This is an error'), silent=TRUE)
signalsender(stop, 'This is an error')
## Error in signalsender(stop, "This is an error"): This is an error

By default try will still print that an error has occured, but it can be silenced. However, silencing errors is risky and should only be used when the reason for the error is known and expected.

All of the signals have handlers: functions that react when the signals are raised. These handlers can be overwritten with tryCatch()- and withCallingHandlers()-functions:

tryCatch(
  error = function(cond) {
    message(paste('Received an error\n', cond))
  },
  signalsender(stop, 'Raise an error')
)
## Received an error
##  Error in signalsender(stop, "Raise an error"): Raise an error
withCallingHandlers(
  message = function(cond) {
    stop(paste('I did not expect a message:\n', cond,'\nRaising an error!'), call.=FALSE)
  },
  signalsender(message, 'Send a message')
)
## Error: I did not expect a message:
##  simpleMessage in signal_function(text): Send a message
## 
##  
## Raising an error!

There’s a difference between these two functions. From Advanced R:

tryCatch() defines exiting handlers; after the condition is handled, control returns to the context where tryCatch() was called. This makes tryCatch() most suitable for working with errors and interrupts, as these have to exit anyway.

withCallingHandlers() defines calling handlers; after the condition is captured control returns to the context where the condition was signalled. This makes it most suitable for working with non-error conditions.

The difference is also well illustrated in the following quote from Advanced R:

An exiting handler handles a signal like you handle a problem; it makes the problem go away.

A calling handler handles a signal like you handle a car; the car still exists.

rlang-package⁷ provides additional wrappers for these signals that can make error raising and handling easier. You can easily create your own custom error types with it.

For more info on R’s error handling and on using rlang, see this chapter of Advanced R.

3.5 Libraries

3.5.1 Libraries are usually compiled

R libraries can be pure R or they can utilize other languages. Many R’s internal functions utilize R’s API for writing extensions. Common used languages are C, C++ and Fortran. Reason for this is speed provided by these lower´ level languages.

However, this API can be very complicated and thus most new packages use C/C++ with Rcpp-package.⁸

This means that during installation many R libraries need C and C++ compilers and external libraries, which can result in various errors if you lack said requirements.

Debugging C or C++ code requires one to utilize C/C++ debuggers. For more information one can look for example at this blog post⁹ or at the various links provided in Advanced R.

For more information on Rcpp, one can check Rcpp for everyone¹⁰, this chapter in Advanced R

Hadley Wickham, Advanced r (CRC press, 2019), https://adv-r.hadley.nz.↩︎
Kirill Müller and Hadley Wickham, Tibble: Simple Data Frames, 2021, https://CRAN.R-project.org/package=tibble.↩︎
Matt Dowle and Arun Srinivasan, Data.table: Extension of ‘Data.frame‘, 2021, https://CRAN.R-project.org/package=data.table.↩︎
Winston Chang, R6: Encapsulated Classes with Reference Semantics, 2022.↩︎
Lionel Henry and Hadley Wickham, Rlang: Functions for Base Types and Core r and Tidyverse Features, 2022, https://CRAN.R-project.org/package=rlang.↩︎
Dirk Eddelbuettel et al., Rcpp: Seamless r and c++ Integration, 2022, https://CRAN.R-project.org/package=Rcpp.↩︎
Davis Vaughan, “Debugging an r Package with c++,” 2017, https://blog.davisvaughan.com/2019/04/05/debug-r-package-with-cpp/.↩︎
Masaki E. Tsuda, Rcpp for Everyone, 2020, https://teuder.github.io/rcpp4everyone_en/.↩︎