--- title: "Using valaddin" author: "Eugene Ha" date: "2017-08-10" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using valaddin} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(valaddin) knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` _valaddin_ is a lightweight R package that enables you to transform an existing function into a function with input validation checks. It does so without requiring you to modify the body of the function, in contrast to doing input validation using `stop` or `stopifnot`, and is therefore suitable for both programmatic and interactive use. This document illustrates the use of valaddin, by example. For usage details, see the main documentation page, `?firmly`. ## Use cases The workhorse of valaddin is the function `firmly`, which applies input validation to a function, _in situ_. It can be used to: ### Enforce types for arguments For example, to require that all arguments of the function ```{r} f <- function(x, h) (sin(x + h) - sin(x)) / h ``` are numerical, apply `firmly` with the check formula `~is.numeric`[^1]: ```{r} ff <- firmly(f, ~is.numeric) ``` `ff` behaves just like `f`, but with a constraint on the type of its arguments: ```{r, error = TRUE, purl = FALSE} ff(0.0, 0.1) ff("0.0", 0.1) ``` [^1]: The inspiration to use `~` as a quoting operator came from the vignette [Non-standard evaluation](https://cran.r-project.org/package=lazyeval/vignettes/lazyeval.html), by Hadley Wickham. ### Enforce constraints on argument values For example, use `firmly` to put a cap on potentially long-running computations: ```{r, error = TRUE, purl = FALSE} fib <- function(n) { if (n <= 1L) return(1L) Recall(n - 1L) + Recall(n - 2L) } capped_fib <- firmly(fib, list("n capped at 30" ~ ceiling(n)) ~ {. <= 30L}) capped_fib(10) capped_fib(50) ``` The role of each part of the value-constraining formula is evident: - The right-hand side `{. <= 30L}` is the constraint itself, expressed as a condition on `.`, a placeholder argument. - The left-hand side `list("n capped at 30" ~ ceiling(n))` specifies the expression for the placeholder, namely `ceiling(n)`, along with a message to produce if the constraint is violated. ### Warn about pitfalls If the default behavior of a function is problematic, or unexpected, you can use `firmly` to warn you. Consider the function `as.POSIXct`, which creates a date-time object: ```{r, include = FALSE} tz_original <- Sys.getenv("TZ", unset = NA) ``` ```{r} Sys.setenv(TZ = "CET") (d <- as.POSIXct("2017-01-01 09:30:00")) ``` The problem is that `d` is a potentially _ambiguous_ object (with hidden state), because it's not assigned a time zone, explicitly. If you compute the local hour of `d` using `as.POSIXlt`, you get an answer that interprets `d` according to your current time zone; another user—or you, in another country, in the future—may get a different result. - If you're in CET time zone: ```{r} as.POSIXlt(d, tz = "EST")$hour ``` - If you were to change to EST time zone and rerun the code: ```{r} Sys.setenv(TZ = "EST") d <- as.POSIXct("2017-01-01 09:30:00") as.POSIXlt(d, tz = "EST")$hour ``` ```{r, include = FALSE} if (isTRUE(is.na(tz_original))) { Sys.unsetenv("TZ") } else { Sys.setenv(TZ = tz_original) } ``` To warn yourself about this pitfall, you can modify `as.POSIXct` to complain when you've forgotten to specify a time zone: ```{r} as.POSIXct <- firmly(as.POSIXct, .warn_missing = "tz") ``` Now when you call `as.POSIXct`, you get a cautionary reminder: ```{r} as.POSIXct("2017-01-01 09:30:00") as.POSIXct("2017-01-01 09:30:00", tz = "CET") ``` **NB**: The missing-argument warning is implemented by wrapping functions. The underlying function `base::as.POSIXct` is called _unmodified_. #### Use `loosely` to access the original function Though reassigning `as.POSIXct` may seem risky, it is not, for the behavior is unchanged (aside from the extra precaution), and the original `as.POSIXct` remains accessible: - With a namespace prefix: `base::as.POSIXct` - By applying `loosely` to strip input validation: `loosely(as.POSIXct)` ```{r} loosely(as.POSIXct)("2017-01-01 09:30:00") identical(loosely(as.POSIXct), base::as.POSIXct) ``` ### Decline handouts R tries to help you express your ideas as concisely as possible. Suppose you want to truncate negative values of a vector `w`: ```{r} w <- {set.seed(1); rnorm(5)} ifelse(w > 0, w, 0) ``` `ifelse` assumes (correctly) that you intend the `0` to be repeated `r length(w)` times, and does that for you, automatically. Nonetheless, R's good intentions have a darker side: ```{r} z <- rep(1, 6) pos <- 1:5 neg <- -6:-1 ifelse(z > 0, pos, neg) ``` This smells like a coding error. Instead of complaining that `pos` is too short, `ifelse` recycles it to line it up with `z`. The result is probably not what you wanted. In this case, you don't need a helping hand, but rather a firm one: ```{r} chk_length_type <- list( "'yes', 'no' differ in length" ~ length(yes) == length(no), "'yes', 'no' differ in type" ~ typeof(yes) == typeof(no) ) ~ isTRUE ifelse_f <- firmly(ifelse, chk_length_type) ``` `ifelse_f` is more pedantic than `ifelse`. But it also spares you the consequences of invalid inputs: ```{r, error = TRUE, purl = FALSE} ifelse_f(w > 0, w, 0) ifelse_f(w > 0, w, rep(0, length(w))) ifelse(z > 0, pos, neg) ifelse_f(z > 0, pos, neg) ifelse(z > 0, as.character(pos), neg) ifelse_f(z > 0, as.character(pos), neg) ``` ### Reduce the risks of a lazy evaluation-style When R make a function call, say, `f(a)`, the _value_ of the argument `a` is not materialized in the body of `f` until it is actually needed. Usually, you can safely ignore this as a technicality of R's evaluation model; but in some situations, it can be problematic if you're not mindful of it. Consider a bank that waives fees for students. A function to make deposits might look like this[^2]: ```{r} deposit <- function(account, value) { if (is_student(account)) { account$fees <- 0 } account$balance <- account$balance + value account } is_student <- function(account) { if (isTRUE(account$is_student)) TRUE else FALSE } ``` Suppose Bob is an account holder, currently not in school: ```{r} bobs_acct <- list(balance = 10, fees = 3, is_student = FALSE) ``` If Bob were to deposit an amount to cover an future fee payment, his account balance would be updated to: ```{r} deposit(bobs_acct, bobs_acct$fees)$balance ``` Bob goes back to school and informs the bank, so that his fees will be waived: ```{r} bobs_acct$is_student <- TRUE ``` But now suppose that, somewhere in the bowels of the bank's software, the type of Bob's account object is converted from a list to an environment: ```{r} bobs_acct <- list2env(bobs_acct) ``` If Bob were to deposit an amount to cover an future fee payment, his account balance would now be updated to: ```{r} deposit(bobs_acct, bobs_acct$fees)$balance ``` Becoming a student has cost Bob money. What happened to the amount deposited? The culprit is lazy evaluation and the modify-in-place semantics of environments. In the call `deposit(account = bobs_acct, value = bobs_acct$fee)`, the value of the argument `value` is only set when it's used, which comes after the object `fee` in the environment `bobs_acct` has already been zeroed out. To minimize such risks, forbid `account` from being an environment: ```{r} err_msg <- "`acccount` should not be an environment" deposit <- firmly(deposit, list(err_msg ~ account) ~ Negate(is.environment)) ``` This makes Bob a happy customer, and reduces the bank's liability: ```{r, error = TRUE, purl = FALSE} bobs_acct <- list2env(list(balance = 10, fees = 3, is_student = TRUE)) deposit(bobs_acct, bobs_acct$fees)$balance deposit(as.list(bobs_acct), bobs_acct$fees)$balance ``` [^2]: Adapted from an example in Section 6.3 of Chambers, _Extending R_, CRC Press, 2016. For the sake of the example, ignore the fact that logic to handle fees does not belong in a function for deposits! ### Prevent self-inflicted wounds You don't mean to shoot yourself, but sometimes it happens, nonetheless: ```{r, eval = FALSE} x <- "An expensive object" save(x, file = "my-precious.rda") x <- "Oops! A bug or lapse has tarnished your expensive object" # Many computations later, you again save x, oblivious to the accident ... save(x, file = "my-precious.rda") ``` `firmly` can safeguard you from such mishaps: implement a safety procedure ```{r} # Argument `gear` is a list with components: # fun: Function name # ns : Namespace of `fun` # chk: Formula that specify input checks hardhat <- function(gear, env = .GlobalEnv) { for (. in gear) { safe_fun <- firmly(getFromNamespace(.$fun, .$ns), .$chk) assign(.$fun, safe_fun, envir = env) } } ``` gather your safety gear ```{r} protection <- list( list( fun = "save", ns = "base", chk = list("Won't overwrite `file`" ~ file) ~ Negate(file.exists) ), list( fun = "load", ns = "base", chk = list("Won't load objects into current environment" ~ envir) ~ {!identical(., parent.frame(2))} ) ) ``` then put it on ```{r} hardhat(protection) ``` Now `save` and `load` engage safety features that prevent you from inadvertently destroying your data: ```{r, eval = FALSE} x <- "An expensive object" save(x, file = "my-precious.rda") x <- "Oops! A bug or lapse has tarnished your expensive object" #> Error: save(x, file = "my-precious.rda") #> Won't overwrite `file` save(x, file = "my-precious.rda") # Inspecting x, you notice it's changed, so you try to retrieve the original ... x #> [1] "Oops! A bug or lapse has tarnished your expensive object" load("my-precious.rda") #> Error: load(file = "my-precious.rda") #> Won't load objects into current environment # Keep calm and carry on loosely(load)("my-precious.rda") x #> [1] "An expensive object" ``` **NB**: Input validation is implemented by wrapping functions; thus, if the arguments are valid, the underlying functions `base::save`, `base::load` are called _unmodified_. ## Toolbox of input checkers _valaddin_ provides a collection of over 50 pre-made input checkers to facilitate typical kinds of argument checks. These checkers are prefixed by `vld_`, for convenient browsing and look-up in editors and IDE's that support name completion. For example, to create a type-checked version of the function `upper.tri`, which returns an upper-triangular logical matrix, apply the checkers `vld_matrix`, `vld_boolean` (here "boolean" is shorthand for "logical vector of length 1"): ```{r, error = TRUE, purl = FALSE} upper_tri <- firmly(upper.tri, vld_matrix(~x), vld_boolean(~diag)) # upper.tri assumes you mean a vector to be a column matrix upper.tri(1:2) upper_tri(1:2) # But say you actually meant (1, 2) to be a diagonal matrix upper_tri(diag(1:2)) upper_tri(diag(1:2), diag = "true") upper_tri(diag(1:2), TRUE) ``` ### Check anything with `vld_true` Any input validation can be expressed as an assertion that "such and such must be true"; to apply it as such, use `vld_true` (or its complement, `vld_false`). For example, the above hardening of `ifelse` can be redone as: ```{r, error = TRUE, purl = FALSE} chk_length_type <- vld_true( "'yes', 'no' differ in length" ~ length(yes) == length(no), "'yes', 'no' differ in type" ~ typeof(yes) == typeof(no) ) ifelse_f <- firmly(ifelse, chk_length_type) z <- rep(1, 6) pos <- 1:5 neg <- -6:-1 ifelse_f(z > 0, as.character(pos), neg) ifelse_f(z > 0, c(pos, 6), neg) ifelse_f(z > 0, c(pos, 6L), neg) ``` ### Make your own input checker with `localize` A check formula such as `~ is.numeric` (or `"Not number" ~ is.numeric`, if you want a custom error message) imposes its condition "globally": ```{r, error = TRUE, purl = FALSE} difference <- firmly(function(x, y) x - y, ~ is.numeric) difference(3, 1) difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC")) ``` With `localize`, you can concentrate a globally applied check formula to specific expressions. The result is a _reusable_ custom checker: ```{r, error = TRUE, purl = FALSE} chk_numeric <- localize("Not numeric" ~ is.numeric) secant <- firmly(function(f, x, h) (f(x + h) - f(x)) / h, chk_numeric(~x, ~h)) secant(sin, 0, .1) secant(sin, "0", .1) ``` (In fact, `chk_numeric` is equivalent to the pre-built checker `vld_numeric`.) Conversely, apply `globalize` to impose your localized checker globally: ```{r, error = TRUE, purl = FALSE} difference <- firmly(function(x, y) x - y, globalize(chk_numeric)) difference(3, 1) difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC")) ``` ## Related packages ### Packages that enhance valaddin - [assertive](https://bitbucket.org/richierocks/assertive), [assertthat](https://github.com/hadley/assertthat), and [checkmate](https://github.com/mllg/checkmate) provide extensive collections of predicate functions that you can use in conjunction with `firmly` and `localize`. - [ensurer](https://github.com/smbache/ensurer) and [assertr](https://github.com/ropensci/assertr) provide ways to validate function _values_. ### Other approaches to input validation - [argufy](https://github.com/gaborcsardi/argufy) takes a different approach to input validation, using [roxygen](https://github.com/r-lib/roxygen2) comments to specify checks. - [ensurer](https://github.com/smbache/ensurer) provides an experimental replacement for `function` that builds functions with type-validated arguments. - [typeCheck](https://github.com/jimhester/typeCheck), together with [Types for R](https://github.com/jimhester/types), enable the creation of functions with type-validated arguments by means of special type annotations. This approach is orthogonal to that of valaddin: whereas valaddin specifies input checks as _predicate functions with scope_ (predicates are primary), typeCheck specifies input checks as _arguments with type_ (arguments are primary).