valaddin is a lightweight
R package that enables you to transform an existing function into a
function with input validation checks. It does so without requiring you
to modify the body of the function, in contrast to doing input
validation using stop
or stopifnot
, and is
therefore suitable for both programmatic and interactive use.
This document illustrates the use of valaddin, by example. For usage
details, see the main documentation page, ?firmly
.
The workhorse of valaddin is the function firmly
, which
applies input validation to a function, in situ. It can be used
to:
For example, to require that all arguments of the function
are numerical, apply firmly
with the check formula
~is.numeric
1:
ff
behaves just like f
, but with a
constraint on the type of its arguments:
For example, use firmly
to put a cap on potentially
long-running computations:
fib <- function(n) {
if (n <= 1L) return(1L)
Recall(n - 1L) + Recall(n - 2L)
}
capped_fib <- firmly(fib, list("n capped at 30" ~ ceiling(n)) ~ {. <= 30L})
capped_fib(10)
#> [1] 89
capped_fib(50)
#> Error: capped_fib(n = 50)
#> n capped at 30
The role of each part of the value-constraining formula is evident:
The right-hand side {. <= 30L}
is the constraint
itself, expressed as a condition on .
, a placeholder
argument.
The left-hand side
list("n capped at 30" ~ ceiling(n))
specifies the
expression for the placeholder, namely ceiling(n)
, along
with a message to produce if the constraint is violated.
If the default behavior of a function is problematic, or unexpected,
you can use firmly
to warn you. Consider the function
as.POSIXct
, which creates a date-time object:
The problem is that d
is a potentially
ambiguous object (with hidden state), because it’s not assigned
a time zone, explicitly. If you compute the local hour of d
using as.POSIXlt
, you get an answer that interprets
d
according to your current time zone; another user—or you,
in another country, in the future—may get a different result.
If you’re in CET time zone:
If you were to change to EST time zone and rerun the code:
``` r
Sys.setenv(TZ = "EST")
d <- as.POSIXct("2017-01-01 09:30:00")
as.POSIXlt(d, tz = "EST")$hour
#> [1] 9
```
To warn yourself about this pitfall, you can modify
as.POSIXct
to complain when you’ve forgotten to specify a
time zone:
Now when you call as.POSIXct
, you get a cautionary
reminder:
as.POSIXct("2017-01-01 09:30:00")
#> Warning: Argument(s) expected but not specified in call as.POSIXct(x =
#> "2017-01-01 09:30:00"): `tz`
#> [1] "2017-01-01 09:30:00 UTC"
as.POSIXct("2017-01-01 09:30:00", tz = "CET")
#> [1] "2017-01-01 09:30:00 CET"
NB: The missing-argument warning is implemented by
wrapping functions. The underlying function
base::as.POSIXct
is called unmodified.
loosely
to access the original functionThough reassigning as.POSIXct
may seem risky, it is not,
for the behavior is unchanged (aside from the extra precaution), and the
original as.POSIXct
remains accessible:
base::as.POSIXct
loosely
to strip input validation:
loosely(as.POSIXct)
R tries to help you express your ideas as concisely as possible.
Suppose you want to truncate negative values of a vector
w
:
w <- {set.seed(1); rnorm(5)}
ifelse(w > 0, w, 0)
#> [1] 0.0000000 0.1836433 0.0000000 1.5952808 0.3295078
ifelse
assumes (correctly) that you intend the
0
to be repeated 5 times, and does that for you,
automatically.
Nonetheless, R’s good intentions have a darker side:
This smells like a coding error. Instead of complaining that
pos
is too short, ifelse
recycles it to line
it up with z
. The result is probably not what you
wanted.
In this case, you don’t need a helping hand, but rather a firm one:
chk_length_type <- list(
"'yes', 'no' differ in length" ~ length(yes) == length(no),
"'yes', 'no' differ in type" ~ typeof(yes) == typeof(no)
) ~ isTRUE
ifelse_f <- firmly(ifelse, chk_length_type)
ifelse_f
is more pedantic than ifelse
. But
it also spares you the consequences of invalid inputs:
ifelse_f(w > 0, w, 0)
#> Error: ifelse_f(test = w > 0, yes = w, no = 0)
#> 'yes', 'no' differ in length
ifelse_f(w > 0, w, rep(0, length(w)))
#> [1] 0.0000000 0.1836433 0.0000000 1.5952808 0.3295078
ifelse(z > 0, pos, neg)
#> [1] 1 2 3 4 5 1
ifelse_f(z > 0, pos, neg)
#> Error: ifelse_f(test = z > 0, yes = pos, no = neg)
#> 'yes', 'no' differ in length
ifelse(z > 0, as.character(pos), neg)
#> [1] "1" "2" "3" "4" "5" "1"
ifelse_f(z > 0, as.character(pos), neg)
#> Error: ifelse_f(test = z > 0, yes = as.character(pos), no = neg)
#> 1) 'yes', 'no' differ in length
#> 2) 'yes', 'no' differ in type
When R make a function call, say, f(a)
, the
value of the argument a
is not materialized in the
body of f
until it is actually needed. Usually, you can
safely ignore this as a technicality of R’s evaluation model; but in
some situations, it can be problematic if you’re not mindful of it.
Consider a bank that waives fees for students. A function to make deposits might look like this2:
deposit <- function(account, value) {
if (is_student(account)) {
account$fees <- 0
}
account$balance <- account$balance + value
account
}
is_student <- function(account) {
if (isTRUE(account$is_student)) TRUE else FALSE
}
Suppose Bob is an account holder, currently not in school:
If Bob were to deposit an amount to cover an future fee payment, his account balance would be updated to:
Bob goes back to school and informs the bank, so that his fees will be waived:
But now suppose that, somewhere in the bowels of the bank’s software, the type of Bob’s account object is converted from a list to an environment:
If Bob were to deposit an amount to cover an future fee payment, his account balance would now be updated to:
Becoming a student has cost Bob money. What happened to the amount deposited?
The culprit is lazy evaluation and the modify-in-place semantics of
environments. In the call
deposit(account = bobs_acct, value = bobs_acct$fee)
, the
value of the argument value
is only set when it’s used,
which comes after the object fee
in the environment
bobs_acct
has already been zeroed out.
To minimize such risks, forbid account
from being an
environment:
err_msg <- "`acccount` should not be an environment"
deposit <- firmly(deposit, list(err_msg ~ account) ~ Negate(is.environment))
This makes Bob a happy customer, and reduces the bank’s liability:
You don’t mean to shoot yourself, but sometimes it happens, nonetheless:
x <- "An expensive object"
save(x, file = "my-precious.rda")
x <- "Oops! A bug or lapse has tarnished your expensive object"
# Many computations later, you again save x, oblivious to the accident ...
save(x, file = "my-precious.rda")
firmly
can safeguard you from such mishaps: implement a
safety procedure
# Argument `gear` is a list with components:
# fun: Function name
# ns : Namespace of `fun`
# chk: Formula that specify input checks
hardhat <- function(gear, env = .GlobalEnv) {
for (. in gear) {
safe_fun <- firmly(getFromNamespace(.$fun, .$ns), .$chk)
assign(.$fun, safe_fun, envir = env)
}
}
gather your safety gear
protection <- list(
list(
fun = "save",
ns = "base",
chk = list("Won't overwrite `file`" ~ file) ~ Negate(file.exists)
),
list(
fun = "load",
ns = "base",
chk = list("Won't load objects into current environment" ~ envir) ~
{!identical(., parent.frame(2))}
)
)
then put it on
Now save
and load
engage safety features
that prevent you from inadvertently destroying your data:
x <- "An expensive object"
save(x, file = "my-precious.rda")
x <- "Oops! A bug or lapse has tarnished your expensive object"
#> Error: save(x, file = "my-precious.rda")
#> Won't overwrite `file`
save(x, file = "my-precious.rda")
# Inspecting x, you notice it's changed, so you try to retrieve the original ...
x
#> [1] "Oops! A bug or lapse has tarnished your expensive object"
load("my-precious.rda")
#> Error: load(file = "my-precious.rda")
#> Won't load objects into current environment
# Keep calm and carry on
loosely(load)("my-precious.rda")
x
#> [1] "An expensive object"
NB: Input validation is implemented by wrapping
functions; thus, if the arguments are valid, the underlying functions
base::save
, base::load
are called
unmodified.
valaddin provides a collection of over 50 pre-made input
checkers to facilitate typical kinds of argument checks. These checkers
are prefixed by vld_
, for convenient browsing and look-up
in editors and IDE’s that support name completion.
For example, to create a type-checked version of the function
upper.tri
, which returns an upper-triangular logical
matrix, apply the checkers vld_matrix
,
vld_boolean
(here “boolean” is shorthand for “logical
vector of length 1”):
upper_tri <- firmly(upper.tri, vld_matrix(~x), vld_boolean(~diag))
# upper.tri assumes you mean a vector to be a column matrix
upper.tri(1:2)
#> [,1]
#> [1,] FALSE
#> [2,] FALSE
upper_tri(1:2)
#> Error: upper_tri(x = 1:2, diag = FALSE)
#> Not matrix: x
# But say you actually meant (1, 2) to be a diagonal matrix
upper_tri(diag(1:2))
#> [,1] [,2]
#> [1,] FALSE TRUE
#> [2,] FALSE FALSE
upper_tri(diag(1:2), diag = "true")
#> Error: upper_tri(x = diag(1:2), diag = "true")
#> Not boolean: diag
upper_tri(diag(1:2), TRUE)
#> [,1] [,2]
#> [1,] TRUE TRUE
#> [2,] FALSE TRUE
vld_true
Any input validation can be expressed as an assertion that “such and
such must be true”; to apply it as such, use vld_true
(or
its complement, vld_false
).
For example, the above hardening of ifelse
can be redone
as:
chk_length_type <- vld_true(
"'yes', 'no' differ in length" ~ length(yes) == length(no),
"'yes', 'no' differ in type" ~ typeof(yes) == typeof(no)
)
ifelse_f <- firmly(ifelse, chk_length_type)
z <- rep(1, 6)
pos <- 1:5
neg <- -6:-1
ifelse_f(z > 0, as.character(pos), neg)
#> Error: ifelse_f(test = z > 0, yes = as.character(pos), no = neg)
#> 1) 'yes', 'no' differ in length
#> 2) 'yes', 'no' differ in type
ifelse_f(z > 0, c(pos, 6), neg)
#> Error: ifelse_f(test = z > 0, yes = c(pos, 6), no = neg)
#> 'yes', 'no' differ in type
ifelse_f(z > 0, c(pos, 6L), neg)
#> [1] 1 2 3 4 5 6
localize
A check formula such as ~ is.numeric
(or
"Not number" ~ is.numeric
, if you want a custom error
message) imposes its condition “globally”:
difference <- firmly(function(x, y) x - y, ~ is.numeric)
difference(3, 1)
#> [1] 2
difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC"))
#> Error: difference(x = as.POSIXct("2017-01-01", "UTC"), y = as.POSIXct("2016-01-01", "UTC"))
#> 1) FALSE: is.numeric(x)
#> 2) FALSE: is.numeric(y)
With localize
, you can concentrate a globally applied
check formula to specific expressions. The result is a reusable
custom checker:
chk_numeric <- localize("Not numeric" ~ is.numeric)
secant <- firmly(function(f, x, h) (f(x + h) - f(x)) / h, chk_numeric(~x, ~h))
secant(sin, 0, .1)
#> [1] 0.9983342
secant(sin, "0", .1)
#> Error: secant(f = sin, x = "0", h = 0.1)
#> Not numeric: x
(In fact, chk_numeric
is equivalent to the pre-built
checker vld_numeric
.)
Conversely, apply globalize
to impose your localized
checker globally:
difference <- firmly(function(x, y) x - y, globalize(chk_numeric))
difference(3, 1)
#> [1] 2
difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC"))
#> Error: difference(x = as.POSIXct("2017-01-01", "UTC"), y = as.POSIXct("2016-01-01", "UTC"))
#> 1) Not numeric: `x`
#> 2) Not numeric: `y`
The inspiration to use ~
as a quoting
operator came from the vignette Non-standard
evaluation, by Hadley Wickham.↩︎
Adapted from an example in Section 6.3 of Chambers, Extending R, CRC Press, 2016. For the sake of the example, ignore the fact that logic to handle fees does not belong in a function for deposits!↩︎