Consider a package teeny that exports function
mean_and_variance()
for calculating the sample mean and the
sample variance in a single call:
#' Calculate the Mean and Variance
#'
#' @param x A numeric vector.
#'
#' @return
#' A named numeric vector of length two
#' with elements `mean` and `variance`.
#'
#' @importFrom stats var
#' @export
<- function(x) {
mean_and_variance <- mean(x)
mu <- var(x)
sigma2 c(mean = mu, variance = sigma2)
}
There are a few things to notice about the implements. The first is
that we declare that variance()
should be imported from the
stats package. Here we use a roxygen2
@importFrom stats var
statement to declare this import,
which is short for manually adding an
importFrom(stats, var)
entry in the package
NAMESPACE
file.
The second is that, because all objects exported by the
base package are always available, we do not have to
explicitly import them. It is actually not possible to import
base functions; if we would add an
@importFrom base mean
statement, the package fails to
install (see Appendix).
Continuing, when R evaluates
::mean_and_variance(1:100) teeny
## mean variance
## 50.5000 841.6667
it will eventually have to locate functions mean()
and
var()
. It turns out that these are effectively found
as:
<- getNamespace("teeny")
pkg_ns <- parent.env(parent.env(pkg_ns))[["mean"]]
mean <- parent.env( pkg_ns )[["var"]] var
Note the extra layer of parent.env()
for
mean()
. So, why are they not found in the same environment?
This is because we explicitly imported stats::var()
, but
not base::mean()
.
Let’s dig in deeper to see how this works, but before doing that, let’s look at what the first few parent environments of the teeny namespace are. We can du this using the environments package as:
<- getNamespace("teeny")
pkg_ns <- environments::parent_envs(pkg_ns, until = globalenv())
nss nss
## $teeny
## <environment: namespace:teeny>
##
## $`imports:teeny`
## <environment: 0x55cb372dd980>
## attr(,"name")
## [1] "imports:teeny"
##
## $base
## <environment: namespace:base>
##
## $R_GlobalEnv
## <environment: R_GlobalEnv>
This reveals (part of) the search path used by the functions in the
teeny package. When mean_and_variance()
is
called, R searches for mean()
and var()
in
these environments until found. So, in the case of mean()
,
it first looks nss[[1]]
using something like
exists("mean", mode = "function", envir = nss[[1]], inherits = FALSE)
,
and if not found, it continues to environment nss[[2]]
, and
so on until found. In this case, it locates mean()
in
nss[[3]]
, i.e. in the base namespace;
exists("mean", mode = "function", envir = nss[[3]], inherits = FALSE)
## [1] TRUE
It searches for var()
in a similar manner. Since
var()
was explicitly imported, and all package imports are
stored in the imports::teeny
environment, it finds
var()
in nss[[2]]
;
exists("var", mode = "function", envir = nss[[2]], inherits = FALSE)
## [1] TRUE
This begs the question, isn’t that imported var()
function a copy of the var()
in the stats
package? Yes, it is! Technically, we could, with some hacks, replace the
“imported” var()
with another function after the package
has been loaded.
Now, what would happen if we forget to import
stats::var()
? Let’s retry by dropping:
#' @importFrom stats var
First of all, R CMD check
would detect this and produce
the following NOTE:
* checking R code for possible problems ... NOTE
mean_and_variance: no visible global function definition for ‘var’
Undefined global functions or variables:
var
Consider adding
importFrom("stats", "var")
to your NAMESPACE file.
That’s comforting to know. Note also how it suggests how we might be able to resolve it. That is a useful service.
Next, let’s see what happens if we try to use the function;
::mean_and_variance(1:100) teeny
## mean variance
## 50.5000 841.6667
Hmm, how is that possible? The var()
function does not
exist in the package namespace, not in the “imports” environments, not
in the base namespace, and not in the global
environment. What we didn’t say above is that, if a function cannot be
found even in the global environment, it continues to search the parent
environments of the global environment too. The parent environments of
the global environment are all the environments that
search()
reports;
search()
## [1] ".GlobalEnv" "package:stats" "package:graphics"
## [4] "package:grDevices" "package:utils" "package:datasets"
## [7] "toolbox:default" "package:methods" "Autoloads"
## [10] "package:base"
If the object search for is not found in one of these environments, then it ends up at the very top parent environment, which is always the empty environment, and stops there (with an error).
To see all the environment searched, we can use:
<- getNamespace("teeny")
pkg_ns <- environments::parent_envs(pkg_ns, until = emptyenv())
nss str(nss)
## List of 13
## $ teeny :<environment: namespace:teeny>
## $ imports:teeny :<environment: 0x555840d33fc8>
## ..- attr(*, "name")= chr "imports:teeny"
## $ base :<environment: namespace:base>
## $ R_GlobalEnv :<environment: R_GlobalEnv>
## $ package:stats :<environment: package:stats>
## ..- attr(*, "name")= chr "package:stats"
## ..- attr(*, "path")= chr "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/stats"
## $ package:graphics :<environment: package:graphics>
## ..- attr(*, "name")= chr "package:graphics"
## ..- attr(*, "path")= chr "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/graphics"
## $ package:grDevices:<environment: package:grDevices>
## ..- attr(*, "name")= chr "package:grDevices"
## ..- attr(*, "path")= chr "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/grDevices"
## $ package:utils :<environment: package:utils>
## ..- attr(*, "name")= chr "package:utils"
## ..- attr(*, "path")= chr "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/utils"
## $ package:datasets :<environment: package:datasets>
## ..- attr(*, "name")= chr "package:datasets"
## ..- attr(*, "path")= chr "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/datasets"
## $ package:methods :<environment: package:methods>
## ..- attr(*, "name")= chr "package:methods"
## ..- attr(*, "path")= chr "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/methods"
## $ Autoloads :<environment: 0x55583eee63c0>
## ..- attr(*, "name")= chr "Autoloads"
## $ package:base :<environment: base>
## $ R_EmptyEnv :<environment: R_EmptyEnv>
Thus, the reason why teeny::mean_and_variance(1:100)
works, is that R eventually locates var()
in the
stats package, i.e. in the package:stats
environment. Using the environments package, we can see
where it finds var()
by calling:
::find_object(name = "var", from = teeny::mean_and_variance) environments
## $name
## [1] "var"
##
## $envir
## <environment: package:stats>
## attr(,"name")
## [1] "package:stats"
## attr(,"path")
## [1] "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/stats"
So, why do we even bother to import var()
from the
stats package then? Recall how also the global
environment is part of the search path. This means that we can inject
another, possible false, implementation of var()
by adding
it to the global environment, e.g.
<- function(x) pi
var ::mean_and_variance(1:100) teeny
## mean variance
## 50.500000 3.141593
Whoops! That’s really worrying; the result of the function depends on what happens to be in the global environment of the current R session. Furthermore, the stats package may not even be attached, so we could end up with an error also a fresh R session, e.g.
$ R_DEFAULT_PACKAGES=base R --quiet --vanilla
> search()
1] ".GlobalEnv" "Autoloads" "package:base"
[> teeny::mean_and_variance(1:100)
in var(x) : could not find function "var" Error
This illustrates why it is important to declare all imports,
even those from base-R packages. So, do not ignore those NOTEs
on “no visible global function definition for …” that
R CMD check
reports on.
The parent environment of the base namespace is the global environment;
<- getNamespace("base")
base_ns ::parent_envs(base_ns) environments
## $base
## <environment: namespace:base>
##
## $R_GlobalEnv
## <environment: R_GlobalEnv>
##
## $`package:stats`
## <environment: package:stats>
## attr(,"name")
## [1] "package:stats"
## attr(,"path")
## [1] "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/stats"
##
## $`package:graphics`
## <environment: package:graphics>
## attr(,"name")
## [1] "package:graphics"
## attr(,"path")
## [1] "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/graphics"
##
## $`package:grDevices`
## <environment: package:grDevices>
## attr(,"name")
## [1] "package:grDevices"
## attr(,"path")
## [1] "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/grDevices"
##
## $`package:utils`
## <environment: package:utils>
## attr(,"name")
## [1] "package:utils"
## attr(,"path")
## [1] "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/utils"
##
## $`package:datasets`
## <environment: package:datasets>
## attr(,"name")
## [1] "package:datasets"
## attr(,"path")
## [1] "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/datasets"
##
## $`toolbox:default`
## <environment: 0x555bee0d3868>
## attr(,"name")
## [1] "toolbox:default"
##
## $`package:methods`
## <environment: package:methods>
## attr(,"name")
## [1] "package:methods"
## attr(,"path")
## [1] "/home/hb/shared/software/CBI/R-4.2.1-gcc9/lib/R/library/methods"
##
## $Autoloads
## <environment: 0x555beb56b770>
## attr(,"name")
## [1] "Autoloads"
##
## $`package:base`
## <environment: base>
##
## $R_EmptyEnv
## <environment: R_EmptyEnv>
So, why isn’t it just the empty environment? Then we would avoid the
problem of picking up unwanted objects in the global environment, and
avoid being dependent on what packages happens to be attached at the
time we call our package functions.
The answer has to do with historical reasons. The design of having the
base namespace being followed by the global environment
stems from the old days when R did not have namespaces. At that time,
the R engine searched all attached packages to identify any variable or
function used to evaluate a function. That design had of course similar
problems to the ones illustrated above, e.g. functions could be
overridden by assign new ones in the global environment, or by a
third-party package attached on the search()
path. This is
why the concept of namespaces was introduced in R; as a package
developer you had full control of exactly which implementation of a
function was called.
However, due to how S3 method dispatching worked, and how some
packages attaches packages conditionally of being available,
e.g. if require(otherpkg)) somefcn()
, it was necessary to
keep searching also the global environment and attached packages as a
fallback. These are the main reasons why the base
namespace is still followed by the global environment. It is not
unreasonable to expect that this will change in some future R version.
For instance, in the R-devel thread ‘Time to drop globalenv() from
searches in package code?’ (2022-09-15; https://stat.ethz.ch/pipermail/r-devel/2022-September/081980.html),
one of the R Core developers explains the history and proposes a way
forward to make the empty environment the parent of the
base namespace.
If we look at the output from, for instance,
parent_envs(stats::median)
, we see that there are two
“base” environments. One early on following the “imports” environment
and one at the very end just before the empty environment. These two
environments are:
getNamespace("base")
## <environment: namespace:base>
and
baseenv()
## <environment: base>
respectively. They are very similar, except that their parent
environments differ; getNamespace("base")
is followed by
the global environment, and baseenv()
is followed
by the empty environment;
::parent_env(getNamespace("base")) environments
## <environment: R_GlobalEnv>
::parent_env(baseenv()) environments
## <environment: R_EmptyEnv>
Everything else is the same, e.g.
all.equal(getNamespace("base"), baseenv())
## [1] TRUE
and
identical(getNamespace("base")[["mean"]], baseenv()[["mean"]])
## [1] TRUE
By the way, .BaseNamespaceEnv
is the same as
getNamespace("base")
;
identical(.BaseNamespaceEnv, getNamespace("base"))
## [1] TRUE
As far as I understand it, the reason why there are two “base”
environment is that getNamespace("base")
is used in R as a
workaround in order to be able search for objects as explained in
Appendix ‘Why R searches also the global environment and attached
packages’ above. The design is that a package namespace should have all
exported base objects on the search path
immediately following the package’s imported objects. However,
because of the historical reasons of having to search also the global
environment and the attached packages, we cannot use
baseenv()
for this purpose. If we did, then the search path
would end there, and like breaks a few things as it stands right now.
Because of this, getNamespace("base")
was introduced as an
copy of baseenv()
, but with the parent environment being
the global environment in order to not break the full search
path. If, or rather when, R Core can sort out the last hurdles for
having the package search path end immediately after the “import” and
the base environment, then I imagine there is no longer
a need for the special getNamespace("base")
environment.
If we would add:
importFrom(base, mean)
to the NAMESPACE
file for teeny, the
package would build, but it would not install. We would get an
installation error:
$ R CMD INSTALL teeny_1.0.tar.gz
* installing to library ‘/path/to/R/x86_64-pc-linux-gnu-library/4.2’
* installing *source* package ‘teeny’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
Error in asNamespace(ns, base.OK = FALSE) :
operation not allowed on base namespace
Calls: <Anonymous> ... namespaceImportFrom -> importIntoEnv -> getNamespaceInfo -> asNamespace
ERROR: lazy loading failed for package ‘teeny’
* removing ‘/home/hb/R/x86_64-pc-linux-gnu-library/4.2-CBI-gcc9/teeny’
* restoring previous ‘/home/hb/R/x86_64-pc-linux-gnu-library/4.2-CBI-gcc9/teeny’