Title: | Simply Caching R Objects |
---|---|
Description: | Provides intuitive functions for caching R objects, encouraging reproducible, restartable, and distributed R analysis. The user selects a location to store caches, and then provides nothing more than a cache name and instructions (R code) for how to produce the R object. Also provides some advanced options like environment assignments, recreating or reloading caches, and cluster compute bindings (using the 'batchtools' package) making it flexible enough for use in large-scale data analysis projects. |
Authors: | VP Nagraj [aut], Nathan Sheffield [aut, cre] |
Maintainer: | Nathan Sheffield <[email protected]> |
License: | BSD_2_clause + file LICENSE |
Version: | 0.4.2 |
Built: | 2024-10-13 06:35:33 UTC |
Source: | https://github.com/databio/simplecache |
Provides intuitive functions for caching R objects, encouraging reproducible, restartable, and distributed R analysis. The user selects a location to store caches, and then provides nothing more than a cache name and instructions (R code) for how to produce the R object. Also provides some advanced options like environment assignments, recreating or reloading caches, and cluster compute bindings (using the 'batchtools' package) making it flexible enough for use in large-scale data analysis projects.
Nathan Sheffield
https://github.com/databio/simpleCache
Useful links:
.tooOld
accepts a maximum cache age and checks for an option with
that setting under MAX.CACHE.AGE
if such an argument isn't passed.
If the indicated file exists and is older than the threshold passed or
set as an option, the file is deemed "stale." If an age threshold is
provided, no check for an option is performed. If the file does not
exist or there's not an age threshold directly passed or set as an option,
the result is FALSE
.
.tooOld(pathCacheFile, lifespan = NULL)
.tooOld(pathCacheFile, lifespan = NULL)
pathCacheFile |
Path to file to ask about staleness. |
lifespan |
Maximum file age before it's "stale." |
TRUE
if the file exists and its age exceeds
lifespan
if given or
getOption("MAX.CACHE.AGE")
if no age threshold is passed
and that option exists; FALSE
otherwise.
Append a new Environment name (a character string) to a global option which is a vector of such names. SimpleCache will search all of these environments to check if a cache is previously loaded, before reloading it.
addCacheSearchEnvironment(addEnv)
addCacheSearchEnvironment(addEnv)
addEnv |
Environment to append to the shared cache search list |
Given a cache name, this function will attempt to delete the cache of that name on disk.
deleteCaches(cacheNames, cacheDir = getCacheDir(), force = FALSE)
deleteCaches(cacheNames, cacheDir = getCacheDir(), force = FALSE)
cacheNames |
Name(s) of the cache to delete |
cacheDir |
Directory where caches are kept |
force |
Force deletion without user prompt |
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
getCacheDir
retrieves the value of the option that stores the currently
set cache directory path.
getCacheDir()
getCacheDir()
If the option is set, the path to the currently set cache directory; otherwise, NULL
.
Lists any cache files in the cache directory.
listCaches(cacheSubDir = "")
listCaches(cacheSubDir = "")
cacheSubDir |
Optional parameter to specify a subdirectory of the cache folder. |
character
vector in which each element is the path to a file that
represents an available cache (within getOption("RCACHE.DIR")
)
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
This function just takes a list of caches, and loads them. It's designed for stuff you already cached previously, so it won't build any caches.
loadCaches(cacheNames, loadEnvir = NULL, ...)
loadCaches(cacheNames, loadEnvir = NULL, ...)
cacheNames |
Vector of caches to load. |
loadEnvir |
Environment into which to load each cache. |
... |
Additional parameters passed to simpleCache. |
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
NULL
.Sets global option of cache search environments to NULL
.
resetCacheSearchEnvironment()
resetCacheSearchEnvironment()
This function takes a time in seconds and converts it to a more human-readable format, showing hours, minutes, or seconds, depending on how long the time is. Used by my implementation of tic()/toc().
secToTime(timeInSec)
secToTime(timeInSec)
timeInSec |
numeric value of time measured in seconds. |
Sets local cache build directory with scripts for building files.
setCacheBuildDir(cacheBuildDir = NULL)
setCacheBuildDir(cacheBuildDir = NULL)
cacheBuildDir |
Directory where build scripts are stored. |
simpleCache
calls.Sets a global variable specifying the default cache directory for
simpleCache
calls.
setCacheDir(cacheDir = NULL)
setCacheDir(cacheDir = NULL)
cacheDir |
Directory where caches should be stored |
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
Given a unique name for an R object, and instructions for how to make that object, use the simpleCache function to create and cache or load the object. This should be used for computations that take a long time and generate a table or something used repeatedly (in other scripts, for example). Because the cache is tied to the object name, there is some danger of causing troubles if you misuse the caching system. The object should be considered static.
simpleCache( cacheName, instruction = NULL, buildEnvir = NULL, reload = FALSE, recreate = FALSE, noload = FALSE, cacheDir = getCacheDir(), cacheSubDir = NULL, timer = FALSE, buildDir = getOption("RBUILD.DIR"), assignToVariable = NULL, loadEnvir = parent.frame(), searchEnvir = getOption("SIMPLECACHE.ENV"), nofail = FALSE, batchRegistry = NULL, batchResources = NULL, pepSettings = NULL, ignoreLock = FALSE, lifespan = NULL )
simpleCache( cacheName, instruction = NULL, buildEnvir = NULL, reload = FALSE, recreate = FALSE, noload = FALSE, cacheDir = getCacheDir(), cacheSubDir = NULL, timer = FALSE, buildDir = getOption("RBUILD.DIR"), assignToVariable = NULL, loadEnvir = parent.frame(), searchEnvir = getOption("SIMPLECACHE.ENV"), nofail = FALSE, batchRegistry = NULL, batchResources = NULL, pepSettings = NULL, ignoreLock = FALSE, lifespan = NULL )
cacheName |
A character vector for a unique name for the cache. Be careful. |
instruction |
R expression (in braces) to be evaluated. The returned value of this code is what will be cached under the cacheName. |
buildEnvir |
An environment (or list) providing additional variables necessary for evaluating the code in instruction. |
reload |
Logical indicating whether to force re-loading the cache, even if it exists in the env. |
recreate |
Logical indicating whether to force reconstruction of the cache |
noload |
Logical indicating whether to create but not load the cache. noload is useful for: you want to create the caches, but not load (like a cache creation loop). |
cacheDir |
Character vector specifying the directory where caches are
saved (and loaded from). Defaults to the variable set by
|
cacheSubDir |
Character vector specifying a subdirectory within the
|
timer |
Logical indicating whether to report how long it took to create the cache. |
buildDir |
Location of Build files (files with instructions for use If the instructions argument is not provided). Defaults to RBUILD.DIR global option. |
assignToVariable |
Character vector for a variable name to load the
cache into. By default, |
loadEnvir |
An environment. Into which environment would you like to
load the variable? Defaults to |
searchEnvir |
a vector of environments to search for the already loaded cache. |
nofail |
By default, simpleCache throws an error if the instructions
fail. Use this option to convert this error into a warning. No cache will
be created, but simpleCache will not then hard-stop your processing. This
is useful, for example, if you are creating a bunch of caches (for
example using |
batchRegistry |
A |
batchResources |
A list of variables to provide to batchtools for
cluster resource managers. Used as the |
pepSettings |
Experimental untested feature. |
ignoreLock |
Internal parameter used for batch job submission; don't touch. |
lifespan |
Numeric specifying the maximum age of cache, in days, to
allow before automatically triggering |
You should pass a bracketed R code snippet like rnorm(500)
as the
instruction, and simpleCache will create the object. Alternatively, if the
code to create the cache is large, you can put an R script called object.R in
the RBUILD.DIR
(the name of the file *must* match the name of the object it
creates *exactly*). If you don't provide an instruction, the function sources
RBUILD.DIR/object.R and caches the result as the object. This source file
*must* create an object with the same name of the object. If you already have
an object with the name of the object to load in your current environment,
this function will not try to reload the object; instead, it returns the
local object. In essence, it assumes that this is a static object, which you
will not change. You can force it to load the cached version instead with
"reload".
Because R uses lexical scope and not dynamic scope, you may need to pass some environment variables you use in your instruction code. You can use this using the parameter buildEnvir (just provide a list of named variables).
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
Helper alias for loading caches into the global environment. simpleCache normally loads variables into the calling environment; this ensures that the variables are loaded in the global environment.
simpleCacheGlobal(...)
simpleCacheGlobal(...)
... |
Parameters passed to |
Views simpleCache global variables
simpleCacheOptions()
simpleCacheOptions()
Sometimes you use significant computational power to create an object, but
you didn't cache it with simpleCache
. Oops, maybe you wish you had, after the
fact. This function lets you store an object in the environment so it could
be loaded by future calls to simpleCache
.
storeCache( cacheName, cacheDir = getCacheDir(), cacheSubDir = NULL, recreate = FALSE )
storeCache( cacheName, cacheDir = getCacheDir(), cacheSubDir = NULL, recreate = FALSE )
cacheName |
Unique name for the cache (and R object to be cached). |
cacheDir |
The directory where caches are saved (and loaded from).
Defaults to the global |
cacheSubDir |
You can specify a subdirectory within the cacheDir
variable. Defaults to |
recreate |
Forces reconstruction of the cache |
This can be used in interactive sessions, but could also be used for another
use case: you have a complicated set of instructions (too much to pass as the
instruction argument to simpleCache
), so you could just stick a call to
storeCache
at the end.
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
# choose location to store caches cacheDir = tempdir() cacheDir setCacheDir(cacheDir) # build some caches simpleCache("normSample", { rnorm(5e3, 0,1) }, recreate=TRUE, timer=TRUE) simpleCache("normSample", { rnorm(5e3, 0,1) }) simpleCache("normSample", { rnorm(5e3, 0,1) }, reload=TRUE) # storing a cache after-the-fact normSample2 = rnorm(10, 0, 1) storeCache("normSample2") # what's available? listCaches() # load a cache simpleCache("normSample") # load multiples caches loadCaches(c("normSample", "normSample2"), reload=TRUE)
Start a timer
tic(gcFirst = TRUE, type = c("elapsed", "user.self", "sys.self"))
tic(gcFirst = TRUE, type = c("elapsed", "user.self", "sys.self"))
gcFirst |
Garbage Collect before starting the timer? |
type |
Type of time to return, can be 'elapsed', 'user.self', or 'sys.self' |
Check the time since the current timer was started with tic()
toc()
toc()