customizable cache (closes #2176) by atusy · Pull Request #2340 · yihui/knitr

atusy · 2024-04-26T15:19:43Z

This PR allows implementing knit_cache_hook methods which may preprocess objects (e.g., save to an external file) and define custom loaders.

I will add a NEWS item after we agree with the design.

refactor(cache): use saveRDS/readRDS instead of makeLazyLoadDB/lazyload
- For migration, cache_save replaces rdb/rdx files with rds file
- For backward compatibility, cache_load() attempts lazyload() if rdb/rdx files are available
~~feat(cache): allow pre/postprocessing cache objects)~~
- knit_cache_preprocess preprocesses objects being saved
- knit_cache_postprocess postprocesses objects being loaded
feat!(cache): implement knit_cache_hook instead of pre/post-processors
- Call knit_cache_hook methods on saving cache
  - Methods may save extra files under ${cache_path(h)}__extra directory
  - Methods may return custom loader functions which is saved to ${cache_path(h).rds}

With this PR, we can add some hooks on objects to be cached.
For example, we can use writeLines to save character objects.

```{r}
library(knitr)
registerS3method(
  "knit_cache_hook",
  "character",
  function(x, nm, path) {
    # Cache x as is if it extends character class
    if (!identical(class(x), "character")) {
      return(x)
    }

    # Preprocess data (e.g., save data to an external file)
    # Create external files under the directory of `paste0(path, "__extra")`
    # if knitr should cleanup them on refreshing/cleaning cache
    d <- paste0(path, "__extra")
    dir.create(d, showWarnings = FALSE, recursive = TRUE)
    f <- file.path(d, paste0(nm, '.txt'))
    writeLines(x, f)

    # Return loader function
    # which receives ellipsis for future extentions and has knit_cache_loader class
    structure(function(...) readLines(f), class = 'knit_cache_loader')
  },
  envir = asNamespace("knitr")
)
```

```{r, cache=TRUE}
x <- 'foo bar'
print(x)
```

```{r}
print(x)
```

atusy · 2024-04-26T15:25:00Z

maybe preprocess and postprocess are not good names... 🤔

atusy · 2024-04-30T01:16:12Z

R/cache.R

  cache_purge = function(hash) {
-    for (h in hash) unlink(paste(cache_path(h), c('rdb', 'rdx', 'RData'), sep = '.'))
+    for (h in hash) unlink(paste(cache_path(h), c('rds', 'rdb', 'rdx', 'RData'), sep = '.'))
  }


cache_purge() and clean_cache() take into account of limited file types/names.
There may be some cases where knitr should remove more files.

The example in the description saved an extra file as a part of cache, which will not be removed by knitr.

registerS3method("knit_cache_preprocess", "data.frame", function(x) { write.csv(x, "cache.csv") # NOTE: this file is not removed by `cache$purge()` or `clean_cache()` structure("cache.csv", class = "knit_cache_csv") }, envir = asNamespace("knitr"))

atusy · 2024-04-30T01:49:02Z

R/cache.R

+knit_cache_postprocess = function(x, ...) UseMethod('knit_cache_postprocess')
+knit_cache_postprocess.default = function(x, ...) x


Postprocess is skipped if a package is not loaded.

atusy · 2024-04-30T03:21:40Z

To solve the above problems, I implemented the knit_cache_hook generic function in place of knit_cache_preprocess and knit_cache_postprocess. See updated description for the details.

yihui

I feel this implementation is too complicated, and I'd like to propose a different way: in the latest version of xfun, I added two functions lazy_save() and lazy_load() as a different and simple implementation of base R's lazyLoad() and tools:::makeLazyLoadDB(). By default, xfun::lazy_save()/lazy_load() use the rds format, ~~which should solve the problems #2176 and #2339 (I've only briefly tested #2176).~~

For backward compatibility, we can first test if *.rdb/*.rdx exist. If they do, we use the old approach (base R), otherwise we switch to xfun's lazy loading.

What do you think?

Correction: I don't remember how I tested #2176 now. I was expecting that this would just work:

```{r}
library(terra)
```

```{r, cache=TRUE}
r = rast(matrix(1:12,3,4))
```

```{r}
r
```

but it doesn't (even if we save r to *.rds). The object r will fail to load in a new R session, and we still have to do wrap()/unwrap():

```{r}
library(terra)
```

```{r, cache=TRUE}
r = rast(matrix(1:12,3,4))
p = wrap(r)
```

```{r}
unwrap(p)
```

That said, I had this issue in mind when designing the new cache system for litedown. With litedown, it's possible to customize the read/write methods for cache via the chunk option cache.rw, e.g.,

---
title: "Caching terra objects with litedown"
knit: litedown:::knit
---

```{r}
library(terra)
rw_terra = list(
  name = 'terra',
  save = function(x, file) {
    if (inherits(x, 'SpatRaster')) x = wrap(x)
    saveRDS(x, file)
  },
  load = function(...) {
    x = readRDS(...)
    if (inherits(x, 'PackedSpatRaster')) x = unwrap(x)
    x
  }
)
```

```{r, cache=TRUE, cache.rw=rw_terra}
r = rast(matrix(1:12,3,4))
```

```{r}
r
```

atusy · 2024-08-27T01:59:41Z

Thanks for the comment.
I do not have a strong opinion, but let me leave some comments below.

I accepted the complexity for following reasons:

the feature is mainly for package developers and not for end users
usage is limited (I guess)

With my implementation, user's do not have to care about what is going on under saving/loading caches.

For developers, I agree chunk option is a good idea.
The implementation becomes simple.
However, this imposes end-users to understand tricks for edge-cases.
Can we expect end-users read documents carefully before facing troubles on cache behavior?

yihui · 2024-08-27T02:05:59Z

Good points, and I agree. Let me think more about it. Thanks!

atusy linked an issue Apr 26, 2024 that may be closed by this pull request

Caching reference objects #2176

Open

3 tasks

atusy mentioned this pull request Apr 26, 2024

Caching reference objects #2176

Open

3 tasks

This comment was marked as resolved.

Sign in to view

atusy marked this pull request as draft April 26, 2024 15:28

atusy force-pushed the cache-hook branch 3 times, most recently from 259fdd5 to ccc14d7 Compare April 27, 2024 15:20

cderv assigned yihui Apr 29, 2024

atusy added 2 commits April 29, 2024 22:34

refactor(cache): use saveRDS/readRDS instead of makeLazyLoadDB/lazyload

f2d52f7

feat(cache): allow pre/postprocessing cache objects

94cca33

atusy force-pushed the cache-hook branch from ccc14d7 to 94cca33 Compare April 29, 2024 13:34

atusy changed the title ~~customizible cache (closes #2176)~~ customizable cache (closes #2176) Apr 30, 2024

atusy commented Apr 30, 2024

View reviewed changes

atusy marked this pull request as ready for review April 30, 2024 03:22

atusy force-pushed the cache-hook branch from da48583 to 229b70b Compare April 30, 2024 04:26

atusy added 2 commits April 30, 2024 13:36

feat!(cache): implement knit_cache_hook instead of pre/post-processors

4dddaba

chore(docs): build docs

78435c3

atusy force-pushed the cache-hook branch from 229b70b to 78435c3 Compare April 30, 2024 04:36

cderv requested a review from yihui August 26, 2024 10:10

cderv mentioned this pull request Aug 26, 2024

Problem caching instances of torch modules and datasets #2339

Open

yihui reviewed Aug 26, 2024

View reviewed changes

gavril0 mentioned this pull request Oct 17, 2024

Errors when caching torch objects in rmarkdown mlverse/torch#1199

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

customizable cache (closes #2176)#2340

customizable cache (closes #2176)#2340
atusy wants to merge 4 commits intoyihui:masterfrom
atusy:cache-hook

atusy commented Apr 26, 2024 •

edited

Loading

Uh oh!

atusy commented Apr 26, 2024 •

edited

Loading

Uh oh!

This comment was marked as resolved.

atusy Apr 30, 2024 •

edited

Loading

Uh oh!

atusy Apr 30, 2024

Uh oh!

atusy commented Apr 30, 2024

Uh oh!

yihui left a comment •

edited

Loading

Uh oh!

atusy commented Aug 27, 2024

Uh oh!

yihui commented Aug 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		knit_cache_postprocess = function(x, ...) UseMethod('knit_cache_postprocess')
		knit_cache_postprocess.default = function(x, ...) x

Uh oh!

Conversation

atusy commented Apr 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atusy commented Apr 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

atusy Apr 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atusy Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

atusy commented Apr 30, 2024

Uh oh!

yihui left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atusy commented Aug 27, 2024

Uh oh!

yihui commented Aug 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

atusy commented Apr 26, 2024 •

edited

Loading

atusy commented Apr 26, 2024 •

edited

Loading

atusy Apr 30, 2024 •

edited

Loading

yihui left a comment •

edited

Loading