Skip to content

Conversation

@pieh
Copy link
Contributor

@pieh pieh commented Dec 13, 2020

Description

Approach in #28351 turned out to be unfeasble - we do need entire page object actually to preserve backward compatiblity for DELETE_PAGE action.

To handle potential OOM issues when persisting state, I expanded state sharding during persistance from just nodes to both nodes and now persisted pages. Right now it's copied & pasted sharding done for nodes - I could have DRY it (I still can if reviewers will request it :) ), but didn't think DRYing it is worth adding mental overhead for added abstractions - thoughts?

This PR build on work already done by @redabacha (in #28316) borrowing test change (unskip) already done (in #28351).

TODO:

  • garbage collect stateful pages during bootstrap (see comment)

Related Issues

fixes #28281
fixes #26520
[ch19791]

@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Dec 13, 2020
@pieh pieh added topic: query invalidation* and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Dec 13, 2020
@pieh pieh force-pushed the fix/no-rebuild-bug branch from f6c1a86 to aa421fe Compare December 13, 2020 10:59
Comment on lines 186 to 233
const nodeShardsScenarios = [
{
numberOfNodes: 50,
simulatedNodeObjectSize: 5 * 1024 * 1024,
expectedNumberOfNodeShards: 1,
},
{
numberOfNodes: 5,
simulatedNodeObjectSize: 0.6 * 1024 * 1024 * 1024,
expectedNumberOfNodeShards: 3,
},
]
const pageShardsScenarios = [
{
numberOfPages: 50,
simulatedPageObjectSize: 10 * 1024 * 1024,
expectedNumberOfPageShards: 1,
},
{
numberOfPages: 5,
simulatedPageObjectSize: 0.9 * 1024 * 1024 * 1024,
expectedNumberOfPageShards: 5,
},
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to construct scenario matrix - and we check each combination. I didn't go over the board - primary purpose of this is that sharding for pages and nodes is separate and that one doesn't affect the other.

Something to think about is wether we want to add some edge cases here - like when number of nodes or pages is equal 0 - for now implementation for those is that we discard persisted state when either is 0 - it does make a whole lot of sense for nodes as core creates some nodes of their own even if user doesn't use any source plugins. It's a bit different for pages - gatsby doesn't create pages (well except for gatsby develop creating dev-404-page ...). I am not sure if it's worth handling this case more gracefully? Do we want to optimize for scenarios where there are 0 pages?


const pages: Array<[string, IGatsbyPage]> = [].concat(...pagesChunks)

if (!pagesChunks.length) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially problematic - this will discard state if there are 0 pages - do we want to optimize for those cases? (I don't even know if gatsby actually "works" if there are 0 pages TBH)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When way to go about this is to add pagesCount to redux.rest part and compare it I guess? Cache will get invalidated anyway after upgrade due to pagesChunk.length being 0 so I think cache shape "version" could be handled gracefully

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there were no pages then it's probably no big deal to bust the cache anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per off-thread discussion. I think you can merge this PR in a patch without this check, if you wanted to and then add the check in a minor bump later.

@pieh pieh marked this pull request as ready for review December 14, 2020 09:04

const pages: Array<[string, IGatsbyPage]> = [].concat(...pagesChunks)

if (!pagesChunks.length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per off-thread discussion. I think you can merge this PR in a patch without this check, if you wanted to and then add the check in a minor bump later.

@pieh
Copy link
Contributor Author

pieh commented Dec 20, 2020

Additional note on this - this needs additional handling done for createPagesStatefully: createPages does "garbage collect" and it works as-is with this code, but createPagesStatefully doesn't have garbage collection so we will never clean those pages up. This means that if user do gatsby build -> remove page from src/pages/ -> gatsby build, with this code as-is we will crash with (currently):

 ERROR #85913  GRAPHQL

There was a problem reading the file: /Users/misiek/dev/gatsby-starter-blog/src/pages/multiple-exports.js

File: src/pages/multiple-exports.js



  Error: ENOENT: no such file or directory, open '/Users/misiek/dev/gatsby-starter-blog/src/pages/multiple-exports.js'

failed extract queries from components - 0.324s

TODO item: garbage collect stateful pages in bootstrap

@pieh
Copy link
Contributor Author

pieh commented Feb 10, 2021

If merged, we should revert #29431

@pieh pieh force-pushed the fix/no-rebuild-bug branch from eb6689b to 619041c Compare April 13, 2021 09:32
@pieh pieh changed the title fix(gatsby): invalidate queries if page context changes between runs fix(gatsby): persist pages between runs Apr 13, 2021
@pieh pieh marked this pull request as ready for review April 14, 2021 11:33
LekoArts
LekoArts previously approved these changes Apr 21, 2021
Copy link
Contributor

@LekoArts LekoArts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (maxSize > 500000) {
if (showMaxSizeWarning && maxSize > 500000) {
report.warn(
`The size of at least one page context chunk exceeded 500kb, which could lead to degraded performance. Consider putting less data in the page context.`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've talked about adding information about what page exceeds this limit, but that's for a follow-up (not in this PR)

* add stateful page to artifacts tests

* fix(gatsby): garbage collect stateful pages

* Revert "fix: clear tracked queries when deleting stale page-data files (#29431)" (#30848)

This reverts commit 478cf68.
Copy link
Contributor

@LekoArts LekoArts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌮

@pieh pieh merged commit a0b31bc into master Apr 22, 2021
@pieh pieh deleted the fix/no-rebuild-bug branch April 22, 2021 08:48
currentPages.forEach(page => {
if (
!page.isCreatedByStatefulCreatePages &&
(shouldRunCreatePagesStatefully ||
Copy link

@pierluigizagaria pierluigizagaria Apr 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for commenting this issue but I'm trying to understand how the createPagesStatefully works as no detailed docs are provided.
What I understood is that GC is always done on pages and we need to call createPage in every incremental build.
As the name suggests, createPagesStatefully should prevent that, giving the page the "statefully" meaning that you have to manage the state of the page.

I've tried using this api on my project but GC is always done. Pages built by calling createPage in createPagesStatefully get deleted in incrememental builds if I don't regulary recreate them in createPagesStatefully.

After looking at the code i figured out that shouldRunCreatePagesStatefully is always true in "gatsby build" so even if !page.isCreatedByStatefulCreatePages returns false the OR condition allows this code to always delete pages if not updated.

Is this correct? What is the real purpose of createPagesStatefully then?

Thank you for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache is not invalidated when programatically build pages and the data is changed createPages uses stale cached data for "previous" and "next" pages

5 participants