-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Needs
Syncing archive nodes from scratch takes a lot of time. Also, we sometimes have sync issues like the incoming network slots issue before or the paritydb issue now. We already have the --sync warp feature, but it can't be used for archive nodes and it doesn't help to resolve all sync issues (like network issues, disaster recovery when a network doesn't work, etc.).
At the same time, networks need archive nodes. It means that users need the possibility to spin up their nodes in a reasonable time. It's why binary db snapshots are still very important for now.
How it works now
rocksdb uses a lot of files that have small sizes but paritydb uses smaller amounts of really big files. For the archive paritydb nodes, they can be table files that are more than 160-200 GB.
Big files lead to some issues:
- it is not a trivial task to move really big files to or from the backup storage. any network interruptions can increase the number of retries. any retries with 200Gb files take a lot of time and expensive traffic
- all files are changed very often, it doesn't allow us to use diff backups based on the modification time of separate files
- any CDN cache services don't allow cache files with sizes more than 500-1000MB. it makes providing public snapshots more expensive
These things lead to the use of some additional backup systems that can split files into chunks, have their own DB of chunks, and sync them based on checksum. It takes a lot of additional CPU time. Also, it makes using snapshots more difficult for users.
The request
Is it possible to add a new CLI flag that allows restricting the size of table paritydb files for newly synchronized nodes?