feat(fdb): add FoundationDB forkless metadata backend#844
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental 'FORKLESS' metadata backend using FoundationDB, enabling metadata persistence without forking the master process. The changes include a batched metadata writer, a bootstrapper for migrating existing data, and signal hooks for tracking node and xattr updates. Feedback identifies a critical configuration key mismatch that prevents backend activation, significant code duplication in the initialization logic, and potential performance issues due to heap allocations in xattr signals. Additionally, placeholder methods should use consistent logging levels to avoid misleading error reports.
There was a problem hiding this comment.
Pull request overview
Adds an experimental FoundationDB-backed, forkless metadata backend to persist master metadata in a KV store (instead of metadata.sfs), enabling synchronous dumps without forking and introducing bootstrapping from the legacy file when needed.
Changes:
- Introduce
MetadataBackendForklesswith FDB transactions + periodic/batched persistence viaMetadataWriterFDB. - Add an FDB bootstrap loader for several metadata sections (NODE/EDGE/FREE/XATR/CHNK) from
metadata.sfs. - Update master init/config and test tooling to support selecting and running the forkless/FDB-backed mode.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/tools/saunafs.sh | Extends test harness to handle FORKLESS backend setup and FDB cluster configuration. |
| tests/tools/foundationdb.sh | Improves FDB test cluster startup variable scoping and disowns background fdbmonitor. |
| src/master/metadata_writer_fdb.h | Defines update-event types and the batched FDB metadata writer API. |
| src/master/metadata_writer_fdb.cc | Implements event serialization and batched flush/commit logic to the KV store. |
| src/master/metadata_section_bootstrap_fdb.h | Declares bootstrapper that loads missing FDB sections from metadata.sfs. |
| src/master/metadata_section_bootstrap_fdb.cc | Implements bootstrapping for NODE/EDGE/FREE/XATR/CHNK sections into FDB. |
| src/master/metadata_backend_interface.h | Extends backend interface with getHeaderSignature() for signature validation. |
| src/master/metadata_backend_forkless.h | Declares new forkless backend, loader/saver, and signal→event wiring. |
| src/master/metadata_backend_forkless.cc | Implements forkless backend lifecycle, loading, saving, flushing, and signal hookups. |
| src/master/metadata_backend_file.h | Adds stub getHeaderSignature() for file backend implementation. |
| src/master/matoclserv.cc | Adjusts admin “save metadata” reply behavior to handle synchronous forkless dumps. |
| src/master/masterconn.cc | Skips downloading metadata.sfs when using the forkless backend and continues with changelogs. |
| src/master/kv_common_keys.h | Adds META_HEADER key constant and documents META_NEXT_CHUNK_ID. |
| src/master/init.h | Adds config-driven selection between FILE and FORKLESS metadata backends. |
| src/master/filesystem_xattr.h | Introduces xattr change/removal signals for KV-backed persistence. |
| src/master/filesystem_xattr.cc | Emits xattr change/removal signals on updates/removals. |
| src/master/filesystem_checksum.cc | Emits nodeChangedSignal from checksum updates to drive KV persistence. |
| src/master/chunks.h | Exposes chunk_get_next_id() for persisting restore-critical next-chunk-id key. |
| src/master/chunks.cc | Implements chunk_get_next_id() wrapper. |
| src/master/CMakeLists.txt | Links fdb into master library when ENABLE_FOUNDATIONDB is enabled. |
| src/data/sfsmaster.cfg.in | Documents FDB_CLUSTER_FILE and new metadata backend selection option. |
| while (true) { | ||
| inode_t inode{}; | ||
| getINode(&ptr, inode); | ||
| uint8_t attributeNameLength = get8bit(&ptr); | ||
|
|
||
| uint32_t attributeValueLength{}; | ||
| get32bit(&ptr, attributeValueLength); | ||
| marker->offset = metadataFile_->offset(ptr); | ||
|
|
There was a problem hiding this comment.
loadXAttrSection() reads inode/name/value lengths in a while(true) loop without bounding reads by the XATR section end. If metadata.sfs is truncated or corrupt, this can advance ptr past the section (or even past the mapped file) and crash. Please add a sectionEnd calculation (marker->offset + marker->length) and validate that ptr + attributeNameLength + attributeValueLength stays within bounds before constructing vectors/advancing ptr.
There was a problem hiding this comment.
Very nice suggestion 👍
I will consider to apply it for the rest of the sections.
| // All nodes changes call fsnodes_update_checksum(), so triggering nodeChangedSignal here | ||
| // ensures that any node change is tracked, which is necessary for keeping FDB nodes's state up | ||
| // to date and consistent with the in-memory state of the nodes. | ||
| gMetadata->nodeChangedSignal.emit(node); |
There was a problem hiding this comment.
Emitting nodeChangedSignal from fsnodes_update_checksum() changes semantics globally: this function is called very frequently (including inside transactions and checksum recalculation paths), and existing code deliberately suppresses nodeChangedSignal in some transactional contexts. This will likely cause excessive/duplicated persistence events for the forkless backend and may break ordering/transaction assumptions. Consider keeping signal emission at the existing mutation points (or adding explicit hooks where needed) rather than unconditionally emitting here.
There was a problem hiding this comment.
I agree, this is a pending suggestion I need to address.
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental "FORKLESS" metadata backend that stores filesystem metadata in FoundationDB, aiming to eliminate the need for forking the master process during metadata dumps. Key additions include the MetadataBackendForkless implementation, a bootstrapping tool for migrating data from metadata.sfs, and a batched MetadataWriterFDB for handling updates. Review feedback identifies critical issues regarding FoundationDB's 5-second transaction limit during full range scans, potential logic conflicts between backend initialization and bootstrapping, and several opportunities for performance optimization, such as using std::deque for the update queue and read-only transactions where appropriate.
This commit introduces a new FDB-based backend to progressively migrate metadata storage from file-based dumps to FoundationDB. This lays the foundation for eliminating the fork-based dump process that currently blocks multi-threading improvements in the master server. Key changes: - Add MetadataBackendForkless class implementing IMetadataBackend. - Add chunk metadata persistence to FDB via gChunkChangedSignal. Current limitations: - Only CHNK 1.0 section is migrated to FDB. - No batching mechanism for chunk updates yet. Migration strategy: Filesystem metadata sections will be migrated incrementally until all metadata resides in FDB. Co-authored-by: guillex <guillex@leil.io> Signed-off-by: Crash <crash@leil.io>
This commit completes the migration of chunk metadata persistence
to FDB forkless backend.
Key changes:
- Move addFilesToChunks() to MetadataBackendForkless::loadall().
- Move checksum calculation to MetadataBackendForkless::loadall().
- Remove the fdbmonitor background process from the job control table
to prevent the wait command from blocking during tests.
Test configuration:
Setup the testing framework to recognize ${metadata_backend}="FORKLESS"
to enable forkless backend in tests and start one FDB cluster.
Co-authored-by: guillex <guillex@leil.io>
Signed-off-by: Crash <crash@leil.io>
This commit implements MetadataWriterFDB class to batch and flush metadata updates to FoundationDB, improving write performance and reducing transaction overhead. Key changes: - Add MetadataWriterFDB class with event queue for batched writes. - Implement IMetadataUpdateEvent interface for different metadata update types. - Add ChunkUpdateEvent for handling chunk metadata updates. - Register periodic flush callback (1s interval) using event loop. - Replace direct FDB writes in chunk signal handler with enqueue of ChunkUpdateEvent operations. The writer accumulates updates in memory and flushes them periodically or on-demand, preserving changelog ordering while reducing the number of FDB transactions performed to persist the filesystem metadata. Co-authored-by: guillex <guillex@leil.io> Signed-off-by: Crash <crash@leil.io>
This commit implements metadata header signature validation and initialization for the FDB forkless backend to ensure compatibility with different metadata versions. Key changes: - Add META_HEADER key constant for storing header signatures in FDB. - Extend IMetadataBackend interface with getHeaderSignature() function and implement getHeaderSignature() in both file and forkless backends. - Add checkMetadataSignature() function to validate metadata headers in forkless backend. - Add initializeNewMetadataHeader() method to set proper metadata header signature when new empty metadata is detected. - Use signal-based approach to trigger metadata header initialization. This change enables the FDB backend to properly validate and initialize metadata headers. Co-authored-by: guillex <guillex@leil.io> Signed-off-by: Crash <crash@leil.io>
Ensure consistent chunk ID allocation and restore behavior in the FoundationDB backend by persisting the next chunk ID and improving metadata flush guarantees. Key changes: - Persist META_NEXT_CHUNK_ID to track the next chunk ID across restarts - Restore chunk ID generator state during metadata load - Ensure full flush of pending updates before saving restore-critical keys - Introduce flushAll() to guarantee complete persistence when required - Batch metadata writer flushes to avoid long blocking operations - Move chunk update signal registration to init() to cover changelog replay Additional updates: - Add configurable FDB_CLUSTER_FILE path to sfsmaster.cfg - Minor cleanup and logging improvements - Use safer key encoding helpers for chunk metadata - Fix variable scoping in FoundationDB test helper script Signed-off-by: Crash <crash@leil.io>
This commit introduces a metadata bootstrap mechanism for the forkless backend that initializes required metadata sections directly from metadata.sfs when they are missing. This change was motivated by upgrade-related integration tests (e.g. overwrite-file and undergoal-chunks scenarios from Longs suite) that start from legacy SaunaFS versions without an FDB backend, create files and chunks, and then upgrade to a version using an FDB backend. In those cases, the legacy versions do not initialize any backend state, causing chunk metadata to be missing after the upgrade and leading to test failures. A new MetadataSectionBootstrapFDB component scans the metadata file during startup, detects which sections require bootstrapping, and populates FDB accordingly. The initial implementation bootstraps the CHNK 1.0 section, restoring chunk state into the backend. Bootstrapping runs only on the master node and is skipped once the required keys already exist in FDB. If bootstrapping succeeds, metadata loading proceeds even when the on-disk metadata signature is not yet valid, allowing seamless upgrades from legacy versions. Signed-off-by: Crash <crash@leil.io>
This commit adds support for the Nodes metadata section to the FDB metadata backend. This change introduces persistence of filesystem nodes in FDB, including: - NODE_ keys storing the latest serialized FSNode state. - Bootstrap logic to initialize the Nodes section when missing. Co-authored-by: guillex <guillex@leil.io> Signed-off-by: Crash <crash@leil.io>
This commit adds support for the Free metadata section in the FoundationDB metadata backend. This change introduces persistence of the inode free list into FDB using FREE_<inode> keys, where the value stores the associated timestamp. Key changes: - Introduce loadFree() function to load FREE entries from FDB into inodePool during startup. - Signal integration with inodePool to persist free list changes in real time via FreeNodeUpdateEvent. - FREE section bootstrap support to import entries from metadata.sfs file when initializing a new FDB backend. - FreeNodeUpdateEvent implementation for adding and removing FREE_<inode> keys. FREE entries are written when an inode is freed and removed when it is allocated, keeping the FDB state consistent with the in-memory inode free list. Co-authored-by: guillex <guillex@leil.io> Signed-off-by: Crash <crash@leil.io>
This commit adds support for the EDGE metadata section in the FoundationDB metadata backend. Edges represent directory relationships and are stored as: EDGE_<parentId><name>: <childId> This change introduces full EDGE section support, including: - loadEdges() to reconstruct the in-memory directory tree from EDGE_ keys at startup. - loadEdge() to attach children to their parent directories, including trash/reserved handling and stats propagation. - EdgeUpdateEvent and EdgeRemoveEvent to persist directory entry creations, renames, and removals in real time. - Signal wiring (edgeChangedSignal / edgeRemovedSignal) to propagate metadata changes to FDB. - EDGE section bootstrap support to import entries from metadata.sfs file when initializing a new FDB backend. - Add missing documentation for existing functions. Co-authored-by: guillex <guillex@leil.io> Signed-off-by: Crash <crash@leil.io>
This commit adds support for persisting node deletions and metadata global properties in the FDB backend. Node removal support: - Introduce nodeRemovedSignal in gMetadata. - Emit nodeChangedSignal on checksum updates to ensure modifications are persisted. - Implement NodeRemoveEvent to remove NODE_<inode> keys - Wire nodeRemovedSignal to MetadataBackendForkless. Metadata global properties: - Persist maxInodeId, metadataVersion, and nextSessionId as dedicated META_* keys in FDB. - Load these values during startup before section loading. - Integrate persistence into fs_storeall(). fs_storeall() improvements: - Flush pending updates before sealing restore keys. - Persist metadata keys atomically. - Rotate changelog and broadcast metadata_saved status. Additional improvements: - Handle metadata download flow correctly for forkless backend. - Adjust synchronous dump behavior for forkless backend. - Improve test setup for master/shadow configurations. Co-authored-by: guillex <guillex@leil.io> Signed-off-by: Crash <crash@leil.io>
Add support for the XATR metadata section in the FoundationDB backend, including persistence, loading, and bootstrap from metadata files. This introduces XAttr update and removal events in the metadata writer, mapping in-memory xattr operations to KV entries using the `XATR_<inode><name>` key format. Hook xattr lifecycle events into the metadata backend via signals: - emit change and removal events on xattr updates in the filesystem layer - enqueue corresponding XAttrUpdateEvent, XAttrRemoveEvent, and XAttrInodeRemoveEvent operations in the FDB writer Implement full loading support from FoundationDB by scanning the XATR_ keyspace and reconstructing in-memory xattr structures, with validation against protocol limits. Add bootstrap support to import XATR metadata from filesystem dumps into FoundationDB during initialization. Overall, this enables full round-trip support for xattr metadata in the FoundationDB backend. Co-authored-by: guillex <guillex@leil.io> Signed-off-by: Crash <crash@leil.io>
bfa9236 to
e1ba985
Compare
Key changes: - Gate forkless backend sources and initialization behind ENABLE_FOUNDATIONDB, and correct the metadata backend config key. - Emit node, chunk, and xattr change notifications across filesystem operations so FDB-backed metadata stays in sync with in-memory state. - Harden FDB bootstrap and load paths by persisting metadata header fields, detecting partially initialized stores, and using safer read-only transactions during loads. - Rework metadata writer flushing to support ordered batched drains, restore failed batches, and handle bootstrap writes more safely. Signed-off-by: Crash <crash@leil.io>
e1ba985 to
34a9b46
Compare
Add a new FoundationDB-based metadata backend (
MetadataBackendForkless) that stores filesystem metadata in FDB instead of themetadata.sfsfile, removing the need to fork the master process during metadata dumps.Motivation
The current file-based backend forks the master process to snapshot metadata, which blocks multi-threading improvements and adds latency on large instances. This PR introduces an alternative backend that persists metadata directly to FoundationDB, enabling synchronous dumps without forking.
What's included
This PR implements the core infrastructure and five of the eight metadata sections. The remaining three sections (ACLS, QUOT, FLCK) will follow in a separate PR to keep this one reviewable.
Core infrastructure
IMetadataBackendimplementation that reads and writes metadata via FDB transactions.metadata.sfswhen sections are missing (e.g. upgrades from legacy versions).META_HEADERkey in FDB for signature checks, matching the file backend's header handling.METADATA_BACKEND_TYPEconfig option: newsfsmaster.cfgsetting (FILE|FORKLESS) under theEXPERIMENTALsection.Metadata sections
CHNK_<chunkId>gChunkChangedSignalNODEL_<inodeId>nodeChangedSignal/nodeRemovedSignalEDGE_<parentId><name>edgeChangedSignal/edgeRemovedSignalFREE_<inodeId>inodePoolfree/acquire signalsXATR_<inodeId><name>gXAttrChangedSignal/gXAttrRemovedSignal/gXAttrInodeRemovedSignalPersistence flow
IMetadataUpdateEventobjects into the writer.fs_storeall()flushes all pending updates, then persists restore-critical keys (META_NEXT_CHUNK_ID,META_VERSION, etc.) in a single FDB transaction.What's NOT included (next PR)
Testing
Tested with the existing shadow synchronization and short system tests using
METADATA_BACKEND_TYPE=FORKLESS. The FDB test infrastructure starts a local FoundationDB cluster for each test run.Not all tests pass yet, for two reasons:
metadata.sfsfile presence or fork-based dump behavior (e.g. reading dump files directly). These are structurally incompatible with the FORKLESS backend and will require separate test adaptations.Notes
FORKLESSbackend is experimental and not suitable for production use.