Skip to content

Update libp2p_kad::store::RecordStore trait to be amenable to a persistent implementation #4817

@nathanielc

Description

@nathanielc

Description

The libp2p_kad::store::RecordStore trait currently has a design that makes it difficult to implement a persistent backend.

There are challenges with the current design of the trait that make this difficult:

  • An Instant is used in the record types which prevents valid serialization/deserialization to a persistent store.
  • The API assumes that all data fits easily in memory, i.e. doesn't have a way to batch or otherwise partition the set of records.
  • The API does not have an async paradigm. The API does not follow either a poll model or a async/await model allowing for efficient system IO to a persistent store.

Instant Serialization

Specifically the ProviderRecord and Record types contain an Instant which by design cannot be serialized/deserialized.

I suggest we change the time type used from Instant to SystemTime. The trade off is that SystemTime is not guaranteed to be monotonic, i.e the system clock can be modified and so a time that was expected to be in the future may not be. However its possible to serialize/desialize a SystemTime (i.e. using seconds since the Unix Epoch). Time scales typically involved in record expiration are typically hours, at this scale its uncommon to see changes in monotinicity of a SystemTime.

Memory Pressure

The provided method produces an iterator over all entries in the store. Without a mechanism to paginate or resume from a cursor in the store the iterator may block other concurrent requests to the underlying store (i.e. sqlite).

Async API

Its not clear from the trait API or docs how/if any async system IO can be performed efficiently by an implementer. Are methods called concurrently? If system IO blocks the current thread will that potentially create a deadlock in the calling code? A persistent implementation needs answers to these questions.

Motivation

We have a use case where we are storing on the order of 100K - 10M provider records. Storing all of this data in memory is in efficient. Additionally we need the set of records to be persistent across restarts of the process.

Based on the design of the trait we have a few choices:

  • Persist the data in a different store and populate the memory store on startup. This has challenges of keeping the two stores in sync and does not solve the memory pressure needs.
  • Implement the RecordStore trait using a persistent store hacking around the Instant serialization problem and blocking the current thread on system IO. Its not clear what the performance impact of such a design would be.

Current Implementation

The current implementation has one other limitation. While the records and provided methods return an iterator over the data, the iterator is immediately cloned/collected into a heap allocated vector. This means not only would we need to update the trait API but also update the consuming code to be memory efficient.

Are you planning to do it yourself in a pull request ?

Maybe

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions