[Core] A nixl backend dedicated for object storage #1939

ziruiliu · 2025-11-02T17:37:45Z

Why

Comparing to existing s3 connector based on awscrt library, this nixl_object_backend enabled self-signed certificates as ca_bundle option for private S3-compatible endpoints—that are difficult to supported with awscrt library.
Nixl object backend leverages nixl transfer API to transfer data in a batch as nixl_storage_backend also did, but instead of the in-memory dictionary used by the original Nixl storage backend, each cache line is addressed by a deterministic key derived from the CacheEngineKey, letting us reuse objects after a reboot without rebuilding metadata. Therefore the new nixl_object_backend is able to share KV cache data between multiple nodes, and survives process restart

see #1557, original Nixl storage backend needs nixl_pool_size to pre-allocate certain number of files or object names at initialization with random names. With the new backend, there is no nixl_pool_size value because objects are named with key's hash value. Exatr config looks like:

extra_config:
  enable_nixl_object: true
  nixl_object_backend: OBJ
  nixl_object_presence_cache: False
  nixl_object_async_put: False
  nixl_object_backend_params:
    access_key: ACCESS_KEY
    secret_key: SECRET
    bucket: nixl-test-bucket
    region: any
    endpoint_override: https://url-to-object-storage
    ca_bundle: path to self-signed certificate

What

Implement a NixlObjectBackend built on top of NixlAgent helpers for memory registration and transfer so all reads/writes go through Nixl-provided batched descriptors.
Implement contains() via Nixl's query_mem() API (equivalent to HeadObject) and implement batched_async_contains by callin contains() parallely in multiple batches, fail fast if there is any failure in the small batchs, instead of issuing all contains() at once
Introduce an optional presence cache to short-circuit remote existence checks and avoid repeated query_memory calls during lookups. The cache could be false-positive, which would trigger failure in following retrieval. However failure in retrieval could be recovered by async KV loading in LMCache or vllm's #19330
Introduce an option that enable async put in write path to leverage Nixl's natively asynchronous API

Signed-off-by: Zirui Liu <[email protected]>

gemini-code-assist · 2025-11-02T17:38:08Z

Summary of Changes

Hello @ziruiliu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new 'NixlObjectBackend' for LMCache, designed to provide a robust and flexible object storage solution. It addresses limitations of the existing S3 connector by supporting self-signed certificates for private S3-compatible endpoints. A key innovation is the use of deterministic keys for cache lines, which allows the KV cache to be shared across nodes and persist through process restarts. The backend also includes optimizations for existence checks with a new presence cache and supports asynchronous write operations, enhancing overall performance and resilience.

Highlights

New Nixl Object Storage Backend: Introduces a dedicated Nixl backend for object storage, offering an alternative to the existing S3 connector and enhancing flexibility for various storage solutions.
Self-Signed Certificate Support: Enables the use of self-signed certificates as a 'ca_bundle' option for private S3-compatible endpoints, addressing a limitation with the existing 'awscrt' library.
Persistent & Shared KV Cache: Leverages deterministic keys derived from 'CacheEngineKey' to address cache lines, allowing KV cache data to be shared across multiple nodes and survive process restarts without rebuilding metadata.
Optimized Existence Checks: Implements 'contains()' using Nixl's 'query_mem()' API (equivalent to 'HeadObject') and 'batched_async_contains' for parallel, fail-fast existence checks across multiple keys.
Optional Presence Cache: Adds an optional in-memory presence cache to short-circuit remote existence checks, reducing repeated 'query_memory' calls and improving performance for frequently accessed keys.
Asynchronous Put Operations: Introduces an option to enable asynchronous 'put' operations in the write path, leveraging Nixl's native asynchronous API for potentially faster data transfers.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new NixlObjectBackend for object storage, which is a significant and valuable addition. The implementation correctly leverages nixl for batched transfers and introduces features like a presence cache and asynchronous writes. The code is well-structured.

My review focuses on several key areas:

Thread Safety: I've identified potential race conditions in the presence cache and statistics counters that could lead to incorrect behavior under concurrent access.
Efficiency: There's an opportunity to improve the efficiency of blocking operations by avoiding busy-waiting.
Correctness: A few methods, such as remove and batched_get_non_blocking, have implementations that don't fully match their intent, which could lead to bugs or unexpected behavior.
Maintainability: Some method names could be clearer to better reflect their functionality.

I've provided specific suggestions to address these points. Overall, this is a great contribution, and with these refinements, it will be even more robust and efficient.

gemini-code-assist · 2025-11-02T17:40:04Z