Skip to content

Conversation

@ziruiliu
Copy link
Contributor

@ziruiliu ziruiliu commented Nov 2, 2025

Why

  • Comparing to existing s3 connector based on awscrt library, this nixl_object_backend enabled self-signed certificates as ca_bundle option for private S3-compatible endpoints—that are difficult to supported with awscrt library.
  • Nixl object backend leverages nixl transfer API to transfer data in a batch as nixl_storage_backend also did, but instead of the in-memory dictionary used by the original Nixl storage backend, each cache line is addressed by a deterministic key derived from the CacheEngineKey, letting us reuse objects after a reboot without rebuilding metadata. Therefore the new nixl_object_backend is able to share KV cache data between multiple nodes, and survives process restart

see #1557, original Nixl storage backend needs nixl_pool_size to pre-allocate certain number of files or object names at initialization with random names. With the new backend, there is no nixl_pool_size value because objects are named with key's hash value. Exatr config looks like:

extra_config:
  enable_nixl_object: true
  nixl_object_backend: OBJ
  nixl_object_presence_cache: False
  nixl_object_async_put: False
  nixl_object_backend_params:
    access_key: ACCESS_KEY
    secret_key: SECRET
    bucket: nixl-test-bucket
    region: any
    endpoint_override: https://url-to-object-storage
    ca_bundle: path to self-signed certificate

What

  • Implement a NixlObjectBackend built on top of NixlAgent helpers for memory registration and transfer so all reads/writes go through Nixl-provided batched descriptors.
  • Implement contains() via Nixl's query_mem() API (equivalent to HeadObject) and implement batched_async_contains by callin contains() parallely in multiple batches, fail fast if there is any failure in the small batchs, instead of issuing all contains() at once
  • Introduce an optional presence cache to short-circuit remote existence checks and avoid repeated query_memory calls during lookups. The cache could be false-positive, which would trigger failure in following retrieval. However failure in retrieval could be recovered by async KV loading in LMCache or vllm's #19330
  • Introduce an option that enable async put in write path to leverage Nixl's natively asynchronous API

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ziruiliu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new 'NixlObjectBackend' for LMCache, designed to provide a robust and flexible object storage solution. It addresses limitations of the existing S3 connector by supporting self-signed certificates for private S3-compatible endpoints. A key innovation is the use of deterministic keys for cache lines, which allows the KV cache to be shared across nodes and persist through process restarts. The backend also includes optimizations for existence checks with a new presence cache and supports asynchronous write operations, enhancing overall performance and resilience.

Highlights

  • New Nixl Object Storage Backend: Introduces a dedicated Nixl backend for object storage, offering an alternative to the existing S3 connector and enhancing flexibility for various storage solutions.
  • Self-Signed Certificate Support: Enables the use of self-signed certificates as a 'ca_bundle' option for private S3-compatible endpoints, addressing a limitation with the existing 'awscrt' library.
  • Persistent & Shared KV Cache: Leverages deterministic keys derived from 'CacheEngineKey' to address cache lines, allowing KV cache data to be shared across multiple nodes and survive process restarts without rebuilding metadata.
  • Optimized Existence Checks: Implements 'contains()' using Nixl's 'query_mem()' API (equivalent to 'HeadObject') and 'batched_async_contains' for parallel, fail-fast existence checks across multiple keys.
  • Optional Presence Cache: Adds an optional in-memory presence cache to short-circuit remote existence checks, reducing repeated 'query_memory' calls and improving performance for frequently accessed keys.
  • Asynchronous Put Operations: Introduces an option to enable asynchronous 'put' operations in the write path, leveraging Nixl's native asynchronous API for potentially faster data transfers.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new NixlObjectBackend for object storage, which is a significant and valuable addition. The implementation correctly leverages nixl for batched transfers and introduces features like a presence cache and asynchronous writes. The code is well-structured.

My review focuses on several key areas:

  • Thread Safety: I've identified potential race conditions in the presence cache and statistics counters that could lead to incorrect behavior under concurrent access.
  • Efficiency: There's an opportunity to improve the efficiency of blocking operations by avoiding busy-waiting.
  • Correctness: A few methods, such as remove and batched_get_non_blocking, have implementations that don't fully match their intent, which could lead to bugs or unexpected behavior.
  • Maintainability: Some method names could be clearer to better reflect their functionality.

I've provided specific suggestions to address these points. Overall, this is a great contribution, and with these refinements, it will be even more robust and efficient.

Comment on lines +659 to +667
def remove(self, key: CacheEngineKey, force: bool = True) -> bool:
"""
Remove the key from the storage backend.

:param key: The key to remove.
:param force: Whether to force removal (not used in this implementation)
"""
self._cache_discard(key.chunk_hash)
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The remove method only discards the key from the local presence cache (self._cache_discard). It does not actually remove the object from the remote object storage. This will lead to orphaned data in the storage backend, which can cause unexpected behavior and unnecessary storage costs. The implementation should be updated to issue a delete request to the object storage via the nixl agent. If object deletion is not yet supported, this method should raise NotImplementedError.

Comment on lines +51 to +64
class SetPresenceCache:
"""Default presence cache using a thread-safe Python set."""

def __init__(self) -> None:
self._keys: set[int] = set()

def add(self, key: int) -> None:
self._keys.add(key)

def discard(self, key: int) -> None:
self._keys.discard(key)

def contains(self, key: int) -> bool:
return key in self._keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The SetPresenceCache class is not thread-safe, despite its docstring. The add and discard methods modify self._keys without any synchronization, which can lead to race conditions when called from multiple threads (e.g., in batched_async_contains). To ensure thread safety, a threading.Lock should be used to protect access to self._keys.

Suggested change
class SetPresenceCache:
"""Default presence cache using a thread-safe Python set."""
def __init__(self) -> None:
self._keys: set[int] = set()
def add(self, key: int) -> None:
self._keys.add(key)
def discard(self, key: int) -> None:
self._keys.discard(key)
def contains(self, key: int) -> bool:
return key in self._keys
class SetPresenceCache:
"""Default presence cache using a thread-safe Python set."""
def __init__(self) -> None:
self._keys: set[int] = set()
self._lock = threading.Lock()
def add(self, key: int) -> None:
with self._lock:
self._keys.add(key)
def discard(self, key: int) -> None:
with self._lock:
self._keys.discard(key)
def contains(self, key: int) -> bool:
with self._lock:
return key in self._keys

Comment on lines +209 to +213
while state != "DONE" and state != "ERR":
try:
state = self.nixl_agent.check_xfer_state(handle)
except nixlBind.nixlBackendError:
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The post_blocking method implements a busy-wait loop to check the transfer state, which can lead to high CPU usage and is inefficient. Consider adding a short sleep inside the loop to yield the CPU, similar to what's done in _wait_for_transfer.

Suggested change
while state != "DONE" and state != "ERR":
try:
state = self.nixl_agent.check_xfer_state(handle)
except nixlBind.nixlBackendError:
raise
while state != "DONE" and state != "ERR":
try:
time.sleep(0.001) # Avoid busy-waiting
state = self.nixl_agent.check_xfer_state(handle)
except nixlBind.nixlBackendError:
raise

Comment on lines +644 to +657
async def batched_get_non_blocking(
self,
lookup_id: str,
keys: list[CacheEngineKey],
transfer_spec: Any = None,
) -> list[MemoryObj]:
"""
Non blocking interface to get the kv cache from the storage backend.
:param List[CacheEngineKey] keys: The keys of the MemoryObjs.
:return: a list of memory objects.
"""
obj_list = self.storage_to_mem(keys, False)
assert None not in obj_list
return cast(list[MemoryObj], obj_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The async method batched_get_non_blocking directly calls self.storage_to_mem(keys, False), which is a blocking function. This will block the entire asyncio event loop, defeating the purpose of it being non-blocking. To make it truly asynchronous, you should run the blocking call in a separate thread using asyncio.to_thread.

    async def batched_get_non_blocking(
        self,
        lookup_id: str,
        keys: list[CacheEngineKey],
        transfer_spec: Any = None,
    ) -> list[MemoryObj]:
        """
        Non blocking interface to get the kv cache from the storage backend.
        :param List[CacheEngineKey] keys: The keys of the MemoryObjs.
        :return: a list of memory objects.
        """
        obj_list = await asyncio.to_thread(self.storage_to_mem, keys, False)
        assert None not in obj_list
        return cast(list[MemoryObj], obj_list)

Comment on lines +218 to +221
def post_blocking_async(self, handle: NixlXferHandle):
"""Non-blocking async post for WRITE operations."""
state = self.nixl_agent.transfer(handle)
return state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The method name post_blocking_async is confusing as it suggests both blocking and asynchronous behavior. Based on its implementation and docstring ("Non-blocking async post"), a more descriptive name like post_non_blocking or post_async would improve clarity. Remember to update the call site in mem_to_storage as well.

Suggested change
def post_blocking_async(self, handle: NixlXferHandle):
"""Non-blocking async post for WRITE operations."""
state = self.nixl_agent.transfer(handle)
return state
def post_async(self, handle: NixlXferHandle):
"""Non-blocking async post for WRITE operations."""
state = self.nixl_agent.transfer(handle)
return state

if resp[0] is None:
return False
return True
except Exception as exc:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception can hide bugs or unexpected errors. It's better to catch more specific exceptions that you expect self.nixl_agent.query_memory to raise, such as nixlBind.nixlBackendError which is caught elsewhere in this file. This makes error handling more robust and predictable.

Suggested change
except Exception as exc:
except nixlBind.nixlBackendError as exc:

Comment on lines +307 to +315
def _cache_contains(self, chunk_hash: int) -> bool:
if not self.enable_presence_cache or self.key_presence_cache is None:
return False
found = self.key_presence_cache.contains(chunk_hash)
self.hit_counter += 1 if found else 0
self.total_counter += 1
if self.total_counter % 100 == 0:
logger.debug(f"Cache hit: {self.hit_counter} vs {self.total_counter}")
return found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The counters self.hit_counter and self.total_counter are updated in _cache_contains without any locking. Since _cache_contains can be called from multiple threads via batched_async_contains, this can lead to race conditions and inaccurate statistics. These counters should be protected by a lock. You could reuse self.progress_lock for this purpose.

Suggested change
def _cache_contains(self, chunk_hash: int) -> bool:
if not self.enable_presence_cache or self.key_presence_cache is None:
return False
found = self.key_presence_cache.contains(chunk_hash)
self.hit_counter += 1 if found else 0
self.total_counter += 1
if self.total_counter % 100 == 0:
logger.debug(f"Cache hit: {self.hit_counter} vs {self.total_counter}")
return found
def _cache_contains(self, chunk_hash: int) -> bool:
if not self.enable_presence_cache or self.key_presence_cache is None:
return False
found = self.key_presence_cache.contains(chunk_hash)
with self.progress_lock:
self.hit_counter += 1 if found else 0
self.total_counter += 1
if self.total_counter % 100 == 0:
logger.debug(f"Cache hit: {self.hit_counter} vs {self.total_counter}")
return found

Comment on lines 576 to 580
self._wait_for_transfer(
handle, initial_state, keys,
storage_reg_descs, storage_xfer_handler,
mem_objs
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The arguments to self._wait_for_transfer are formatted in a way that is hard to read. For better readability and to adhere to common Python style guides, consider formatting the arguments on separate lines, aligned with the opening parenthesis.

                self._wait_for_transfer(
                    handle,
                    initial_state,
                    keys,
                    storage_reg_descs,
                    storage_xfer_handler,
                    mem_objs,
                )

Signed-off-by: Zirui Liu <[email protected]>
@mmgaggle
Copy link

mmgaggle commented Nov 7, 2025

I love that you're doing this. We will give it a spin with our all-flash Ceph cluster.

The CRT client does have upsides, namely distributing connections in the client connection pool across s3 endpoints returned via DNS. We played with a variation of nixl/obj connector that was modified to use S3CrtClient (to get multipathing), but there were enough other issues that we abandoned it in favor of the native s3 connector in lmcache. Making the nixl object content addressable is a huge step forward, and I also appreciate the lookup caching!

@mmgaggle
Copy link

We are seeing significant improvement using this branch:

Screenshot 2025-11-12 at 12 07 30 PM

@ziruiliu
Copy link
Contributor Author

I love that you're doing this. We will give it a spin with our all-flash Ceph cluster.

The CRT client does have upsides, namely distributing connections in the client connection pool across s3 endpoints returned via DNS. We played with a variation of nixl/obj connector that was modified to use S3CrtClient (to get multipathing), but there were enough other issues that we abandoned it in favor of the native s3 connector in lmcache. Making the nixl object content addressable is a huge step forward, and I also appreciate the lookup caching!

Thank you Kyle for testing with this branch. I am now working with Nvidia nixl team to get an unified version of nixl storage backend. Please stay tuned

@leelavg
Copy link

leelavg commented Nov 18, 2025

if it helps in anyway, we also tested this branch with kserve/llminferenceservice observing good perf difference and also a streamlined config when compared to lmcache+s3 (shm keeps on filling up) and lmcache+nixl[storage]+s3 (inconsistent numbers intermittently).

This is a small test I've added to test the basic flow https://github.com/leelavg/LMCache/blob/nixl-test/tests/test_nixl_object.py

Thanks.

tshmilnvidia added a commit to tshmilnvidia/LMCache that referenced this pull request Nov 18, 2025
This commit incorporates PR LMCache#1939 into NixlStorageBackend, under the
name of NixlDynamicStorageBackend, in order to prevent having 2 nixl
storage backends as well as lessen code duplication.

In order to use this new dynamic storage mode, which creates object keys
when needed instead of a limited pre-allocated amount,
extra_config.nixl_pool_size should be set to 0.

This mode is currently only supported for nixl OBJ backend.

Signed-off-by: Tomer Shmilovich <[email protected]>
@ziruiliu
Copy link
Contributor Author

Close this one. This scalable nixl object backend is integrated into an unified nixl storage backend, see PR #2024
Setting nixl_pool_size to 0 in nixl storage backend will enable dynamic mode, which is the same as this one

@ziruiliu ziruiliu closed this Nov 20, 2025
tshmilnvidia added a commit to tshmilnvidia/LMCache that referenced this pull request Nov 23, 2025
This commit incorporates PR LMCache#1939 into NixlStorageBackend, under the
name of NixlDynamicStorageBackend, in order to prevent having 2 nixl
storage backends as well as lessen code duplication.

In order to use this new dynamic storage mode, which creates object keys
when needed instead of a limited pre-allocated amount,
extra_config.nixl_pool_size should be set to 0.

This mode is currently only supported for nixl OBJ backend.

Signed-off-by: Tomer Shmilovich <[email protected]>
tshmilnvidia added a commit to tshmilnvidia/LMCache that referenced this pull request Nov 23, 2025
This commit incorporates PR LMCache#1939 into NixlStorageBackend, under the
name of NixlDynamicStorageBackend, in order to prevent having 2 nixl
storage backends as well as lessen code duplication.

In order to use this new dynamic storage mode, which creates object keys
when needed instead of a limited pre-allocated amount,
extra_config.nixl_pool_size should be set to 0.

This mode is currently only supported for nixl OBJ backend.

Signed-off-by: Tomer Shmilovich <[email protected]>
tshmilnvidia added a commit to tshmilnvidia/LMCache that referenced this pull request Nov 24, 2025
This commit incorporates PR LMCache#1939 into NixlStorageBackend, under the
name of NixlDynamicStorageBackend, in order to prevent having 2 nixl
storage backends as well as lessen code duplication.

In order to use this new dynamic storage mode, which creates object keys
when needed instead of a limited pre-allocated amount,
extra_config.nixl_pool_size should be set to 0.

This mode is currently only supported for nixl OBJ backend.

Signed-off-by: Tomer Shmilovich <[email protected]>
tshmilnvidia added a commit to tshmilnvidia/LMCache that referenced this pull request Nov 24, 2025
This commit incorporates PR LMCache#1939 into NixlStorageBackend, under the
name of NixlDynamicStorageBackend, in order to prevent having 2 nixl
storage backends as well as lessen code duplication.

In order to use this new dynamic storage mode, which creates object keys
when needed instead of a limited pre-allocated amount,
extra_config.nixl_pool_size should be set to 0.

This mode is currently only supported for nixl OBJ backend.

Signed-off-by: Tomer Shmilovich <[email protected]>
tshmilnvidia added a commit to tshmilnvidia/LMCache that referenced this pull request Dec 2, 2025
This commit incorporates PR LMCache#1939 into NixlStorageBackend, under the
name of NixlDynamicStorageBackend, in order to prevent having 2 nixl
storage backends as well as lessen code duplication.

In order to use this new dynamic storage mode, which creates object keys
when needed instead of a limited pre-allocated amount,
extra_config.nixl_pool_size should be set to 0.

This mode is currently only supported for nixl OBJ backend.

Signed-off-by: Tomer Shmilovich <[email protected]>
tshmilnvidia added a commit to tshmilnvidia/LMCache that referenced this pull request Dec 2, 2025
This commit incorporates PR LMCache#1939 into NixlStorageBackend, under the
name of NixlDynamicStorageBackend, in order to prevent having 2 nixl
storage backends as well as lessen code duplication.

In order to use this new dynamic storage mode, which creates object keys
when needed instead of a limited pre-allocated amount,
extra_config.nixl_pool_size should be set to 0.

This mode is currently only supported for nixl OBJ backend.

Signed-off-by: Tomer Shmilovich <[email protected]>
tshmilnvidia added a commit to tshmilnvidia/LMCache that referenced this pull request Dec 2, 2025
This commit incorporates PR LMCache#1939 into NixlStorageBackend, under the
name of NixlDynamicStorageBackend, in order to prevent having 2 nixl
storage backends as well as lessen code duplication.

In order to use this new dynamic storage mode, which creates object keys
when needed instead of a limited pre-allocated amount,
extra_config.nixl_pool_size should be set to 0.

This mode is currently only supported for nixl OBJ backend.

Signed-off-by: Tomer Shmilovich <[email protected]>
tshmilnvidia added a commit to tshmilnvidia/LMCache that referenced this pull request Dec 2, 2025
This commit incorporates PR LMCache#1939 into NixlStorageBackend, under the
name of NixlDynamicStorageBackend, in order to prevent having 2 nixl
storage backends as well as lessen code duplication.

In order to use this new dynamic storage mode, which creates object keys
when needed instead of a limited pre-allocated amount,
extra_config.nixl_pool_size should be set to 0.

This mode is currently only supported for nixl OBJ backend.

Signed-off-by: Tomer Shmilovich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants