Skip to content

Conversation

@sanketkedia
Copy link
Contributor

@sanketkedia sanketkedia commented Oct 28, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Errors if source_key is set but ef is not supplied. Both client and server side
  • New functionality
    • ...

Test plan

How are these changes tested?
Added unit-test

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

None

Observability plan

None

Documentation Changes

None

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@sanketkedia sanketkedia requested a review from jairad26 October 28, 2025 18:10
@sanketkedia sanketkedia marked this pull request as ready for review October 28, 2025 18:10
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Oct 28, 2025

Validate SparseVectorIndexConfig requires embedding_function or bm25 when source_key is set

Adds client-side (Python) and server-side (Rust) guards so that a sparse-vector index cannot be created with a non-null source_key unless an embedding_function is supplied or bm25=True. Unit tests were updated/extended to reflect the new rule and to keep existing tests green (they now pass the bm25=True flag where no embedding function is present).

Key Changes

• Introduced _validate_sparse_vector_config() inside chromadb/api/types.py and call it from _set_index_for_key() and _enable_all_indexes_for_key().
• Parallel validation added in Rust at rust/types/src/validators.rs to keep server and thin-client behaviour aligned.
• Updated many test fixtures in chromadb/test/api/test_schema.py and chromadb/test/api/test_schema_e2e.py to add bm25=True where required.
• New unit test test_sparse_vector_config_requires_ef_with_source_key() asserts that a ValueError is raised when rule is violated.

Affected Areas

chromadb/api/types.py index-creation path
• Rust schema validator rust/types/src/validators.rs
• Python schema tests under chromadb/test/api/

This summary was automatically generated by @propel-code-bot

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

.into(),
));
}
if svit.config.source_key.is_some() && svit.config.embedding_function.is_none() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Validation logic discrepancy between Python and Rust implementations:

Python validation (line 2053-2055):

if (config.source_key is not None
        and config.embedding_function is None
        and config.bm25 is not True):

Rust validation (line 2284):

if svit.config.source_key.is_some() && svit.config.embedding_function.is_none() {

The Rust validation is missing the bm25 check entirely. This means:

  • Python: SparseVectorIndexConfig(source_key="text", bm25=True) ✅ passes validation
  • Rust: Same config ❌ fails validation

This will cause runtime failures when the Python client sends valid configs to a Rust server.

Fix the Rust validation:

if svit.config.source_key.is_some() 
    && svit.config.embedding_function.is_none() 
    && !svit.config.bm25.unwrap_or(false) {
Context for Agents
[**CriticalError**]

Validation logic discrepancy between Python and Rust implementations:

**Python validation (line 2053-2055):**
```python
if (config.source_key is not None
        and config.embedding_function is None
        and config.bm25 is not True):
```

**Rust validation (line 2284):**
```rust
if svit.config.source_key.is_some() && svit.config.embedding_function.is_none() {
```

The Rust validation is missing the `bm25` check entirely. This means:
- Python: `SparseVectorIndexConfig(source_key="text", bm25=True)` ✅ passes validation
- Rust: Same config ❌ fails validation

This will cause runtime failures when the Python client sends valid configs to a Rust server.

**Fix the Rust validation:**
```rust
if svit.config.source_key.is_some() 
    && svit.config.embedding_function.is_none() 
    && !svit.config.bm25.unwrap_or(false) {
```

File: rust/types/src/validators.rs
Line: 284

@sanketkedia sanketkedia changed the title Add validation [ENH]: Error if source_key set but no ef Oct 28, 2025
@sanketkedia sanketkedia merged commit 0d3bd2a into main Oct 28, 2025
62 of 63 checks passed
chroma-droid pushed a commit that referenced this pull request Oct 28, 2025
## Description of changes

_Summarize the changes made by this PR._

- Improvements & Bug fixes
- Errors if source_key is set but ef is not supplied. Both client and
server side
- New functionality
  - ...

## Test plan

_How are these changes tested?_
Added unit-test
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Migration plan
None

## Observability plan
None

## Documentation Changes
None
sanketkedia added a commit that referenced this pull request Oct 28, 2025
sanketkedia added a commit that referenced this pull request Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants