Skip to content

Conversation

@chroma-droid
Copy link

This PR cherry-picks the commit 0d3bd2a onto rc/2025-10-24. If there are unresolved conflicts, please resolve them manually.

## Description of changes

_Summarize the changes made by this PR._

- Improvements & Bug fixes
- Errors if source_key is set but ef is not supplied. Both client and
server side
- New functionality
  - ...

## Test plan

_How are these changes tested?_
Added unit-test
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Migration plan
None

## Observability plan
None

## Documentation Changes
None
@propel-code-bot
Copy link
Contributor

Add strict validation when source_key is set on SparseVectorIndexConfig without embedding function

This hot-fix cherry-picks PR #5751 onto the release branch and introduces a guard that disallows creating a sparse-vector index with source_key specified but no embedding_function (unless bm25=True). Validation is added in both Python (chromadb/api/types.py) and Rust (rust/types/src/validators.rs) code paths, and unit-tests are updated/added to reflect the new rule. Minor whitespace clean-up was also applied to affected test files.

Key Changes

• Created _validate_sparse_vector_config in chromadb/api/types.py and wired it into _set_index_for_key and _enable_all_indexes_for_key.
• Extended Rust validator in rust/types/src/validators.rs to return a schema error on the same condition.
• Added pytest test_sparse_vector_config_requires_ef_with_source_key and patched many existing tests to pass bm25=True where no embedding function is supplied.
• Bumped doc-string comments and removed/added small whitespace in test files.

Affected Areas

chromadb/api/types.py (schema validation logic)
rust/types/src/validators.rs (server-side schema validation)
• Python unit tests in chromadb/test/api/test_schema.py and test_schema_e2e.py

This summary was automatically generated by @propel-code-bot

Comment on lines +284 to +287
if svit.config.source_key.is_some() && svit.config.embedding_function.is_none() {
return Err(ValidationError::new("schema").with_message(
"If source_key is provided then embedding_function must also be provided since there is no default embedding function.".into(),
));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

The Rust validation logic is inconsistent with the Python implementation. The Rust code at line 284 only checks for embedding_function, but the Python validation on line 2055 also allows bm25=True as an alternative:

if (config.source_key is not None
        and config.embedding_function is None
        and config.bm25 is not True):

The Rust validation should be updated to match:

Suggested Change
Suggested change
if svit.config.source_key.is_some() && svit.config.embedding_function.is_none() {
return Err(ValidationError::new("schema").with_message(
"If source_key is provided then embedding_function must also be provided since there is no default embedding function.".into(),
));
if svit.config.source_key.is_some() && svit.config.embedding_function.is_none() && !svit.config.bm25.unwrap_or(false) {
return Err(ValidationError::new("schema").with_message(
"If source_key is provided then either embedding_function or bm25 must be provided since there is no default embedding function.".into(),
));
}

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**CriticalError**]

The Rust validation logic is inconsistent with the Python implementation. The Rust code at line 284 only checks for `embedding_function`, but the Python validation on line 2055 also allows `bm25=True` as an alternative:

```python
if (config.source_key is not None
        and config.embedding_function is None
        and config.bm25 is not True):
```

The Rust validation should be updated to match:

<details>
<summary>Suggested Change</summary>

```suggestion
if svit.config.source_key.is_some() && svit.config.embedding_function.is_none() && !svit.config.bm25.unwrap_or(false) {
    return Err(ValidationError::new("schema").with_message(
        "If source_key is provided then either embedding_function or bm25 must be provided since there is no default embedding function.".into(),
    ));
}
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

</details>

File: rust/types/src/validators.rs
Line: 287

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants