Skip to content

Conversation

@rescrv
Copy link
Contributor

@rescrv rescrv commented Oct 30, 2025

Description of changes

We need to put both tokens in both numeric and string form into the
outputs for statistics functions.

Test plan

CI

Migration plan

N/A

Observability plan

N/A

Documentation Changes

N/A

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Oct 30, 2025

Expose token id + string in sparse-vector statistics output

StatisticsValue::SparseVector is expanded from u32 to (u32, Option<String>) so both the numeric index and its corresponding token string (when available) propagate through the statistics pipeline. Supporting code is updated to convert, format, hash, compare, and emit the new compound value, and the statistics executor now writes an additional metadata field value2 that contains the token string.

Key Changes

• Changed enum variant StatisticsValue::SparseVector to SparseVector(u32, Option<String>)
• Added helper methods stable_value1(), stable_value2() and stable_string1() used for numeric and string representations
• Updated stable_type, hashing, equality, and Display impls to accommodate the two-part sparse vector value
• Reworked conversion from MetadataValue::SparseVector to pair indices with optional tokens; adds zip logic when tokens list present
• Executor now stores value (numeric id) and optional value2 (token string) in output metadata; record id & document strings now use new stable format

Affected Areas

rust/worker/src/execution/functions/statistics.rs (all statistic logic, executor, unit tests)

This summary was automatically generated by @propel-code-bot


/// A stable representation of the statistics's value.
fn stable_value(&self) -> String {
fn stable_value1(&self) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Function naming inconsistency: The new methods stable_value1() and stable_value2() use numeric suffixes which is not a clear naming convention. Consider using descriptive names like stable_numeric_value() and stable_token_value() to clarify their purpose:

Suggested change
fn stable_value1(&self) -> String {
/// A stable representation of the statistics's numeric value.
fn stable_numeric_value(&self) -> String {

This would make the API more self-documenting and easier to understand for future developers.

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

**Function naming inconsistency**: The new methods `stable_value1()` and `stable_value2()` use numeric suffixes which is not a clear naming convention. Consider using descriptive names like `stable_numeric_value()` and `stable_token_value()` to clarify their purpose:

```suggestion
    /// A stable representation of the statistics's numeric value.
    fn stable_numeric_value(&self) -> String {
```

This would make the API more self-documenting and easier to understand for future developers.

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: rust/worker/src/execution/functions/statistics.rs
Line: 95

/// Separate so display repr can change.
fn stable_string(&self) -> String {
format!("{}:{}", self.type_prefix(), self.stable_value())
fn stable_string1(&self) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Inconsistent method naming pattern: stable_string1() follows the same problematic numeric suffix pattern. Consider renaming to stable_string_representation() or similar:

Suggested change
fn stable_string1(&self) -> String {
fn stable_string_representation(&self) -> String {

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

**Inconsistent method naming pattern**: `stable_string1()` follows the same problematic numeric suffix pattern. Consider renaming to `stable_string_representation()` or similar:

```suggestion
    fn stable_string_representation(&self) -> String {
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: rust/worker/src/execution/functions/statistics.rs
Line: 124

impl std::fmt::Display for StatisticsValue {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", self.stable_string())
write!(f, "{}", self.stable_string1())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Method call inconsistency: This Display implementation calls stable_string1() but the original called stable_string(). This creates inconsistent behavior for existing users of the Display trait. Consider maintaining backward compatibility or clearly documenting this breaking change:

Suggested change
write!(f, "{}", self.stable_string1())
write!(f, "{}", self.stable_string_representation())

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

**Method call inconsistency**: This Display implementation calls `stable_string1()` but the original called `stable_string()`. This creates inconsistent behavior for existing users of the Display trait. Consider maintaining backward compatibility or clearly documenting this breaking change:

```suggestion
        write!(f, "{}", self.stable_string_representation())
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: rust/worker/src/execution/functions/statistics.rs
Line: 160

let stable_string = stats_value.stable_string();
let record_id = format!("{key}::{stable_string}");
let document = format!("statistics about {key} for {stable_string}");
let stable_value1 = stats_value.stable_value1();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Variable naming inconsistency: Using stable_value1 and stable_string1 with numeric suffixes makes the code less readable. Consider using descriptive names that align with the new method names:

Suggested change
let stable_value1 = stats_value.stable_value1();
let stable_numeric_value = stats_value.stable_numeric_value();
let stable_string_repr = stats_value.stable_string_representation();
let record_id = format!("{key}::{stable_string_repr}");
let document = format!("statistics about {key} for {stable_string_repr}");

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

**Variable naming inconsistency**: Using `stable_value1` and `stable_string1` with numeric suffixes makes the code less readable. Consider using descriptive names that align with the new method names:

```suggestion
                let stable_numeric_value = stats_value.stable_numeric_value();
                let stable_string_repr = stats_value.stable_string_representation();
                let record_id = format!("{key}::{stable_string_repr}");
                let document = format!("statistics about {key} for {stable_string_repr}");
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: rust/worker/src/execution/functions/statistics.rs
Line: 234

UpdateMetadataValue::Str(stats_value.stable_type().to_string()),
);
metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_value));
metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_value1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Variable naming continues the pattern: Same issue with variable naming consistency. The numeric suffix pattern should be replaced with descriptive names:

Suggested change
metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_value1));
metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_numeric_value));
if let Some(stable_token_value) = stats_value.stable_token_value() {

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

**Variable naming continues the pattern**: Same issue with variable naming consistency. The numeric suffix pattern should be replaced with descriptive names:

```suggestion
                metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_numeric_value));
                if let Some(stable_token_value) = stats_value.stable_token_value() {
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: rust/worker/src/execution/functions/statistics.rs
Line: 246

Comment on lines +136 to +153
MetadataValue::SparseVector(sparse) => {
if let Some(tokens) = sparse.tokens.as_ref() {
sparse
.indices
.iter()
.zip(tokens.iter())
.map(|(index, token)| {
StatisticsValue::SparseVector(*index, Some(token.clone()))
})
.collect()
} else {
sparse
.indices
.iter()
.map(|index| StatisticsValue::SparseVector(*index, None))
.collect()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

This block can be simplified to avoid code duplication by using enumerate and map on the tokens option. This makes the logic more concise and easier to maintain.

Suggested Change
Suggested change
MetadataValue::SparseVector(sparse) => {
if let Some(tokens) = sparse.tokens.as_ref() {
sparse
.indices
.iter()
.zip(tokens.iter())
.map(|(index, token)| {
StatisticsValue::SparseVector(*index, Some(token.clone()))
})
.collect()
} else {
sparse
.indices
.iter()
.map(|index| StatisticsValue::SparseVector(*index, None))
.collect()
}
}
MetadataValue::SparseVector(sparse) => sparse
.indices
.iter()
.enumerate()
.map(|(i, index)| {
let token = sparse.tokens.as_ref().map(|tokens| tokens[i].clone());
StatisticsValue::SparseVector(*index, token)
})
.collect(),

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

This block can be simplified to avoid code duplication by using `enumerate` and `map` on the `tokens` option. This makes the logic more concise and easier to maintain.
<details>
<summary>Suggested Change</summary>

```suggestion
            MetadataValue::SparseVector(sparse) => sparse
                .indices
                .iter()
                .enumerate()
                .map(|(i, index)| {
                    let token = sparse.tokens.as_ref().map(|tokens| tokens[i].clone());
                    StatisticsValue::SparseVector(*index, token)
                })
                .collect(),
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

</details>

File: rust/worker/src/execution/functions/statistics.rs
Line: 153

@blacksmith-sh

This comment has been minimized.

@rescrv rescrv force-pushed the rescrv/use-sparse-vector-tokens branch from a2ec476 to a0b964a Compare October 31, 2025 19:44
Comment on lines +111 to +112
/// A stable representation of the statistics's value.
fn stable_value2(&self) -> Option<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

As part of the renaming for clarity, stable_value2 should also be renamed. This also includes an improved doc comment to explain what this value represents.

Context for Agents
[**BestPractice**]

As part of the renaming for clarity, `stable_value2` should also be renamed. This also includes an improved doc comment to explain what this value represents.

File: rust/worker/src/execution/functions/statistics.rs
Line: 112

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with these. How about stable_value_expanded?

@rescrv rescrv requested a review from tanujnay112 October 31, 2025 20:05
Copy link
Contributor

@tanujnay112 tanujnay112 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming should be improved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants