-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[ENH] Put both token id and token str in the statistics #5777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rescrv/sparse
Are you sure you want to change the base?
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
|
Expose token id + string in sparse-vector statistics output
Key Changes• Changed enum variant Affected Areas• This summary was automatically generated by @propel-code-bot |
|
|
||
| /// A stable representation of the statistics's value. | ||
| fn stable_value(&self) -> String { | ||
| fn stable_value1(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Function naming inconsistency: The new methods stable_value1() and stable_value2() use numeric suffixes which is not a clear naming convention. Consider using descriptive names like stable_numeric_value() and stable_token_value() to clarify their purpose:
| fn stable_value1(&self) -> String { | |
| /// A stable representation of the statistics's numeric value. | |
| fn stable_numeric_value(&self) -> String { |
This would make the API more self-documenting and easier to understand for future developers.
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**BestPractice**]
**Function naming inconsistency**: The new methods `stable_value1()` and `stable_value2()` use numeric suffixes which is not a clear naming convention. Consider using descriptive names like `stable_numeric_value()` and `stable_token_value()` to clarify their purpose:
```suggestion
/// A stable representation of the statistics's numeric value.
fn stable_numeric_value(&self) -> String {
```
This would make the API more self-documenting and easier to understand for future developers.
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
File: rust/worker/src/execution/functions/statistics.rs
Line: 95| /// Separate so display repr can change. | ||
| fn stable_string(&self) -> String { | ||
| format!("{}:{}", self.type_prefix(), self.stable_value()) | ||
| fn stable_string1(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Inconsistent method naming pattern: stable_string1() follows the same problematic numeric suffix pattern. Consider renaming to stable_string_representation() or similar:
| fn stable_string1(&self) -> String { | |
| fn stable_string_representation(&self) -> String { |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**BestPractice**]
**Inconsistent method naming pattern**: `stable_string1()` follows the same problematic numeric suffix pattern. Consider renaming to `stable_string_representation()` or similar:
```suggestion
fn stable_string_representation(&self) -> String {
```
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
File: rust/worker/src/execution/functions/statistics.rs
Line: 124| impl std::fmt::Display for StatisticsValue { | ||
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| write!(f, "{}", self.stable_string()) | ||
| write!(f, "{}", self.stable_string1()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Method call inconsistency: This Display implementation calls stable_string1() but the original called stable_string(). This creates inconsistent behavior for existing users of the Display trait. Consider maintaining backward compatibility or clearly documenting this breaking change:
| write!(f, "{}", self.stable_string1()) | |
| write!(f, "{}", self.stable_string_representation()) |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**BestPractice**]
**Method call inconsistency**: This Display implementation calls `stable_string1()` but the original called `stable_string()`. This creates inconsistent behavior for existing users of the Display trait. Consider maintaining backward compatibility or clearly documenting this breaking change:
```suggestion
write!(f, "{}", self.stable_string_representation())
```
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
File: rust/worker/src/execution/functions/statistics.rs
Line: 160| let stable_string = stats_value.stable_string(); | ||
| let record_id = format!("{key}::{stable_string}"); | ||
| let document = format!("statistics about {key} for {stable_string}"); | ||
| let stable_value1 = stats_value.stable_value1(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Variable naming inconsistency: Using stable_value1 and stable_string1 with numeric suffixes makes the code less readable. Consider using descriptive names that align with the new method names:
| let stable_value1 = stats_value.stable_value1(); | |
| let stable_numeric_value = stats_value.stable_numeric_value(); | |
| let stable_string_repr = stats_value.stable_string_representation(); | |
| let record_id = format!("{key}::{stable_string_repr}"); | |
| let document = format!("statistics about {key} for {stable_string_repr}"); |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**BestPractice**]
**Variable naming inconsistency**: Using `stable_value1` and `stable_string1` with numeric suffixes makes the code less readable. Consider using descriptive names that align with the new method names:
```suggestion
let stable_numeric_value = stats_value.stable_numeric_value();
let stable_string_repr = stats_value.stable_string_representation();
let record_id = format!("{key}::{stable_string_repr}");
let document = format!("statistics about {key} for {stable_string_repr}");
```
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
File: rust/worker/src/execution/functions/statistics.rs
Line: 234| UpdateMetadataValue::Str(stats_value.stable_type().to_string()), | ||
| ); | ||
| metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_value)); | ||
| metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_value1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Variable naming continues the pattern: Same issue with variable naming consistency. The numeric suffix pattern should be replaced with descriptive names:
| metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_value1)); | |
| metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_numeric_value)); | |
| if let Some(stable_token_value) = stats_value.stable_token_value() { |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**BestPractice**]
**Variable naming continues the pattern**: Same issue with variable naming consistency. The numeric suffix pattern should be replaced with descriptive names:
```suggestion
metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_numeric_value));
if let Some(stable_token_value) = stats_value.stable_token_value() {
```
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
File: rust/worker/src/execution/functions/statistics.rs
Line: 246| MetadataValue::SparseVector(sparse) => { | ||
| if let Some(tokens) = sparse.tokens.as_ref() { | ||
| sparse | ||
| .indices | ||
| .iter() | ||
| .zip(tokens.iter()) | ||
| .map(|(index, token)| { | ||
| StatisticsValue::SparseVector(*index, Some(token.clone())) | ||
| }) | ||
| .collect() | ||
| } else { | ||
| sparse | ||
| .indices | ||
| .iter() | ||
| .map(|index| StatisticsValue::SparseVector(*index, None)) | ||
| .collect() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
This block can be simplified to avoid code duplication by using enumerate and map on the tokens option. This makes the logic more concise and easier to maintain.
Suggested Change
| MetadataValue::SparseVector(sparse) => { | |
| if let Some(tokens) = sparse.tokens.as_ref() { | |
| sparse | |
| .indices | |
| .iter() | |
| .zip(tokens.iter()) | |
| .map(|(index, token)| { | |
| StatisticsValue::SparseVector(*index, Some(token.clone())) | |
| }) | |
| .collect() | |
| } else { | |
| sparse | |
| .indices | |
| .iter() | |
| .map(|index| StatisticsValue::SparseVector(*index, None)) | |
| .collect() | |
| } | |
| } | |
| MetadataValue::SparseVector(sparse) => sparse | |
| .indices | |
| .iter() | |
| .enumerate() | |
| .map(|(i, index)| { | |
| let token = sparse.tokens.as_ref().map(|tokens| tokens[i].clone()); | |
| StatisticsValue::SparseVector(*index, token) | |
| }) | |
| .collect(), |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**BestPractice**]
This block can be simplified to avoid code duplication by using `enumerate` and `map` on the `tokens` option. This makes the logic more concise and easier to maintain.
<details>
<summary>Suggested Change</summary>
```suggestion
MetadataValue::SparseVector(sparse) => sparse
.indices
.iter()
.enumerate()
.map(|(i, index)| {
let token = sparse.tokens.as_ref().map(|tokens| tokens[i].clone());
StatisticsValue::SparseVector(*index, token)
})
.collect(),
```
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
</details>
File: rust/worker/src/execution/functions/statistics.rs
Line: 153
This comment has been minimized.
This comment has been minimized.
a2ec476 to
a0b964a
Compare
| /// A stable representation of the statistics's value. | ||
| fn stable_value2(&self) -> Option<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
As part of the renaming for clarity, stable_value2 should also be renamed. This also includes an improved doc comment to explain what this value represents.
Context for Agents
[**BestPractice**]
As part of the renaming for clarity, `stable_value2` should also be renamed. This also includes an improved doc comment to explain what this value represents.
File: rust/worker/src/execution/functions/statistics.rs
Line: 112There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with these. How about stable_value_expanded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming should be improved
Description of changes
We need to put both tokens in both numeric and string form into the
outputs for statistics functions.
Test plan
CI
Migration plan
N/A
Observability plan
N/A
Documentation Changes
N/A