Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 55 additions & 20 deletions rust/worker/src/execution/functions/statistics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ enum StatisticsValue {
/// String metadata value associated with a record.
Str(String),
/// Sparse vector index observed in metadata.
SparseVector(u32),
SparseVector(u32, Option<String>),
}

impl StatisticsValue {
Expand All @@ -76,7 +76,7 @@ impl StatisticsValue {
Self::Int(_) => "int",
Self::Float(_) => "float",
Self::Str(_) => "str",
Self::SparseVector(_) => "sparse",
Self::SparseVector(_, _) => "sparse",
}
}

Expand All @@ -87,12 +87,12 @@ impl StatisticsValue {
Self::Int(_) => "i",
Self::Float(_) => "f",
Self::Str(_) => "s",
Self::SparseVector(_) => "sv",
Self::SparseVector(_, _) => "sv",
}
}

/// A stable representation of the statistics's value.
fn stable_value(&self) -> String {
fn stable_value1(&self) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Function naming inconsistency: The new methods stable_value1() and stable_value2() use numeric suffixes which is not a clear naming convention. Consider using descriptive names like stable_numeric_value() and stable_token_value() to clarify their purpose:

Suggested change
fn stable_value1(&self) -> String {
/// A stable representation of the statistics's numeric value.
fn stable_numeric_value(&self) -> String {

This would make the API more self-documenting and easier to understand for future developers.

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

**Function naming inconsistency**: The new methods `stable_value1()` and `stable_value2()` use numeric suffixes which is not a clear naming convention. Consider using descriptive names like `stable_numeric_value()` and `stable_token_value()` to clarify their purpose:

```suggestion
    /// A stable representation of the statistics's numeric value.
    fn stable_numeric_value(&self) -> String {
```

This would make the API more self-documenting and easier to understand for future developers.

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: rust/worker/src/execution/functions/statistics.rs
Line: 95

match self {
Self::Bool(b) => {
format!("{b}")
Expand All @@ -102,16 +102,27 @@ impl StatisticsValue {
}
Self::Str(s) => s.clone(),
Self::Float(f) => format!("{f:.16e}"),
Self::SparseVector(index) => {
Self::SparseVector(index, _) => {
format!("{index}")
}
}
}

/// A stable representation of the statistics's value.
fn stable_value2(&self) -> Option<String> {
Comment on lines +111 to +112
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

As part of the renaming for clarity, stable_value2 should also be renamed. This also includes an improved doc comment to explain what this value represents.

Context for Agents
[**BestPractice**]

As part of the renaming for clarity, `stable_value2` should also be renamed. This also includes an improved doc comment to explain what this value represents.

File: rust/worker/src/execution/functions/statistics.rs
Line: 112

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with these. How about stable_value_expanded?

match self {
Self::Bool(_) => None,
Self::Int(_) => None,
Self::Str(_) => None,
Self::Float(_) => None,
Self::SparseVector(_, token) => token.clone(),
}
}

/// A stable string representation of a statistics value with type tag.
/// Separate so display repr can change.
fn stable_string(&self) -> String {
format!("{}:{}", self.type_prefix(), self.stable_value())
fn stable_string1(&self) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Inconsistent method naming pattern: stable_string1() follows the same problematic numeric suffix pattern. Consider renaming to stable_string_representation() or similar:

Suggested change
fn stable_string1(&self) -> String {
fn stable_string_representation(&self) -> String {

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

**Inconsistent method naming pattern**: `stable_string1()` follows the same problematic numeric suffix pattern. Consider renaming to `stable_string_representation()` or similar:

```suggestion
    fn stable_string_representation(&self) -> String {
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: rust/worker/src/execution/functions/statistics.rs
Line: 124

format!("{}:{}", self.type_prefix(), self.stable_value1())
}

/// Convert MetadataValue to a vector of StatisticsValue.
Expand All @@ -122,18 +133,31 @@ impl StatisticsValue {
MetadataValue::Int(i) => vec![StatisticsValue::Int(*i)],
MetadataValue::Float(f) => vec![StatisticsValue::Float(*f)],
MetadataValue::Str(s) => vec![StatisticsValue::Str(s.clone())],
MetadataValue::SparseVector(sparse) => sparse
.indices
.iter()
.map(|index| StatisticsValue::SparseVector(*index))
.collect(),
MetadataValue::SparseVector(sparse) => {
if let Some(tokens) = sparse.tokens.as_ref() {
sparse
.indices
.iter()
.zip(tokens.iter())
.map(|(index, token)| {
StatisticsValue::SparseVector(*index, Some(token.clone()))
})
.collect()
} else {
sparse
.indices
.iter()
.map(|index| StatisticsValue::SparseVector(*index, None))
.collect()
}
}
Comment on lines +136 to +153
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

This block can be simplified to avoid code duplication by using enumerate and map on the tokens option. This makes the logic more concise and easier to maintain.

Suggested Change
Suggested change
MetadataValue::SparseVector(sparse) => {
if let Some(tokens) = sparse.tokens.as_ref() {
sparse
.indices
.iter()
.zip(tokens.iter())
.map(|(index, token)| {
StatisticsValue::SparseVector(*index, Some(token.clone()))
})
.collect()
} else {
sparse
.indices
.iter()
.map(|index| StatisticsValue::SparseVector(*index, None))
.collect()
}
}
MetadataValue::SparseVector(sparse) => sparse
.indices
.iter()
.enumerate()
.map(|(i, index)| {
let token = sparse.tokens.as_ref().map(|tokens| tokens[i].clone());
StatisticsValue::SparseVector(*index, token)
})
.collect(),

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

This block can be simplified to avoid code duplication by using `enumerate` and `map` on the `tokens` option. This makes the logic more concise and easier to maintain.
<details>
<summary>Suggested Change</summary>

```suggestion
            MetadataValue::SparseVector(sparse) => sparse
                .indices
                .iter()
                .enumerate()
                .map(|(i, index)| {
                    let token = sparse.tokens.as_ref().map(|tokens| tokens[i].clone());
                    StatisticsValue::SparseVector(*index, token)
                })
                .collect(),
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

</details>

File: rust/worker/src/execution/functions/statistics.rs
Line: 153

}
}
}

impl std::fmt::Display for StatisticsValue {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", self.stable_string())
write!(f, "{}", self.stable_string1())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Method call inconsistency: This Display implementation calls stable_string1() but the original called stable_string(). This creates inconsistent behavior for existing users of the Display trait. Consider maintaining backward compatibility or clearly documenting this breaking change:

Suggested change
write!(f, "{}", self.stable_string1())
write!(f, "{}", self.stable_string_representation())

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

**Method call inconsistency**: This Display implementation calls `stable_string1()` but the original called `stable_string()`. This creates inconsistent behavior for existing users of the Display trait. Consider maintaining backward compatibility or clearly documenting this breaking change:

```suggestion
        write!(f, "{}", self.stable_string_representation())
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: rust/worker/src/execution/functions/statistics.rs
Line: 160

}
}

Expand All @@ -144,7 +168,9 @@ impl PartialEq for StatisticsValue {
(Self::Int(lhs), Self::Int(rhs)) => lhs == rhs,
(Self::Float(lhs), Self::Float(rhs)) => lhs.to_bits() == rhs.to_bits(),
(Self::Str(lhs), Self::Str(rhs)) => lhs == rhs,
(Self::SparseVector(lhs), Self::SparseVector(rhs)) => lhs == rhs,
(Self::SparseVector(lhs1, lhs2), Self::SparseVector(rhs1, rhs2)) => {
lhs1 == rhs1 && lhs2 == rhs2
}
_ => false,
}
}
Expand All @@ -160,7 +186,10 @@ impl Hash for StatisticsValue {
StatisticsValue::Int(value) => value.hash(state),
StatisticsValue::Float(value) => value.to_bits().hash(state),
StatisticsValue::Str(value) => value.hash(state),
StatisticsValue::SparseVector(value) => value.hash(state),
StatisticsValue::SparseVector(value, token) => {
value.hash(state);
token.hash(state);
}
}
}
}
Expand Down Expand Up @@ -202,10 +231,10 @@ impl AttachedFunctionExecutor for StatisticsFunctionExecutor {
let mut records = Vec::with_capacity(counts.len());
for (key, inner_map) in counts.into_iter() {
for (stats_value, count) in inner_map.into_iter() {
let stable_value = stats_value.stable_value();
let stable_string = stats_value.stable_string();
let record_id = format!("{key}::{stable_string}");
let document = format!("statistics about {key} for {stable_string}");
let stable_value1 = stats_value.stable_value1();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Variable naming inconsistency: Using stable_value1 and stable_string1 with numeric suffixes makes the code less readable. Consider using descriptive names that align with the new method names:

Suggested change
let stable_value1 = stats_value.stable_value1();
let stable_numeric_value = stats_value.stable_numeric_value();
let stable_string_repr = stats_value.stable_string_representation();
let record_id = format!("{key}::{stable_string_repr}");
let document = format!("statistics about {key} for {stable_string_repr}");

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

**Variable naming inconsistency**: Using `stable_value1` and `stable_string1` with numeric suffixes makes the code less readable. Consider using descriptive names that align with the new method names:

```suggestion
                let stable_numeric_value = stats_value.stable_numeric_value();
                let stable_string_repr = stats_value.stable_string_representation();
                let record_id = format!("{key}::{stable_string_repr}");
                let document = format!("statistics about {key} for {stable_string_repr}");
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: rust/worker/src/execution/functions/statistics.rs
Line: 234

let stable_string1 = stats_value.stable_string1();
let record_id = format!("{key}::{stable_string1}");
let document = format!("statistics about {key} for {stable_string1}");

let mut metadata = HashMap::with_capacity(4);
metadata.insert("count".to_string(), count.output());
Expand All @@ -214,7 +243,13 @@ impl AttachedFunctionExecutor for StatisticsFunctionExecutor {
"type".to_string(),
UpdateMetadataValue::Str(stats_value.stable_type().to_string()),
);
metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_value));
metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_value1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Variable naming continues the pattern: Same issue with variable naming consistency. The numeric suffix pattern should be replaced with descriptive names:

Suggested change
metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_value1));
metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_numeric_value));
if let Some(stable_token_value) = stats_value.stable_token_value() {

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

**Variable naming continues the pattern**: Same issue with variable naming consistency. The numeric suffix pattern should be replaced with descriptive names:

```suggestion
                metadata.insert("value".to_string(), UpdateMetadataValue::Str(stable_numeric_value));
                if let Some(stable_token_value) = stats_value.stable_token_value() {
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: rust/worker/src/execution/functions/statistics.rs
Line: 246

if let Some(stable_value2) = stats_value.stable_value2() {
metadata.insert(
"value2".to_string(),
UpdateMetadataValue::Str(stable_value2),
);
}

keys.insert(record_id.clone());
records.push(LogRecord {
Expand Down
Loading