refactor(clp-s): Simplify dictionary search for variable string predicates. #1112

gibber9809 · 2025-07-18T21:11:15Z

Description

The existing clp-s code strangely handles search for variable strings containing wildcards by populating a SubQuery using EncodedVariableInterpreter::wildcard_search_dictionary_and_get_encoded_matches and checking the precise and imprecise variables in the resulting SubQuery to populate the set of matching variables for the predicate.

This PR simplifies this logic by directly using the VariableDictionary::get_entries_matching_wildcard_string member function and populating the set of matching variables directly based on the return value.

We also add some tests that exercise the wildcard and non-wildcard search paths for variable strings.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Added tests to exercise wildcard and non-wildcard variable string matching

Summary by CodeRabbit

New Features
- Enhanced search functionality to better handle wildcard and escaped wildcard queries for variable strings.
Tests
- Added new test cases to validate search behaviour with ambiguous and pattern-like string values.
- Expanded test data to include additional entries for comprehensive wildcard search testing.

coderabbitai · 2025-07-18T21:11:21Z

Walkthrough

The changes update the wildcard variable string search logic in QueryRunner.cpp by removing the use of EncodedVariableInterpreter and replacing it with direct dictionary queries. Corresponding test cases and test data are extended to cover ambiguous wildcard scenarios, specifically for variable strings containing asterisks.

Changes

File(s)	Change Summary
components/core/src/clp_s/search/QueryRunner.cpp	Replaces `EncodedVariableInterpreter` wildcard search with direct dictionary lookups; simplifies logic.
components/core/tests/test-clp_s-search.cpp	Adds test cases for ambiguous wildcard and escaped asterisk queries on `ambiguous_varstring` field.
components/core/tests/test_log_files/test_search.jsonl	Adds three new entries with `ambiguous_varstring` values to support new test cases.

Sequence Diagram(s)

sequenceDiagram
    participant QueryRunner
    participant VarDictionary

    QueryRunner->>VarDictionary: get_entries_matching_wildcard_string(query_string)
    VarDictionary-->>QueryRunner: matching_var_ids
    QueryRunner->>QueryRunner: Insert matching_var_ids into matching_vars set

Possibly related PRs

refactor(clp-s): Replace clp_s::StringUtils function calls with clp::string_utils equivalents and remove redundant implementations. #1103: Refactors string utility usage in QueryRunner.cpp, which is related at the utility usage level to this PR's changes in wildcard matching logic.

Suggested reviewers

wraymo

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Cppcheck (2.17.1)

components/core/src/clp_s/search/QueryRunner.cpp

/bin/bash: line 1: /usr/bin/cppcheck: No such file or directory

components/core/tests/test-clp_s-search.cpp

/bin/bash: line 1: /usr/bin/cppcheck: No such file or directory

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8fc0dd7 and 0ec4bc6.

📒 Files selected for processing (3)

components/core/src/clp_s/search/QueryRunner.cpp (2 hunks)
components/core/tests/test-clp_s-search.cpp (1 hunks)
components/core/tests/test_log_files/test_search.jsonl (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

Instructions used from:

Sources:
⚙️ CodeRabbit Configuration File

🧠 Learnings (4)

📓 Common learnings

Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:320-324
Timestamp: 2025-05-05T01:12:18.561Z
Learning: In the CLP codebase, the `m_case_sensitive_search` flag is used only for actual string value comparisons during query evaluation, not for schema key name matching. Schema keys are always compared case-sensitively regardless of this flag's setting.

Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.

components/core/tests/test_log_files/test_search.jsonl (1)

Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.

components/core/tests/test-clp_s-search.cpp (5)

Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.

Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:320-324
Timestamp: 2025-05-05T01:12:18.561Z
Learning: In the CLP codebase, the `m_case_sensitive_search` flag is used only for actual string value comparisons during query evaluation, not for schema key name matching. Schema keys are always compared case-sensitively regardless of this flag's setting.

Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:141-142
Timestamp: 2024-11-29T22:51:00.355Z
Learning: In 'components/core/tests/test-clp_s-end_to_end.cpp', using `WIFEXITED` and `WEXITSTATUS` with the return value of `std::system` is acceptable for checking the exit status of system commands, as per the C++ documentation.

components/core/src/clp_s/search/QueryRunner.cpp (3)

Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.

Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

🔇 Additional comments (4)

components/core/tests/test_log_files/test_search.jsonl (1)

11-13: LGTM! Well-structured test data additions.

The new test entries provide comprehensive coverage for wildcard matching scenarios, including regular strings, shorter matches, and literal asterisk cases. This properly supports the new test cases for both wildcard ("a*e") and escaped wildcard ("a\*e") patterns.

components/core/tests/test-clp_s-search.cpp (1)

224-225: LGTM! Excellent test coverage for wildcard matching.

The new test cases effectively validate both wildcard and escaped wildcard functionality:

"a*e" correctly expects matches for all three patterns (indices 10, 11, 12)

"a\*e" correctly expects only the literal asterisk match (index 12)

This provides comprehensive coverage for the refactored wildcard matching logic in QueryRunner.cpp.

components/core/src/clp_s/search/QueryRunner.cpp (2)

870-870: LGTM! Removed unused variable.

The SubQuery sub_query; variable was declared but never used. Good cleanup.

902-910: LGTM! Clean simplification of wildcard matching logic.

The refactoring successfully simplifies the wildcard search by:

Replacing the indirect approach via EncodedVariableInterpreter::wildcard_search_dictionary_and_get_encoded_matches with direct dictionary access

Using m_var_dict->get_entries_matching_wildcard_string to directly retrieve matching entries

Streamlining the population of matching_vars by iterating through the dictionary results

This maintains the same functionality while reducing complexity and improving code clarity.

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

haiqi96

Looks reasonable. The only nitpick might be that new empty line but I think it's actually better to keep it in the change.

Did some simple unit-testing to convince myself that the change works

kirkrodrigues

Deferring to @haiqi96's review.

gibber9809 added 2 commits July 18, 2025 20:45

Simplify handling of VarString during search

ee391a2

Add variable string search tests.

0ec4bc6

gibber9809 requested review from a team and wraymo as code owners July 18, 2025 21:11

kirkrodrigues requested a review from haiqi96 July 21, 2025 15:05

haiqi96 approved these changes Jul 21, 2025

View reviewed changes

kirkrodrigues approved these changes Jul 21, 2025

View reviewed changes

gibber9809 merged commit 0bfaf5c into y-scope:main Jul 21, 2025
23 of 26 checks passed

This was referenced Jul 29, 2025

refactor(clp): Prepare parsing & search code for deduplication with clp-s: #1138

Merged

refactor(clp): Prepare Grep code for deduplication with clp-s: #1143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(clp-s): Simplify dictionary search for variable string predicates. #1112

refactor(clp-s): Simplify dictionary search for variable string predicates. #1112

Uh oh!

gibber9809 commented Jul 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 18, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

haiqi96 left a comment •

edited

Loading

Uh oh!

kirkrodrigues left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor(clp-s): Simplify dictionary search for variable string predicates. #1112

refactor(clp-s): Simplify dictionary search for variable string predicates. #1112

Uh oh!

Conversation

gibber9809 commented Jul 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

haiqi96 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kirkrodrigues left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gibber9809 commented Jul 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 18, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

haiqi96 left a comment •

edited

Loading