Skip to content

Conversation

@gibber9809
Copy link
Contributor

@gibber9809 gibber9809 commented Jul 18, 2025

Description

The existing clp-s code strangely handles search for variable strings containing wildcards by populating a SubQuery using EncodedVariableInterpreter::wildcard_search_dictionary_and_get_encoded_matches and checking the precise and imprecise variables in the resulting SubQuery to populate the set of matching variables for the predicate.

This PR simplifies this logic by directly using the VariableDictionary::get_entries_matching_wildcard_string member function and populating the set of matching variables directly based on the return value.

We also add some tests that exercise the wildcard and non-wildcard search paths for variable strings.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Added tests to exercise wildcard and non-wildcard variable string matching

Summary by CodeRabbit

  • New Features

    • Enhanced search functionality to better handle wildcard and escaped wildcard queries for variable strings.
  • Tests

    • Added new test cases to validate search behaviour with ambiguous and pattern-like string values.
    • Expanded test data to include additional entries for comprehensive wildcard search testing.

@gibber9809 gibber9809 requested review from a team and wraymo as code owners July 18, 2025 21:11
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 18, 2025

Walkthrough

The changes update the wildcard variable string search logic in QueryRunner.cpp by removing the use of EncodedVariableInterpreter and replacing it with direct dictionary queries. Corresponding test cases and test data are extended to cover ambiguous wildcard scenarios, specifically for variable strings containing asterisks.

Changes

File(s) Change Summary
components/core/src/clp_s/search/QueryRunner.cpp Replaces EncodedVariableInterpreter wildcard search with direct dictionary lookups; simplifies logic.
components/core/tests/test-clp_s-search.cpp Adds test cases for ambiguous wildcard and escaped asterisk queries on ambiguous_varstring field.
components/core/tests/test_log_files/test_search.jsonl Adds three new entries with ambiguous_varstring values to support new test cases.

Sequence Diagram(s)

sequenceDiagram
    participant QueryRunner
    participant VarDictionary

    QueryRunner->>VarDictionary: get_entries_matching_wildcard_string(query_string)
    VarDictionary-->>QueryRunner: matching_var_ids
    QueryRunner->>QueryRunner: Insert matching_var_ids into matching_vars set
Loading

Possibly related PRs

Suggested reviewers

  • wraymo

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Cppcheck (2.17.1)
components/core/src/clp_s/search/QueryRunner.cpp

/bin/bash: line 1: /usr/bin/cppcheck: No such file or directory

components/core/tests/test-clp_s-search.cpp

/bin/bash: line 1: /usr/bin/cppcheck: No such file or directory


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8fc0dd7 and 0ec4bc6.

📒 Files selected for processing (3)
  • components/core/src/clp_s/search/QueryRunner.cpp (2 hunks)
  • components/core/tests/test-clp_s-search.cpp (1 hunks)
  • components/core/tests/test_log_files/test_search.jsonl (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

Instructions used from:

Sources:
⚙️ CodeRabbit Configuration File

🧠 Learnings (4)
📓 Common learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:320-324
Timestamp: 2025-05-05T01:12:18.561Z
Learning: In the CLP codebase, the `m_case_sensitive_search` flag is used only for actual string value comparisons during query evaluation, not for schema key name matching. Schema keys are always compared case-sensitively regardless of this flag's setting.
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.
components/core/tests/test_log_files/test_search.jsonl (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
components/core/tests/test-clp_s-search.cpp (5)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:320-324
Timestamp: 2025-05-05T01:12:18.561Z
Learning: In the CLP codebase, the `m_case_sensitive_search` flag is used only for actual string value comparisons during query evaluation, not for schema key name matching. Schema keys are always compared case-sensitively regardless of this flag's setting.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:141-142
Timestamp: 2024-11-29T22:51:00.355Z
Learning: In 'components/core/tests/test-clp_s-end_to_end.cpp', using `WIFEXITED` and `WEXITSTATUS` with the return value of `std::system` is acceptable for checking the exit status of system commands, as per the C++ documentation.
components/core/src/clp_s/search/QueryRunner.cpp (3)
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.
🔇 Additional comments (4)
components/core/tests/test_log_files/test_search.jsonl (1)

11-13: LGTM! Well-structured test data additions.

The new test entries provide comprehensive coverage for wildcard matching scenarios, including regular strings, shorter matches, and literal asterisk cases. This properly supports the new test cases for both wildcard ("a*e") and escaped wildcard ("a\*e") patterns.

components/core/tests/test-clp_s-search.cpp (1)

224-225: LGTM! Excellent test coverage for wildcard matching.

The new test cases effectively validate both wildcard and escaped wildcard functionality:

  • "a*e" correctly expects matches for all three patterns (indices 10, 11, 12)
  • "a\*e" correctly expects only the literal asterisk match (index 12)

This provides comprehensive coverage for the refactored wildcard matching logic in QueryRunner.cpp.

components/core/src/clp_s/search/QueryRunner.cpp (2)

870-870: LGTM! Removed unused variable.

The SubQuery sub_query; variable was declared but never used. Good cleanup.


902-910: LGTM! Clean simplification of wildcard matching logic.

The refactoring successfully simplifies the wildcard search by:

  • Replacing the indirect approach via EncodedVariableInterpreter::wildcard_search_dictionary_and_get_encoded_matches with direct dictionary access
  • Using m_var_dict->get_entries_matching_wildcard_string to directly retrieve matching entries
  • Streamlining the population of matching_vars by iterating through the dictionary results

This maintains the same functionality while reducing complexity and improving code clarity.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@kirkrodrigues kirkrodrigues requested a review from haiqi96 July 21, 2025 15:05
Copy link
Contributor

@haiqi96 haiqi96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. The only nitpick might be that new empty line but I think it's actually better to keep it in the change.

Did some simple unit-testing to convince myself that the change works

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring to @haiqi96's review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants