Skip to content

Conversation

@gibber9809
Copy link
Contributor

@gibber9809 gibber9809 commented Jul 30, 2025

  • Move Grep::process_raw_query and associated utilities into a new GrepCore class.
  • Templatize dictionary-parameters in GrepCore.
  • Use dictionary IDs instead of pointers to dictionary entries in subquery generation.
  • Move segment matching out of GrepCore::process_raw_query flow.
  • Make callers of GrepCore::process_raw_query explicitly specify all wildcards in the query string.
  • Make callers of GrepCore::process_raw_query clean up the query string.

Description

This PR is the second in a series of PRs (see #1138 for the first) that is attempting to bring the core CLP parsing and search code to a state where it can be shared by both clp and clp-s. The full end to end integration can be seen here.

This PR focuses on splitting up Grep into the parts that will be shared (GrepCore) and the parts specific to CLP (left as Grep), and making the parts that will be shared independent of specific CLP implementations. The diff size is large since code is moved around, but the actual amount of changes is fairly small. In particular this PR:

  1. Moves process_raw_query and associated utility functions to GrepCore

  2. Modifies functions in GrepCore to use templates in order to remove dependencies on clp specific dictionary implementations -- this will be augmented with concepts in a follow-up PR in order to guarantee that the dictionaries follow a certain interface.

  3. Moves the QueryToken class into its own files QueryToken.{cpp,hpp}

  4. Refactors Query and Subquery classes to work in terms of dictionary ids (and segment ids) instead of pointers to dictionary entries in order to remove dependence on clp-specific dictionary implementations.

  5. Refactors process_raw_query flow in a few ways

  • Leaves adding implicit wildcards up to the caller. This makes the implementation better able to serve the needs of clp and clp-s, and makes it so that the interface shouldn't need to be changed if we interally switch to ffi-style subquery generation
  • Moves segment index calculation outside of process_raw_query -- now callers that are interested in finding matches based on the segment index must call calculate_ids_of_matching_segments on the resulting Query object from process_raw_query. This simplifies the process_raw_query flow and allows callers that don't care about the segment index (clp-s) to not have to worry about the segment index calculation flow.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Validated that all unit tests pass
  • Verified that after the entire sequence of planned PRs performance looks good

Summary by CodeRabbit

  • New Features

    • Introduced advanced query tokenization and variable identification, enabling more flexible and accurate search queries with support for wildcards and ambiguous tokens.
    • Added new query processing logic that efficiently matches log entries using structured sub-queries and segment filtering.
  • Refactor

    • Replaced pointer-based dictionary references with ID-based references in query logic for improved modularity and performance.
    • Split and reorganized core search and query tokenization logic into new components.
  • Bug Fixes

    • Enhanced handling of wildcards and ambiguous tokens in search queries, improving accuracy and reliability.
  • Chores

    • Updated build and lint configurations to include new source files and tests.
    • Adjusted test cases to use the new query processing components.

@gibber9809 gibber9809 requested a review from a team as a code owner July 30, 2025 19:50
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 30, 2025

Walkthrough

This change refactors and modularizes the query parsing and matching logic for log search functionality. It introduces new components (GrepCore, QueryToken) to encapsulate query tokenization and processing, updates interfaces to use dictionary IDs instead of pointers, and adapts build, test, and lint configurations. The main programs are updated to use the new APIs.

Changes

Cohort / File(s) Change Summary
Build System Updates
components/core/CMakeLists.txt, components/core/src/clp/clg/CMakeLists.txt, components/core/src/clp/clo/CMakeLists.txt
Added GrepCore.cpp, GrepCore.hpp, QueryToken.cpp, QueryToken.hpp to build sources for core, clg, and clo targets. Updated test source from test-Grep.cpp to test-GrepCore.cpp.
Lint Configuration
taskfiles/lint.yaml
Excluded GrepCore and QueryToken source/header files from clang-tidy. Updated test exclusion from test-Grep.cpp to test-GrepCore.cpp.
Test Update
components/core/tests/test-GrepCore.cpp
Updated tests to use GrepCore::get_bounds_of_next_potential_var and corresponding header.
Grep and Query Refactor
components/core/src/clp/Grep.cpp, components/core/src/clp/Grep.hpp, components/core/src/clp/Query.cpp, components/core/src/clp/Query.hpp
Removed query tokenization and variable boundary logic from Grep, migrated to GrepCore. Updated Query and SubQuery to use dictionary IDs instead of pointers. Changed interfaces for segment ID calculation.
New Query Tokenization Components
components/core/src/clp/GrepCore.cpp, components/core/src/clp/GrepCore.hpp, components/core/src/clp/QueryToken.cpp, components/core/src/clp/QueryToken.hpp
Added GrepCore class for query processing and token boundary logic. Added QueryToken class for representing and classifying query tokens.
Encoded Variable Interpreter Update
components/core/src/clp/EncodedVariableInterpreter.hpp
Removed template parameter for dictionary entry type. Standardized use of dictionary IDs instead of pointers in dictionary search and encoding methods.
Main Program Adaptation
components/core/src/clp/clg/clg.cpp, components/core/src/clp/clo/clo.cpp
Updated to use GrepCore::process_raw_query with explicit dictionary arguments. Added logic to calculate segment IDs for sub-queries. Added implicit wildcards and cleaned search strings before processing.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant MainProgram
    participant GrepCore
    participant QueryToken
    participant LogtypeDict
    participant VarDict

    User->>MainProgram: Provide search string
    MainProgram->>GrepCore: process_raw_query(logtype_dict, var_dict, search_string, ...)
    GrepCore->>QueryToken: Tokenize search string
    QueryToken-->>GrepCore: Return tokenized representation
    GrepCore->>LogtypeDict: Lookup logtype matches (if needed)
    GrepCore->>VarDict: Lookup variable matches (if needed)
    GrepCore-->>MainProgram: Return Query object with sub-queries
    MainProgram->>Query: calculate_ids_of_matching_segments(get_segments_by_logtype_id, get_segments_by_var_id)
    Query->>LogtypeDict: Get segments for logtype IDs
    Query->>VarDict: Get segments for variable IDs
    Query-->>MainProgram: Matching segment IDs
    MainProgram-->>User: Present search results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • kirkrodrigues
  • haiqi96

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0685c2a and 0808664.

📒 Files selected for processing (2)
  • components/core/src/clp/GrepCore.hpp (1 hunks)
  • components/core/src/clp/clg/clg.cpp (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit Configuration File

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/clg/clg.cpp
  • components/core/src/clp/GrepCore.hpp
🧠 Learnings (21)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.978Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.
Learnt from: gibber9809
PR: y-scope/clp#504
File: components/core/src/clp_s/search/kql/CMakeLists.txt:29-29
Timestamp: 2024-10-22T15:36:04.655Z
Learning: When reviewing pull requests, focus on the changes within the PR and avoid commenting on issues outside the scope of the PR.
📚 Learning: in the clp project, when reviewing cmakelists.txt changes that introduce new compression library dep...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, validation and exception throwing are unnecessary in ...
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: in `components/core/src/clp/bufferedfilereader.cpp`, refactoring the nested error handling condition...
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
  • components/core/src/clp/GrepCore.hpp
📚 Learning: in the clp project, antlr code generation at build time is being removed by another pr. when reviewi...
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: in 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floa...
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: in the clp codebase, standard headers like `` and `` in alphabetical order (as seen...
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: in the `queryhandlerimpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: in the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible f...
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: in the clp project, c++20 is used, allowing the utilization of c++20 features such as class template...
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `get_archive_node_id` method, throwing a s...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: in `components/core/tests/test-error_handling.cpp` (c++), when specializing template methods of `err...
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In `components/core/tests/test-error_handling.cpp` (C++), when specializing template methods of `ErrorCategory` via type aliases (e.g., `AlwaysSuccessErrorCategory`), it's acceptable to define these specializations in the current namespace, even though the primary template is in the `clp::error_handling` namespace, due to the use of type aliases and template instantiation.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: in `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of ir files, mak...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, when handling errors in `parse_from_ir`, prefer to ma...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
📚 Learning: for the `queryhandlerimpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over ...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.

Applied to files:

  • components/core/src/clp/clg/clg.cpp
  • components/core/src/clp/GrepCore.hpp
📚 Learning: in the clp search component, the `evaluate_wildcard_filter` function should return `astevaluationres...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • components/core/src/clp/GrepCore.hpp
📚 Learning: double parentheses in the codebase are intentional and required for clang-tidy to pass. these should...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in the codebase are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

Applied to files:

  • components/core/src/clp/GrepCore.hpp
📚 Learning: double parentheses in require statements (e.g., `require((expression))`) and in conditional expressi...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in REQUIRE statements (e.g., `REQUIRE((expression))`) and in conditional expressions are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

Applied to files:

  • components/core/src/clp/GrepCore.hpp
📚 Learning: in log-surgeon, the `.` character is treated as a literal non-delimiter character rather than a rege...
Learnt from: SharafMohamed
PR: y-scope/clp#1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the `.` character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like `.*` in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.

Applied to files:

  • components/core/src/clp/GrepCore.hpp
📚 Learning: the `queryhandlerimpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is as...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

Applied to files:

  • components/core/src/clp/GrepCore.hpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `parse_from_ir` method, encountering error...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.

Applied to files:

  • components/core/src/clp/GrepCore.hpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (9)
components/core/src/clp/clg/clg.cpp (4)

215-226: LGTM!

The refactoring correctly extracts the dictionaries from the archive and passes them to the new GrepCore::process_raw_query API, aligning with the goal of decoupling from clp-specific implementations.


242-258: LGTM!

The segment ID calculation logic correctly implements the new design pattern where callers explicitly invoke calculate_ids_of_matching_segments. The lambdas properly capture dictionaries by reference and provide the required interface.


505-507: LGTM!

The lambda correctly implements the addition of implicit wildcards as per the new design, wrapping search strings with asterisks and cleaning them up.


512-512: LGTM!

The add_implicit_wildcards lambda is consistently applied to all search strings, ensuring uniform wildcard handling.

Also applies to: 518-518

components/core/src/clp/GrepCore.hpp (5)

23-54: LGTM!

The class design as a utility with static methods is appropriate for the refactoring goals. The template approach provides good flexibility for different dictionary implementations, and the documentation clearly states preconditions.


165-176: LGTM!

The wildcard replacement logic correctly handles the conversion from '?' to '*' wildcards, with proper cleanup to handle edge cases like consecutive wildcards. The comment adequately explains the rationale.


148-265: LGTM!

The process_raw_query implementation follows a clear and logical flow for tokenizing search strings, handling ambiguous tokens, and generating sub-queries. The early return optimization for superseding queries is well-implemented.


267-326: LGTM!

The process_var_token method correctly handles all variable token types, with appropriate wildcard matching requirements and proper construction of logtype strings.


328-422: LGTM!

The generate_logtypes_and_vars_for_subquery method provides comprehensive handling of logtype generation with proper escape character processing, efficient handling of superseding queries, and correct dictionary lookups.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (3)
components/core/src/clp/clo/clo.cpp (1)

500-533: Well-structured refactoring that aligns with PR objectives.

The changes correctly:

  1. Extract dictionaries from the archive reader
  2. Add implicit wildcards and clean the search string
  3. Use the new GrepCore::process_raw_query with explicit dictionary references
  4. Calculate matching segments explicitly using lambda functions

Consider capturing the dictionaries by const reference in the lambdas to avoid potential copies:

-    auto get_segments_containing_log_dict_id
-            = [&log_dict](logtype_dictionary_id_t logtype_id) -> std::set<segment_id_t> const& {
+    auto get_segments_containing_log_dict_id
+            = [&log_dict = std::as_const(log_dict)](logtype_dictionary_id_t logtype_id) -> std::set<segment_id_t> const& {
components/core/src/clp/clg/clg.cpp (1)

215-258: Consistent refactoring that properly uses the new GrepCore interface.

The changes correctly implement the new query processing pattern and use the add_implicit_wildcards lambda for consistent wildcard handling.

As with clo.cpp, consider capturing dictionaries by const reference in the lambdas to ensure no accidental modifications:

-                auto get_segments_containing_log_dict_id
-                        = [&log_dict](
-                                  logtype_dictionary_id_t logtype_id
-                          ) -> std::set<segment_id_t> const& {
+                auto get_segments_containing_log_dict_id
+                        = [&log_dict = std::as_const(log_dict)](
+                                  logtype_dictionary_id_t logtype_id
+                          ) -> std::set<segment_id_t> const& {

Also applies to: 511-511, 517-517

components/core/src/clp/QueryToken.hpp (1)

1-67: Well-designed QueryToken class that supports the refactoring goals.

The class properly encapsulates token properties with a clean interface. The support for ambiguous tokens and type cycling aligns well with the PR objectives.

Consider adding documentation for the constructor parameters to clarify what is_var represents:

     // Constructors
+    /**
+     * @param query_string The full query string containing this token
+     * @param begin_pos Start position of the token in the query string
+     * @param end_pos End position of the token in the query string
+     * @param is_var Whether this token was identified as a potential variable
+     */
     QueryToken(std::string const& query_string, size_t begin_pos, size_t end_pos, bool is_var);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1fa32d and c8bc92b.

📒 Files selected for processing (17)
  • components/core/CMakeLists.txt (2 hunks)
  • components/core/src/clp/EncodedVariableInterpreter.hpp (4 hunks)
  • components/core/src/clp/Grep.cpp (1 hunks)
  • components/core/src/clp/Grep.hpp (0 hunks)
  • components/core/src/clp/GrepCore.cpp (1 hunks)
  • components/core/src/clp/GrepCore.hpp (1 hunks)
  • components/core/src/clp/Query.cpp (7 hunks)
  • components/core/src/clp/Query.hpp (8 hunks)
  • components/core/src/clp/QueryToken.cpp (1 hunks)
  • components/core/src/clp/QueryToken.hpp (1 hunks)
  • components/core/src/clp/StringReader.cpp (0 hunks)
  • components/core/src/clp/clg/CMakeLists.txt (2 hunks)
  • components/core/src/clp/clg/clg.cpp (5 hunks)
  • components/core/src/clp/clo/CMakeLists.txt (2 hunks)
  • components/core/src/clp/clo/clo.cpp (4 hunks)
  • components/core/tests/test-Grep.cpp (3 hunks)
  • taskfiles/lint.yaml (2 hunks)
💤 Files with no reviewable changes (2)
  • components/core/src/clp/StringReader.cpp
  • components/core/src/clp/Grep.hpp
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit Configuration File

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/tests/test-Grep.cpp
  • components/core/src/clp/clo/clo.cpp
  • components/core/src/clp/clg/clg.cpp
  • components/core/src/clp/EncodedVariableInterpreter.hpp
  • components/core/src/clp/Grep.cpp
  • components/core/src/clp/GrepCore.cpp
  • components/core/src/clp/GrepCore.hpp
  • components/core/src/clp/QueryToken.cpp
  • components/core/src/clp/QueryToken.hpp
  • components/core/src/clp/Query.hpp
  • components/core/src/clp/Query.cpp
🧠 Learnings (13)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.978Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.
taskfiles/lint.yaml (6)

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: anlowee
PR: #925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.

Learnt from: jackluo923
PR: #1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like <cstdint> and <string> in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: anlowee
PR: #925
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-19
Timestamp: 2025-05-26T18:39:51.727Z
Learning: In CMakeLists.txt files, anlowee prefers to explicitly list source files one by one rather than using file(GLOB ...) or file(GLOB_RECURSE ...) to automatically include files, maintaining consistency with other CMake files in the project.

components/core/CMakeLists.txt (10)

Learnt from: anlowee
PR: #925
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-19
Timestamp: 2025-05-26T18:39:51.727Z
Learning: In CMakeLists.txt files, anlowee prefers to explicitly list source files one by one rather than using file(GLOB ...) or file(GLOB_RECURSE ...) to automatically include files, maintaining consistency with other CMake files in the project.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.hpp:15-18
Timestamp: 2025-05-02T22:11:55.711Z
Learning: The team plans to systematically improve include paths by adding ${PROJECT_SOURCE_DIR}/src to CMake targets and using direct include paths instead of deep relative ones in a future PR.

Learnt from: jackluo923
PR: #1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like <cstdint> and <string> in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Learnt from: haiqi96
PR: #523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In components/core/src/clp/BufferedFileReader.cpp, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The QueryHandlerImpl class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

components/core/src/clp/clo/CMakeLists.txt (8)

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: anlowee
PR: #925
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-19
Timestamp: 2025-05-26T18:39:51.727Z
Learning: In CMakeLists.txt files, anlowee prefers to explicitly list source files one by one rather than using file(GLOB ...) or file(GLOB_RECURSE ...) to automatically include files, maintaining consistency with other CMake files in the project.

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: jackluo923
PR: #1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like <cstdint> and <string> in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.hpp:15-18
Timestamp: 2025-05-02T22:11:55.711Z
Learning: The team plans to systematically improve include paths by adding ${PROJECT_SOURCE_DIR}/src to CMake targets and using direct include paths instead of deep relative ones in a future PR.

Learnt from: haiqi96
PR: #523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In components/core/src/clp/BufferedFileReader.cpp, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Learnt from: LinZhihao-723
PR: #558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file components/core/tests/test-ffi_KeyValuePairLogEvent.cpp, including <json/single_include/nlohmann/json.hpp> is consistent with the project's coding standards.

components/core/tests/test-Grep.cpp (8)

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: LinZhihao-723
PR: #570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (components/core/tests/test-ir_encoding_methods.cpp), exception handling for msgpack::unpack can be omitted because the Catch2 testing framework captures exceptions if they occur.

Learnt from: haiqi96
PR: #523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In components/core/src/clp/BufferedFileReader.cpp, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Learnt from: LinZhihao-723
PR: #557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case ffi_ir_stream_serialize_schema_tree_node_id in test-ir_encoding_methods.cpp, suppressing the readability-function-cognitive-complexity warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Learnt from: LinZhihao-723
PR: #558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file components/core/tests/test-ffi_KeyValuePairLogEvent.cpp, including <json/single_include/nlohmann/json.hpp> is consistent with the project's coding standards.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In components/core/src/clp_s/JsonParser.cpp, when handling errors in parse_from_ir, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In components/core/tests/test-clp_s-end_to_end.cpp, the success of constructor.store() is verified through REQUIRE statements and subsequent comparisons.

Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in REQUIRE statements (e.g., REQUIRE((expression))) and in conditional expressions are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

components/core/src/clp/clo/clo.cpp (18)

Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:320-324
Timestamp: 2025-05-05T01:12:18.561Z
Learning: In the CLP codebase, the m_case_sensitive_search flag is used only for actual string value comparisons during query evaluation, not for schema key name matching. Schema keys are always compared case-sensitively regardless of this flag's setting.

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the evaluate_wildcard_filter function should return AstEvaluationResult::Pruned when node_id_value_pairs is empty, not AstEvaluationResult::False. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Learnt from: haiqi96
PR: #523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In components/core/src/clp/BufferedFileReader.cpp, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The QueryHandlerImpl class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: LinZhihao-723
PR: #558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file components/core/tests/test-ffi_KeyValuePairLogEvent.cpp, including <json/single_include/nlohmann/json.hpp> is consistent with the project's coding standards.

Learnt from: haiqi96
PR: #619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ FileDecompressor implementations at components/core/src/clp/clp/FileDecompressor.cpp and components/core/src/glt/glt/FileDecompressor.cpp, the temp_output_path variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In components/core/src/clp_s/JsonParser.cpp, within the get_archive_node_id method, throwing a string literal as an exception is acceptable practice.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In components/core/src/clp_s/JsonParser.cpp, within the get_archive_node_id method, throwing a string literal as an exception is acceptable practice.

Learnt from: gibber9809
PR: #630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In components/core/src/clp_s/JsonParser.cpp, within the get_archive_node_id function, validation and exception throwing for UTF-8 compliance of curr_node.get_key_name() are unnecessary and should be omitted.

Learnt from: gibber9809
PR: #630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In components/core/src/clp_s/JsonParser.cpp, validation and exception throwing are unnecessary in the get_archive_node_id method when processing nodes, and should not be added.

Learnt from: LinZhihao-723
PR: #856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

Learnt from: LinZhihao-723
PR: #486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In components/core/tests/test-error_handling.cpp (C++), when specializing template methods of ErrorCategory via type aliases (e.g., AlwaysSuccessErrorCategory), it's acceptable to define these specializations in the current namespace, even though the primary template is in the clp::error_handling namespace, due to the use of type aliases and template instantiation.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In components/core/src/clp_s/JsonParser.cpp, when handling errors in parse_from_ir, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-clp_s-end_to_end.cpp:141-142
Timestamp: 2024-11-29T22:51:00.355Z
Learning: In 'components/core/tests/test-clp_s-end_to_end.cpp', using WIFEXITED and WEXITSTATUS with the return value of std::system is acceptable for checking the exit status of system commands, as per the C++ documentation.

Learnt from: LinZhihao-723
PR: #486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with std::array.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

components/core/src/clp/clg/clg.cpp (14)

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: gibber9809
PR: #630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In components/core/src/clp_s/JsonParser.cpp, validation and exception throwing are unnecessary in the get_archive_node_id method when processing nodes, and should not be added.

Learnt from: haiqi96
PR: #523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In components/core/src/clp/BufferedFileReader.cpp, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: jackluo923
PR: #1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like <cstdint> and <string> in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the QueryHandlerImpl.cpp file, the unique_projected_columns set (using std::string_view) is intentionally designed to only check for duplications within the local scope of the create_projected_columns_and_projection_map function. The team decided this is an acceptable use of std::string_view in a container since the referenced strings remain valid throughout the function's execution.

Learnt from: LinZhihao-723
PR: #856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

Learnt from: LinZhihao-723
PR: #486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with std::array.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In components/core/src/clp_s/JsonParser.cpp, within the get_archive_node_id method, throwing a string literal as an exception is acceptable practice.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In components/core/src/clp_s/JsonParser.cpp, within the get_archive_node_id method, throwing a string literal as an exception is acceptable practice.

Learnt from: LinZhihao-723
PR: #486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In components/core/tests/test-error_handling.cpp (C++), when specializing template methods of ErrorCategory via type aliases (e.g., AlwaysSuccessErrorCategory), it's acceptable to define these specializations in the current namespace, even though the primary template is in the clp::error_handling namespace, due to the use of type aliases and template instantiation.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In components/core/src/clp_s/JsonParser.cpp, when handling errors in parse_from_ir, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

components/core/src/clp/clg/CMakeLists.txt (8)

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: anlowee
PR: #925
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-19
Timestamp: 2025-05-26T18:39:51.727Z
Learning: In CMakeLists.txt files, anlowee prefers to explicitly list source files one by one rather than using file(GLOB ...) or file(GLOB_RECURSE ...) to automatically include files, maintaining consistency with other CMake files in the project.

Learnt from: jackluo923
PR: #1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like <cstdint> and <string> in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: Bill-hbrhbr
PR: #1126
File: .gitignore:5-5
Timestamp: 2025-07-25T21:29:48.947Z
Learning: In the CLP project, the .clang-format file is maintained in the yscope-dev-utils submodule and copied over to the main CLP repository, so it should be ignored in .gitignore to prevent accidental commits of the copied file and maintain the single source of truth in the submodule.

Learnt from: anlowee
PR: #925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.

Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.hpp:15-18
Timestamp: 2025-05-02T22:11:55.711Z
Learning: The team plans to systematically improve include paths by adding ${PROJECT_SOURCE_DIR}/src to CMake targets and using direct include paths instead of deep relative ones in a future PR.

components/core/src/clp/Grep.cpp (5)

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: haiqi96
PR: #523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In components/core/src/clp/BufferedFileReader.cpp, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The QueryHandlerImpl class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the QueryHandlerImpl class, reusing member variables like m_ast_dfs_stack is preferred over making methods const and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like evaluate_kv_pair_log_event.

components/core/src/clp/GrepCore.cpp (3)

Learnt from: haiqi96
PR: #523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In components/core/src/clp/BufferedFileReader.cpp, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Learnt from: haiqi96
PR: #619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ FileDecompressor implementations at components/core/src/clp/clp/FileDecompressor.cpp and components/core/src/glt/glt/FileDecompressor.cpp, the temp_output_path variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.

Learnt from: LinZhihao-723
PR: #856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

components/core/src/clp/GrepCore.hpp (2)

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the QueryHandlerImpl class, reusing member variables like m_ast_dfs_stack is preferred over making methods const and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like evaluate_kv_pair_log_event.

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The QueryHandlerImpl class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

components/core/src/clp/Query.hpp (7)

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the QueryHandlerImpl class, reusing member variables like m_ast_dfs_stack is preferred over making methods const and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like evaluate_kv_pair_log_event.

Learnt from: LinZhihao-723
PR: #558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: KeyValuePairLogEvent::create requires std::shared_ptr<SchemaTree>, so m_auto_generated_schema_tree and m_user_generated_schema_tree need to be std::shared_ptr.

Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the QueryHandlerImpl.cpp file, the unique_projected_columns set (using std::string_view) is intentionally designed to only check for duplications within the local scope of the create_projected_columns_and_projection_map function. The team decided this is an acceptable use of std::string_view in a container since the referenced strings remain valid throughout the function's execution.

Learnt from: gibber9809
PR: #584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In components/core/src/clp_s/SchemaTree.hpp, the SchemaNode class uses std::unique_ptr<char[]> m_key_buf and std::string_view m_key_name to ensure that references to m_key_name remain valid even after SchemaNode is move-constructed.

Learnt from: gibber9809
PR: #584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In components/core/src/clp_s/SchemaTree.hpp, it's acceptable to use std::string_view as keys in m_node_map because SchemaNode's m_key_name remains valid even after move operations or reallocations, preventing dangling references.

Learnt from: jackluo923
PR: #1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like <cstdint> and <string> in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The QueryHandlerImpl class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

components/core/src/clp/Query.cpp (4)

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the QueryHandlerImpl class, reusing member variables like m_ast_dfs_stack is preferred over making methods const and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like evaluate_kv_pair_log_event.

Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the QueryHandlerImpl.cpp file, the unique_projected_columns set (using std::string_view) is intentionally designed to only check for duplications within the local scope of the create_projected_columns_and_projection_map function. The team decided this is an acceptable use of std::string_view in a container since the referenced strings remain valid throughout the function's execution.

Learnt from: gibber9809
PR: #584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In components/core/src/clp_s/SchemaTree.hpp, it's acceptable to use std::string_view as keys in m_node_map because SchemaNode's m_key_name remains valid even after move operations or reallocations, preventing dangling references.

Learnt from: gibber9809
PR: #584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In components/core/src/clp_s/SchemaTree.hpp, the SchemaNode class uses std::unique_ptr<char[]> m_key_buf and std::string_view m_key_name to ensure that references to m_key_name remain valid even after SchemaNode is move-constructed.

🧬 Code Graph Analysis (6)
components/core/src/clp/clg/clg.cpp (3)
components/core/src/clp/Utils.hpp (1)
  • load_lexer_from_file (52-55)
components/core/src/clp/Query.hpp (3)
  • Query (167-171)
  • get_segments_containing_log_dict_id (112-117)
  • get_segments_containing_log_dict_id (219-224)
components/core/src/clp/GrepCore.hpp (3)
  • process_raw_query (45-54)
  • process_raw_query (148-265)
  • process_raw_query (148-157)
components/core/src/clp/GrepCore.cpp (2)
components/core/src/clp/QueryToken.cpp (4)
  • is_wildcard (145-147)
  • is_wildcard (145-145)
  • is_var (135-143)
  • is_var (135-135)
components/core/src/clp/GrepCore.hpp (2)
  • value (65-70)
  • value (82-88)
components/core/src/clp/GrepCore.hpp (3)
components/core/src/clp/QueryToken.cpp (4)
  • is_var (135-143)
  • is_var (135-135)
  • is_wildcard (145-147)
  • is_wildcard (145-145)
components/core/src/clp/GrepCore.cpp (4)
  • get_bounds_of_next_potential_var (33-151)
  • get_bounds_of_next_potential_var (33-38)
  • get_bounds_of_next_potential_var (153-272)
  • get_bounds_of_next_potential_var (153-159)
components/core/src/clp_s/search/clp_search/Grep.cpp (1)
  • clp_s (17-630)
components/core/src/clp/QueryToken.cpp (2)
components/core/src/clp/QueryToken.hpp (1)
  • QueryToken (13-13)
components/core/src/clp_s/SchemaTree.hpp (1)
  • m_type (92-92)
components/core/src/clp/QueryToken.hpp (1)
components/core/src/clp/QueryToken.cpp (2)
  • is_var (135-143)
  • is_var (135-135)
components/core/src/clp/Query.hpp (2)
components/core/src/clp_s/search/clp_search/Query.hpp (11)
  • precise_dict_var (82-85)
  • possible_dict_vars (92-95)
  • m_is_precise_var (39-39)
  • m_is_dict_var (41-41)
  • m_var_dict_entry (43-43)
  • m_possible_var_dict_entries (45-48)
  • m_wildcard_match_required (113-113)
  • m_possible_logtype_ids (115-115)
  • vars (141-141)
  • m_vars (121-121)
  • m_vars (123-123)
components/core/src/glt/Query.hpp (15)
  • precise_dict_var (85-88)
  • possible_dict_vars (95-98)
  • segment_ids (40-40)
  • m_is_precise_var (42-42)
  • m_is_dict_var (44-44)
  • m_var_dict_entry (46-46)
  • m_possible_var_dict_entries (48-51)
  • m_wildcard_match_required (116-116)
  • m_wildcard_match_required (246-246)
  • m_possible_logtype_ids (118-118)
  • m_possible_logtype_ids (124-126)
  • vars (148-148)
  • vars (244-244)
  • m_vars (128-128)
  • m_vars (130-130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (15)
components/core/src/clp/clg/CMakeLists.txt (1)

36-37: Potential duplicate compilation units – verify that Grep and GrepCore coexist cleanly

Grep.cpp/Grep.hpp remain in the source list while their functional successor (GrepCore.*) is now compiled into the same target. If any non-inline symbols were moved (rather than renamed) from Grep into GrepCore, you may get duplicate‐definition link errors or, worse, ODR violations that slip past the linker.

Please confirm that

  1. the old class still provides distinct symbols, or
  2. the obsolete objects are no longer needed and can be removed from CLG_SOURCES.

Failing to do so can yield hard-to-trace runtime issues.

Also applies to: 64-65

components/core/src/clp/clo/CMakeLists.txt (1)

37-38: Consistency of include paths

QueryToken.* is added only under CLO_SOURCES, whereas GrepCore.* is added above. Keeping the new tokenisation logic co-located with its consumer makes sense, but ensure the public header is reachable through the target’s target_include_directories; otherwise downstream libraries (tests, utils) will need relative includes.

Also applies to: 62-63

components/core/tests/test-Grep.cpp (1)

90-91: Mixed overload usage may hide test gaps

One call to get_bounds_of_next_potential_var omits the lexer argument while all previous ones include it. If this overload behaves differently (e.g., falls back to a default wildcard policy) the test no longer exercises the lexer-aware branch for the final token; confirm this is intentional.

components/core/src/clp/clo/clo.cpp (2)

4-4: LGTM! Required includes for the refactoring.

The added includes support the new functionality:

  • <set> for segment ID collections
  • string_utils for wildcard string cleaning
  • GrepCore.hpp for the refactored query processing logic

Also applies to: 10-10, 16-16


41-41: LGTM! Using statements support the refactored code.

The added using statements properly import the new types and functions needed for the refactoring.

Also applies to: 44-44, 47-47, 51-51, 53-53

components/core/src/clp/clg/clg.cpp (2)

5-5: LGTM! Consistent with the refactoring pattern.

The include and using statement additions mirror those in clo.cpp and properly support the refactored query processing.

Also applies to: 9-9, 15-15, 31-31, 33-33, 36-36, 41-41, 43-43


504-507: Good encapsulation of wildcard addition logic.

The lambda function cleanly encapsulates the wildcard wrapping and cleaning logic, improving code readability.

components/core/src/clp/Grep.cpp (1)

1-246: Clean extraction of query processing logic to GrepCore.

The removal of query tokenization and processing logic from this file aligns perfectly with the PR objectives. The remaining search and output functionality is appropriately separated.

components/core/src/clp/EncodedVariableInterpreter.hpp (2)

196-196: Good simplification using dictionary IDs instead of pointers.

The removal of the VariableDictionaryEntryType template parameter and the switch to using dictionary IDs aligns well with the PR's decoupling objectives.

Also applies to: 437-437, 468-469, 473-480


502-509: Consistent refactoring in wildcard search method.

The changes properly track dictionary IDs for wildcard matches, maintaining consistency with the encode_and_search_dictionary method.

components/core/src/clp/GrepCore.cpp (1)

46-47: Correctly follows coding guidelines for boolean expressions.

The code properly uses false == <expression> pattern instead of !<expression> throughout the method.

Also applies to: 58-58, 72-72, 144-144

components/core/src/clp/QueryToken.cpp (1)

91-169: Well-structured implementation of QueryToken methods.

The getter and type-checking methods are clean and consistently handle ambiguous tokens. The change_to_next_possible_type method correctly cycles through possible types.

components/core/src/clp/GrepCore.hpp (1)

281-281: Correctly follows coding guidelines for boolean expressions.

The code properly uses false == <expression> pattern for negation checks.

Also applies to: 288-288, 308-308

components/core/src/clp/Query.hpp (1)

21-21: Good refactoring from pointers to dictionary IDs.

The consistent change from dictionary entry pointers to IDs improves decoupling and modularity. The addition of function parameters for external segment lookups is a clean design choice. Default member initializers enhance safety.

Also applies to: 24-24, 42-44, 50-50, 52-52, 58-59, 63-63, 67-67, 88-88, 97-97, 103-103, 113-117

components/core/src/clp/Query.cpp (1)

191-203: Clean implementation of segment ID calculation.

The method correctly delegates to each sub-query's calculate_ids_of_matching_segments method, maintaining good separation of concerns.

Comment on lines +582 to 585
src/clp/GrepCore.cpp
src/clp/GrepCore.hpp
src/clp/hash_utils.cpp
src/clp/hash_utils.hpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Unit-test build lists growing unmanageably

The manual source lists have grown past 700 entries. Every new pair of files must be added in three places (clg, clo, and unit tests). Consider moving to:

target_sources(unitTest PRIVATE
    $<TARGET_OBJECTS:clp_core_objects>)

or a file(GLOB ...) guarded behind an opt-in cache flag. This keeps explicit listing for production targets while sparing tests from constant churn.

Also applies to: 632-633

🤖 Prompt for AI Agents
In components/core/CMakeLists.txt around lines 582 to 585, the unit-test source
file lists are manually maintained and have grown excessively large, requiring
updates in multiple places. To fix this, replace the explicit file listings for
the unit tests with a target_sources call that references the object files from
the clp_core_objects target, or alternatively use a file(GLOB ...) command
controlled by an opt-in cache flag. This change will reduce maintenance overhead
by avoiding repeated manual updates while keeping production target listings
explicit. Apply the same approach to lines 632-633 as well.

Comment on lines +244 to +246
// TODO: creating a string reader, setting it equal to a string, to read it into
// the ParserInputBuffer, seems like a convoluted way to set a string equal to a
// string, should be improved when adding a SearchParser to log_surgeon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Address the TODO comment about improving string handling.

The comment indicates that the current approach of creating a StringReader and ParserInputBuffer is convoluted and should be improved when adding a SearchParser to log_surgeon.

Would you like me to create an issue to track this technical debt for future improvement?

🤖 Prompt for AI Agents
In components/core/src/clp/GrepCore.cpp around lines 244 to 246, the current
code uses a StringReader and ParserInputBuffer to set a string equal to another
string, which is unnecessarily complex. To fix this, simplify the string
handling by directly assigning or copying the string without involving these
intermediate objects. Since this is a known technical debt, create a clear issue
or task to track this improvement for when the SearchParser is added to
log_surgeon, and reference that issue in the TODO comment.


namespace clp {
QueryToken::QueryToken(
std::string const& query_string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use using std::string in this file since std::string is used multiple times.

@gibber9809 gibber9809 requested a review from haiqi96 July 31, 2025 17:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
components/core/src/clp/GrepCore.cpp (2)

228-228: Fix copy-paste error in suffix wildcard check.

Line 228 incorrectly uses begin_pos instead of end_pos - 1 when checking for suffix wildcards.

-            bool has_suffix_wildcard = ('*' == value[end_pos - 1]) || ('?' == value[begin_pos]);
+            bool has_suffix_wildcard = ('*' == value[end_pos - 1]) || ('?' == value[end_pos - 1]);

244-246: Address the TODO comment about improving string handling.

The comment indicates that the current approach of creating a StringReader and ParserInputBuffer is convoluted and should be improved when adding a SearchParser to log_surgeon.

Would you like me to create an issue to track this technical debt for future improvement?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 255fad3 and 7d635df.

📒 Files selected for processing (2)
  • components/core/src/clp/GrepCore.cpp (1 hunks)
  • components/core/src/clp/QueryToken.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit Configuration File

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/GrepCore.cpp
  • components/core/src/clp/QueryToken.cpp
🧠 Learnings (16)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.978Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.
📚 Learning: in `components/core/src/clp/bufferedfilereader.cpp`, refactoring the nested error handling condition...
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • components/core/src/clp/GrepCore.cpp
  • components/core/src/clp/QueryToken.cpp
📚 Learning: in the c++ `filedecompressor` implementations at `components/core/src/clp/clp/filedecompressor.cpp` ...
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.

Applied to files:

  • components/core/src/clp/GrepCore.cpp
📚 Learning: in the clp search component, the `evaluate_wildcard_filter` function should return `astevaluationres...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • components/core/src/clp/GrepCore.cpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, when handling errors in `parse_from_ir`, prefer to ma...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp/GrepCore.cpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `get_archive_node_id` function, validation...
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.

Applied to files:

  • components/core/src/clp/GrepCore.cpp
📚 Learning: double parentheses in the codebase are intentional and required for clang-tidy to pass. these should...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in the codebase are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

Applied to files:

  • components/core/src/clp/GrepCore.cpp
📚 Learning: in the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp...
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/GrepCore.cpp
📚 Learning: in the `network_reader_with_valid_http_header_kv_pairs` test case in `components/core/tests/test-net...
Learnt from: LinZhihao-723
PR: y-scope/clp#593
File: components/core/tests/test-NetworkReader.cpp:216-219
Timestamp: 2024-11-15T03:15:45.919Z
Learning: In the `network_reader_with_valid_http_header_kv_pairs` test case in `components/core/tests/test-NetworkReader.cpp`, additional error handling for JSON parsing failures is not necessary, as the current error message is considered sufficient.

Applied to files:

  • components/core/src/clp/GrepCore.cpp
📚 Learning: user haiqi96 requested creating a github issue to document a bug fix from pr #1136, which addressed ...
Learnt from: haiqi96
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-07-29T14:04:13.769Z
Learning: User haiqi96 requested creating a GitHub issue to document a bug fix from PR #1136, which addressed MySQL compatibility issues with invalid SQL CAST operations in the WebUI component.

Applied to files:

  • components/core/src/clp/GrepCore.cpp
📚 Learning: in the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible f...
Learnt from: LinZhihao-723
PR: y-scope/clp#856
File: components/core/src/clp/ffi/ir_stream/search/utils.cpp:258-266
Timestamp: 2025-04-26T02:21:22.021Z
Learning: In the clp::ffi::ir_stream::search namespace, the design principle is that callers are responsible for type checking, not the called functions. If control flow reaches a function, input types should already be validated by the caller.

Applied to files:

  • components/core/src/clp/GrepCore.cpp
📚 Learning: when working with constexpr strings (string literals with static storage duration), std::string_view...
Learnt from: davemarco
PR: y-scope/clp#700
File: components/core/src/clp/streaming_archive/ArchiveMetadata.hpp:153-155
Timestamp: 2025-01-30T19:26:33.869Z
Learning: When working with constexpr strings (string literals with static storage duration), std::string_view is the preferred choice for member variables as it's more efficient and safe, avoiding unnecessary memory allocations.

Applied to files:

  • components/core/src/clp/QueryToken.cpp
📚 Learning: in the file `components/core/tests/test-ffi_keyvaluepairlogevent.cpp`, including `
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.

Applied to files:

  • components/core/src/clp/QueryToken.cpp
📚 Learning: member variables in c++ classes should be explicitly initialized in the constructor to prevent undef...
Learnt from: haiqi96
PR: y-scope/clp#646
File: components/core/src/clp/streaming_archive/writer/Archive.hpp:354-354
Timestamp: 2025-01-14T16:06:54.692Z
Learning: Member variables in C++ classes should be explicitly initialized in the constructor to prevent undefined behavior, as demonstrated in the Archive class where `m_use_single_file_archive` is initialized to `false`.

Applied to files:

  • components/core/src/clp/QueryToken.cpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `get_archive_node_id` method, throwing a s...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` method, throwing a string literal as an exception is acceptable practice.

Applied to files:

  • components/core/src/clp/QueryToken.cpp
📚 Learning: only trivially destructible global objects are allowed in the coding standard, so avoid suggesting t...
Learnt from: LinZhihao-723
PR: y-scope/clp#544
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:170-186
Timestamp: 2024-09-24T22:33:59.488Z
Learning: Only trivially destructible global objects are allowed in the coding standard, so avoid suggesting the use of non-trivially destructible global objects like `static std::unordered_map`.

Applied to files:

  • components/core/src/clp/QueryToken.cpp
🧬 Code Graph Analysis (1)
components/core/src/clp/QueryToken.cpp (2)
components/core/src/clp/QueryToken.hpp (1)
  • QueryToken (13-13)
components/core/src/clp_s/SchemaTree.hpp (1)
  • m_type (92-92)
🔇 Additional comments (4)
components/core/src/clp/GrepCore.cpp (2)

1-20: Well-organized includes and declarations.

The includes are properly grouped and the using declarations improve code readability.


33-151: Solid implementation of variable boundary detection.

The method correctly handles escape characters, wildcards, and special cases like hex values and equals-sign prefixed tokens. The logic flow is clear and maintains proper state throughout.

components/core/src/clp/QueryToken.cpp (2)

1-9: Clean and minimal includes.

The headers are appropriate and the using std::string declaration improves readability as suggested.


93-171: Well-structured getter methods with proper const-correctness.

The getter methods are appropriately const-qualified and handle ambiguous token types consistently. The change_to_next_possible_type method correctly implements circular iteration through possible types.

Comment on lines +27 to +30
class SearchToken : public log_surgeon::Token {
public:
std::set<int> m_type_ids_set;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider encapsulating the public member variable.

While acceptable for a simple wrapper in an anonymous namespace, consider making m_type_ids_set private with appropriate accessors for better encapsulation.

🤖 Prompt for AI Agents
In components/core/src/clp/GrepCore.cpp around lines 27 to 30, the member
variable m_type_ids_set in the SearchToken class is public, which reduces
encapsulation. Change m_type_ids_set to private and add public getter and setter
methods or other appropriate accessors to control access to this set, improving
encapsulation and maintaining class invariants.

Comment on lines +11 to +21
QueryToken::QueryToken(
string const& query_string,
size_t const begin_pos,
size_t const end_pos,
bool const is_var
)
: m_current_possible_type_ix(0) {
m_begin_pos = begin_pos;
m_end_pos = end_pos;
m_value.assign(query_string, m_begin_pos, m_end_pos - m_begin_pos);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider using member initializer list for all member variables.

While m_current_possible_type_ix is initialized in the member initializer list, m_begin_pos and m_end_pos could also be initialized there for consistency and potential performance benefits.

-        : m_current_possible_type_ix(0) {
-    m_begin_pos = begin_pos;
-    m_end_pos = end_pos;
+        : m_current_possible_type_ix(0),
+          m_begin_pos(begin_pos),
+          m_end_pos(end_pos) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
QueryToken::QueryToken(
string const& query_string,
size_t const begin_pos,
size_t const end_pos,
bool const is_var
)
: m_current_possible_type_ix(0) {
m_begin_pos = begin_pos;
m_end_pos = end_pos;
m_value.assign(query_string, m_begin_pos, m_end_pos - m_begin_pos);
QueryToken::QueryToken(
string const& query_string,
size_t const begin_pos,
size_t const end_pos,
bool const is_var
)
: m_current_possible_type_ix(0),
m_begin_pos(begin_pos),
m_end_pos(end_pos) {
m_value.assign(query_string, m_begin_pos, m_end_pos - m_begin_pos);
// ... rest of constructor ...
}
🤖 Prompt for AI Agents
In components/core/src/clp/QueryToken.cpp around lines 11 to 21, the constructor
initializes m_begin_pos and m_end_pos inside the constructor body instead of the
member initializer list. Move the initialization of m_begin_pos and m_end_pos to
the member initializer list along with m_current_possible_type_ix for
consistency and better performance.

encoded_variable_t m_precise_var{};
// Only used if the precise variable is a dictionary variable
VariableDictionaryEntry const* m_var_dict_entry;
variable_dictionary_id_t m_var_dict_entry{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we don't store pointer to entries anymore, how about renaming them tom something like m_var_dict_id

@haiqi96
Copy link
Contributor

haiqi96 commented Jul 31, 2025

I feel that moving segment index calculation outside the procress_raw_query is not the best design, especially know that developer needs to explicitly call calculate_ids_of_matching_segments, but I see it is because we want to refactor only the common logic into GrepCore.

In addition, I believe that our long-term goal is to drop segment support in CLP as well, so l would say the change proposed in this PR is fine.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d635df and 56a5655.

📒 Files selected for processing (3)
  • components/core/src/clp/Query.cpp (6 hunks)
  • components/core/src/clp/Query.hpp (7 hunks)
  • taskfiles/lint.yaml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit Configuration File

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/Query.cpp
  • components/core/src/clp/Query.hpp
🧠 Learnings (24)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.978Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.
Learnt from: gibber9809
PR: y-scope/clp#504
File: components/core/src/clp_s/search/kql/CMakeLists.txt:29-29
Timestamp: 2024-10-22T15:36:04.655Z
Learning: When reviewing pull requests, focus on the changes within the PR and avoid commenting on issues outside the scope of the PR.
📚 Learning: in the clp project, when reviewing cmakelists.txt changes that introduce new compression library dep...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in the clp project, the .clang-format file is maintained in the yscope-dev-utils submodule and copie...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1126
File: .gitignore:5-5
Timestamp: 2025-07-25T21:29:48.947Z
Learning: In the CLP project, the .clang-format file is maintained in the yscope-dev-utils submodule and copied over to the main CLP repository, so it should be ignored in .gitignore to prevent accidental commits of the copied file and maintain the single source of truth in the submodule.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in the clp project, antlr code generation at build time is being removed by another pr. when reviewi...
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in cmakelists.txt files, anlowee prefers to explicitly list source files one by one rather than usin...
Learnt from: anlowee
PR: y-scope/clp#925
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-19
Timestamp: 2025-05-26T18:39:51.727Z
Learning: In CMakeLists.txt files, anlowee prefers to explicitly list source files one by one rather than using file(GLOB ...) or file(GLOB_RECURSE ...) to automatically include files, maintaining consistency with other CMake files in the project.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: antlr-generated files under `antlr_generated/` directories should not be subject to coding style che...
Learnt from: anlowee
PR: y-scope/clp#925
File: components/core/src/clp_s/search/kql/antlr_generated/KqlParser.cpp:0-0
Timestamp: 2025-05-27T20:06:21.263Z
Learning: ANTLR-generated files under `antlr_generated/` directories should not be subject to coding style checks or manual review comments, as they are auto-generated code that would be overwritten on regeneration.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: the team plans to systematically improve include paths by adding `${project_source_dir}/src` to cmak...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.hpp:15-18
Timestamp: 2025-05-02T22:11:55.711Z
Learning: The team plans to systematically improve include paths by adding `${PROJECT_SOURCE_DIR}/src` to CMake targets and using direct include paths instead of deep relative ones in a future PR.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: double parentheses in the codebase are intentional and required for clang-tidy to pass. these should...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in the codebase are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: double parentheses in require statements (e.g., `require((expression))`) and in conditional expressi...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in REQUIRE statements (e.g., `REQUIRE((expression))`) and in conditional expressions are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in clp installation scripts, consistency across platform scripts is prioritized over defensive progr...
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in the clp search component, the `evaluate_wildcard_filter` function should return `astevaluationres...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in the clp codebase, standard headers like `` and `` in alphabetical order (as seen...
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/src/clp/Query.hpp
📚 Learning: in 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floa...
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in `components/core/src/clp/bufferedfilereader.cpp`, refactoring the nested error handling condition...
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, when handling errors in `parse_from_ir`, prefer to ma...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of ir files, mak...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in the clp project, c++20 is used, allowing the utilization of c++20 features such as class template...
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: for the `queryhandlerimpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over ...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.

Applied to files:

  • components/core/src/clp/Query.cpp
  • components/core/src/clp/Query.hpp
📚 Learning: `keyvaluepairlogevent::create` requires `std::shared_ptr`, so `m_auto_generated_schema_t...
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-10-14T03:39:35.585Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.

Applied to files:

  • components/core/src/clp/Query.cpp
  • components/core/src/clp/Query.hpp
📚 Learning: in the `queryhandlerimpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.

Applied to files:

  • components/core/src/clp/Query.cpp
  • components/core/src/clp/Query.hpp
📚 Learning: in `components/core/src/clp_s/schematree.hpp`, it's acceptable to use `std::string_view` as keys in ...
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.

Applied to files:

  • components/core/src/clp/Query.cpp
  • components/core/src/clp/Query.hpp
📚 Learning: in `components/core/src/clp_s/schematree.hpp`, the `schemanode` class uses `std::unique_ptr ...
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.

Applied to files:

  • components/core/src/clp/Query.cpp
  • components/core/src/clp/Query.hpp
📚 Learning: the variable `ir_node_to_archive_node_unordered_map` is intentionally named to reflect its type chan...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: The variable `ir_node_to_archive_node_unordered_map` is intentionally named to reflect its type change, even if the actual variable type has been altered.

Applied to files:

  • components/core/src/clp/Query.hpp
📚 Learning: the `queryhandlerimpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is as...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

Applied to files:

  • components/core/src/clp/Query.hpp
🧬 Code Graph Analysis (1)
components/core/src/clp/Query.hpp (2)
components/core/src/clp_s/search/clp_search/Query.hpp (9)
  • precise_dict_var (82-85)
  • possible_dict_vars (92-95)
  • m_is_precise_var (39-39)
  • m_is_dict_var (41-41)
  • m_wildcard_match_required (113-113)
  • m_possible_logtype_ids (115-115)
  • vars (141-141)
  • m_vars (121-121)
  • m_vars (123-123)
components/core/src/glt/Query.hpp (13)
  • precise_dict_var (85-88)
  • possible_dict_vars (95-98)
  • segment_ids (40-40)
  • m_is_precise_var (42-42)
  • m_is_dict_var (44-44)
  • m_wildcard_match_required (116-116)
  • m_wildcard_match_required (246-246)
  • m_possible_logtype_ids (118-118)
  • m_possible_logtype_ids (124-126)
  • vars (148-148)
  • vars (244-244)
  • m_vars (128-128)
  • m_vars (130-130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build (macos-15)
  • GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (4)
taskfiles/lint.yaml (2)

285-287: Addition keeps the new GrepCore files in step with existing Grep exclusions

clp/GrepCore.cpp and clp/GrepCore.hpp are now explicitly listed alongside the legacy Grep.* files. This mirrors the current policy of silencing clang-tidy on the heavy query-parsing code paths and resolves the earlier omission.


329-331: QueryToken exclusions added for consistency

clp/QueryToken.cpp and clp/QueryToken.hpp follow the same exclusion pattern as other search-token classes. No further action required.

components/core/src/clp/Query.cpp (2)

189-201: Well-structured addition for explicit segment calculation.

The new method properly delegates segment calculation to sub-queries, aligning with the PR's goal of making segment index calculation explicit rather than implicit in query processing.


87-87: Add explicit type to the range-based for loop variable.

The loop variable entry is missing its type specification, which could lead to compilation issues or reduced code clarity.

-        for (auto entry : m_possible_var_dict_ids) {
+        for (auto const& entry : m_possible_var_dict_ids) {
⛔ Skipped due to learnings
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

@gibber9809 gibber9809 requested a review from haiqi96 July 31, 2025 19:28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we rename this to be test-GrepCore.cpp?

@gibber9809 gibber9809 requested a review from haiqi96 July 31, 2025 20:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
taskfiles/lint.yaml (1)

285-287: New sources still exempted from clang-tidy

GrepCore.* and QueryToken.* are now part of the core search logic but are added to EXCLUDE_PATTERNS, meaning they will never be statically analysed. The earlier review already raised this for the same files; the concern still stands.

Also applies to: 329-331

components/core/CMakeLists.txt (1)

582-584: Manual unit-test source list keeps growing

Yet another trio of files (GrepCore.*, QueryToken.*, and test-GrepCore.cpp) is appended to the already 700-line list. This reproduces the maintenance burden highlighted in previous reviews; consider the target_sources(unitTest PRIVATE $<TARGET_OBJECTS:clp_core_objects>) or opt-in file(GLOB …) approach suggested earlier.

Also applies to: 632-633, 723-723

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f996556 and 7f73b23.

📒 Files selected for processing (3)
  • components/core/CMakeLists.txt (3 hunks)
  • components/core/tests/test-GrepCore.cpp (3 hunks)
  • taskfiles/lint.yaml (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit Configuration File

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/tests/test-GrepCore.cpp
🧠 Learnings (27)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.978Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.
Learnt from: gibber9809
PR: y-scope/clp#504
File: components/core/src/clp_s/search/kql/CMakeLists.txt:29-29
Timestamp: 2024-10-22T15:36:04.655Z
Learning: When reviewing pull requests, focus on the changes within the PR and avoid commenting on issues outside the scope of the PR.
📚 Learning: in the clp project, when reviewing cmakelists.txt changes that introduce new compression library dep...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/CMakeLists.txt
📚 Learning: in the clp project, the .clang-format file is maintained in the yscope-dev-utils submodule and copie...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1126
File: .gitignore:5-5
Timestamp: 2025-07-25T21:29:48.947Z
Learning: In the CLP project, the .clang-format file is maintained in the yscope-dev-utils submodule and copied over to the main CLP repository, so it should be ignored in .gitignore to prevent accidental commits of the copied file and maintain the single source of truth in the submodule.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in cmakelists.txt files, anlowee prefers to explicitly list source files one by one rather than usin...
Learnt from: anlowee
PR: y-scope/clp#925
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-19
Timestamp: 2025-05-26T18:39:51.727Z
Learning: In CMakeLists.txt files, anlowee prefers to explicitly list source files one by one rather than using file(GLOB ...) or file(GLOB_RECURSE ...) to automatically include files, maintaining consistency with other CMake files in the project.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/CMakeLists.txt
📚 Learning: in the clp project, antlr code generation at build time is being removed by another pr. when reviewi...
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/CMakeLists.txt
📚 Learning: antlr-generated files under `antlr_generated/` directories should not be subject to coding style che...
Learnt from: anlowee
PR: y-scope/clp#925
File: components/core/src/clp_s/search/kql/antlr_generated/KqlParser.cpp:0-0
Timestamp: 2025-05-27T20:06:21.263Z
Learning: ANTLR-generated files under `antlr_generated/` directories should not be subject to coding style checks or manual review comments, as they are auto-generated code that would be overwritten on regeneration.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floa...
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/CMakeLists.txt
  • components/core/tests/test-GrepCore.cpp
📚 Learning: double parentheses in the codebase are intentional and required for clang-tidy to pass. these should...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in the codebase are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: double parentheses in require statements (e.g., `require((expression))`) and in conditional expressi...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in REQUIRE statements (e.g., `REQUIRE((expression))`) and in conditional expressions are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/tests/test-GrepCore.cpp
📚 Learning: in clp installation scripts, consistency across platform scripts is prioritized over defensive progr...
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/CMakeLists.txt
📚 Learning: in the clp search component, the `evaluate_wildcard_filter` function should return `astevaluationres...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the `evaluate_wildcard_filter` function should return `AstEvaluationResult::Pruned` when `node_id_value_pairs` is empty, not `AstEvaluationResult::False`. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in the clp codebase, standard headers like `` and `` in alphabetical order (as seen...
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/CMakeLists.txt
📚 Learning: in `components/core/src/clp/bufferedfilereader.cpp`, refactoring the nested error handling condition...
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/CMakeLists.txt
  • components/core/tests/test-GrepCore.cpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, when handling errors in `parse_from_ir`, prefer to ma...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/tests/test-GrepCore.cpp
📚 Learning: in `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of ir files, mak...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `clp-s.cpp`, the `run_serializer` function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/CMakeLists.txt
📚 Learning: in the clp project, c++20 is used, allowing the utilization of c++20 features such as class template...
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:37-38
Timestamp: 2024-11-26T19:12:43.982Z
Learning: In the CLP project, C++20 is used, allowing the utilization of C++20 features such as class template argument deduction (CTAD) with `std::array`.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: in the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msg...
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/CMakeLists.txt
  • components/core/tests/test-GrepCore.cpp
📚 Learning: in the file `components/core/tests/test-ffi_keyvaluepairlogevent.cpp`, including `
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/CMakeLists.txt
📚 Learning: in the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp...
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • taskfiles/lint.yaml
  • components/core/CMakeLists.txt
  • components/core/tests/test-GrepCore.cpp
📚 Learning: the team plans to systematically improve include paths by adding `${project_source_dir}/src` to cmak...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.hpp:15-18
Timestamp: 2025-05-02T22:11:55.711Z
Learning: The team plans to systematically improve include paths by adding `${PROJECT_SOURCE_DIR}/src` to CMake targets and using direct include paths instead of deep relative ones in a future PR.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: when reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note th...
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: in clp installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistenc...
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: the `queryhandlerimpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is as...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: in `components/core/tests/test-ffi_keyvaluepairlogevent.cpp`, helper functions like `assert_kv_pair_...
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:228-236
Timestamp: 2024-10-14T03:43:40.364Z
Learning: In `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, helper functions like `assert_kv_pair_log_event_creation_failure` return booleans to the caller, and the caller asserts using `REQUIRE` to provide clearer failure information.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: the function `assert_kv_pair_log_event_creation_failure` is correctly placed within the anonymous na...
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:85-103
Timestamp: 2024-10-14T03:42:53.145Z
Learning: The function `assert_kv_pair_log_event_creation_failure` is correctly placed within the anonymous namespace in `test-ffi_KeyValuePairLogEvent.cpp`.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: in the `network_reader_with_valid_http_header_kv_pairs` test case in `components/core/tests/test-net...
Learnt from: LinZhihao-723
PR: y-scope/clp#593
File: components/core/tests/test-NetworkReader.cpp:216-219
Timestamp: 2024-11-15T03:15:45.919Z
Learning: In the `network_reader_with_valid_http_header_kv_pairs` test case in `components/core/tests/test-NetworkReader.cpp`, additional error handling for JSON parsing failures is not necessary, as the current error message is considered sufficient.

Applied to files:

  • components/core/CMakeLists.txt
📚 Learning: in `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verifi...
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.

Applied to files:

  • components/core/tests/test-GrepCore.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
components/core/tests/test-GrepCore.cpp (1)

90-92: Verify the overload without a lexer argument

All preceding calls supply the lexer parameter, but this one omits it:

GrepCore::get_bounds_of_next_potential_var(str, begin_pos, end_pos, is_var)

Confirm that GrepCore really provides an overload with the signature (std::string const&, size_t&, size_t&, bool&). If not, the test won’t compile.

taskfiles/lint.yaml (1)

735-735: Test file exclusion is okay

Excluding the Catch2 test file from clang-tidy is consistent with the rest of the configuration.

haiqi96
haiqi96 previously approved these changes Aug 1, 2025
Copy link
Contributor

@haiqi96 haiqi96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did my best effort review and the change looks good as most of them copying and paste with adding template parameters.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
components/core/src/clp/Query.hpp (3)

50-50: Consider renaming member variables for clarity.

Based on the previous review comment, since we don't store pointers to entries anymore, consider renaming m_var_dict_id to m_var_dict_id (which is already done) and ensure consistency throughout the codebase.

Also applies to: 63-63


146-147: Improve template parameter documentation.

The documentation should be more descriptive about the requirements for the template parameter.

-     * @tparam EncodedVariableContainerType A random access list of `clp::encoded_variable_t`.
+     * @tparam EncodedVariableContainerType A random access container of `clp::encoded_variable_t`.

126-126: Consider renaming for consistency.

As suggested in the previous review, rename m_possible_logtype_ids to m_possible_logtypes to maintain consistency.

components/core/src/clp/GrepCore.hpp (1)

29-30: Use consistent documentation style.

Change "NOTE:" to maintain consistency with other documentation markers like "TODO:".

-     * NOTE: Callers are responsible for ensuring that the search string is "clean", where clean is
+     * NOTE: Callers are responsible for ensuring that the search string is "clean", where clean is
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f73b23 and 5a04728.

📒 Files selected for processing (2)
  • components/core/src/clp/GrepCore.hpp (1 hunks)
  • components/core/src/clp/Query.hpp (7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit Configuration File

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/GrepCore.hpp
  • components/core/src/clp/Query.hpp
🧠 Learnings (13)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.978Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.
Learnt from: gibber9809
PR: y-scope/clp#504
File: components/core/src/clp_s/search/kql/CMakeLists.txt:29-29
Timestamp: 2024-10-22T15:36:04.655Z
Learning: When reviewing pull requests, focus on the changes within the PR and avoid commenting on issues outside the scope of the PR.
📚 Learning: for the `queryhandlerimpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over ...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.hpp:164-166
Timestamp: 2025-05-11T22:22:49.286Z
Learning: For the `QueryHandlerImpl` class, reusing member variables like `m_ast_dfs_stack` is preferred over making methods `const` and moving state to function scope to avoid allocation/deallocation overhead, especially for frequently called methods like `evaluate_kv_pair_log_event`.

Applied to files:

  • components/core/src/clp/GrepCore.hpp
  • components/core/src/clp/Query.hpp
📚 Learning: in `components/core/src/clp_s/jsonparser.cpp`, within the `parse_from_ir` method, encountering error...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.

Applied to files:

  • components/core/src/clp/GrepCore.hpp
📚 Learning: the `queryhandlerimpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is as...
Learnt from: LinZhihao-723
PR: y-scope/clp#882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:428-450
Timestamp: 2025-05-07T16:56:53.901Z
Learning: The `QueryHandlerImpl` class in clp/ffi/ir_stream/search is not designed to be thread-safe and is assumed to not be used across multiple threads.

Applied to files:

  • components/core/src/clp/GrepCore.hpp
  • components/core/src/clp/Query.hpp
📚 Learning: the variable `ir_node_to_archive_node_unordered_map` is intentionally named to reflect its type chan...
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:581-627
Timestamp: 2024-10-07T21:38:35.979Z
Learning: The variable `ir_node_to_archive_node_unordered_map` is intentionally named to reflect its type change, even if the actual variable type has been altered.

Applied to files:

  • components/core/src/clp/Query.hpp
📚 Learning: in `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serial...
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.

Applied to files:

  • components/core/src/clp/Query.hpp
📚 Learning: in `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verifi...
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.

Applied to files:

  • components/core/src/clp/Query.hpp
📚 Learning: in the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp...
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.

Applied to files:

  • components/core/src/clp/Query.hpp
📚 Learning: double parentheses in require statements (e.g., `require((expression))`) and in conditional expressi...
Learnt from: Bill-hbrhbr
PR: y-scope/clp#0
File: :0-0
Timestamp: 2025-03-18T07:27:54.738Z
Learning: Double parentheses in REQUIRE statements (e.g., `REQUIRE((expression))`) and in conditional expressions are intentional and required for clang-tidy to pass. These should not be flagged as style issues in code reviews.

Applied to files:

  • components/core/src/clp/Query.hpp
📚 Learning: in the `queryhandlerimpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is...
Learnt from: LinZhihao-723
PR: y-scope/clp#873
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:148-157
Timestamp: 2025-05-02T22:27:59.347Z
Learning: In the `QueryHandlerImpl.cpp` file, the `unique_projected_columns` set (using `std::string_view`) is intentionally designed to only check for duplications within the local scope of the `create_projected_columns_and_projection_map` function. The team decided this is an acceptable use of `std::string_view` in a container since the referenced strings remain valid throughout the function's execution.

Applied to files:

  • components/core/src/clp/Query.hpp
📚 Learning: in `components/core/src/clp_s/schematree.hpp`, the `schemanode` class uses `std::unique_ptr ...
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.

Applied to files:

  • components/core/src/clp/Query.hpp
📚 Learning: in `components/core/src/clp_s/schematree.hpp`, it's acceptable to use `std::string_view` as keys in ...
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.

Applied to files:

  • components/core/src/clp/Query.hpp
📚 Learning: in the clp codebase, standard headers like `` and `` in alphabetical order (as seen...
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/src/glt/SQLitePreparedStatement.hpp:4-4
Timestamp: 2025-07-01T14:49:07.333Z
Learning: In the CLP codebase, standard headers like `<cstdint>` and `<string>` in alphabetical order (as seen in components/core/src/glt/SQLitePreparedStatement.hpp) are correctly ordered and do not need adjustment.

Applied to files:

  • components/core/src/clp/Query.hpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (7)
components/core/src/clp/Query.hpp (4)

4-4: LGTM!

The addition of <functional> header is necessary for the new std::function parameters in the refactored methods.


40-44: LGTM! Well-designed interface change.

The use of std::function to retrieve segments containing dictionary IDs is a clean abstraction that decouples the Query logic from the dictionary implementation details.


213-225: LGTM! Clean separation of concerns.

The new calculate_ids_of_matching_segments method properly separates the segment calculation from query processing, allowing callers to opt-in to this computation as needed.


240-264: LGTM! Correct subsequence matching implementation.

The templated matches_vars method correctly implements subsequence matching, allowing variables to appear in order but not necessarily contiguously. The algorithm is efficient with O(n) complexity.

components/core/src/clp/GrepCore.hpp (3)

1-21: LGTM! Well-organized header structure.

The includes are properly organized and all necessary dependencies are included for the GrepCore functionality.


342-356: LGTM! Thorough escape handling implementation.

The escape handler lambda correctly handles the complex cases of escaping wildcards and variable placeholders. The logic properly avoids double-escaping already escaped wildcards.


408-418: LGTM! Efficient logtype ID collection.

The code properly converts logtype entries to their IDs for storage in the sub-query. This aligns well with the overall refactoring goal of using IDs instead of pointers.

Comment on lines +168 to +173
std::replace(
search_string_for_sub_queries.begin(),
search_string_for_sub_queries.end(),
'?',
'*'
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider edge case handling for wildcard replacement.

The replacement of '?' with '*' wildcards and subsequent cleanup could potentially create unexpected patterns. For example, "a?b" becomes "a**b" then "ab", which changes the semantics slightly. Consider documenting this behaviour or handling it more explicitly.

🤖 Prompt for AI Agents
In components/core/src/clp/GrepCore.hpp around lines 168 to 173, the code
replaces '?' with '*' in search strings, which can lead to unintended patterns
like consecutive '*' characters altering the original semantics. To fix this,
explicitly handle or document the behavior of such replacements, for example by
collapsing multiple '*' into a single '*' after replacement or by adding
comments explaining this transformation and its impact on pattern matching.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the body of the PR going to be? (Usually I put that as the first lines of the PR body (before the description).

@gibber9809
Copy link
Contributor Author

What's the body of the PR going to be? (Usually I put that as the first lines of the PR body (before the description).

I updated the PR description with what I'd put as the PR body @kirkrodrigues.

@kirkrodrigues
Copy link
Member

Edited the PR body directly (you can check the edit history for details).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants