Skip to content

Conversation

@SharafMohamed
Copy link
Contributor

@SharafMohamed SharafMohamed commented Nov 19, 2025

Reference

Addresses #180.

Description

The maps to go from rule -> captures -> tags -> registers previously relied on the capture names as unique keys. To support non-unique capture names an alternative key needs to be used:

  • We were already using a pointer to the capture object (built during AST construction) to refer to the capture everywhere (in the AST/NFA).
  • We now use the pointer value as the unique key in all relevant maps.

Consideration needs to be made toward ensuring a canonical interpretation of the regex pattern, specifically when considering the order of captures. Without a canonical interpretation, there will be no way to search a parsed log. Previously, this wasn't a concern as the names were unique identifiers. As pointer values themselves are not canonical, we must instead rely on the ordering in the m_rule_id_to_capture map. Specifically, we desire this order to match the order in which the capture groups appear in the literal regex pattern (left to right):

  • The order in m_rule_id_to_capture depends on the AST matching the literal regex pattern.
    • This means the AST must be built leftmost, bottom up, without any optimizations that break this ordering.
    • This also means the AST must not be flattened, optimized, or altered in any way prior to NFA construction.
  • The order in m_rule_id_to_capture also depends on the NFA traversal of the AST, it must be depth first leftmost, such that it traverses the capture groups in the same order they appear in the literal regex pattern.

Also some small fixes to unit-tests that were not caught in the previous PR after a merge:

  • Special characters are no longer escaped in the delimiter string (i.e., \[ will error, [ is correct).

Validation Preformed

New unit-test added with non-unique capture names.

Summary by CodeRabbit

  • Refactor

    • Overhauled capture handling to return capture objects (not numeric IDs) and store captures per rule, simplifying retrieval and iteration.
    • Capture name accessor now returns a const reference to the internal string.
  • Breaking API Change

    • Removed the numeric capture ID alias; related public method signatures were updated to use capture objects.
  • Tests

    • Updated and added tests to validate the new capture semantics and non‑unique capture name handling.

✏️ Tip: You can customize this high-level summary in your review settings.

@SharafMohamed SharafMohamed requested a review from a team as a code owner November 19, 2025 09:42
@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

Walkthrough

Replaces ID-based capture APIs with pointer-based Capture const* APIs across Lexer and tests, removes the public capture_id_t alias, updates internal mappings to use Capture pointers, changes Capture::get_name() return type, and adapts tests to the new capture-pointer semantics.

Changes

Cohort / File(s) Summary
Core alias removal
src/log_surgeon/types.hpp
Removed public alias capture_id_t.
Lexer API & internals
src/log_surgeon/Lexer.hpp, src/log_surgeon/Lexer.tpp
Replaced ID-based APIs with pointer-based ones: get_capture_ids_from_rule_id()get_captures_from_rule_id() returning std::optional<std::vector<finite_automata::Capture const*>>; get_tag_id_pair_from_capture_id()get_tag_id_pair_from_capture(Capture const*); get_reg_ids_from_capture_id()get_reg_ids_from_capture(Capture const*). Internal maps renamed to use Capture const* keys (m_rule_id_to_capture, m_capture_to_tag_id_pair); added Capture.hpp include; removed global capture-id symbol bookkeeping in favour of per-rule capture aggregation.
Capture type change
src/log_surgeon/finite_automata/Capture.hpp
get_name() now returns std::string const& instead of std::string_view. Include and API adjusted accordingly.
LogEvent usage update
src/log_surgeon/LogEvent.cpp
Switched from capture IDs to Capture pointers: uses get_captures_from_rule_id() and get_reg_ids_from_capture(), and reads capture names via capture->get_name().
Tests — buffer & reader parser
tests/test-buffer-parser.cpp, tests/test-reader-parser.cpp
Replaced capture_id_t usages and calls to get_capture_ids_from_rule_id()/get_reg_ids_from_capture_id() with pointer-based APIs; updated test data structures (map→vector for captures in buffer tests) and iteration to use Capture pointers; added/adjusted tests for non-unique capture names.
Tests — schema
tests/test-schema.cpp
Added a non_unique_capture_names test and minor std::vector include/using changes.
Minor header removal
src/log_surgeon/finite_automata/Nfa.hpp
Removed an unused #include <cstddef>; no API changes.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Consumer (code/tests)
    participant Lexer
    participant Capture as Capture*
    participant TagMap as m_capture_to_tag_id_pair
    participant RegResolver as reg-id resolution

    Caller->>Lexer: get_captures_from_rule_id(rule_id)
    alt captures exist
        Lexer-->>Caller: optional<vector<Capture const*>>
        loop per Capture*
            Caller->>Lexer: get_tag_id_pair_from_capture(capture*)
            Lexer->>TagMap: lookup(capture*) 
            TagMap-->>Lexer: optional<pair<tag_id,tag_id>>
            Lexer-->>Caller: optional<pair<tag_id,tag_id>>

            Caller->>Lexer: get_reg_ids_from_capture(capture*)
            Lexer->>Lexer: get_tag_id_pair_from_capture(capture*)
            Lexer->>RegResolver: get_reg_id_from_tag_id(tag_id)
            RegResolver-->>Lexer: reg_id(s)
            Lexer-->>Caller: optional<pair<reg_id,reg_id>>

            Caller->>Capture: get_name()
            Capture-->>Caller: string const&
        end
    else no captures
        Lexer-->>Caller: std::nullopt
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review pointer lifetimes and ownership for stored Capture const* in m_rule_id_to_capture and m_capture_to_tag_id_pair.
  • Verify Capture::get_name() (returns std::string const&) cannot dangle; ensure backing string outlives uses.
  • Check tests that moved from map→vector preserve semantics (ordering and duplicate-name behavior).
  • Confirm removal of capture_id_t doesn't leave stray usages or mismatched declarations.

Possibly related PRs

Suggested reviewers

  • LinZhihao-723

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly summarizes the main change: adding support for non-unique capture names and references the related issue (#180).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ec9356 and ec32233.

📒 Files selected for processing (6)
  • src/log_surgeon/Lexer.tpp (1 hunks)
  • src/log_surgeon/LogEvent.cpp (2 hunks)
  • src/log_surgeon/finite_automata/Capture.hpp (1 hunks)
  • src/log_surgeon/finite_automata/Nfa.hpp (0 hunks)
  • tests/test-buffer-parser.cpp (5 hunks)
  • tests/test-schema.cpp (4 hunks)
💤 Files with no reviewable changes (1)
  • src/log_surgeon/finite_automata/Nfa.hpp
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,h,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/finite_automata/Capture.hpp
  • tests/test-schema.cpp
  • src/log_surgeon/LogEvent.cpp
  • tests/test-buffer-parser.cpp
🧠 Learnings (29)
📓 Common learnings
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.cpp:159-177
Timestamp: 2025-08-25T20:44:48.955Z
Learning: In the log-surgeon codebase, thread safety issues with global state like NonTerminal::m_next_children_start should be addressed comprehensively in dedicated PRs rather than fixed piecemeal in individual feature PRs. The user SharafMohamed prefers to defer such systemic architectural issues to separate PRs.
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...
📚 Learning: 2024-11-13T20:02:13.737Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/finite_automata/Capture.hpp
  • tests/test-schema.cpp
  • src/log_surgeon/LogEvent.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2024-11-18T16:45:46.073Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.073Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/finite_automata/Capture.hpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2025-10-22T15:40:29.992Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/LogEvent.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2025-09-03T16:45:58.451Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.hpp:119-132
Timestamp: 2025-09-03T16:45:58.451Z
Learning: In log-surgeon's wildcard_query_parser, the DFA intersection process used in Query::get_matching_variable_types loses track of variable type priorities from the original ByteLexer. Priority information cannot be preserved by simply changing the return type from std::set to std::vector - it would require post-processing after the DFA intersection.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • tests/test-schema.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2024-10-24T15:54:19.228Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:99-105
Timestamp: 2024-10-24T15:54:19.228Z
Learning: In `src/log_surgeon/finite_automata/RegexNFA.hpp`, it's acceptable to have constructors without the `explicit` specifier. Do not suggest adding `explicit` to constructors in this file.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • tests/test-schema.cpp
  • src/log_surgeon/LogEvent.cpp
📚 Learning: 2024-10-24T15:54:35.193Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:442-456
Timestamp: 2024-10-24T15:54:35.193Z
Learning: In the C++ file `src/log_surgeon/finite_automata/RegexNFA.hpp`, for the `RegexNFA::serialize()` function, prioritize code clarity over efficiency when handling string operations.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/finite_automata/Capture.hpp
  • tests/test-schema.cpp
📚 Learning: 2025-08-26T10:06:22.914Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.hpp:9-11
Timestamp: 2025-08-26T10:06:22.914Z
Learning: In y-scope/log-surgeon project, it's acceptable to include headers like log_surgeon/Lexer.hpp directly in header files rather than using forward declarations, even when only references are used in the interface. The project prefers the simplicity of direct includes over header coupling optimization through forward declarations.

Applied to files:

  • src/log_surgeon/Lexer.tpp
📚 Learning: 2024-11-13T22:38:19.472Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • tests/test-schema.cpp
📚 Learning: 2025-08-25T20:44:48.955Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.cpp:159-177
Timestamp: 2025-08-25T20:44:48.955Z
Learning: In the log-surgeon codebase, thread safety issues with global state like NonTerminal::m_next_children_start should be addressed comprehensively in dedicated PRs rather than fixed piecemeal in individual feature PRs. The user SharafMohamed prefers to defer such systemic architectural issues to separate PRs.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/LogEvent.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2025-08-08T13:18:39.895Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp:42-48
Timestamp: 2025-08-08T13:18:39.895Z
Learning: In y-scope/log-surgeon (C++), it is acceptable/preferred to keep `const` qualifiers on by-value function parameters to signal intent (e.g., in src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp). Do not suggest removing `const` from by-value parameters in future reviews.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • tests/test-schema.cpp
  • src/log_surgeon/LogEvent.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2025-08-08T10:22:26.739Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/QueryInterpretation.cpp:98-120
Timestamp: 2025-08-08T10:22:26.739Z
Learning: In C++ file src/log_surgeon/wildcard_query_parser/QueryInterpretation.cpp, for QueryInterpretation::serialize(), prefer explicit std::holds_alternative/std::get branching over std::visit for readability; do not suggest refactoring to std::visit in future reviews.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • tests/test-schema.cpp
  • src/log_surgeon/LogEvent.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2025-08-08T10:23:06.281Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp:80-88
Timestamp: 2025-08-08T10:23:06.281Z
Learning: In y-scope/log-surgeon (C++), small function definitions are allowed to remain inline in headers. For src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp, do not suggest moving small methods like QueryInterpretation::append_variable_token to the .cpp for “consistency” in future reviews.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/LogEvent.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2025-08-08T10:21:56.571Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/QueryInterpretation.cpp:57-77
Timestamp: 2025-08-08T10:21:56.571Z
Learning: In src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp/.cpp (C++), for QueryInterpretation::append_query_interpretation, prefer a single overload taking QueryInterpretation const&; do not suggest adding an rvalue/move overload in future reviews for this method.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/LogEvent.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2025-08-13T12:06:36.584Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp:58-77
Timestamp: 2025-08-13T12:06:36.584Z
Learning: In src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp, caching SchemaParser::get_special_regex_characters() with a const& reference provides no performance benefit, indicating the method likely returns a reference to an existing container rather than performing expensive computations.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • tests/test-schema.cpp
📚 Learning: 2025-08-08T10:00:20.963Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:00:20.963Z
Learning: In src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp (C++), do not default the comparison operators: the class stores std::vector<std::variant<StaticQueryToken, VariableQueryToken>>, which yields only a weak ordering. A custom operator<=> that maps the variant’s weak ordering to std::strong_ordering is required; avoid suggesting defaulting there in future reviews.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/LogEvent.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2024-11-27T22:25:35.608Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T22:25:35.608Z
Learning: In the `RegisterHandler` class in `src/log_surgeon/finite_automata/RegisterHandler.hpp`, the methods `add_register` and `append_position` rely on `emplace_back` and `m_prefix_tree.insert` to handle exceptions correctly and maintain consistent state without requiring additional exception handling.

Applied to files:

  • src/log_surgeon/finite_automata/Capture.hpp
  • tests/test-schema.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2025-08-08T13:30:25.172Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: tests/comparison_test_utils.hpp:0-0
Timestamp: 2025-08-08T13:30:25.172Z
Learning: In tests/comparison_test_utils.hpp (C++), all comparison helper templates (test_equal, test_greater_than, test_less_than, pairwise_comparison_of_strictly_ascending_vector) must be constrained with the StronglyThreeWayComparable concept on both declarations and definitions. Maintain this constraint in future changes.

Applied to files:

  • tests/test-schema.cpp
📚 Learning: 2025-08-15T12:07:58.626Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-expression-view.cpp:7-7
Timestamp: 2025-08-15T12:07:58.626Z
Learning: In tests/test-expression-view.cpp, the `<catch2/catch_message.hpp>` header is required for clang-tidy to pass, even though it may not be directly used in the visible code.

Applied to files:

  • tests/test-schema.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2025-08-13T12:05:00.245Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp:58-77
Timestamp: 2025-08-13T12:05:00.245Z
Learning: In src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp, the generate_regex_string() method should not enforce well-formedness checks via assertions. The is_well_formed() method is intended to be used by callers at their discretion, allowing flexibility to generate regex strings from malformed views if desired.

Applied to files:

  • tests/test-schema.cpp
📚 Learning: 2024-10-11T16:16:02.866Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 38
File: src/log_surgeon/finite_automata/RegexAST.hpp:663-669
Timestamp: 2024-10-11T16:16:02.866Z
Learning: In `RegexASTLiteral::serialize()`, to properly handle Unicode characters beyond the ASCII range, cast `m_character` to `char32_t` and use `U"{}{}"` in `fmt::format`.

Applied to files:

  • tests/test-schema.cpp
📚 Learning: 2025-08-08T10:17:43.495Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:17:43.495Z
Learning: In src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp/.cpp (C++), do not suggest defaulting operator<=>. The project prefers a custom out-of-line comparator that explicitly handles the bool member (via explicit cast) to avoid implicit conversions; keep the current manual implementation.

Applied to files:

  • tests/test-schema.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2025-08-26T10:13:00.368Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: tests/test-query.cpp:61-67
Timestamp: 2025-08-26T10:13:00.368Z
Learning: In Catch2 unit tests, the REQUIRE macro already provides detailed debugging output when container comparisons (like std::set equality) fail, showing both expected and actual values. Additional CAPTURE statements are typically unnecessary for such comparisons.

Applied to files:

  • tests/test-schema.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2024-11-13T22:25:54.168Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: tests/test-tag.cpp:10-10
Timestamp: 2024-11-13T22:25:54.168Z
Learning: In the log-surgeon codebase (C++), particularly in the finite automata components involving the `Tag` class (`src/log_surgeon/finite_automata/Tag.hpp`), it's important to ensure that `Tag*` pointers in other objects cannot be `nullptr`. Test cases should focus on validating that these `Tag*` pointers are not null where they are used, and handle `nullptr` appropriately.

Applied to files:

  • src/log_surgeon/LogEvent.cpp
  • tests/test-buffer-parser.cpp
📚 Learning: 2025-08-18T12:05:55.905Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 151
File: tests/test-buffer-parser.cpp:0-0
Timestamp: 2025-08-18T12:05:55.905Z
Learning: In y-scope/log-surgeon (C++), prefer uniform initialization with braces {} when possible instead of assignment or parentheses initialization. For example, prefer `string const var{value};` over `string const var = value;`.

Applied to files:

  • src/log_surgeon/LogEvent.cpp
📚 Learning: 2025-11-06T11:16:52.917Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 184
File: tests/test-reader-parser.cpp:150-154
Timestamp: 2025-11-06T11:16:52.917Z
Learning: In Catch2 test files for the y-scope/log-surgeon repository, an explicit `if (false == optional.has_value()) { return; }` check may be required after `REQUIRE(optional.has_value())` to prevent clang-tidy errors, even though the code is unreachable at runtime. clang-tidy's static analyzer doesn't recognize that REQUIRE aborts execution and may flag unsafe optional access without the explicit check.

Applied to files:

  • tests/test-buffer-parser.cpp
📚 Learning: 2025-08-15T00:13:46.717Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-wildcard-expression.cpp:0-0
Timestamp: 2025-08-15T00:13:46.717Z
Learning: In src/log_surgeon/wildcard_query_parser/WildcardExpression.cpp, the escaping logic treats consecutive backslashes as escape-literal pairs, not as individual escape characters. For input like R"(a\*b\?c\\)", position 7 is the escape character and position 8 is the literal backslash being escaped, so only position 7 should have is_escape() return true.

Applied to files:

  • tests/test-buffer-parser.cpp
📚 Learning: 2025-05-05T14:55:34.455Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 106
File: src/log_surgeon/Lalr1Parser.tpp:661-665
Timestamp: 2025-05-05T14:55:34.455Z
Learning: The log-surgeon codebase follows a design approach where function contracts (like `ErrorCode::Success` guaranteeing a valid token) are trusted, and contract violations are allowed to throw exceptions rather than being explicitly checked at every call site.

Applied to files:

  • tests/test-buffer-parser.cpp
📚 Learning: 2025-05-01T14:47:57.016Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 106
File: src/log_surgeon/Lexer.hpp:114-114
Timestamp: 2025-05-01T14:47:57.016Z
Learning: When handling error cases in log-surgeon, prefer using the `Result<T, ErrorCode>` type from ystdlib-cpp (https://github.com/y-scope/ystdlib-cpp/blob/main/src/ystdlib/error_handling/Result.hpp) instead of `std::pair<ErrorCode, T>` for better type safety and clearer semantics.

Applied to files:

  • tests/test-buffer-parser.cpp
🧬 Code graph analysis (4)
src/log_surgeon/finite_automata/Capture.hpp (1)
src/log_surgeon/Lexer.hpp (8)
  • nodiscard (153-153)
  • nodiscard (155-157)
  • nodiscard (159-161)
  • nodiscard (163-166)
  • nodiscard (175-181)
  • nodiscard (188-194)
  • nodiscard (202-209)
  • nodiscard (217-236)
tests/test-schema.cpp (3)
src/log_surgeon/finite_automata/Nfa.hpp (1)
  • captures (59-63)
src/log_surgeon/Schema.hpp (1)
  • var_schema (40-40)
src/log_surgeon/LogParser.hpp (1)
  • schema_ast (158-158)
src/log_surgeon/LogEvent.cpp (1)
src/log_surgeon/Lexer.hpp (5)
  • rule_id (65-66)
  • rule_id (73-74)
  • rule_id (175-176)
  • capture (188-189)
  • capture (217-218)
tests/test-buffer-parser.cpp (2)
src/log_surgeon/Lexer.hpp (2)
  • capture (188-189)
  • capture (217-218)
tests/test-reader-parser.cpp (3)
  • parse_and_validate (66-70)
  • parse_and_validate (78-178)
  • parse_and_validate (78-82)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: build (macos-15, debug)
  • GitHub Check: build (ubuntu-22.04, release)
  • GitHub Check: build (ubuntu-24.04, release)
  • GitHub Check: build (ubuntu-24.04, debug)
  • GitHub Check: lint (macos-15)
  • GitHub Check: lint (ubuntu-24.04)
🔇 Additional comments (12)
src/log_surgeon/finite_automata/Capture.hpp (1)

12-12: LGTM! Return type change is appropriate.

The change from std::string_view to std::string const& is correct and aligns with the new pointer-based API design. Returning a const reference to the internal string is appropriate since callers access captures via const pointers with managed lifetimes.

tests/test-schema.cpp (3)

3-3: LGTM! Necessary includes for the new test case.

The vector include and using declaration are appropriately added to support the new non_unique_capture_names test case.

Also applies to: 22-22


54-56: LGTM! Lambda improvements align with coding standards.

The explicit void return type and use of std::ignore for the dynamic_cast result improve code clarity and align with past review feedback.


189-218: LGTM! Comprehensive test coverage for non-unique capture names.

The new test case thoroughly validates the behaviour when multiple variables share the same capture group name. The test:

  • Creates multiple variables with the same capture name "cap_name"
  • Validates schema parsing for each variable individually
  • Confirms capture name consistency across variables
  • Provides good coverage for the non-unique capture names feature
src/log_surgeon/Lexer.tpp (2)

440-446: LGTM! Efficient capture aggregation with pointer-based mapping.

The refactored logic correctly builds per-rule capture collections using the pointer as the unique key. The use of try_emplace with a cached reference avoids repeated hash lookups, as addressed in previous reviews.


451-451: LGTM! Correct transition to pointer-based capture mapping.

The change to use the Capture pointer directly as the map key (instead of a capture ID) aligns with the broader refactor to support non-unique capture names.

src/log_surgeon/LogEvent.cpp (2)

68-74: LGTM! Correct transition to pointer-based capture retrieval.

The refactored logic correctly uses get_captures_from_rule_id to retrieve captures and iterates over the returned Capture const* pointers, aligning with the new API design.


77-77: LGTM! Efficient capture data access.

The use of get_reg_ids_from_capture(capture) and capture->get_name() with const reference binding correctly implements the pointer-based API while avoiding unnecessary copies, as addressed in previous reviews.

Also applies to: 89-89

tests/test-buffer-parser.cpp (4)

25-25: LGTM! Type updates align with the new data model.

The addition of using std::pair; and the updates to struct field types (using vector, string_view, pair, etc.) correctly align with the refactored capture handling that uses pointer-based access.

Also applies to: 33-40, 44-46


61-62: LGTM! Function signatures modernized.

The use of string_view parameters is consistent with modern C++ practices and improves efficiency by avoiding unnecessary string copies.

Also applies to: 73-74


124-149: LGTM! Refactored capture validation correctly implements pointer-based API.

The refactored logic correctly:

  • Uses get_captures_from_rule_id to retrieve captures
  • Validates capture count before index access (Line 131, addressing past review feedback)
  • Iterates captures by index and validates names and positions
  • Applies the documented truncation workaround for the known bug in start positions (Lines 147-149, tracked in issue Buffer parser: Start positions incorrectly include failed match starts #194 per past review discussion)

1073-1145: LGTM! Comprehensive test coverage for non-unique capture names.

The new test case thoroughly validates the feature:

  • Tests two variables sharing the same capture name "capture"
  • Variable var1 contains two captures with identical names
  • Variable var2 contains one capture with the same name
  • Expected positions correctly track each capture's location
  • Validates the core functionality of non-unique capture name support

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/log_surgeon/finite_automata/Nfa.hpp (1)

62-69: Update stale NOTE about unique capture names

The class comment still says “It is assumed that all capture groups have unique names, even across different rules.”, but the new design uses Capture const* as the identity key (see m_capture_to_tag_id_pair) specifically to support non‑unique capture names.

This NOTE is now misleading; please either remove it or update it to describe the current pointer‑based identity model so future readers are not confused about whether non‑unique names are allowed.

src/log_surgeon/Lexer.hpp (1)

76-80: Adjust generate() documentation to reflect allowed non‑unique capture names

The comment for generate() still documents:

@throw std::invalid_argument If multiple captures with the same name are found in the rules.

With the move to Capture* identity and the new per‑rule / per‑capture maps, non‑unique capture names across rules are now supported and no such exception is thrown here anymore.

Please update or remove this @throw clause so that the public contract matches the current behaviour (and, if duplicates are still disallowed in some narrower case, clarify exactly which scenario remains invalid).

src/log_surgeon/finite_automata/Capture.hpp (1)

4-7: Ajouter <cstdint> et harmoniser la signature de set_context

Capture.hpp utilise uint32_t à la ligne 13 sans inclure <cstdint>, ce qui rend l'en-tête non autonome et risque de causer des erreurs de compilation s'il est inclus seul. De plus, set_context() omet le type de retour avec flèche (-> void), ce qui est incompatible avec la convention du projet (voir get_name() ligne 18, et d'autres fichiers comme Lexer.hpp).

Corrections requises :

  • Ajouter #include <cstdint> aux lignes 4-7
  • Remplacer auto set_context(std::string rule_name, uint32_t pos) { par auto set_context(std::string rule_name, uint32_t pos) -> void {
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1135c2e and 8e3b649.

📒 Files selected for processing (9)
  • src/log_surgeon/Lexer.hpp (4 hunks)
  • src/log_surgeon/Lexer.tpp (1 hunks)
  • src/log_surgeon/LogEvent.cpp (2 hunks)
  • src/log_surgeon/finite_automata/Capture.hpp (1 hunks)
  • src/log_surgeon/finite_automata/Nfa.hpp (2 hunks)
  • src/log_surgeon/types.hpp (0 hunks)
  • tests/test-buffer-parser.cpp (5 hunks)
  • tests/test-reader-parser.cpp (4 hunks)
  • tests/test-schema.cpp (3 hunks)
💤 Files with no reviewable changes (1)
  • src/log_surgeon/types.hpp
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,h,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/finite_automata/Nfa.hpp
  • tests/test-schema.cpp
  • src/log_surgeon/LogEvent.cpp
  • src/log_surgeon/finite_automata/Capture.hpp
  • tests/test-buffer-parser.cpp
  • src/log_surgeon/Lexer.hpp
  • tests/test-reader-parser.cpp
🧠 Learnings (24)
📓 Common learnings
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.cpp:159-177
Timestamp: 2025-08-25T20:44:48.955Z
Learning: In the log-surgeon codebase, thread safety issues with global state like NonTerminal::m_next_children_start should be addressed comprehensively in dedicated PRs rather than fixed piecemeal in individual feature PRs. The user SharafMohamed prefers to defer such systemic architectural issues to separate PRs.
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...
📚 Learning: 2024-11-13T20:02:13.737Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/finite_automata/Nfa.hpp
  • tests/test-schema.cpp
  • src/log_surgeon/finite_automata/Capture.hpp
  • tests/test-buffer-parser.cpp
  • src/log_surgeon/Lexer.hpp
  • tests/test-reader-parser.cpp
📚 Learning: 2024-11-18T16:45:46.073Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.073Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/finite_automata/Nfa.hpp
  • src/log_surgeon/finite_automata/Capture.hpp
  • tests/test-buffer-parser.cpp
  • src/log_surgeon/Lexer.hpp
📚 Learning: 2025-10-22T15:40:29.992Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/LogEvent.cpp
  • src/log_surgeon/finite_automata/Capture.hpp
  • tests/test-buffer-parser.cpp
  • src/log_surgeon/Lexer.hpp
  • tests/test-reader-parser.cpp
📚 Learning: 2025-09-03T16:45:58.451Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.hpp:119-132
Timestamp: 2025-09-03T16:45:58.451Z
Learning: In log-surgeon's wildcard_query_parser, the DFA intersection process used in Query::get_matching_variable_types loses track of variable type priorities from the original ByteLexer. Priority information cannot be preserved by simply changing the return type from std::set to std::vector - it would require post-processing after the DFA intersection.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • tests/test-schema.cpp
  • tests/test-buffer-parser.cpp
  • src/log_surgeon/Lexer.hpp
📚 Learning: 2024-10-24T15:54:35.193Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:442-456
Timestamp: 2024-10-24T15:54:35.193Z
Learning: In the C++ file `src/log_surgeon/finite_automata/RegexNFA.hpp`, for the `RegexNFA::serialize()` function, prioritize code clarity over efficiency when handling string operations.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/finite_automata/Nfa.hpp
  • tests/test-schema.cpp
  • src/log_surgeon/Lexer.hpp
📚 Learning: 2024-10-24T15:54:19.228Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:99-105
Timestamp: 2024-10-24T15:54:19.228Z
Learning: In `src/log_surgeon/finite_automata/RegexNFA.hpp`, it's acceptable to have constructors without the `explicit` specifier. Do not suggest adding `explicit` to constructors in this file.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/finite_automata/Nfa.hpp
  • tests/test-schema.cpp
  • src/log_surgeon/Lexer.hpp
📚 Learning: 2025-08-26T10:06:22.914Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.hpp:9-11
Timestamp: 2025-08-26T10:06:22.914Z
Learning: In y-scope/log-surgeon project, it's acceptable to include headers like log_surgeon/Lexer.hpp directly in header files rather than using forward declarations, even when only references are used in the interface. The project prefers the simplicity of direct includes over header coupling optimization through forward declarations.

Applied to files:

  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/Lexer.hpp
📚 Learning: 2024-11-02T09:18:31.046Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-02T09:18:31.046Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.

Applied to files:

  • src/log_surgeon/finite_automata/Nfa.hpp
  • src/log_surgeon/Lexer.hpp
📚 Learning: 2024-11-27T22:25:35.608Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T22:25:35.608Z
Learning: In the `RegisterHandler` class in `src/log_surgeon/finite_automata/RegisterHandler.hpp`, the methods `add_register` and `append_position` rely on `emplace_back` and `m_prefix_tree.insert` to handle exceptions correctly and maintain consistent state without requiring additional exception handling.

Applied to files:

  • src/log_surgeon/finite_automata/Nfa.hpp
  • tests/test-schema.cpp
  • src/log_surgeon/finite_automata/Capture.hpp
  • tests/test-buffer-parser.cpp
  • src/log_surgeon/Lexer.hpp
  • tests/test-reader-parser.cpp
📚 Learning: 2024-11-27T21:56:13.425Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T21:56:13.425Z
Learning: In the `log_surgeon` project, header guards in C++ header files should include `_HPP` at the end to match the filename. For example, in `RegisterHandler.hpp`, the header guard should be `LOG_SURGEON_FINITE_AUTOMATA_REGISTER_HANDLER_HPP`.

Applied to files:

  • src/log_surgeon/finite_automata/Nfa.hpp
  • src/log_surgeon/Lexer.hpp
📚 Learning: 2024-11-02T09:13:56.755Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-02T09:13:56.755Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.

Applied to files:

  • src/log_surgeon/finite_automata/Nfa.hpp
  • src/log_surgeon/Lexer.hpp
📚 Learning: 2025-08-15T12:07:58.626Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-expression-view.cpp:7-7
Timestamp: 2025-08-15T12:07:58.626Z
Learning: In tests/test-expression-view.cpp, the `<catch2/catch_message.hpp>` header is required for clang-tidy to pass, even though it may not be directly used in the visible code.

Applied to files:

  • tests/test-schema.cpp
  • tests/test-buffer-parser.cpp
  • tests/test-reader-parser.cpp
📚 Learning: 2025-08-08T13:30:25.172Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: tests/comparison_test_utils.hpp:0-0
Timestamp: 2025-08-08T13:30:25.172Z
Learning: In tests/comparison_test_utils.hpp (C++), all comparison helper templates (test_equal, test_greater_than, test_less_than, pairwise_comparison_of_strictly_ascending_vector) must be constrained with the StronglyThreeWayComparable concept on both declarations and definitions. Maintain this constraint in future changes.

Applied to files:

  • tests/test-schema.cpp
📚 Learning: 2025-08-13T12:05:00.245Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp:58-77
Timestamp: 2025-08-13T12:05:00.245Z
Learning: In src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp, the generate_regex_string() method should not enforce well-formedness checks via assertions. The is_well_formed() method is intended to be used by callers at their discretion, allowing flexibility to generate regex strings from malformed views if desired.

Applied to files:

  • tests/test-schema.cpp
📚 Learning: 2025-08-13T12:06:36.584Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp:58-77
Timestamp: 2025-08-13T12:06:36.584Z
Learning: In src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp, caching SchemaParser::get_special_regex_characters() with a const& reference provides no performance benefit, indicating the method likely returns a reference to an existing container rather than performing expensive computations.

Applied to files:

  • tests/test-schema.cpp
📚 Learning: 2025-08-26T10:13:00.368Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: tests/test-query.cpp:61-67
Timestamp: 2025-08-26T10:13:00.368Z
Learning: In Catch2 unit tests, the REQUIRE macro already provides detailed debugging output when container comparisons (like std::set equality) fail, showing both expected and actual values. Additional CAPTURE statements are typically unnecessary for such comparisons.

Applied to files:

  • tests/test-schema.cpp
  • tests/test-buffer-parser.cpp
  • tests/test-reader-parser.cpp
📚 Learning: 2024-11-13T22:38:19.472Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.

Applied to files:

  • tests/test-schema.cpp
  • src/log_surgeon/Lexer.hpp
📚 Learning: 2025-08-08T10:00:20.963Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:00:20.963Z
Learning: In src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp (C++), do not default the comparison operators: the class stores std::vector<std::variant<StaticQueryToken, VariableQueryToken>>, which yields only a weak ordering. A custom operator<=> that maps the variant’s weak ordering to std::strong_ordering is required; avoid suggesting defaulting there in future reviews.

Applied to files:

  • tests/test-buffer-parser.cpp
📚 Learning: 2025-08-08T10:17:43.495Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:17:43.495Z
Learning: In src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp/.cpp (C++), do not suggest defaulting operator<=>. The project prefers a custom out-of-line comparator that explicitly handles the bool member (via explicit cast) to avoid implicit conversions; keep the current manual implementation.

Applied to files:

  • tests/test-buffer-parser.cpp
📚 Learning: 2025-05-05T14:55:34.455Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 106
File: src/log_surgeon/Lalr1Parser.tpp:661-665
Timestamp: 2025-05-05T14:55:34.455Z
Learning: The log-surgeon codebase follows a design approach where function contracts (like `ErrorCode::Success` guaranteeing a valid token) are trusted, and contract violations are allowed to throw exceptions rather than being explicitly checked at every call site.

Applied to files:

  • tests/test-buffer-parser.cpp
  • tests/test-reader-parser.cpp
📚 Learning: 2025-05-01T14:47:57.016Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 106
File: src/log_surgeon/Lexer.hpp:114-114
Timestamp: 2025-05-01T14:47:57.016Z
Learning: When handling error cases in log-surgeon, prefer using the `Result<T, ErrorCode>` type from ystdlib-cpp (https://github.com/y-scope/ystdlib-cpp/blob/main/src/ystdlib/error_handling/Result.hpp) instead of `std::pair<ErrorCode, T>` for better type safety and clearer semantics.

Applied to files:

  • tests/test-buffer-parser.cpp
  • tests/test-reader-parser.cpp
📚 Learning: 2025-11-06T11:16:52.917Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 184
File: tests/test-reader-parser.cpp:150-154
Timestamp: 2025-11-06T11:16:52.917Z
Learning: In Catch2 test files for the y-scope/log-surgeon repository, an explicit `if (false == optional.has_value()) { return; }` check may be required after `REQUIRE(optional.has_value())` to prevent clang-tidy errors, even though the code is unreachable at runtime. clang-tidy's static analyzer doesn't recognize that REQUIRE aborts execution and may flag unsafe optional access without the explicit check.

Applied to files:

  • tests/test-buffer-parser.cpp
  • tests/test-reader-parser.cpp
📚 Learning: 2024-11-13T22:25:54.168Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: tests/test-tag.cpp:10-10
Timestamp: 2024-11-13T22:25:54.168Z
Learning: In the log-surgeon codebase (C++), particularly in the finite automata components involving the `Tag` class (`src/log_surgeon/finite_automata/Tag.hpp`), it's important to ensure that `Tag*` pointers in other objects cannot be `nullptr`. Test cases should focus on validating that these `Tag*` pointers are not null where they are used, and handle `nullptr` appropriately.

Applied to files:

  • src/log_surgeon/Lexer.hpp
  • tests/test-reader-parser.cpp
🧬 Code graph analysis (7)
src/log_surgeon/finite_automata/Nfa.hpp (3)
src/log_surgeon/Buffer.hpp (2)
  • curr_pos (43-43)
  • curr_pos (43-43)
src/log_surgeon/finite_automata/NfaState.hpp (6)
  • nodiscard (108-108)
  • nodiscard (110-110)
  • nodiscard (112-114)
  • nodiscard (116-119)
  • nodiscard (121-123)
  • nodiscard (125-125)
src/log_surgeon/LexicalRule.hpp (3)
  • nodiscard (28-30)
  • nodiscard (32-32)
  • nodiscard (34-37)
tests/test-schema.cpp (1)
src/log_surgeon/Schema.hpp (1)
  • var_schema (40-40)
src/log_surgeon/LogEvent.cpp (2)
src/log_surgeon/Lexer.hpp (5)
  • rule_id (65-66)
  • rule_id (73-74)
  • rule_id (175-176)
  • capture (188-189)
  • capture (217-218)
src/log_surgeon/finite_automata/Nfa.hpp (3)
  • captures (95-99)
  • capture (111-115)
  • capture (152-153)
src/log_surgeon/finite_automata/Capture.hpp (2)
src/log_surgeon/Lexer.hpp (8)
  • nodiscard (153-153)
  • nodiscard (155-157)
  • nodiscard (159-161)
  • nodiscard (163-166)
  • nodiscard (175-181)
  • nodiscard (188-194)
  • nodiscard (202-209)
  • nodiscard (217-236)
src/log_surgeon/finite_automata/Nfa.hpp (4)
  • nodiscard (49-51)
  • nodiscard (53-55)
  • nodiscard (137-137)
  • nodiscard (139-142)
tests/test-buffer-parser.cpp (2)
src/log_surgeon/Lexer.hpp (2)
  • capture (188-189)
  • capture (217-218)
tests/test-reader-parser.cpp (3)
  • parse_and_validate (66-70)
  • parse_and_validate (78-178)
  • parse_and_validate (78-82)
src/log_surgeon/Lexer.hpp (1)
src/log_surgeon/finite_automata/Nfa.hpp (6)
  • nodiscard (49-51)
  • nodiscard (53-55)
  • nodiscard (137-137)
  • nodiscard (139-142)
  • capture (111-115)
  • capture (152-153)
tests/test-reader-parser.cpp (1)
src/log_surgeon/Lexer.hpp (2)
  • capture (188-189)
  • capture (217-218)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: lint (macos-15)
  • GitHub Check: build (macos-15, debug)
  • GitHub Check: lint (ubuntu-24.04)
  • GitHub Check: build (ubuntu-24.04, release)
  • GitHub Check: build (macos-15, release)
  • GitHub Check: build (ubuntu-22.04, debug)
🔇 Additional comments (7)
tests/test-schema.cpp (1)

3-3: Non‑unique capture names test covers the intended AST behaviour

The new non_unique_capture_names test, together with the added <vector> include and using std::vector, cleanly exercises the scenario where several variables reuse the same capture name. It mirrors the existing add_underscore_name pattern and verifies both the variable name and that get_subtree_positive_captures() returns a single capture with the expected name.

No issues from a correctness or style point of view.

Also applies to: 22-22, 187-216

tests/test-reader-parser.cpp (2)

146-173: Updated capture validation correctly follows the new Lexer API

The capture‑validation block in parse_and_validate has been cleanly migrated:

  • lexer.get_captures_from_rule_id(token_type) replaces the old capture‑ID lookup and is guarded by both REQUIRE(optional_captures.has_value()) and an explicit early return to keep clang‑tidy happy.
  • Iteration is now over Capture const*, with names obtained via capture->get_name() and checked against the expected_captures map.
  • lexer.get_reg_ids_from_capture(capture) is used to fetch the start/end register IDs, again with both a REQUIRE and a defensive early return before using .value().

This preserves the previous semantics while tying the test to the new pointer‑based capture bookkeeping.


204-207: Delimiter schema updates align tests with new delimiter parsing

The schema examples and constants have been updated from escaped \[ to raw [ in:

  • The documentation block’s delimiters: \n\r[:, example.
  • cDelimitersSchema in both single_line_without_capture_reader_parser and reader_parser_wrap_around.

This matches the stated change that special characters are no longer escaped in the delimiter string and keeps tests in sync with the parser behaviour.

Also applies to: 230-231, 274-275

src/log_surgeon/Lexer.hpp (1)

168-181: Capture pointer APIs and backing maps are consistent and usable

The new capture‑oriented APIs on Lexer:

  • get_captures_from_rule_id(rule_id_t)optional<vector<Capture const*>>
  • get_tag_id_pair_from_capture(Capture const*)optional<pair<tag_id_t, tag_id_t>>
  • get_reg_ids_from_capture(Capture const*)optional<pair<reg_id_t, reg_id_t>>

together with the private maps

  • m_rule_id_to_capture
  • m_capture_to_tag_id_pair

provide a coherent pointer‑based view over captures, tags, and registers. The implementations correctly:

  • Distinguish “no captures for this rule” via std::nullopt instead of an empty vector.
  • Use Capture const* as the unique key for both tag ID and reg ID derivation, matching the new NFA mapping.
  • Respect the existing false == ... style for optionals.

This design should make it straightforward for consumers (e.g., LogEvent, tests) to work with multiple captures sharing the same name while still getting deterministic start/end register IDs.

Also applies to: 188-236, 268-271

tests/test-buffer-parser.cpp (3)

25-25: LGTM: Type refactoring aligns with ordered capture semantics.

The switch from std::map<string, CapturePositions> to vector<pair<string, CapturePositions>> for ExpectedToken.m_captures correctly reflects the requirement to maintain canonical capture ordering as described in the PR objectives.

Also applies to: 33-34, 38-40, 44-46, 61-62, 73-74


127-129: LGTM: Defensive check for clang-tidy.

The explicit has_value() check and early return after REQUIRE follows the established pattern for clang-tidy compliance in this codebase.

Based on learnings.


1111-1144: LGTM: Well-structured test for non-unique capture names.

The new test case effectively validates the core feature of this PR. The test data correctly represents multiple captures with the same name "capture" across two variable rules, and the expected positions are properly specified as ordered pairs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants