-
Notifications
You must be signed in to change notification settings - Fork 10
feat: Support non-unique capture names (fixes #180). #193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support non-unique capture names (fixes #180). #193
Conversation
WalkthroughReplaces ID-based capture APIs with pointer-based Capture const* APIs across Lexer and tests, removes the public Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Consumer (code/tests)
participant Lexer
participant Capture as Capture*
participant TagMap as m_capture_to_tag_id_pair
participant RegResolver as reg-id resolution
Caller->>Lexer: get_captures_from_rule_id(rule_id)
alt captures exist
Lexer-->>Caller: optional<vector<Capture const*>>
loop per Capture*
Caller->>Lexer: get_tag_id_pair_from_capture(capture*)
Lexer->>TagMap: lookup(capture*)
TagMap-->>Lexer: optional<pair<tag_id,tag_id>>
Lexer-->>Caller: optional<pair<tag_id,tag_id>>
Caller->>Lexer: get_reg_ids_from_capture(capture*)
Lexer->>Lexer: get_tag_id_pair_from_capture(capture*)
Lexer->>RegResolver: get_reg_id_from_tag_id(tag_id)
RegResolver-->>Lexer: reg_id(s)
Lexer-->>Caller: optional<pair<reg_id,reg_id>>
Caller->>Capture: get_name()
Capture-->>Caller: string const&
end
else no captures
Lexer-->>Caller: std::nullopt
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (6)
💤 Files with no reviewable changes (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.{cpp,h,hpp,java,js,jsx,tpp,ts,tsx}⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (29)📓 Common learnings📚 Learning: 2024-11-13T20:02:13.737ZApplied to files:
📚 Learning: 2024-11-18T16:45:46.073ZApplied to files:
📚 Learning: 2025-10-22T15:40:29.992ZApplied to files:
📚 Learning: 2025-09-03T16:45:58.451ZApplied to files:
📚 Learning: 2024-10-24T15:54:19.228ZApplied to files:
📚 Learning: 2024-10-24T15:54:35.193ZApplied to files:
📚 Learning: 2025-08-26T10:06:22.914ZApplied to files:
📚 Learning: 2024-11-13T22:38:19.472ZApplied to files:
📚 Learning: 2025-08-25T20:44:48.955ZApplied to files:
📚 Learning: 2025-08-08T13:18:39.895ZApplied to files:
📚 Learning: 2025-08-08T10:22:26.739ZApplied to files:
📚 Learning: 2025-08-08T10:23:06.281ZApplied to files:
📚 Learning: 2025-08-08T10:21:56.571ZApplied to files:
📚 Learning: 2025-08-13T12:06:36.584ZApplied to files:
📚 Learning: 2025-08-08T10:00:20.963ZApplied to files:
📚 Learning: 2024-11-27T22:25:35.608ZApplied to files:
📚 Learning: 2025-08-08T13:30:25.172ZApplied to files:
📚 Learning: 2025-08-15T12:07:58.626ZApplied to files:
📚 Learning: 2025-08-13T12:05:00.245ZApplied to files:
📚 Learning: 2024-10-11T16:16:02.866ZApplied to files:
📚 Learning: 2025-08-08T10:17:43.495ZApplied to files:
📚 Learning: 2025-08-26T10:13:00.368ZApplied to files:
📚 Learning: 2024-11-13T22:25:54.168ZApplied to files:
📚 Learning: 2025-08-18T12:05:55.905ZApplied to files:
📚 Learning: 2025-11-06T11:16:52.917ZApplied to files:
📚 Learning: 2025-08-15T00:13:46.717ZApplied to files:
📚 Learning: 2025-05-05T14:55:34.455ZApplied to files:
📚 Learning: 2025-05-01T14:47:57.016ZApplied to files:
🧬 Code graph analysis (4)src/log_surgeon/finite_automata/Capture.hpp (1)
tests/test-schema.cpp (3)
src/log_surgeon/LogEvent.cpp (1)
tests/test-buffer-parser.cpp (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
🔇 Additional comments (12)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/log_surgeon/finite_automata/Nfa.hpp (1)
62-69: Update stale NOTE about unique capture namesThe class comment still says “It is assumed that all capture groups have unique names, even across different rules.”, but the new design uses
Capture const*as the identity key (seem_capture_to_tag_id_pair) specifically to support non‑unique capture names.This NOTE is now misleading; please either remove it or update it to describe the current pointer‑based identity model so future readers are not confused about whether non‑unique names are allowed.
src/log_surgeon/Lexer.hpp (1)
76-80: Adjustgenerate()documentation to reflect allowed non‑unique capture namesThe comment for
generate()still documents:
@throw std::invalid_argument If multiple captures with the same name are found in the rules.With the move to
Capture*identity and the new per‑rule / per‑capture maps, non‑unique capture names across rules are now supported and no such exception is thrown here anymore.Please update or remove this
@throwclause so that the public contract matches the current behaviour (and, if duplicates are still disallowed in some narrower case, clarify exactly which scenario remains invalid).src/log_surgeon/finite_automata/Capture.hpp (1)
4-7: Ajouter<cstdint>et harmoniser la signature deset_context
Capture.hpputiliseuint32_tà la ligne 13 sans inclure<cstdint>, ce qui rend l'en-tête non autonome et risque de causer des erreurs de compilation s'il est inclus seul. De plus,set_context()omet le type de retour avec flèche (-> void), ce qui est incompatible avec la convention du projet (voirget_name()ligne 18, et d'autres fichiers commeLexer.hpp).Corrections requises :
- Ajouter
#include <cstdint>aux lignes 4-7- Remplacer
auto set_context(std::string rule_name, uint32_t pos) {parauto set_context(std::string rule_name, uint32_t pos) -> void {
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (9)
src/log_surgeon/Lexer.hpp(4 hunks)src/log_surgeon/Lexer.tpp(1 hunks)src/log_surgeon/LogEvent.cpp(2 hunks)src/log_surgeon/finite_automata/Capture.hpp(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp(2 hunks)src/log_surgeon/types.hpp(0 hunks)tests/test-buffer-parser.cpp(5 hunks)tests/test-reader-parser.cpp(4 hunks)tests/test-schema.cpp(3 hunks)
💤 Files with no reviewable changes (1)
- src/log_surgeon/types.hpp
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,h,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
src/log_surgeon/Lexer.tppsrc/log_surgeon/finite_automata/Nfa.hpptests/test-schema.cppsrc/log_surgeon/LogEvent.cppsrc/log_surgeon/finite_automata/Capture.hpptests/test-buffer-parser.cppsrc/log_surgeon/Lexer.hpptests/test-reader-parser.cpp
🧠 Learnings (24)
📓 Common learnings
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.cpp:159-177
Timestamp: 2025-08-25T20:44:48.955Z
Learning: In the log-surgeon codebase, thread safety issues with global state like NonTerminal::m_next_children_start should be addressed comprehensively in dedicated PRs rather than fixed piecemeal in individual feature PRs. The user SharafMohamed prefers to defer such systemic architectural issues to separate PRs.
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...
📚 Learning: 2024-11-13T20:02:13.737Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Applied to files:
src/log_surgeon/Lexer.tppsrc/log_surgeon/finite_automata/Nfa.hpptests/test-schema.cppsrc/log_surgeon/finite_automata/Capture.hpptests/test-buffer-parser.cppsrc/log_surgeon/Lexer.hpptests/test-reader-parser.cpp
📚 Learning: 2024-11-18T16:45:46.073Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.073Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Applied to files:
src/log_surgeon/Lexer.tppsrc/log_surgeon/finite_automata/Nfa.hppsrc/log_surgeon/finite_automata/Capture.hpptests/test-buffer-parser.cppsrc/log_surgeon/Lexer.hpp
📚 Learning: 2025-10-22T15:40:29.992Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...
Applied to files:
src/log_surgeon/Lexer.tppsrc/log_surgeon/LogEvent.cppsrc/log_surgeon/finite_automata/Capture.hpptests/test-buffer-parser.cppsrc/log_surgeon/Lexer.hpptests/test-reader-parser.cpp
📚 Learning: 2025-09-03T16:45:58.451Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.hpp:119-132
Timestamp: 2025-09-03T16:45:58.451Z
Learning: In log-surgeon's wildcard_query_parser, the DFA intersection process used in Query::get_matching_variable_types loses track of variable type priorities from the original ByteLexer. Priority information cannot be preserved by simply changing the return type from std::set to std::vector - it would require post-processing after the DFA intersection.
Applied to files:
src/log_surgeon/Lexer.tpptests/test-schema.cpptests/test-buffer-parser.cppsrc/log_surgeon/Lexer.hpp
📚 Learning: 2024-10-24T15:54:35.193Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:442-456
Timestamp: 2024-10-24T15:54:35.193Z
Learning: In the C++ file `src/log_surgeon/finite_automata/RegexNFA.hpp`, for the `RegexNFA::serialize()` function, prioritize code clarity over efficiency when handling string operations.
Applied to files:
src/log_surgeon/Lexer.tppsrc/log_surgeon/finite_automata/Nfa.hpptests/test-schema.cppsrc/log_surgeon/Lexer.hpp
📚 Learning: 2024-10-24T15:54:19.228Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:99-105
Timestamp: 2024-10-24T15:54:19.228Z
Learning: In `src/log_surgeon/finite_automata/RegexNFA.hpp`, it's acceptable to have constructors without the `explicit` specifier. Do not suggest adding `explicit` to constructors in this file.
Applied to files:
src/log_surgeon/Lexer.tppsrc/log_surgeon/finite_automata/Nfa.hpptests/test-schema.cppsrc/log_surgeon/Lexer.hpp
📚 Learning: 2025-08-26T10:06:22.914Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.hpp:9-11
Timestamp: 2025-08-26T10:06:22.914Z
Learning: In y-scope/log-surgeon project, it's acceptable to include headers like log_surgeon/Lexer.hpp directly in header files rather than using forward declarations, even when only references are used in the interface. The project prefers the simplicity of direct includes over header coupling optimization through forward declarations.
Applied to files:
src/log_surgeon/Lexer.tppsrc/log_surgeon/Lexer.hpp
📚 Learning: 2024-11-02T09:18:31.046Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-02T09:18:31.046Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
Applied to files:
src/log_surgeon/finite_automata/Nfa.hppsrc/log_surgeon/Lexer.hpp
📚 Learning: 2024-11-27T22:25:35.608Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T22:25:35.608Z
Learning: In the `RegisterHandler` class in `src/log_surgeon/finite_automata/RegisterHandler.hpp`, the methods `add_register` and `append_position` rely on `emplace_back` and `m_prefix_tree.insert` to handle exceptions correctly and maintain consistent state without requiring additional exception handling.
Applied to files:
src/log_surgeon/finite_automata/Nfa.hpptests/test-schema.cppsrc/log_surgeon/finite_automata/Capture.hpptests/test-buffer-parser.cppsrc/log_surgeon/Lexer.hpptests/test-reader-parser.cpp
📚 Learning: 2024-11-27T21:56:13.425Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T21:56:13.425Z
Learning: In the `log_surgeon` project, header guards in C++ header files should include `_HPP` at the end to match the filename. For example, in `RegisterHandler.hpp`, the header guard should be `LOG_SURGEON_FINITE_AUTOMATA_REGISTER_HANDLER_HPP`.
Applied to files:
src/log_surgeon/finite_automata/Nfa.hppsrc/log_surgeon/Lexer.hpp
📚 Learning: 2024-11-02T09:13:56.755Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-02T09:13:56.755Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Applied to files:
src/log_surgeon/finite_automata/Nfa.hppsrc/log_surgeon/Lexer.hpp
📚 Learning: 2025-08-15T12:07:58.626Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-expression-view.cpp:7-7
Timestamp: 2025-08-15T12:07:58.626Z
Learning: In tests/test-expression-view.cpp, the `<catch2/catch_message.hpp>` header is required for clang-tidy to pass, even though it may not be directly used in the visible code.
Applied to files:
tests/test-schema.cpptests/test-buffer-parser.cpptests/test-reader-parser.cpp
📚 Learning: 2025-08-08T13:30:25.172Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: tests/comparison_test_utils.hpp:0-0
Timestamp: 2025-08-08T13:30:25.172Z
Learning: In tests/comparison_test_utils.hpp (C++), all comparison helper templates (test_equal, test_greater_than, test_less_than, pairwise_comparison_of_strictly_ascending_vector) must be constrained with the StronglyThreeWayComparable concept on both declarations and definitions. Maintain this constraint in future changes.
Applied to files:
tests/test-schema.cpp
📚 Learning: 2025-08-13T12:05:00.245Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp:58-77
Timestamp: 2025-08-13T12:05:00.245Z
Learning: In src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp, the generate_regex_string() method should not enforce well-formedness checks via assertions. The is_well_formed() method is intended to be used by callers at their discretion, allowing flexibility to generate regex strings from malformed views if desired.
Applied to files:
tests/test-schema.cpp
📚 Learning: 2025-08-13T12:06:36.584Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp:58-77
Timestamp: 2025-08-13T12:06:36.584Z
Learning: In src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp, caching SchemaParser::get_special_regex_characters() with a const& reference provides no performance benefit, indicating the method likely returns a reference to an existing container rather than performing expensive computations.
Applied to files:
tests/test-schema.cpp
📚 Learning: 2025-08-26T10:13:00.368Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: tests/test-query.cpp:61-67
Timestamp: 2025-08-26T10:13:00.368Z
Learning: In Catch2 unit tests, the REQUIRE macro already provides detailed debugging output when container comparisons (like std::set equality) fail, showing both expected and actual values. Additional CAPTURE statements are typically unnecessary for such comparisons.
Applied to files:
tests/test-schema.cpptests/test-buffer-parser.cpptests/test-reader-parser.cpp
📚 Learning: 2024-11-13T22:38:19.472Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Applied to files:
tests/test-schema.cppsrc/log_surgeon/Lexer.hpp
📚 Learning: 2025-08-08T10:00:20.963Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:00:20.963Z
Learning: In src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp (C++), do not default the comparison operators: the class stores std::vector<std::variant<StaticQueryToken, VariableQueryToken>>, which yields only a weak ordering. A custom operator<=> that maps the variant’s weak ordering to std::strong_ordering is required; avoid suggesting defaulting there in future reviews.
Applied to files:
tests/test-buffer-parser.cpp
📚 Learning: 2025-08-08T10:17:43.495Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:17:43.495Z
Learning: In src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp/.cpp (C++), do not suggest defaulting operator<=>. The project prefers a custom out-of-line comparator that explicitly handles the bool member (via explicit cast) to avoid implicit conversions; keep the current manual implementation.
Applied to files:
tests/test-buffer-parser.cpp
📚 Learning: 2025-05-05T14:55:34.455Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 106
File: src/log_surgeon/Lalr1Parser.tpp:661-665
Timestamp: 2025-05-05T14:55:34.455Z
Learning: The log-surgeon codebase follows a design approach where function contracts (like `ErrorCode::Success` guaranteeing a valid token) are trusted, and contract violations are allowed to throw exceptions rather than being explicitly checked at every call site.
Applied to files:
tests/test-buffer-parser.cpptests/test-reader-parser.cpp
📚 Learning: 2025-05-01T14:47:57.016Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 106
File: src/log_surgeon/Lexer.hpp:114-114
Timestamp: 2025-05-01T14:47:57.016Z
Learning: When handling error cases in log-surgeon, prefer using the `Result<T, ErrorCode>` type from ystdlib-cpp (https://github.com/y-scope/ystdlib-cpp/blob/main/src/ystdlib/error_handling/Result.hpp) instead of `std::pair<ErrorCode, T>` for better type safety and clearer semantics.
Applied to files:
tests/test-buffer-parser.cpptests/test-reader-parser.cpp
📚 Learning: 2025-11-06T11:16:52.917Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 184
File: tests/test-reader-parser.cpp:150-154
Timestamp: 2025-11-06T11:16:52.917Z
Learning: In Catch2 test files for the y-scope/log-surgeon repository, an explicit `if (false == optional.has_value()) { return; }` check may be required after `REQUIRE(optional.has_value())` to prevent clang-tidy errors, even though the code is unreachable at runtime. clang-tidy's static analyzer doesn't recognize that REQUIRE aborts execution and may flag unsafe optional access without the explicit check.
Applied to files:
tests/test-buffer-parser.cpptests/test-reader-parser.cpp
📚 Learning: 2024-11-13T22:25:54.168Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: tests/test-tag.cpp:10-10
Timestamp: 2024-11-13T22:25:54.168Z
Learning: In the log-surgeon codebase (C++), particularly in the finite automata components involving the `Tag` class (`src/log_surgeon/finite_automata/Tag.hpp`), it's important to ensure that `Tag*` pointers in other objects cannot be `nullptr`. Test cases should focus on validating that these `Tag*` pointers are not null where they are used, and handle `nullptr` appropriately.
Applied to files:
src/log_surgeon/Lexer.hpptests/test-reader-parser.cpp
🧬 Code graph analysis (7)
src/log_surgeon/finite_automata/Nfa.hpp (3)
src/log_surgeon/Buffer.hpp (2)
curr_pos(43-43)curr_pos(43-43)src/log_surgeon/finite_automata/NfaState.hpp (6)
nodiscard(108-108)nodiscard(110-110)nodiscard(112-114)nodiscard(116-119)nodiscard(121-123)nodiscard(125-125)src/log_surgeon/LexicalRule.hpp (3)
nodiscard(28-30)nodiscard(32-32)nodiscard(34-37)
tests/test-schema.cpp (1)
src/log_surgeon/Schema.hpp (1)
var_schema(40-40)
src/log_surgeon/LogEvent.cpp (2)
src/log_surgeon/Lexer.hpp (5)
rule_id(65-66)rule_id(73-74)rule_id(175-176)capture(188-189)capture(217-218)src/log_surgeon/finite_automata/Nfa.hpp (3)
captures(95-99)capture(111-115)capture(152-153)
src/log_surgeon/finite_automata/Capture.hpp (2)
src/log_surgeon/Lexer.hpp (8)
nodiscard(153-153)nodiscard(155-157)nodiscard(159-161)nodiscard(163-166)nodiscard(175-181)nodiscard(188-194)nodiscard(202-209)nodiscard(217-236)src/log_surgeon/finite_automata/Nfa.hpp (4)
nodiscard(49-51)nodiscard(53-55)nodiscard(137-137)nodiscard(139-142)
tests/test-buffer-parser.cpp (2)
src/log_surgeon/Lexer.hpp (2)
capture(188-189)capture(217-218)tests/test-reader-parser.cpp (3)
parse_and_validate(66-70)parse_and_validate(78-178)parse_and_validate(78-82)
src/log_surgeon/Lexer.hpp (1)
src/log_surgeon/finite_automata/Nfa.hpp (6)
nodiscard(49-51)nodiscard(53-55)nodiscard(137-137)nodiscard(139-142)capture(111-115)capture(152-153)
tests/test-reader-parser.cpp (1)
src/log_surgeon/Lexer.hpp (2)
capture(188-189)capture(217-218)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: lint (macos-15)
- GitHub Check: build (macos-15, debug)
- GitHub Check: lint (ubuntu-24.04)
- GitHub Check: build (ubuntu-24.04, release)
- GitHub Check: build (macos-15, release)
- GitHub Check: build (ubuntu-22.04, debug)
🔇 Additional comments (7)
tests/test-schema.cpp (1)
3-3: Non‑unique capture names test covers the intended AST behaviourThe new
non_unique_capture_namestest, together with the added<vector>include andusing std::vector, cleanly exercises the scenario where several variables reuse the same capture name. It mirrors the existingadd_underscore_namepattern and verifies both the variable name and thatget_subtree_positive_captures()returns a single capture with the expected name.No issues from a correctness or style point of view.
Also applies to: 22-22, 187-216
tests/test-reader-parser.cpp (2)
146-173: Updated capture validation correctly follows the new Lexer APIThe capture‑validation block in
parse_and_validatehas been cleanly migrated:
lexer.get_captures_from_rule_id(token_type)replaces the old capture‑ID lookup and is guarded by bothREQUIRE(optional_captures.has_value())and an explicit early return to keep clang‑tidy happy.- Iteration is now over
Capture const*, with names obtained viacapture->get_name()and checked against theexpected_capturesmap.lexer.get_reg_ids_from_capture(capture)is used to fetch the start/end register IDs, again with both aREQUIREand a defensive early return before using.value().This preserves the previous semantics while tying the test to the new pointer‑based capture bookkeeping.
204-207: Delimiter schema updates align tests with new delimiter parsingThe schema examples and constants have been updated from escaped
\[to raw[in:
- The documentation block’s
delimiters: \n\r[:,example.cDelimitersSchemain bothsingle_line_without_capture_reader_parserandreader_parser_wrap_around.This matches the stated change that special characters are no longer escaped in the delimiter string and keeps tests in sync with the parser behaviour.
Also applies to: 230-231, 274-275
src/log_surgeon/Lexer.hpp (1)
168-181: Capture pointer APIs and backing maps are consistent and usableThe new capture‑oriented APIs on
Lexer:
get_captures_from_rule_id(rule_id_t)→optional<vector<Capture const*>>get_tag_id_pair_from_capture(Capture const*)→optional<pair<tag_id_t, tag_id_t>>get_reg_ids_from_capture(Capture const*)→optional<pair<reg_id_t, reg_id_t>>together with the private maps
m_rule_id_to_capturem_capture_to_tag_id_pairprovide a coherent pointer‑based view over captures, tags, and registers. The implementations correctly:
- Distinguish “no captures for this rule” via
std::nulloptinstead of an empty vector.- Use
Capture const*as the unique key for both tag ID and reg ID derivation, matching the new NFA mapping.- Respect the existing
false == ...style for optionals.This design should make it straightforward for consumers (e.g.,
LogEvent, tests) to work with multiple captures sharing the same name while still getting deterministic start/end register IDs.Also applies to: 188-236, 268-271
tests/test-buffer-parser.cpp (3)
25-25: LGTM: Type refactoring aligns with ordered capture semantics.The switch from
std::map<string, CapturePositions>tovector<pair<string, CapturePositions>>forExpectedToken.m_capturescorrectly reflects the requirement to maintain canonical capture ordering as described in the PR objectives.Also applies to: 33-34, 38-40, 44-46, 61-62, 73-74
127-129: LGTM: Defensive check for clang-tidy.The explicit
has_value()check and early return afterREQUIREfollows the established pattern for clang-tidy compliance in this codebase.Based on learnings.
1111-1144: LGTM: Well-structured test for non-unique capture names.The new test case effectively validates the core feature of this PR. The test data correctly represents multiple captures with the same name "capture" across two variable rules, and the expected positions are properly specified as ordered pairs.
Reference
Addresses #180.
Description
The maps to go from
rule -> captures -> tags -> registerspreviously relied on the capture names as unique keys. To support non-unique capture names an alternative key needs to be used:Consideration needs to be made toward ensuring a canonical interpretation of the regex pattern, specifically when considering the order of captures. Without a canonical interpretation, there will be no way to search a parsed log. Previously, this wasn't a concern as the names were unique identifiers. As pointer values themselves are not canonical, we must instead rely on the ordering in the
m_rule_id_to_capturemap. Specifically, we desire this order to match the order in which the capture groups appear in the literal regex pattern (left to right):m_rule_id_to_capturedepends on the AST matching the literal regex pattern.m_rule_id_to_capturealso depends on the NFA traversal of the AST, it must be depth first leftmost, such that it traverses the capture groups in the same order they appear in the literal regex pattern.Also some small fixes to unit-tests that were not caught in the previous PR after a merge:
\[will error,[is correct).Validation Preformed
New unit-test added with non-unique capture names.
Summary by CodeRabbit
Refactor
Breaking API Change
Tests
✏️ Tip: You can customize this high-level summary in your review settings.