-
Notifications
You must be signed in to change notification settings - Fork 10
feat: Allow for non-escaped hyphens outside of regex ranges (fixes #200). #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@SharafMohamed has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 25 minutes and 2 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughChanges update regex patterns in documentation and source code to replace escaped hyphens with unescaped equivalents in timestamp and numeric patterns. The SchemaParser is refactored to consolidate backslash-based literal cancellation handling through dynamic rule application rather than explicit productions, while retaining functional behaviour. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (19)
docs/schema.md(5 hunks)examples/common.cpp(1 hunks)examples/schema.txt(1 hunks)src/log_surgeon/Constants.hpp(2 hunks)src/log_surgeon/Lalr1Parser.tpp(1 hunks)src/log_surgeon/Lexer.hpp(2 hunks)src/log_surgeon/LogEvent.cpp(4 hunks)src/log_surgeon/LogEvent.hpp(1 hunks)src/log_surgeon/LogParser.cpp(7 hunks)src/log_surgeon/LogParserOutputBuffer.hpp(3 hunks)src/log_surgeon/Parser.tpp(1 hunks)src/log_surgeon/SchemaParser.cpp(1 hunks)src/log_surgeon/Token.cpp(2 hunks)src/log_surgeon/Token.hpp(2 hunks)tests/test-buffer-parser.cpp(24 hunks)tests/test-dfa.cpp(1 hunks)tests/test-nfa.cpp(1 hunks)tests/test-reader-parser.cpp(2 hunks)tests/test-schema.cpp(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,h,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
src/log_surgeon/LogEvent.hppsrc/log_surgeon/Parser.tpptests/test-schema.cppsrc/log_surgeon/Constants.hppsrc/log_surgeon/SchemaParser.cppsrc/log_surgeon/Token.cppsrc/log_surgeon/Token.hppexamples/common.cppsrc/log_surgeon/Lalr1Parser.tppsrc/log_surgeon/LogEvent.cppsrc/log_surgeon/LogParser.cppsrc/log_surgeon/Lexer.hpptests/test-reader-parser.cppsrc/log_surgeon/LogParserOutputBuffer.hpptests/test-nfa.cpptests/test-buffer-parser.cpptests/test-dfa.cpp
🧠 Learnings (27)
📓 Common learnings
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 151
File: tests/test-buffer-parser.cpp:959-959
Timestamp: 2025-08-18T16:46:05.687Z
Learning: In the Log-Surgeon codebase, hyphens need to be escaped with backslashes in regular expressions even outside character classes, which differs from standard regex behavior where hyphens don't require escaping outside character classes.
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 199
File: src/log_surgeon/LogParser.cpp:195-195
Timestamp: 2025-12-03T16:22:52.313Z
Learning: In y-scope/log-surgeon, when reviewing code, do not suggest fixing style issues (such as C-style casts vs. static_cast) in lines that were not modified by the PR. The maintainer prefers to keep PRs focused on their intended changes and defer broader style refactoring to dedicated refactor PRs.
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.cpp:159-177
Timestamp: 2025-08-25T20:44:48.955Z
Learning: In the log-surgeon codebase, thread safety issues with global state like NonTerminal::m_next_children_start should be addressed comprehensively in dedicated PRs rather than fixed piecemeal in individual feature PRs. The user SharafMohamed prefers to defer such systemic architectural issues to separate PRs.
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...
📚 Learning: 2025-10-22T15:40:29.992Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...
Applied to files:
src/log_surgeon/LogEvent.hppexamples/schema.txtsrc/log_surgeon/Constants.hppdocs/schema.mdexamples/common.cppsrc/log_surgeon/LogEvent.cppsrc/log_surgeon/LogParser.cpptests/test-reader-parser.cppsrc/log_surgeon/LogParserOutputBuffer.hpptests/test-buffer-parser.cpp
📚 Learning: 2025-12-03T16:38:15.354Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 199
File: src/log_surgeon/LogParser.cpp:154-155
Timestamp: 2025-12-03T16:38:15.354Z
Learning: In `src/log_surgeon/LogParser.cpp` and related parsing code, when `get_reg_ids_from_capture(capture)` returns an optional, calling `.value()` directly without checking is intentional for captures that should always have register IDs (e.g., `"timestamp"` captures). If the optional is nullopt, it indicates an internal programming error/invariant violation, and throwing `std::bad_optional_access` is the desired behavior rather than defensive error handling with continue/return.
Applied to files:
src/log_surgeon/LogEvent.hppexamples/common.cppsrc/log_surgeon/LogEvent.cppsrc/log_surgeon/LogParser.cppsrc/log_surgeon/Lexer.hpptests/test-reader-parser.cppsrc/log_surgeon/LogParserOutputBuffer.hpptests/test-buffer-parser.cpp
📚 Learning: 2025-12-09T21:17:27.546Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 197
File: src/log_surgeon/finite_automata/RegexAST.hpp:1011-1019
Timestamp: 2025-12-09T21:17:27.546Z
Learning: In C++ files under src/log_surgeon in the y-scope/log-surgeon repository, for const pointer parameters in null checks, Yoda conditions (nullptr == ptr) are not required; standard comparison order (ptr == nullptr) is acceptable since const prevents accidental assignment. Apply this pattern to header and related source files within src/log_surgeon.
Applied to files:
src/log_surgeon/LogEvent.hppsrc/log_surgeon/Constants.hppsrc/log_surgeon/Token.hppsrc/log_surgeon/Lexer.hppsrc/log_surgeon/LogParserOutputBuffer.hpp
📚 Learning: 2024-11-18T16:45:46.073Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.073Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Applied to files:
src/log_surgeon/Parser.tppsrc/log_surgeon/Constants.hppsrc/log_surgeon/Token.cppsrc/log_surgeon/Token.hppsrc/log_surgeon/Lalr1Parser.tppsrc/log_surgeon/Lexer.hpp
📚 Learning: 2025-08-08T10:23:06.281Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp:80-88
Timestamp: 2025-08-08T10:23:06.281Z
Learning: In y-scope/log-surgeon (C++), small function definitions are allowed to remain inline in headers. For src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp, do not suggest moving small methods like QueryInterpretation::append_variable_token to the .cpp for “consistency” in future reviews.
Applied to files:
src/log_surgeon/Parser.tpp
📚 Learning: 2025-09-03T16:45:58.451Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.hpp:119-132
Timestamp: 2025-09-03T16:45:58.451Z
Learning: In log-surgeon's wildcard_query_parser, the DFA intersection process used in Query::get_matching_variable_types loses track of variable type priorities from the original ByteLexer. Priority information cannot be preserved by simply changing the return type from std::set to std::vector - it would require post-processing after the DFA intersection.
Applied to files:
tests/test-schema.cpptests/test-dfa.cpp
📚 Learning: 2025-08-13T12:06:36.584Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp:58-77
Timestamp: 2025-08-13T12:06:36.584Z
Learning: In src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp, caching SchemaParser::get_special_regex_characters() with a const& reference provides no performance benefit, indicating the method likely returns a reference to an existing container rather than performing expensive computations.
Applied to files:
tests/test-schema.cppsrc/log_surgeon/SchemaParser.cpp
📚 Learning: 2025-08-13T12:05:00.245Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp:58-77
Timestamp: 2025-08-13T12:05:00.245Z
Learning: In src/log_surgeon/wildcard_query_parser/WildcardExpressionView.cpp, the generate_regex_string() method should not enforce well-formedness checks via assertions. The is_well_formed() method is intended to be used by callers at their discretion, allowing flexibility to generate regex strings from malformed views if desired.
Applied to files:
tests/test-schema.cpp
📚 Learning: 2024-11-13T20:02:13.737Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Applied to files:
tests/test-schema.cppsrc/log_surgeon/Constants.hppsrc/log_surgeon/Lalr1Parser.tppsrc/log_surgeon/Lexer.hpptests/test-nfa.cpptests/test-dfa.cpp
📚 Learning: 2025-08-08T10:17:43.495Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:17:43.495Z
Learning: In src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp/.cpp (C++), do not suggest defaulting operator<=>. The project prefers a custom out-of-line comparator that explicitly handles the bool member (via explicit cast) to avoid implicit conversions; keep the current manual implementation.
Applied to files:
tests/test-schema.cpp
📚 Learning: 2025-08-08T10:00:20.963Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:00:20.963Z
Learning: In src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp (C++), do not default the comparison operators: the class stores std::vector<std::variant<StaticQueryToken, VariableQueryToken>>, which yields only a weak ordering. A custom operator<=> that maps the variant’s weak ordering to std::strong_ordering is required; avoid suggesting defaulting there in future reviews.
Applied to files:
tests/test-schema.cpptests/test-reader-parser.cpp
📚 Learning: 2025-08-26T10:13:00.368Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: tests/test-query.cpp:61-67
Timestamp: 2025-08-26T10:13:00.368Z
Learning: In Catch2 unit tests, the REQUIRE macro already provides detailed debugging output when container comparisons (like std::set equality) fail, showing both expected and actual values. Additional CAPTURE statements are typically unnecessary for such comparisons.
Applied to files:
tests/test-schema.cpptests/test-reader-parser.cpptests/test-buffer-parser.cpp
📚 Learning: 2025-08-18T16:46:05.687Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 151
File: tests/test-buffer-parser.cpp:959-959
Timestamp: 2025-08-18T16:46:05.687Z
Learning: In the Log-Surgeon codebase, hyphens need to be escaped with backslashes in regular expressions even outside character classes, which differs from standard regex behavior where hyphens don't require escaping outside character classes.
Applied to files:
tests/test-schema.cppsrc/log_surgeon/SchemaParser.cpp
📚 Learning: 2025-08-26T10:06:22.914Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.hpp:9-11
Timestamp: 2025-08-26T10:06:22.914Z
Learning: In y-scope/log-surgeon project, it's acceptable to include headers like log_surgeon/Lexer.hpp directly in header files rather than using forward declarations, even when only references are used in the interface. The project prefers the simplicity of direct includes over header coupling optimization through forward declarations.
Applied to files:
src/log_surgeon/Constants.hppsrc/log_surgeon/Lalr1Parser.tppsrc/log_surgeon/Lexer.hppsrc/log_surgeon/LogParserOutputBuffer.hpp
📚 Learning: 2025-08-18T12:00:40.545Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 151
File: tests/test-buffer-parser.cpp:867-870
Timestamp: 2025-08-18T12:00:40.545Z
Learning: For schema documentation in code comments, multi-line rules should be supported and are expected since complex regex patterns may not fit comfortably on a single line within linting constraints. The "..." notation or backslash continuation can be used to indicate pattern continuation in documentation blocks.
Applied to files:
docs/schema.mdsrc/log_surgeon/LogParser.cpp
📚 Learning: 2025-08-15T00:13:46.717Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-wildcard-expression.cpp:0-0
Timestamp: 2025-08-15T00:13:46.717Z
Learning: In src/log_surgeon/wildcard_query_parser/WildcardExpression.cpp, the escaping logic treats consecutive backslashes as escape-literal pairs, not as individual escape characters. For input like R"(a\*b\?c\\)", position 7 is the escape character and position 8 is the literal backslash being escaped, so only position 7 should have is_escape() return true.
Applied to files:
src/log_surgeon/SchemaParser.cpp
📚 Learning: 2025-08-26T09:31:28.567Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/ExpressionView.hpp:46-70
Timestamp: 2025-08-26T09:31:28.567Z
Learning: In src/log_surgeon/wildcard_query_parser, escaped wildcards (\* or \?) lose their wildcard nature entirely and become literal characters, while escaped delimiters retain their delimiter property in the delimiter table regardless of being escaped. This means boundary detection logic correctly treats escaped wildcards as non-wildcards but escaped delimiters as still being delimiters.
Applied to files:
src/log_surgeon/SchemaParser.cpp
📚 Learning: 2025-08-15T01:04:03.681Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/ExpressionView.cpp:11-11
Timestamp: 2025-08-15T01:04:03.681Z
Learning: In y-scope/log-surgeon C++ codebase, prefer explicit includes in implementation files (.cpp) for types that are directly used, even when those headers are already transitively included through the corresponding header file. This makes dependencies explicit and improves maintainability.
Applied to files:
src/log_surgeon/LogEvent.cppsrc/log_surgeon/LogParserOutputBuffer.hpp
📚 Learning: 2024-10-24T15:54:19.228Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:99-105
Timestamp: 2024-10-24T15:54:19.228Z
Learning: In `src/log_surgeon/finite_automata/RegexNFA.hpp`, it's acceptable to have constructors without the `explicit` specifier. Do not suggest adding `explicit` to constructors in this file.
Applied to files:
src/log_surgeon/LogParser.cppsrc/log_surgeon/Lexer.hpptests/test-nfa.cpp
📚 Learning: 2024-10-11T16:16:02.866Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 38
File: src/log_surgeon/finite_automata/RegexAST.hpp:663-669
Timestamp: 2024-10-11T16:16:02.866Z
Learning: In `RegexASTLiteral::serialize()`, to properly handle Unicode characters beyond the ASCII range, cast `m_character` to `char32_t` and use `U"{}{}"` in `fmt::format`.
Applied to files:
src/log_surgeon/LogParser.cpp
📚 Learning: 2024-10-24T15:54:35.193Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:442-456
Timestamp: 2024-10-24T15:54:35.193Z
Learning: In the C++ file `src/log_surgeon/finite_automata/RegexNFA.hpp`, for the `RegexNFA::serialize()` function, prioritize code clarity over efficiency when handling string operations.
Applied to files:
src/log_surgeon/LogParser.cpptests/test-nfa.cpp
📚 Learning: 2024-11-27T22:25:35.608Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T22:25:35.608Z
Learning: In the `RegisterHandler` class in `src/log_surgeon/finite_automata/RegisterHandler.hpp`, the methods `add_register` and `append_position` rely on `emplace_back` and `m_prefix_tree.insert` to handle exceptions correctly and maintain consistent state without requiring additional exception handling.
Applied to files:
src/log_surgeon/Lexer.hpp
📚 Learning: 2024-11-27T21:56:13.425Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T21:56:13.425Z
Learning: In the `log_surgeon` project, header guards in C++ header files should include `_HPP` at the end to match the filename. For example, in `RegisterHandler.hpp`, the header guard should be `LOG_SURGEON_FINITE_AUTOMATA_REGISTER_HANDLER_HPP`.
Applied to files:
src/log_surgeon/Lexer.hppsrc/log_surgeon/LogParserOutputBuffer.hpp
📚 Learning: 2024-11-13T22:25:54.168Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: tests/test-tag.cpp:10-10
Timestamp: 2024-11-13T22:25:54.168Z
Learning: In the log-surgeon codebase (C++), particularly in the finite automata components involving the `Tag` class (`src/log_surgeon/finite_automata/Tag.hpp`), it's important to ensure that `Tag*` pointers in other objects cannot be `nullptr`. Test cases should focus on validating that these `Tag*` pointers are not null where they are used, and handle `nullptr` appropriately.
Applied to files:
tests/test-reader-parser.cpp
📚 Learning: 2025-11-06T11:16:52.917Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 184
File: tests/test-reader-parser.cpp:150-154
Timestamp: 2025-11-06T11:16:52.917Z
Learning: In Catch2 test files for the y-scope/log-surgeon repository, an explicit `if (false == optional.has_value()) { return; }` check may be required after `REQUIRE(optional.has_value())` to prevent clang-tidy errors, even though the code is unreachable at runtime. clang-tidy's static analyzer doesn't recognize that REQUIRE aborts execution and may flag unsafe optional access without the explicit check.
Applied to files:
tests/test-reader-parser.cpptests/test-buffer-parser.cpp
📚 Learning: 2025-08-15T12:07:58.626Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-expression-view.cpp:7-7
Timestamp: 2025-08-15T12:07:58.626Z
Learning: In tests/test-expression-view.cpp, the `<catch2/catch_message.hpp>` header is required for clang-tidy to pass, even though it may not be directly used in the visible code.
Applied to files:
tests/test-reader-parser.cpp
🧬 Code graph analysis (8)
src/log_surgeon/LogEvent.hpp (2)
src/log_surgeon/LogEvent.cpp (2)
nodiscard(33-35)nodiscard(37-48)src/log_surgeon/LogParserOutputBuffer.hpp (8)
nodiscard(37-37)nodiscard(43-45)nodiscard(49-49)nodiscard(53-55)nodiscard(57-59)nodiscard(63-63)nodiscard(67-67)nodiscard(69-69)
tests/test-schema.cpp (1)
src/log_surgeon/LogParser.hpp (1)
schema_ast(158-158)
src/log_surgeon/Token.cpp (1)
src/log_surgeon/Token.hpp (1)
start_pos(63-66)
tests/test-reader-parser.cpp (1)
src/log_surgeon/Lexer.hpp (2)
capture(189-190)capture(218-219)
src/log_surgeon/LogParserOutputBuffer.hpp (2)
src/log_surgeon/LogEvent.cpp (2)
nodiscard(33-35)nodiscard(37-48)src/log_surgeon/LogEvent.hpp (4)
nodiscard(53-55)nodiscard(60-60)nodiscard(66-69)nodiscard(85-85)
tests/test-nfa.cpp (1)
src/log_surgeon/Schema.hpp (1)
var_schema(40-40)
tests/test-buffer-parser.cpp (2)
src/log_surgeon/Lexer.hpp (2)
capture(189-190)capture(218-219)tests/test-reader-parser.cpp (3)
parse_and_validate(66-70)parse_and_validate(78-173)parse_and_validate(78-82)
tests/test-dfa.cpp (1)
src/log_surgeon/Schema.hpp (1)
var_schema(40-40)
🪛 Cppcheck (2.18.0)
src/log_surgeon/Parser.tpp
[information] 18-18: Include file
(missingIncludeSystem)
[information] 19-19: Include file
(missingIncludeSystem)
[information] 20-20: Include file
(missingIncludeSystem)
[information] 21-21: Include file
(missingIncludeSystem)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
tests/test-schema.cpp
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[style] 220-220: The function 'scan_with_wildcard' is never used.
(unusedFunction)
src/log_surgeon/SchemaParser.cpp
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
src/log_surgeon/Token.cpp
[information] 4-4: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 4-4: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 4-4: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 4-4: Include file
(missingIncludeSystem)
[information] 4-4: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[style] 39-39: The function 'get_sub_token' is never used.
(unusedFunction)
[style] 46-46: The function 'set' is never used.
(unusedFunction)
[style] 48-48: The function 'get_num_regs' is never used.
(unusedFunction)
examples/common.cpp
[style] 35-35: The function 'set_has_header' is never used.
(unusedFunction)
[style] 37-37: The function 'has_header' is never used.
(unusedFunction)
[style] 39-39: The function 'set_timestamp' is never used.
(unusedFunction)
[style] 38-38: The function 'append_position' is never used.
(unusedFunction)
src/log_surgeon/Lalr1Parser.tpp
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
src/log_surgeon/LogEvent.cpp
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingIncludeSystem)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[performance] 35-35: Searching before insertion is not necessary. Instead of 'm_lexer.m_symbol_id[name]=m_lexer.m_symbol_id.size()' consider using 'm_lexer.m_symbol_id.try_emplace(name, m_lexer.m_symbol_id.size());'.
(stlFindInsert)
[style] 37-37: The function 'is_multi_valued' is never used.
(unusedFunction)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[style] 53-53: The function 'static_size' is never used.
(unusedFunction)
[style] 64-64: The function 'set_active_buffer' is never used.
(unusedFunction)
[style] 73-73: The function 'get_mutable_active_buffer' is never used.
(unusedFunction)
[style] 76-76: The function 'add_token_group' is never used.
(unusedFunction)
[style] 84-84: The function 'add_token_chain' is never used.
(unusedFunction)
[style] 53-53: The function 'get_variables' is never used.
(unusedFunction)
[style] 63-63: The function 'get_curr_token' is never used.
(unusedFunction)
[style] 68-68: The function 'add_delimiter' is never used.
(unusedFunction)
[style] 69-69: The function 'get_first_char' is never used.
(unusedFunction)
[style] 67-67: The function 'is_root' is never used.
(unusedFunction)
[style] 74-74: The function 'get_parent_id_unsafe' is never used.
(unusedFunction)
[style] 81-81: The function 'get_position' is never used.
(unusedFunction)
[style] 73-73: The function 'token_cast' is never used.
(unusedFunction)
[style] 81-81: The function 'non_terminal_cast' is never used.
(unusedFunction)
[style] 85-85: The function 'get_parser_ast' is never used.
(unusedFunction)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[style] 110-110: The function 'add_production' is never used.
(unusedFunction)
src/log_surgeon/LogParser.cpp
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[style] 73-73: The function 'get_mutable_active_buffer' is never used.
(unusedFunction)
[style] 76-76: The function 'add_token_group' is never used.
(unusedFunction)
[style] 84-84: The function 'add_token_chain' is never used.
(unusedFunction)
[style] 68-68: The function 'add_delimiter' is never used.
(unusedFunction)
[style] 67-67: The function 'is_root' is never used.
(unusedFunction)
[style] 74-74: The function 'get_parent_id_unsafe' is never used.
(unusedFunction)
[style] 81-81: The function 'get_position' is never used.
(unusedFunction)
[style] 73-73: The function 'token_cast' is never used.
(unusedFunction)
[style] 81-81: The function 'non_terminal_cast' is never used.
(unusedFunction)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[style] 120-120: The function 'get_log_event_view' is never used.
(unusedFunction)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[style] 156-156: The function 'get_delim_table' is never used.
(unusedFunction)
[style] 164-164: The function 'get_dfa' is never used.
(unusedFunction)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
tests/test-reader-parser.cpp
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[style] 119-119: The function 'set_type_ids' is never used.
(unusedFunction)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
tests/test-nfa.cpp
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
tests/test-buffer-parser.cpp
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[style] 94-94: The function 'get_input_pos' is never used.
(unusedFunction)
[style] 97-97: The function 'release_parser_ast' is never used.
(unusedFunction)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[style] 428-428: The function 'negate' is never used.
(unusedFunction)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
tests/test-dfa.cpp
[information] Limiting analysis of branches. Use --check-level=exhaustive to analyze all branches.
(normalCheckLevelMaxBranches)
🪛 LanguageTool
examples/schema.txt
[uncategorized] ~47-~47: Possible missing comma found.
Context: ...{2} \d{2} \d{2}:\d{2}:\d{2}) // E.g. 01-21 11:56:42.392 header:(?\d{2}...
(AI_HYDRA_LEO_MISSING_COMMA)
docs/schema.md
[uncategorized] ~91-~91: Loose punctuation mark.
Context: ...pture within a header rule. * $PREFIX$, $SUFFIX$, and $TIMESTAMP-PATTERN$ a...
(UNLIKELY_OPENING_PUNCTUATION)
🔇 Additional comments (44)
src/log_surgeon/Constants.hpp (1)
38-38: LGTM! Token vocabulary updated to header-based terminology.The replacement of timestamp-related tokens with
TokenHeaderandcTokenHeaderis consistent with the broader architectural shift to header-based log event detection.Also applies to: 47-47
src/log_surgeon/Parser.tpp (1)
17-17: LGTM! Symbol mappings correctly updated.The bidirectional mappings for
TokenHeaderare correctly established, consistent with the pattern used for other token types.Also applies to: 25-25
src/log_surgeon/Lalr1Parser.tpp (1)
62-62: LGTM! Terminal symbol set updated correctly.The addition of
TokenHeaderto the terminals set aligns with the vocabulary changes in Constants.hpp and Parser.tpp.src/log_surgeon/SchemaParser.cpp (2)
621-621: LGTM! Core fix for non-escaped hyphens outside character ranges.This production allows hyphens to be treated as literals when they appear outside character classes (outside
[...]). Inside character classes, hyphens are still handled by theLiteralRangeproduction at line 641. This change aligns log-surgeon's behavior with standard regex semantics.Based on learnings, the previous behavior required escaping hyphens everywhere, which differed from standard regex behavior.
666-669: LGTM! Refactored to consolidate escape literal productions.The loop-based approach dynamically generates escape literal productions for all special regex characters (including Dash), ensuring backward compatibility—users can still escape hyphens if desired.
tests/test-dfa.cpp (1)
192-192: LGTM! Test updated to demonstrate non-escaped hyphen syntax.The removal of the backslash escape before the hyphen demonstrates the new behavior. The hyphen outside the character class no longer requires escaping, aligning with standard regex conventions. The expected DFA serialization remains unchanged, confirming the behavioral equivalence.
tests/test-nfa.cpp (1)
192-192: LGTM! Test updated consistently with DFA test.The removal of the backslash escape demonstrates the new hyphen syntax, consistent with the corresponding change in test-dfa.cpp.
examples/schema.txt (2)
1-1: LGTM! Schema updated to header-based timestamp capture.The migration from
timestamp:rules toheader:(?<timestamp>...)patterns is consistent with the architectural shift to header-based log event detection. The(?<timestamp>...)named capture provides the timestamp value for each event.Also applies to: 3-3, 5-5, 7-7, 9-9, 12-12, 14-14, 16-16, 20-20, 22-22, 24-24, 26-26, 28-28, 30-30, 32-32, 34-34, 38-38, 40-40, 42-42, 44-44, 46-46, 48-48, 50-50
59-59: LGTM! Demonstrates non-escaped hyphen syntax.The removal of the backslash escape before the hyphen in the integer pattern demonstrates the new behavior, consistent with the test updates.
src/log_surgeon/LogEvent.hpp (1)
72-74: LGTM! API updated to return timestamp as optional string.The change from
Token*tostd::optional<std::string> const&aligns with the header-based timestamp representation. The implementation correctly delegates tom_log_output_buffer->get_timestamp()(as shown in the relevant code snippets).Note: This is a breaking API change for existing users of
get_timestamp().src/log_surgeon/Token.hpp (1)
57-71: LGTM!The new
get_sub_token()andget_first_char()methods are well-documented and correctly implemented. The exception documentation forget_sub_tokenclearly states thestd::out_of_rangebehavior.examples/common.cpp (1)
35-41: LGTM!The code correctly adapts to the new optional-based timestamp API. The
has_value()check and subsequentvalue()access follow proper optional usage patterns.src/log_surgeon/Token.cpp (1)
39-59: LGTM!The
get_sub_token()implementation correctly validates bounds:
- Checks for negative positions (appropriate since
position_tis signed).- Validates against buffer size with correct semantics (
end_pos > m_buffer.size()allows one-past-the-end).- Returns a new Token with the sub-span while preserving relevant metadata.
tests/test-schema.cpp (3)
148-149: LGTM!These changes correctly demonstrate the PR's objective: hyphens outside character ranges no longer require escaping. Based on learnings, this was previously required but is now optional.
198-199: LGTM!Good improvement to test structure: moving
Schema schemainside the loop ensures clean state per iteration, andCAPTURE(var_name)improves diagnostic output if a test fails.
220-251: Good test coverage for the hyphen feature.This test case thoroughly validates the new non-escaped hyphen behaviour:
ID-(?<id>\d{4})— hyphen outside character range (non-escaped)\d{4}-\d{4}-\d{4}— multiple hyphens outside ranges (non-escaped)[a\-z]— hyphen inside character class (still requires escaping)The AST type verification (
RegexASTCatBytevsRegexASTGroupByte) appropriately confirms correct parsing.tests/test-reader-parser.cpp (3)
116-120: LGTM!The timestamp validation correctly uses the new optional-based API. The
false == event.get_timestamp().has_value()pattern follows the coding guidelines.
122-125: LGTM!The header presence check correctly uses the new
has_header()method, and thefalse == ...has_header()pattern follows the coding guidelines.
158-158: LGTM!The simplified destructuring reflects the updated
get_reg_ids_from_capture()API that now returnsstd::pair<reg_id_t, reg_id_t>directly instead of an optional.src/log_surgeon/Lexer.hpp (2)
9-9: LGTM! Correct include for the new exception usage.The addition of
<stdexcept>is appropriate for thestd::runtime_errorexceptions thrown inget_reg_ids_from_capture.
216-237: LGTM! Improved error handling with explicit exceptions.The change from returning
std::optionalto throwingstd::runtime_errorwith descriptive messages aligns well with the codebase's intent that missing register IDs for captures indicates an internal programming error. Based on learnings, this is the desired behaviour rather than defensive error handling. The error messages clearly identify the capture name and the specific missing element (tag ids, start reg id, or end reg id).src/log_surgeon/LogEvent.cpp (5)
5-5: LGTM! Explicit include for the new return type.The addition of
<optional>is appropriate sinceget_timestamp()now returnsstd::optional<std::string> const&.
33-35: LGTM! Return type change aligns with the new header-based architecture.The return type change from
Token*tostd::optional<std::string> const&provides a safer, value-based API that avoids raw pointer semantics.
40-67: LGTM! Consistent migration to header-based logic.The changes correctly replace timestamp-based checks with
has_header()checks for determining buffer boundaries and first-token calculations. The logic is consistent with the header-based parsing model.
72-85: LGTM! Simplified capture register ID retrieval.The code now uses structured bindings to directly destructure the pair returned by
get_reg_ids_from_capture, which is cleaner since the function now throws on error rather than returning an optional.
107-115: LGTM! Constructor updated for header-based semantics.The
LogEventcopy constructor correctly propagates the header flag and timestamp using the new setter methods.docs/schema.md (3)
19-22: LGTM! Clear documentation of the header-based parsing model.The introduction of headers as log event delimiters with optional timestamp capture is well explained.
76-106: LGTM! Comprehensive Headers section.The Headers section clearly documents:
- Multiple header support
- Optional timestamp captures
- Behaviour when timestamps are parsed or omitted
- Reserved naming for timestamp captures
- Event boundary detection logic
115-119: Hyphen examples in schema documentation are actually escaped, not unescaped.The examples shown in lines 115-119 use
\-{0,1}(escaped hyphens), not-{0,1}as claimed in the review. Theintandfloatrules both properly escape hyphens:int:\-{0,1}\d+andfloat:\-{0,1}\d+\.\d+. This aligns with Log-Surgeon's regex requirements, which mandate escaped hyphens even outside character classes. The documentation examples are correct as-is.Likely an incorrect or invalid review comment.
src/log_surgeon/LogParserOutputBuffer.hpp (3)
4-5: LGTM! Include for optional support.The addition of
<optional>is necessary for the newm_timestampmember.
28-45: LGTM! Clean API for header and timestamp management.The new accessors provide a clear interface for managing the header flag and timestamp value. The
set_timestampacceptingstd::optional<std::string_view>and storing asstd::optional<std::string>allows for efficient setting while maintaining ownership.
72-73: LGTM! Private members for header-based parsing.The replacement of
m_has_timestampwithm_has_headerandm_timestampprovides more flexibility in the header-based parsing model where a header may or may not contain a timestamp.tests/test-buffer-parser.cpp (5)
92-101: LGTM! Updated timestamp and header validation.The test validation correctly uses the new optional-based timestamp API and
has_header()for event offset calculation.
136-136: LGTM! Simplified capture register retrieval.Using structured bindings to directly destructure the pair is cleaner now that the function throws on error.
426-431: LGTM! Schema updated for header-based parsing with non-escaped hyphens.The schema correctly uses
header:with timestamp capture and non-escaped hyphens inint:andfloat:patterns, aligning with the PR objective to allow non-escaped hyphens outside of regex ranges.
519-519: LGTM! Non-escaped hyphen in int pattern.The change from
\-{0,1}to-{0,1}aligns with the PR objective. Based on learnings, hyphens previously required escaping even outside character classes in log-surgeon, but this PR removes that requirement.
1140-1410: LGTM! Comprehensive multiple_headers test case.This test case thoroughly validates:
- Multiple header patterns with different capture configurations
- Optional timestamp captures within headers
- Alternation patterns
(hex|timestamp)with captures- Various combinations of captured and uncaptured timestamps
- Correct position tracking across nine different events
The test provides good coverage for the new header-based parsing functionality.
src/log_surgeon/LogParser.cpp (7)
67-84: Rule preprocessing logic is well-structured.The differentiation between header rules (which can start at file beginning or after newline) and other rules (which require a delimiter prefix) is correct. The use of move semantics and RAII is appropriate.
103-103: LGTM: Cleaner function signature.Removing the redundant
LogParser::qualification improves readability.
183-188: Message boundary detection logic is correct.The boolean condition correctly identifies the start of the next message in both header-based and non-header-based modes. The condition properly follows the coding guideline using
false == output_buffer->has_header()rather than negation.
193-200: LGTM: Correct newline handling for non-header mode.The check correctly uses
false == output_buffer->has_header()following the coding guideline, and the logic appropriately handles newline tokens in non-header mode.
232-252: LGTM: Metadata generation logic is sound.The start offset calculation, token iteration, and multiline detection logic are all correct. The code properly follows the coding guideline using
false == m_log_event_view->m_log_output_buffer->has_header().
67-247: Inconsistency between PR objectives and file changes.The PR objectives state this PR is about allowing "non-escaped hyphens outside of regex ranges," but all visible changes in this file relate to header-based parsing architecture (replacing timestamp-based parsing with header-based parsing). Since the PR description mentions it's "built on top of PR #199," these changes likely originate from the base PR rather than the hyphen-handling feature.
This inconsistency makes it difficult to distinguish which changes are intended for this PR versus inherited from the base. If the hyphen-handling changes are in other files, this file may just contain prerequisite changes from PR #199.
147-164: Use the boolean negation style consistent with the codebase guidelines.Lines 156-157 correctly check for negative positions—
position_tisint32_t, and negative values are valid per the PrefixTree documentation to indicate unmatched captures. However, align the style with the coding guidelines and LogEvent.cpp: usefalse == starts.empty() && false == ends.empty()instead of.empty() || .empty()checks, and prefer-1 < starts[0]overstarts[0] < 0for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment about the doc, but otherwise lgtm. I'll look again after we merge since I might have missed something. Title looks fine to me too.
(I copied the code rabbit comments to the other PR since they should be applied there.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/schema.md (1)
163-165: Clarify updated hyphen escaping rules.The note about special characters states that hyphens must always be escaped, but this contradicts the PR's feature allowing non-escaped hyphens outside character ranges. The documentation should clarify that hyphens only require escaping when used within character ranges (e.g.,
[a-z]), not when appearing outside them.🔎 Proposed clarification for hyphen escaping rules
* Special characters include: ( ) * + - . [ \ ] ^ { | } < > ? - Escape these with '\' when used literally (e.g., \., \(, \\). - - Special characters must be escaped even in ranges. + - When used within character ranges (e.g., [a-z]), hyphens must be escaped or positioned at the start/end of the range. + - Hyphens outside character ranges do not require escaping (e.g., `-{0,1}` is valid).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
docs/schema.md(4 hunks)src/log_surgeon/Token.cpp(2 hunks)src/log_surgeon/Token.hpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,h,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
src/log_surgeon/Token.cppsrc/log_surgeon/Token.hpp
🧠 Learnings (17)
📓 Common learnings
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 151
File: tests/test-buffer-parser.cpp:959-959
Timestamp: 2025-08-18T16:46:05.687Z
Learning: In the Log-Surgeon codebase, hyphens need to be escaped with backslashes in regular expressions even outside character classes, which differs from standard regex behavior where hyphens don't require escaping outside character classes.
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.cpp:159-177
Timestamp: 2025-08-25T20:44:48.955Z
Learning: In the log-surgeon codebase, thread safety issues with global state like NonTerminal::m_next_children_start should be addressed comprehensively in dedicated PRs rather than fixed piecemeal in individual feature PRs. The user SharafMohamed prefers to defer such systemic architectural issues to separate PRs.
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 199
File: src/log_surgeon/LogParser.cpp:195-195
Timestamp: 2025-12-03T16:22:52.313Z
Learning: In y-scope/log-surgeon, when reviewing code, do not suggest fixing style issues (such as C-style casts vs. static_cast) in lines that were not modified by the PR. The maintainer prefers to keep PRs focused on their intended changes and defer broader style refactoring to dedicated refactor PRs.
📚 Learning: 2025-10-22T15:40:29.992Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...
Applied to files:
docs/schema.md
📚 Learning: 2025-08-18T12:00:40.545Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 151
File: tests/test-buffer-parser.cpp:867-870
Timestamp: 2025-08-18T12:00:40.545Z
Learning: For schema documentation in code comments, multi-line rules should be supported and are expected since complex regex patterns may not fit comfortably on a single line within linting constraints. The "..." notation or backslash continuation can be used to indicate pattern continuation in documentation blocks.
Applied to files:
docs/schema.md
📚 Learning: 2025-12-03T16:38:15.354Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 199
File: src/log_surgeon/LogParser.cpp:154-155
Timestamp: 2025-12-03T16:38:15.354Z
Learning: In `src/log_surgeon/LogParser.cpp` and related parsing code, when `get_reg_ids_from_capture(capture)` returns an optional, calling `.value()` directly without checking is intentional for captures that should always have register IDs (e.g., `"timestamp"` captures). If the optional is nullopt, it indicates an internal programming error/invariant violation, and throwing `std::bad_optional_access` is the desired behavior rather than defensive error handling with continue/return.
Applied to files:
src/log_surgeon/Token.cppsrc/log_surgeon/Token.hpp
📚 Learning: 2025-11-06T11:16:52.917Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 184
File: tests/test-reader-parser.cpp:150-154
Timestamp: 2025-11-06T11:16:52.917Z
Learning: In Catch2 test files for the y-scope/log-surgeon repository, an explicit `if (false == optional.has_value()) { return; }` check may be required after `REQUIRE(optional.has_value())` to prevent clang-tidy errors, even though the code is unreachable at runtime. clang-tidy's static analyzer doesn't recognize that REQUIRE aborts execution and may flag unsafe optional access without the explicit check.
Applied to files:
src/log_surgeon/Token.cppsrc/log_surgeon/Token.hpp
📚 Learning: 2025-08-15T12:07:58.626Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-expression-view.cpp:7-7
Timestamp: 2025-08-15T12:07:58.626Z
Learning: In tests/test-expression-view.cpp, the `<catch2/catch_message.hpp>` header is required for clang-tidy to pass, even though it may not be directly used in the visible code.
Applied to files:
src/log_surgeon/Token.cppsrc/log_surgeon/Token.hpp
📚 Learning: 2025-08-08T10:17:43.495Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:17:43.495Z
Learning: In src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp/.cpp (C++), do not suggest defaulting operator<=>. The project prefers a custom out-of-line comparator that explicitly handles the bool member (via explicit cast) to avoid implicit conversions; keep the current manual implementation.
Applied to files:
src/log_surgeon/Token.cppsrc/log_surgeon/Token.hpp
📚 Learning: 2025-08-08T10:23:06.281Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp:80-88
Timestamp: 2025-08-08T10:23:06.281Z
Learning: In y-scope/log-surgeon (C++), small function definitions are allowed to remain inline in headers. For src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp, do not suggest moving small methods like QueryInterpretation::append_variable_token to the .cpp for “consistency” in future reviews.
Applied to files:
src/log_surgeon/Token.cppsrc/log_surgeon/Token.hpp
📚 Learning: 2025-08-08T10:00:20.963Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:00:20.963Z
Learning: In src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp (C++), do not default the comparison operators: the class stores std::vector<std::variant<StaticQueryToken, VariableQueryToken>>, which yields only a weak ordering. A custom operator<=> that maps the variant’s weak ordering to std::strong_ordering is required; avoid suggesting defaulting there in future reviews.
Applied to files:
src/log_surgeon/Token.cppsrc/log_surgeon/Token.hpp
📚 Learning: 2024-11-13T22:25:54.168Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 48
File: tests/test-tag.cpp:10-10
Timestamp: 2024-11-13T22:25:54.168Z
Learning: In the log-surgeon codebase (C++), particularly in the finite automata components involving the `Tag` class (`src/log_surgeon/finite_automata/Tag.hpp`), it's important to ensure that `Tag*` pointers in other objects cannot be `nullptr`. Test cases should focus on validating that these `Tag*` pointers are not null where they are used, and handle `nullptr` appropriately.
Applied to files:
src/log_surgeon/Token.cppsrc/log_surgeon/Token.hpp
📚 Learning: 2025-08-15T01:04:03.681Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/ExpressionView.cpp:11-11
Timestamp: 2025-08-15T01:04:03.681Z
Learning: In y-scope/log-surgeon C++ codebase, prefer explicit includes in implementation files (.cpp) for types that are directly used, even when those headers are already transitively included through the corresponding header file. This makes dependencies explicit and improves maintainability.
Applied to files:
src/log_surgeon/Token.cpp
📚 Learning: 2024-11-27T21:56:13.425Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 56
File: src/log_surgeon/finite_automata/RegisterHandler.hpp:0-0
Timestamp: 2024-11-27T21:56:13.425Z
Learning: In the `log_surgeon` project, header guards in C++ header files should include `_HPP` at the end to match the filename. For example, in `RegisterHandler.hpp`, the header guard should be `LOG_SURGEON_FINITE_AUTOMATA_REGISTER_HANDLER_HPP`.
Applied to files:
src/log_surgeon/Token.cppsrc/log_surgeon/Token.hpp
📚 Learning: 2025-05-05T14:55:34.455Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 106
File: src/log_surgeon/Lalr1Parser.tpp:661-665
Timestamp: 2025-05-05T14:55:34.455Z
Learning: The log-surgeon codebase follows a design approach where function contracts (like `ErrorCode::Success` guaranteeing a valid token) are trusted, and contract violations are allowed to throw exceptions rather than being explicitly checked at every call site.
Applied to files:
src/log_surgeon/Token.cpp
📚 Learning: 2024-11-18T16:45:46.073Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.073Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Applied to files:
src/log_surgeon/Token.hpp
📚 Learning: 2025-08-26T10:06:22.914Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.hpp:9-11
Timestamp: 2025-08-26T10:06:22.914Z
Learning: In y-scope/log-surgeon project, it's acceptable to include headers like log_surgeon/Lexer.hpp directly in header files rather than using forward declarations, even when only references are used in the interface. The project prefers the simplicity of direct includes over header coupling optimization through forward declarations.
Applied to files:
src/log_surgeon/Token.hpp
📚 Learning: 2024-11-02T09:18:31.046Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-02T09:18:31.046Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
Applied to files:
src/log_surgeon/Token.hpp
📚 Learning: 2025-12-09T21:17:27.546Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 197
File: src/log_surgeon/finite_automata/RegexAST.hpp:1011-1019
Timestamp: 2025-12-09T21:17:27.546Z
Learning: In C++ files under src/log_surgeon in the y-scope/log-surgeon repository, for const pointer parameters in null checks, Yoda conditions (nullptr == ptr) are not required; standard comparison order (ptr == nullptr) is acceptable since const prevents accidental assignment. Apply this pattern to header and related source files within src/log_surgeon.
Applied to files:
src/log_surgeon/Token.hpp
🧬 Code graph analysis (1)
src/log_surgeon/Token.cpp (1)
src/log_surgeon/Token.hpp (1)
start_pos(62-65)
🪛 Cppcheck (2.18.0)
src/log_surgeon/Token.cpp
[information] 4-4: Include file
(missingIncludeSystem)
[information] 4-4: Include file
(missingIncludeSystem)
[information] 4-4: Include file
(missingIncludeSystem)
[information] 4-4: Include file
(missingIncludeSystem)
[information] 4-4: Include file
(missingIncludeSystem)
[style] 38-38: The function 'get_sub_token' is never used.
(unusedFunction)
[style] 46-46: The function 'set' is never used.
(unusedFunction)
[style] 38-38: The function 'append_position' is never used.
(unusedFunction)
[style] 48-48: The function 'get_num_regs' is never used.
(unusedFunction)
🪛 LanguageTool
docs/schema.md
[uncategorized] ~91-~91: Loose punctuation mark.
Context: ...pture within a header rule. * $PREFIX$, $SUFFIX$, and $TIMESTAMP-PATTERN$ a...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: lint (macos-15)
- GitHub Check: lint (ubuntu-24.04)
- GitHub Check: build (ubuntu-22.04, debug)
- GitHub Check: build (macos-15, release)
- GitHub Check: build (ubuntu-24.04, release)
- GitHub Check: build (ubuntu-22.04, release)
- GitHub Check: build (ubuntu-24.04, debug)
🔇 Additional comments (2)
src/log_surgeon/Token.hpp (1)
56-65: LGTM!The
get_sub_tokenmethod declaration is well-documented with clear parameter descriptions and exception specification.src/log_surgeon/Token.cpp (1)
4-4: LGTM!Correct include for
std::out_of_range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
examples/CMakeLists.txt(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 151
File: tests/test-buffer-parser.cpp:959-959
Timestamp: 2025-08-18T16:46:05.687Z
Learning: In the Log-Surgeon codebase, hyphens need to be escaped with backslashes in regular expressions even outside character classes, which differs from standard regex behavior where hyphens don't require escaping outside character classes.
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: src/log_surgeon/wildcard_query_parser/Query.cpp:159-177
Timestamp: 2025-08-25T20:44:48.955Z
Learning: In the log-surgeon codebase, thread safety issues with global state like NonTerminal::m_next_children_start should be addressed comprehensively in dedicated PRs rather than fixed piecemeal in individual feature PRs. The user SharafMohamed prefers to defer such systemic architectural issues to separate PRs.
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 199
File: src/log_surgeon/LogParser.cpp:195-195
Timestamp: 2025-12-03T16:22:52.313Z
Learning: In y-scope/log-surgeon, when reviewing code, do not suggest fixing style issues (such as C-style casts vs. static_cast) in lines that were not modified by the PR. The maintainer prefers to keep PRs focused on their intended changes and defer broader style refactoring to dedicated refactor PRs.
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build (ubuntu-22.04, release)
- GitHub Check: build (ubuntu-22.04, debug)
- GitHub Check: build (macos-15, release)
- GitHub Check: build (macos-15, debug)
- GitHub Check: build (ubuntu-24.04, release)
- GitHub Check: build (ubuntu-24.04, debug)
- GitHub Check: lint (ubuntu-24.04)
- GitHub Check: lint (macos-15)
…amed/log-surgeon into log-event-boundary-new
Reference
Built on top of #199 so we can directly apply the hyphen fix to new test cases from #199.
Description
Primary Change
Cleanup
Validation Performed
Summary by CodeRabbit
Release Notes
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.