refactor: Rename the DfaState method next to get_dest_state.#86
refactor: Rename the DfaState method next to get_dest_state.#86davidlion merged 6 commits intoy-scope:mainfrom
DfaState method next to get_dest_state.#86Conversation
WalkthroughThis pull request introduces modifications across several components to improve the handling of lexer state transitions, capture groups, and tag/register retrieval. The updates include new header inclusions, added methods in the Lexer for managing captures and tags, and systematic changes replacing legacy state transition calls with a new method. Additionally, the LexicalRule state creation process is simplified, SchemaParser now uses a Reader with enhanced error handling, and minor API adjustments are made in finite automata classes. A test case has been updated to enforce immutability. Changes
Sequence Diagram(s)sequenceDiagram
participant L as Lexer
participant DS as DfaState
participant C as Control Flow
L->>DS: get_dest_state(next_char)
DS-->>L: Return destination state (const)
L->>C: Process state (scan, wildcard, or file start handling)
sequenceDiagram
participant SP as SchemaParser
participant R as Reader
participant EH as Error Handler
SP->>R: Request input (schema file/string)
R-->>SP: Return data or error (via errno)
SP->>EH: Process error (if any) and update reading position
Possibly related PRs
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
c866da0 to
2ff9f3c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
src/log_surgeon/finite_automata/NfaState.hpp (2)
88-88: Doc string alignment.
The new comment referencesSpontaneousTransition::serialize. Double-check that the intended usage matches the updated method names and that the doc lines remain accurate.
92-92: Returning a reference toboolis risky.
Returningbool const&can be confusing, as copying aboolis typically preferred. Consider returning by value instead to avoid possible misuse.- [[nodiscard]] auto is_accepting() const -> bool const& { return m_accepting; } + [[nodiscard]] auto is_accepting() const -> bool { return m_accepting; }src/log_surgeon/Lexer.hpp (1)
172-191: Consider simplifying the register ID retrieval logic.While the implementation is correct, it could be more concise by combining the optional checks:
auto get_reg_ids_from_capture_id(capture_id_t const capture_id ) const -> std::optional<std::pair<reg_id_t, reg_id_t>> { auto const optional_tag_id_pair{get_tag_id_pair_from_capture_id(capture_id)}; - if (false == optional_tag_id_pair.has_value()) { - return std::nullopt; - } - auto const [start_tag_id, end_tag_id]{optional_tag_id_pair.value()}; - - auto const optional_start_reg_id{get_reg_id_from_tag_id(start_tag_id)}; - if (false == optional_start_reg_id.has_value()) { - return std::nullopt; - } - - auto const optional_end_reg_id{get_reg_id_from_tag_id(end_tag_id)}; - if (false == optional_end_reg_id.has_value()) { - return std::nullopt; - } - - return {optional_start_reg_id.value(), optional_end_reg_id.value()}; + if (optional_tag_id_pair) { + auto const [start_tag_id, end_tag_id]{optional_tag_id_pair.value()}; + auto const optional_start_reg_id{get_reg_id_from_tag_id(start_tag_id)}; + auto const optional_end_reg_id{get_reg_id_from_tag_id(end_tag_id)}; + if (optional_start_reg_id && optional_end_reg_id) { + return {optional_start_reg_id.value(), optional_end_reg_id.value()}; + } + } + return std::nullopt; }tests/CMakeLists.txt (1)
12-14: Good architectural improvement!The replacement of
TaggedTransition.hppwithSpontaneousTransition.hppandTagOperation.hppsuggests a better separation of concerns, aligning with previous feedback about tagged transitions not sharing enough functionality.CMakeLists.txt (1)
81-81: LGTM! The build system changes reflect a good architectural improvement.The replacement of
TaggedTransition.hppwithSpontaneousTransition.hppandTagOperation.hppsuggests a better separation of concerns, which should make the codebase more maintainable.This change aligns well with the Single Responsibility Principle by separating transition logic from tag operations.
Also applies to: 83-83
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
CMakeLists.txt(1 hunks)src/log_surgeon/Lexer.hpp(2 hunks)src/log_surgeon/Lexer.tpp(8 hunks)src/log_surgeon/LexicalRule.hpp(2 hunks)src/log_surgeon/SchemaParser.cpp(1 hunks)src/log_surgeon/finite_automata/Dfa.hpp(1 hunks)src/log_surgeon/finite_automata/DfaState.hpp(2 hunks)src/log_surgeon/finite_automata/DfaStatePair.hpp(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp(7 hunks)src/log_surgeon/finite_automata/NfaState.hpp(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp(4 hunks)src/log_surgeon/finite_automata/SpontaneousTransition.hpp(1 hunks)src/log_surgeon/finite_automata/TagOperation.hpp(1 hunks)tests/CMakeLists.txt(1 hunks)tests/test-nfa.cpp(2 hunks)tests/test-register-handler.cpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
src/log_surgeon/finite_automata/Dfa.hpp
src/log_surgeon/finite_automata/DfaStatePair.hpp
src/log_surgeon/finite_automata/DfaState.hpp
tests/test-register-handler.cpp
tests/test-nfa.cpp
src/log_surgeon/finite_automata/SpontaneousTransition.hpp
src/log_surgeon/Lexer.hpp
src/log_surgeon/LexicalRule.hpp
src/log_surgeon/finite_automata/TagOperation.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/SchemaParser.cpp
src/log_surgeon/finite_automata/Nfa.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
src/log_surgeon/finite_automata/Dfa.hppsrc/log_surgeon/finite_automata/DfaStatePair.hppsrc/log_surgeon/finite_automata/DfaState.hpptests/test-register-handler.cpptests/test-nfa.cppsrc/log_surgeon/finite_automata/SpontaneousTransition.hppsrc/log_surgeon/Lexer.hppsrc/log_surgeon/LexicalRule.hppsrc/log_surgeon/finite_automata/TagOperation.hppsrc/log_surgeon/finite_automata/RegexAST.hppsrc/log_surgeon/finite_automata/NfaState.hppsrc/log_surgeon/SchemaParser.cppsrc/log_surgeon/finite_automata/Nfa.hpp🧠 Learnings (5)
tests/CMakeLists.txt (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
CMakeLists.txt (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
src/log_surgeon/Lexer.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build (ubuntu-22.04, Release)
- GitHub Check: build (ubuntu-22.04, Debug)
- GitHub Check: build (macos-latest, Release)
🔇 Additional comments (48)
src/log_surgeon/finite_automata/NfaState.hpp (12)
4-24: Header inclusions updated.
The new headers support spontaneous transitions and tagging logic. There are no obvious issues with ordering or redundant includes.
40-44: Constructor setsm_accepting.
This constructor explicitly marks the state as accepting. Ensure that any usage aligns with your design intent (i.e., that all states created with this constructor should indeed be accepting).
46-52: Chained constructor usage.
Callingadd_spontaneous_transitionwithin the constructor is clear and reduces duplication. This matches the new spontaneous transition model well.
54-65: Validate pointer lifetime.
add_spontaneous_transitionstoresdest_statein a vector. Confirm that the lifetime ofdest_stateis at least as long as the lifetime of thisNfaStateto avoid dangling references.
98-101: Providing read-only access to spontaneous transitions.
Returning a const reference is appropriate if the caller only needs inspection. This design is consistent with the rest of the interface.
103-105: Consistent style for byte transitions.
The getter mirrors the spontaneous transitions approach and returns a reference for outside inspection. This appears coherent with the new code structure.
107-107: Getter for tree transitions.
Exposingm_tree_transitionsvia a const reference is consistent with the pattern used for other transitions.
112-112: New memberm_spontaneous_transitions.
This vector neatly encapsulates the transitions. No concerns regarding the container choice.
178-179: Epsilon closure with spontaneous transitions.
This replaces older tagged or epsilon transitions effectively. The logic of pushingget_dest_state()onto the stack is straightforward and consistent.
188-190: Conditional acceptance string.
Buildingaccepting_tag_stringifm_acceptingis true is a convenient approach, ensuring minimal overhead when the state is not accepting.
201-208: Serializing spontaneous transitions.
All transitions are collected inserialized_spontaneous_transitions. Returningstd::nulloptupon failure is consistent with the rest of the design.
211-215: Overall serial format.
Including both byte transitions and spontaneous transitions in the same format string is clear and conforms to the new approach.src/log_surgeon/finite_automata/Nfa.hpp (12)
4-21: Refreshed includes and doc clarifications.
The updated includes and doc lines align with the new spontaneous transitions model and usage of optional return types.
43-44: New doc block fornew_state.
The explanation that the returned state has no spontaneous transitions is helpful.
47-52:new_accepting_statecreation
Creating an accepting state with a dedicated method is more explicit, improving readability. Confirm that all references to your older approach are removed.
54-62: Handling negative captures.
The doc string describing how tags are negated is consistent, andnew_state_from_negative_captureshelps clarify usage.
65-76: Positive capture method documentation.
Renamed to reflect “start and end states from positive capture.” The new name is clearer for future maintainers.
85-88: Serialization withstd::optional<std::string>.
Returning an optional indicates possible failure, which is a good practice. Double-check that all call sites handle the empty case gracefully.
96-96: Retrievingm_root.
The direct getter is concise. No issues found so far.
148-152: Implementation ofnew_accepting_state.
The approach of constructing theTypedNfaStatewithmatching_variable_idinsidem_statesis consistent and well contained.
155-170:new_state_from_negative_capturesimplementation details.
Moves all relevant tag IDs into the new state. This logic looks correct for negating multiple captures.
173-185:new_start_and_end_states_from_positive_captureusage.
Attaching transitions for start and end tags in a single step is a clean approach, avoiding confusion from older tagged transitions.
188-216: BFS traversal including spontaneous transitions.
Expanding BFS to queue spontaneous transitions ensures no states remain inaccessible. This approach closely mirrors the old epsilon concept.
221-238: Optional-basedserialize.
Aborting on the first serialization failure is an appropriate strategy. The rest of the method remains straightforward.src/log_surgeon/finite_automata/RegexAST.hpp (3)
24-24: Additional include forTagOperation.hpp.
All references to tag operations appear consolidated, which aids clarity in the capturing logic.
117-125: Refined negative captures approach.
The doc clarifies that negative captures generate a spontaneous transition to apply negation. This consistent terminology aligns well with the updated code.
911-942: Capture group NFA documentation.
The updated notes on spontaneous transitions clearly illustrate the new approach, from setting start tags to negating alternate tags, then ending with capturing final tags. This is an excellent update for maintainability.src/log_surgeon/LexicalRule.hpp (2)
4-5: LGTM!The addition of the
<vector>include is necessary as it's used in theget_captures()method.
45-48: LGTM!The refactoring improves code clarity by consolidating state creation and variable ID assignment into a single method call.
src/log_surgeon/finite_automata/TagOperation.hpp (2)
13-17: LGTM!The enum class is well-defined with clear, descriptive values.
25-31: LGTM!Efficient implementation of comparison operators using std::tie for member-wise comparison.
src/log_surgeon/finite_automata/SpontaneousTransition.hpp (2)
26-30: LGTM!Well-designed constructors with clear parameter handling and move semantics.
53-67: LGTM!The implementation is robust with:
- Proper error handling using std::optional
- Efficient use of C++20 ranges for transformation
- Clear string formatting using fmt
src/log_surgeon/finite_automata/DfaState.hpp (2)
49-49: LGTM!The method rename improves clarity and the const return type enhances const correctness.
60-75: LGTM!The implementation:
- Preserves the existing logic while improving const correctness
- Maintains efficient handling of both Byte and Utf8 state types
src/log_surgeon/finite_automata/DfaStatePair.hpp (2)
72-73: LGTM! Method renaming improves clarity.The renaming from
nexttoget_dest_statemakes the method's purpose more explicit and self-documenting.
74-74: LGTM! Condition checks follow coding guidelines.The conditions have been updated to follow the coding guidelines:
- Using
nullptr != xpattern for pointer checks- Using
false == xpattern for boolean expressionsAlso applies to: 76-76
tests/test-nfa.cpp (1)
65-67: LGTM! Improved error handling with optional type.The change to use
std::optionalfor the serialization result improves error handling by explicitly checking for success before using the value.tests/test-register-handler.cpp (1)
38-38: LGTM! Added const qualifier for immutability.Adding the
constqualifier toempty_handlerbetter expresses the intent that this test variable should not be modified.src/log_surgeon/finite_automata/Dfa.hpp (2)
8-8: LGTM! Added required include for std::stack.The addition of
<stack>header is necessary for usingstd::stackin the implementation.
62-62: LGTM! Condition checks follow coding guidelines.The conditions have been updated to follow the coding guidelines by using the
false == xpattern for boolean expressions.Also applies to: 117-117
src/log_surgeon/Lexer.hpp (2)
6-17: LGTM! The new header inclusions are appropriate.The addition of
<map>andNfaState.hppheaders supports the new functionality while maintaining proper header organization.
132-138: Well-implemented optional return with safe map access!The method properly checks for key existence before access and returns an optional value, following best practices.
src/log_surgeon/Lexer.tpp (4)
87-88: LGTM! The state transition renaming improves clarity.The change from
nexttoget_dest_statebetter describes the method's purpose.
218-219: Good consistency in state transition handling!The explicit type declaration and consistent method naming improve code clarity.
280-283: Well-implemented modern C++ practices!The use of
containsfor set membership and consistent state transition naming improve code quality.
340-340: LGTM! Consistent method naming.The change maintains consistency with the new state transition naming convention.
src/log_surgeon/SchemaParser.cpp (1)
3-23:Details
❓ Verification inconclusive
Verify the impact of replacing Lexer with Reader.
The switch from
Lexer.hpptoReader.hppsuggests a significant change in input handling. Please ensure all lexer functionality previously used is available through the Reader interface.
🏁 Script executed:
#!/bin/bash # Search for any remaining Lexer usage that might need updating rg -l "Lexer" --type cpp --type hppLength of output: 63
Action Required: Re-verify the Lexer Replacement Impact
The recent switch from
Lexer.hpptoReader.hppappears to be a significant refactoring of input handling. Our initial automated search for the term “Lexer” ran into issues with header file type recognition. Please re-run your search using the following command to accurately target both header and source files:#!/bin/bash # Re-run the search for "Lexer" in all .hpp and .cpp files rg -l Lexer -g '*.hpp' -g '*.cpp'
- Confirm that there are no lingering references to
Lexerin the codebase.- Verify that all functionality previously provided by the lexer is now properly supported by the
Readerinterface.
0a57542 to
db7d1e8
Compare
6374be7 to
47525dd
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
src/log_surgeon/Lexer.hpp(1 hunks)src/log_surgeon/Lexer.tpp(8 hunks)src/log_surgeon/LexicalRule.hpp(1 hunks)src/log_surgeon/SchemaParser.cpp(1 hunks)src/log_surgeon/finite_automata/DfaState.hpp(2 hunks)src/log_surgeon/finite_automata/DfaStatePair.hpp(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp(2 hunks)tests/test-register-handler.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- src/log_surgeon/finite_automata/DfaStatePair.hpp
- tests/test-register-handler.cpp
- src/log_surgeon/LexicalRule.hpp
- src/log_surgeon/finite_automata/DfaState.hpp
- src/log_surgeon/finite_automata/Nfa.hpp
- src/log_surgeon/Lexer.tpp
- src/log_surgeon/SchemaParser.cpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
src/log_surgeon/Lexer.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
src/log_surgeon/Lexer.hpp⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build (ubuntu-22.04, Release)
- GitHub Check: build (ubuntu-22.04, Debug)
🔇 Additional comments (2)
src/log_surgeon/Lexer.hpp (2)
132-138: LGTM! Consistent and safe map access pattern.The methods follow a consistent pattern of using
containsbeforeatfor safe map access, with clear documentation and proper error handling.Also applies to: 145-151, 159-165
172-191: LGTM! Well-structured implementation with proper error handling.The implementation:
- Follows coding guidelines using
false == expression- Uses structured bindings for clarity
- Properly handles all error cases
- Maintains a clear flow of operations
next to dest_state and next() to get_dest_state().next to dest_state and next() to get_dest_state().
next to dest_state and next() to get_dest_state().next to dest_state and next() to get_dest_state().
dfc0ab0 to
d69c450
Compare
next to dest_state and next() to get_dest_state().DfaState method next to get_dest_state.
References
Description
nextandnext()todest_stateandget_dest_state(), respectively.Validation performed
Summary by CodeRabbit
New Features
Refactor
Tests