refactor: Move epsilon_closure from Lexer to NfaState; Refactor nfa_to_dfa as Dfa's constructor.#71
Conversation
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/log_surgeon/finite_automata/NfaState.hpp (1)
97-100: Documentation could be more descriptive.While the documentation indicates the return value, it would be beneficial to include:
- The method's purpose
- Time complexity
- Whether the method modifies the state
Consider expanding the documentation:
/** + * Computes the epsilon closure of the current state using depth-first traversal. + * The epsilon closure includes all states reachable through epsilon transitions and + * tagged transitions (as per TODO comment in implementation). + * + * Time complexity: O(V + E) where V is the number of states and E is the number + * of transitions. + * + * @note This method does not modify the current state. * @return The set of all states reachable from the current state via epsilon transitions. */src/log_surgeon/Lexer.tpp (1)
382-382: Consider documenting the DFA construction behaviour.The TODO comment indicates that the DFA ignores tags, which is an important implementation detail that should be documented more formally.
Consider adding a formal documentation comment:
+ /** + * Generates a DFA from the NFA, ignoring tags. + * @note Currently, tags are ignored in the DFA construction. For example, + * "capture:user=(?<user_id>\d+)" is treated as "capture:user=\d+" + */ m_dfa = std::make_unique<finite_automata::Dfa<TypedDfaState>>(std::move(nfa));src/log_surgeon/finite_automata/Dfa.hpp (2)
17-17: Consider acceptingnfaby rvalue reference to improve performanceCurrently, the constructor accepts
Nfa<NfaStateType> nfaby value, which may introduce unnecessary copying. If the intention is to transfer ownership ofnfa, consider accepting it by rvalue reference (Nfa<NfaStateType>&& nfa) and moving it to avoid unnecessary copies.
46-46: Useusinginstead oftypedeffor type aliasingTo adhere to modern C++ conventions, consider replacing
typedefwithusingfor type aliasing:- typedef std::set<TypedNfaState const*> StateSet; + using StateSet = std::set<TypedNfaState const*>;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
examples/intersect-test.cpp(3 hunks)src/log_surgeon/Buffer.hpp(1 hunks)src/log_surgeon/Lalr1Parser.tpp(26 hunks)src/log_surgeon/Lexer.hpp(0 hunks)src/log_surgeon/Lexer.tpp(10 hunks)src/log_surgeon/LogEvent.cpp(2 hunks)src/log_surgeon/LogParser.cpp(1 hunks)src/log_surgeon/Parser.tpp(2 hunks)src/log_surgeon/ParserInputBuffer.cpp(2 hunks)src/log_surgeon/Schema.cpp(1 hunks)src/log_surgeon/SchemaParser.cpp(2 hunks)src/log_surgeon/finite_automata/Dfa.hpp(2 hunks)src/log_surgeon/finite_automata/NfaState.hpp(3 hunks)
💤 Files with no reviewable changes (1)
- src/log_surgeon/Lexer.hpp
✅ Files skipped from review due to trivial changes (6)
- src/log_surgeon/LogEvent.cpp
- src/log_surgeon/Schema.cpp
- src/log_surgeon/SchemaParser.cpp
- src/log_surgeon/LogParser.cpp
- src/log_surgeon/Buffer.hpp
- src/log_surgeon/ParserInputBuffer.cpp
🧰 Additional context used
📓 Path-based instructions (3)
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
examples/intersect-test.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build (ubuntu-latest, Release)
- GitHub Check: build (ubuntu-latest, Debug)
🔇 Additional comments (7)
src/log_surgeon/finite_automata/NfaState.hpp (2)
8-8: LGTM!The addition of the
<stack>header is necessary for the newepsilon_closure()method implementation.
174-207: Review the TODO comment and tagged transitions handling.The implementation treats tagged transitions as epsilon transitions (line 189), but this is marked as a TODO. This design decision should be reviewed as it might affect the correctness of the epsilon closure computation.
Let's verify the usage of tagged transitions:
Additionally:
- The implementation correctly handles cycles using a visited set (
closure_set).- The use of
false == stack.empty()andfalse == closure_set.insert(current_state).secondfollows the coding guidelines for boolean comparisons.✅ Verification successful
The treatment of tagged transitions as epsilon transitions is intentional and correct.
The TODO comment appears to be a documentation note rather than a pending task. This behaviour is:
- Part of the regex capture group implementation design
- Well-documented in RegexAST.hpp
- Consistently implemented across the codebase
- Validated by tests
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for other implementations or tests that rely on this behaviour rg -A 5 "tagged.*transition.*epsilon" --type cppLength of output: 619
Script:
#!/bin/bash # Search for tests involving tagged transitions rg -l "TaggedTransition" test/ # Search for documentation about tagged transitions fd README.md | xargs rg "tagged.*transition" -A 3 # Search for related discussions in comments rg "tagged.*transition" --type cpp -g "!test/*" -A 3Length of output: 25622
src/log_surgeon/Lexer.tpp (2)
22-22: LGTM! Consistent use of trailing return types.The change to use
auto -> voidfor method return types is consistent with modern C++ practices and improves readability.Also applies to: 69-72, 78-81, 86-89, 153-153, 163-163, 213-214, 270-270, 302-302, 379-380, 416-416, 422-422, 428-428, 645-645
45-45: LGTM! Consistent use ofautofor variable declarations.The change to use
autofor variable declarations improves maintainability and follows modern C++ type inference practices.Also applies to: 69-71, 85-85, 177-177, 201-201, 241-241, 256-256, 277-278, 283-286, 303-304, 321-322, 335-338, 348-351, 366-368, 382-383, 390-393, 407-408, 429-432, 477-477, 519-519, 559-559, 575-579, 582-583, 594-594, 612-614, 629-629, 638-638, 661-661, 666-666, 691-692, 714-714, 735-736
src/log_surgeon/Lalr1Parser.tpp (1)
69-72: LGTM! Consistent modernization of C++ syntax.The changes to use trailing return types and
autofor variable declarations are consistent with modern C++ practices and improve code maintainability.Also applies to: 78-81, 86-89, 153-153, 163-166, 173-177, 182-184, 213-214, 220-220, 241-243, 270-270, 277-278, 283-286, 302-304, 308-308, 321-322, 335-338, 348-351, 366-368, 379-380, 383-383, 390-393, 407-408, 416-416, 422-422, 428-430, 432-432, 440-440, 477-477, 519-519, 530-530, 559-559, 575-579, 582-583, 594-594, 612-614, 629-629, 638-638, 645-645, 661-661, 666-666, 691-692, 714-714, 735-736
src/log_surgeon/Parser.tpp (1)
33-36: Use ofautowith trailing return type-> voidis acceptableWhile using
autowith a trailing return type for functions returningvoidmay be unconventional, it's acceptable if it aligns with the project's coding style. Consistency across the codebase can enhance readability.examples/intersect-test.cpp (1)
44-44: Confirm safety of movingnfaintodfa2At line 44,
nfais moved intodfa2usingstd::move(nfa). Please ensure thatnfais not used after this operation to prevent undefined behaviour.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/log_surgeon/finite_automata/Dfa.hpp (3)
50-56: Consider optimizing lambda captures.The lambda
create_dfa_statecapturesthis,dfa_states, andunmarked_setsby reference. Sincethisis only used for callingnew_state, consider capturing it by value for better clarity of ownership.- auto create_dfa_state - = [this, &dfa_states, &unmarked_sets](StateSet const& set) -> TypedDfaState* { + auto create_dfa_state + = [this = this, &dfa_states, &unmarked_sets](StateSet const& set) -> TypedDfaState* {
64-74: Consider extracting transition mapping logic.The nested loops for building the transitions map could be extracted into a separate method for better readability and maintainability, especially when UTF-8 support is added.
Consider creating a helper method:
template <typename TypedNfaState> auto build_transitions_map(StateSet const& set) -> std::map<uint32_t, StateSet> { std::map<uint32_t, StateSet> ascii_transitions_map; for (auto const* s0 : set) { for (uint32_t i = 0; i < cSizeOfByte; i++) { for (auto* const s1 : s0->get_byte_transitions(i)) { StateSet closure = s1->epsilon_closure(); ascii_transitions_map[i].insert(closure.begin(), closure.end()); } } } return ascii_transitions_map; }
73-73: Track UTF-8 support requirements.Multiple TODO comments indicate pending UTF-8 support implementation. These should be tracked in the issue tracker.
Would you like me to create GitHub issues to track the UTF-8 support implementation requirements?
Also applies to: 90-90
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/log_surgeon/finite_automata/Dfa.hpp(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/log_surgeon/finite_automata/Dfa.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.
🔇 Additional comments (1)
src/log_surgeon/finite_automata/Dfa.hpp (1)
10-10: LGTM! Well-structured constructor declaration.The constructor is properly templated and takes advantage of move semantics by accepting the NFA by value. The include statement is appropriately placed.
Also applies to: 16-17
LinZhihao-723
left a comment
There was a problem hiding this comment.
For the PR title, how about:
refactor: Move `epsilon_closure` from `Lexer` to `NfaState`; Refactor `nfa_to_dfa` as `Dfa`'s constructor.
epsilon_closure and nfa_to_dfa from Lexer class to Dfa class.epsilon_closure from Lexer to NfaState; Refactor nfa_to_dfa as Dfa's constructor.
References
Description
epsilon_closurefromLexerclass toNfaStateclass.nfa_to_dfainLexerclass toDfaclass constructor.Validation performed
Previously existing tests succeed.
Summary by CodeRabbit
Release Notes
Refactor
autotype inferenceNew Features
epsilon_closure()method to compute reachable states via epsilon transitionsPerformance