feat: Differentiate single and multi valued tag operations. by SharafMohamed · Pull Request #104 · y-scope/log-surgeon

SharafMohamed · 2025-04-09T21:53:47Z

Description

Previously all tags were considered to be multi-valued. This has performance implications as a fresh prefix tree needs to be created for every token during lexing:

We now only require tags to be multi-valued for repetition regex, otherwise it is single valued. This is tracked using a flag in the set of tag operations.
This is an intermediate step and has no direct performance or functionality implications. In the following PRs, this change will be propagated to the register creation during determinization and register use during simulation.

Also performed some cleanup to the existing docstrings and consolidated repeated code in relavent unit-tests.

Validation performed

Added an NFA unit-test case that has acapture inside a repetition.

Summary by CodeRabbit

New Features
- Enhanced pattern matching to support more robust multi-capture scenarios and improved handling of repeated structures.
Refactor
- Streamlined internal logic for transitions and tagging operations to boost overall processing clarity and efficiency.
Tests
- Added new test cases covering both simple and complex repetition patterns to ensure reliable performance.
Chores
- Updated external dependency references to maintain seamless integration.

…; Add TaggedTransition.hpp to cmake.

…; Delete TaggedTransition file thats not used anymore.

coderabbitai · 2025-04-09T21:53:54Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This pull request updates several components of the NFA construction and testing workflow. In particular, method signatures and constructor calls in the NFA-related classes are modified to include new boolean parameters—such as for handling multi-valued transitions and repetition contexts. Additionally, a previously existing file containing tagged transition definitions has been removed. Test cases are refactored to use a common testing function, and a subproject commit reference is updated.

Changes

File(s)	Change Summary
`src/log_surgeon/LexicalRule.hpp`	Modified the call in `add_to_nfa` to pass an additional `false` boolean parameter to `add_to_nfa_with_negative_captures`.
`src/log_surgeon/finite_automata/Nfa.hpp` `src/log_surgeon/finite_automata/NfaState.hpp`	Added a `multi_valued` boolean parameter to method signatures and constructors to support multi-valued transitions and state handling.
`src/log_surgeon/finite_automata/RegexAST.hpp`	Updated the `add_to_nfa` and `add_to_nfa_with_negative_captures` method signatures (and those in derived classes) to include a `descendent_of_repetition` flag.
`src/log_surgeon/finite_automata/TagOperation.hpp`	Changed the constructor to include a `multi_valued` flag, added an `is_multi_valued` accessor, and updated the serialization format.
`src/log_surgeon/finite_automata/TaggedTransition.hpp`	Removed the file containing definitions for `PositiveTaggedTransition` and `NegativeTaggedTransition`.
`tests/test-nfa.cpp`	Introduced the `test_nfa` function to encapsulate NFA generation and comparison logic, refactoring existing tests and adding new cases for repetition NFAs.
`tools/yscope-dev-utils`	Updated the subproject commit reference from an older commit to a new one.

Sequence Diagram(s)

sequenceDiagram
  participant LR as LexicalRule
  participant RA as RegexAST
  participant N as Nfa
  participant NS as NfaState
  participant TO as TagOperation

  LR->>RA: add_to_nfa(nfa, end_state, false)
  Note right of RA: Determines repetition context via descendent_of_repetition flag
  RA->>N: add_to_nfa_with_negative_captures(nfa, end_state, flag)
  N->>NS: Create new state (using multi_valued flag)
  NS->>TO: add_spontaneous_transition(..., multi_valued)

Possibly related PRs

feat: Split NFA positive tags into start and end transitions to encapsulate a capture group. #50: Updates to the add_to_nfa_with_negative_captures method in the RegexAST class are strongly related to the current modifications.
refactor: Remove redundant this when referring to member variables. #60: Involves changes in the add_to_nfa method of the LexicalRule class that mirror the additional boolean parameter introduced here.
refactor: Unify all types of NFA transitions into NfaSpontaneousTransition. #76: Addresses similar updates to method signatures in both add_to_nfa and add_to_nfa_with_negative_captures, matching the current PR’s focus on parameter handling.

Suggested reviewers

LinZhihao-723

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

SharafMohamed · 2025-04-09T21:58:04Z

@coderabbitai review

coderabbitai · 2025-04-09T21:58:13Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

src/log_surgeon/finite_automata/TagOperation.hpp (1)

43-52: Improved switch statement implementation

The code now directly assigns values to type_char instead of using a default case, which improves code clarity.

However, you should consider using false == m_negative_captures.empty() instead of !m_negative_captures.empty() to match the coding style used elsewhere in the codebase.

src/log_surgeon/finite_automata/NfaState.hpp (1)

73-73: Updated TagOperation constructor call

The TagOperation constructor call now includes the multi_valued parameter, correctly propagating the multi-valued state to the tag operations.

Per your coding guidelines, consider using false == stack.empty() instead of !stack.empty() on line 187 to maintain consistency in the codebase.

src/log_surgeon/finite_automata/RegexAST.hpp (1)

74-85: Docstring clarifies repetition context
The expanded explanation for how repetition affects node processing is valuable. Consider using “descendant” (with an 'a') for maximum clarity.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d69eb8 and 212cd5a.

📒 Files selected for processing (8)

src/log_surgeon/LexicalRule.hpp (1 hunks)
src/log_surgeon/finite_automata/Nfa.hpp (4 hunks)
src/log_surgeon/finite_automata/NfaState.hpp (2 hunks)
src/log_surgeon/finite_automata/RegexAST.hpp (18 hunks)
src/log_surgeon/finite_automata/TagOperation.hpp (2 hunks)
src/log_surgeon/finite_automata/TaggedTransition.hpp (0 hunks)
tests/test-nfa.cpp (3 hunks)
tools/yscope-dev-utils (1 hunks)

💤 Files with no reviewable changes (1)

src/log_surgeon/finite_automata/TaggedTransition.hpp

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

tests/test-nfa.cpp
src/log_surgeon/LexicalRule.hpp
src/log_surgeon/finite_automata/TagOperation.hpp
src/log_surgeon/finite_automata/Nfa.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/RegexAST.hpp

🧬 Code Graph Analysis (2)

src/log_surgeon/LexicalRule.hpp (1)

src/log_surgeon/finite_automata/RegexAST.hpp (10)

nfa (86-90)

nfa (133-150)

nfa (133-137)

nfa (254-258)

nfa (305-309)

nfa (411-415)

nfa (496-500)

nfa (557-561)

nfa (619-623)

nfa (715-719)

src/log_surgeon/finite_automata/Nfa.hpp (1)

src/log_surgeon/finite_automata/NfaState.hpp (2)

dest_state (60-62)

dest_state (60-60)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build (ubuntu-22.04, Release)

🔇 Additional comments (36)

tools/yscope-dev-utils (1)

1-1: Subproject Commit Identifier Updated

The commit identifier has been updated to a2a363490690cd3728009859828caa2f742c636f. This change is a straightforward update to the subproject reference and does not affect the functionality of the tool.

src/log_surgeon/LexicalRule.hpp (1)

47-47: Added missing parameter to function call

The addition of a third parameter (false) to the add_to_nfa_with_negative_captures function call correctly implements the differentiation between single-valued and multi-valued tags. This change indicates that tags created during this operation are not descendants of a repetition and should be treated as single-valued.

tests/test-nfa.cpp (5)

24-62: Good refactoring: created a test helper function

Great job extracting common test logic into a reusable function. This refactoring improves code maintainability by reducing duplication and making it easier to add new test cases in the future.

64-83: Test case refactored appropriately

The test case has been successfully refactored to use the new test_nfa helper function while maintaining the same test logic and expected outputs.

85-114: Complex test case refactored appropriately

The complex test case has been successfully refactored to use the new test_nfa helper function without changing the test's intent or validation logic.

116-131: New test case for simple repetition

Excellent addition of a test case for simple repetition in regex patterns. This test validates the new multi-valued tag functionality when dealing with repetition operators (* and +). Notice how the serialized output includes operations marked with + to indicate multi-valued tags.

133-149: New test case for complex repetition

Great addition of a test case for complex repetition (nested repetition) which further validates the multi-valued tag functionality in more complex scenarios. This test provides good coverage for the new feature.

src/log_surgeon/finite_automata/TagOperation.hpp (3)

20-23: Constructor updated for multi-valued support

The constructor has been appropriately updated to include the new multi_valued parameter, which aligns with the PR objectives of differentiating between single-valued and multi-valued tags.

37-37: Added accessor for multi-valued property

Good addition of a getter method to access the multi-valued state of the tag operation. This follows good encapsulation practices.

58-58: Added new member variable for multi-valued functionality

The new member variable m_multi_valued has been added to store the multi-valued state of the tag operation, which is required for the new functionality.

src/log_surgeon/finite_automata/NfaState.hpp (3)

53-54: Constructor parameter added for multi-valued support

The constructor now accepts an additional parameter multi_valued to support the differentiation between single-valued and multi-valued tags. This aligns with the PR objectives.

57-57: Updated function call with multi-valued parameter

The call to add_spontaneous_transition has been correctly updated to pass the multi_valued parameter, ensuring consistent behavior throughout the class.

67-68: Method signature updated for multi-valued support

The add_spontaneous_transition method has been updated to include the multi_valued parameter. This change is consistent with the overall implementation of the multi-valued tag feature.

src/log_surgeon/finite_automata/Nfa.hpp (7)

56-56: Documentation update looks good
The explanation for the new parameter is concise and aligns well with the existing style.

62-64: Multi-valued parameter introduction
This signature addition is coherent. Please confirm that all call sites referencing this function are updated accordingly.

69-69: Straightforward parameter specification
The multi_valued doc comment is clear and consistent with the rest of the documentation.

167-168: Validate the multi_valued usage for negative captures
Ensuring negative captures can also be multi-valued is crucial. Please confirm that tests account for multi-valued negative capture scenarios.

Also applies to: 181-182

190-191: Extension of signature for positive captures
Introducing multi_valued for paralleling negative captures is logical.

193-193: Inclusion of multi_valued in spontaneous transition calls
Propagating multi_valued to transitions looks properly integrated. This enhances flexibility for capture tracking.

Also applies to: 195-200

205-206: Uniform parameter usage in second state
Applying multi_valued to the end state creation maintains consistency with the start state. Nicely handled.

src/log_surgeon/finite_automata/RegexAST.hpp (16)

86-90: Function signature updated for repetition context
The new boolean parameter is properly introduced. Please verify that all derived classes implement it consistently.

120-132: Enhanced documentation for negative captures under repetition
These clarifications help in understanding how negative captures behave when repeated.

133-137: Extended documentation on add_to_nfa_with_negative_captures
Explaining repetition usage with negative captures aids in maintenance and clarity.

141-145: Ensuring correctness in new_state_from_negative_captures usage
Passing descendent_of_repetition here matches the revised design. Looks good.

146-146: Fallback call to add_to_nfa
Continuing to the original add_to_nfa method when there are no negative captures is correct and does not break existing flow.

Also applies to: 148-148

212-214: add_to_nfa signature for RegexASTEmpty
No additional repetition-specific logic is needed here, which is consistent with an empty node’s behaviour.

254-258: RegexASTLiteral repetition parameter introduced
Handling a literal with the repetition flag does not pose issues given the current implementation.

305-309: Unsupported integer AST handling
Throwing an exception for integer AST nodes is in line with prior restrictions. No problems detected.

412-415: Extended signature for group node
Even though the code here does not overtly use the repetition flag, adding the parameter retains consistency.

497-500: Handling repetition in OR logic
The updated approach for negative captures under repetition is sound.

557-561: Concatenate node extended for repetition
Saving and restoring root while introducing an intermediate state is a well-structured approach.

619-623: Repeat operator logic with descendent_of_repetition set
The code consistently passes repetition context through the multiplication logic.

715-719: Capture node updated to handle repetition
Forwarding the repetition flag into the capture’s NFA logic aligns well with the rest of the codebase.

930-934: Signature adjustment for capture AST
Introducing the repetition parameter ensures uniformity throughout all AST nodes.

966-970: Invoking new_start_and_end_states_from_positive_capture with repetition
The usage of descendent_of_repetition for multi_valued capturing is consistent with the code design.

975-976: Ensuring repetition flag is passed to nested capture AST
Chaining the repetition context down to nested calls maintains correctness for deeper captures.

SharafMohamed added 6 commits April 2, 2025 13:22

Set up NFA to handle single-valued tags.

c7cd662

Merge branch 'main' into single-valued-nfa

c7523e8

Merge branch 'main' into single-valued-nfa

a1548c4

Forward multi_valued argument to tag creation functions.

1225886

Finish forwarding descendaent_of_repetition to tag creation functions…

247d6b6

…; Add TaggedTransition.hpp to cmake.

Add unit-test for multi-valued tag; Finish handling multi-valued tags…

61bc7ef

…; Delete TaggedTransition file thats not used anymore.

SharafMohamed requested a review from a team as a code owner April 9, 2025 21:53

SharafMohamed changed the title ~~feat: Implement single valued tag operations.~~ feat: Differentiate single and multi valued tag operations. Apr 9, 2025

Add another unit-test for capture repetition regex.

212cd5a

coderabbitai bot reviewed Apr 9, 2025

View reviewed changes

SharafMohamed added 2 commits April 13, 2025 20:10

Update yscope-dev-utils.

447491f

Revert yscope-dev-utils to mains commit.

aa6e3e6

davidlion approved these changes Apr 25, 2025

View reviewed changes

SharafMohamed merged commit f9b1eac into y-scope:main Apr 28, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: Differentiate single and multi valued tag operations. #104

feat: Differentiate single and multi valued tag operations. #104
SharafMohamed merged 9 commits intoy-scope:mainfrom
SharafMohamed:single-valued-nfa

SharafMohamed commented Apr 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 9, 2025 •

edited

Loading

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

SharafMohamed commented Apr 9, 2025

Uh oh!

coderabbitai bot commented Apr 9, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

SharafMohamed commented Apr 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

SharafMohamed commented Apr 9, 2025

Uh oh!

coderabbitai bot commented Apr 9, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SharafMohamed commented Apr 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 9, 2025 •

edited

Loading