-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Request
Currently you can specify regex with capture groups inside a repetition. However, it only returns the last match of the capture instead of all matches.
- Essentially, the issue is that for multi-valued tags (e.g.
([a]+=(?<val>[a0]+),){4}) we fail to track all instances of the tag. - More generally, if a variable is ambiguous, the DFA only tracks the tags for one interpretation. If the user chooses a different interpretation after lexing, then the tag positions will be incorrect.
- Additionally, if a variable is partially-ambiguous such that a prefix is ambiguous, similar problems can occur where it begins by tacking one interpretation's tags, and even if that interpretation proves to be incorrect for the full variable, the correct interpretation will now have failed to track the needed tags.
Possible implementation
Fix this test case:
/**
* @ingroup test_buffer_parser_newline_vars
*
* @brief Test capture group repetition and backtracking.
*
* @details
* This test checks `BufferParser`'s handling of a variable with a regex containing capture groups
* repeated multiple times. It verifies the positions of captured subgroups within the parsed token
* and ensures correct tokenization of the repeated pattern.
*
* @section schema Schema Definition
* @code
* delimiters: \n\r\[:,)
* myVar: ([A-Za-z]+=(?<val>[a-zA-Z0-9]+),){4}
* @endcode
*
* @section input Test Input
* @code
* "userID=123,age=30,height=70,weight=100,"
* @endcode
*
* @section expected Expected Logtype
* @code
* "userID=<val>,age=<val>,height=<val>,weight=<val>,"
* @endcode
*
* @section expected Expected Tokenization
* @code
* "userID=123,age=30,height=70,weight=100," -> "keyValuePairs" with:
* "123" -> "val", "30 -> "val", "70" -> "val", "100" -> "val"
* @endcode
*/
TEST_CASE("Test buffer parser with capture group repetition and backtracking", "[BufferParser]") {
constexpr string_view cDelimitersSchema{R"(delimiters: \n\r\[:,)"};
constexpr string_view cVarSchema{"keyValuePairs:([A-Za-z]+=(?<val>[a-zA-Z0-9]+),){4}"};
constexpr string_view cInput{"userID=123,age=30,height=70,weight=100,"};
ExpectedEvent const expected_event{
.m_logtype{R"(userID=<val>,age=<val>,height=<val>,weight=<val>,)"},
.m_timestamp_raw{""},
.m_tokens{
{{"userID=123,age=30,height=70,weight=100,",
"keyValuePairs",
{{{"val", {{35, 25, 15, 7}, {37, 27, 17, 10}}}}}}}
}
};
Schema schema;
schema.add_delimiters(cDelimitersSchema);
schema.add_variable(cVarSchema, -1);
BufferParser buffer_parser{std::move(schema.release_schema_ast_ptr())};
parse_and_validate(buffer_parser, cInput, {expected_event});
// TODO: add backtracking case
}
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request