-
Notifications
You must be signed in to change notification settings - Fork 10
feat: Add Query class to compute QueryInterpretations from a user specified query.
#152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
181 commits
Select commit
Hold shift + click to select a range
40c99c0
Add QueryIntepretation to log-surgeon with major modification to impr…
SharafMohamed 43f9730
Remove is_encoded; Remove constexpr of unused vars; In append_logtype…
SharafMohamed 45978ce
Gaurd against empty strings; Rename for clarity; Lint.
SharafMohamed 33664c7
Add unit-tests; Rename m_logtype to m_tokens; Rename append_logtype t…
SharafMohamed 3854c27
Add unit-test for appending an empty query interpretation.
SharafMohamed 7aec173
rename qi to query_interpretation.
SharafMohamed beeb54e
Add initial WildcardExpression code.
SharafMohamed e8fe2e8
Split classes into seperate files.
SharafMohamed 8415c56
Merge branch 'main' into QueryInterpretation
SharafMohamed 8673254
Pass clang-tidy on new tests.
SharafMohamed 88eae06
Merge branch 'main' into QueryInterpretation
SharafMohamed eb1d85e
Merge branch 'QueryInterpretation' into WildcardExpression
SharafMohamed 35729d9
Merge branch 'main' into QueryInterpretation
davidlion da28956
Merge branch 'main' into WildcardExpression
SharafMohamed d99ce89
Rename query_parser to wildcard_query_parser.
SharafMohamed b4d3f1e
Use three-way comparitor in StaticQuery class.
SharafMohamed ebc51a1
Use three-way comparitor in VariableQueryToken class.
SharafMohamed b60f628
Use three-way comparitor in QueryInterpretation class.
SharafMohamed 5c34255
Fix format errors.
SharafMohamed 169c5af
Fix tidy errors.
SharafMohamed fad4702
Update test-static-query-token.cpp for three-way operator.
SharafMohamed e331aea
Move comparison helpers into its own file.
SharafMohamed 5d34f84
Use a duplicate of the last token so it looks nicer.
SharafMohamed ecd985f
Update test-variable-query-token.cpp for three-way operator.
SharafMohamed ea9c9cd
Fix format errors.
SharafMohamed 6574f80
Add operator== for VariableQueryToken.
SharafMohamed b6a3c3b
Add operator== for QueryInterpretation.
SharafMohamed 79a749d
Update docstring and name of comparison unit tests.
SharafMohamed cfef204
Rename token variables to end in the word token.
SharafMohamed bd297e2
Rename token variables to end in the word token for VariableQueryToke…
SharafMohamed 0cb9217
Update test-query-interpretation.cpp for three-way comparitor.
SharafMohamed d620be6
Fix tidy errors.
SharafMohamed db7c86a
Fix some naming.
SharafMohamed 410adba
Clean up variable query tokens unit-tests.
SharafMohamed 200b977
Fix interpretation initialization that accidentally used parentheis i…
SharafMohamed bc31dd7
Add missing include.
SharafMohamed 5d4fd82
Add const.
SharafMohamed f4e3b63
Clean up static query token's unit-tests.
SharafMohamed f13251e
Deduplicate test code by moving it into utils file.
SharafMohamed 0020744
Merge branch 'main' into QueryInterpretation
SharafMohamed 47a64f1
Fix docstrings to refer to the query instead of logs.
SharafMohamed ffe0171
Remove redundancy in descriptions.
SharafMohamed 6e3684f
Prevent accessing back of empty vector.
SharafMohamed 34043bf
Change has_wildcard to contains_wildcard for consistency.
SharafMohamed 1f75b0a
Change has_wildcard to contains_wildcard in more places for consistency.
SharafMohamed 5017b44
Define the operator first in the cpp.
SharafMohamed e7247b7
Move append_static_token into cpp.
SharafMohamed 2c376ea
Remove extra newline.
SharafMohamed 230f0dc
Added docstring for append_static_token.
SharafMohamed 6318ad3
Remove space.
SharafMohamed 267c26a
Since append_query_interpretations doesn't modify suffix, make it const.
SharafMohamed b4728e7
Fix tidy warnnings.
SharafMohamed 8efa83a
Fix tidy warnings.
SharafMohamed dc2e7c9
Move static_assert into the QueryInterpretations header.
SharafMohamed aaf52f7
Format.
SharafMohamed 8cdb975
Use std::three_way_comparable to simplify concept.
SharafMohamed 90ff8b9
Fix compiler error from previous commit.
SharafMohamed 0124d64
Remove unused headers and using declarations.
SharafMohamed 8e4b484
Use concept to enforce template type in comparison utils.
SharafMohamed 387c4fc
Merge branch 'QueryInterpretation' into WildcardExpression
SharafMohamed 4ac7e7d
Merge branch 'main' into WildcardExpression
SharafMohamed 21f77bf
Complete refactor of the WildcardExpression class.
SharafMohamed 59432d4
Format.
SharafMohamed c5ea9a8
Remove redundant docstring.
SharafMohamed be8d931
Add is_well_formed check.
SharafMohamed 7154798
Fix type in WildcardCharacter.hpp file name and add it to CMakeLists.…
SharafMohamed 0bcd93b
Move contains_wildcard into the view generate_regex_string method; Ad…
SharafMohamed db937eb
Add missing include.
SharafMohamed 12308df
Fix typo in add missing include.
SharafMohamed 56460c9
Fix spelling.
SharafMohamed b1838d4
Reserve regex string size.
SharafMohamed 248f746
Remove unused header; Remove reference to char.
SharafMohamed 5bb3fa5
Fix tidy warnings.
SharafMohamed ddd22da
Improve docstring.
SharafMohamed f2caabe
Remove doc comments from cpp and move them into hpp docstring.
SharafMohamed c105a29
Make CharType nested within WildcardCharacter.
SharafMohamed 9e67679
Rename CharType to Type; Move Type to be public; Use WildcardCharacte…
SharafMohamed 7a91cf6
Rename m_processed_search_string to m_search_string.
SharafMohamed 2d0fdd9
Rename get_string to get_search_string.
SharafMohamed e20cf92
Format.
SharafMohamed 7104a62
Add WildcardCharacter unit-tests.
SharafMohamed 0e4b768
Format.
SharafMohamed 9adf7e9
Improve naming of wildcard character test cases.
SharafMohamed 2a039a0
Rename WildcardCharacter to ExpressionCharacter.
SharafMohamed 109c198
Add unit-tests for WildcardExpression.
SharafMohamed bf4ad21
Format.
SharafMohamed 2f87ae1
Update extend_to_adjacent_wildcards method to run a success flag; Add…
SharafMohamed e631ec3
Improve consistency in expression unit-tests by checking values in no…
SharafMohamed 89b4bd4
Fix naming in header guards.
SharafMohamed aee3c62
Rename WildcardExpression to Expression.
SharafMohamed b8ee95b
Rename WildcardExpressionView to ExpressionView.
SharafMohamed 66b92a7
Format.
SharafMohamed 883b31e
Most of view tests are added now.
SharafMohamed 6fec901
Format.
SharafMohamed fe2ded3
Tidy.
SharafMohamed adacad7
Fix typo.
SharafMohamed 91e7049
Grammar.
SharafMohamed 34eefe0
Remove magic number in test.
SharafMohamed 1bdc456
Improve clarity of expression unit-test.
SharafMohamed 0473274
Fix logic error from previous commit in expression unit-test.
SharafMohamed 9fd521f
Add unit tests for a view that starts or ends with greedy wildcards.
SharafMohamed aeb416c
Add unit tests for extending a view.
SharafMohamed 7868ba6
Format.
SharafMohamed 49ffec7
Add unit-tests to test snapping.
SharafMohamed 61f6baf
Reword snapping to clamping.
SharafMohamed c9da3f5
Add unit tests for generating regex.
SharafMohamed 73c6336
Format.
SharafMohamed 16c5a33
Fix unit test name.
SharafMohamed d03f952
Update docstrings.
SharafMohamed 4bc0fa4
Add unit-test for regex meta characters.
SharafMohamed f86eade
Add test for multi-capture rule.
SharafMohamed 454cbba
Format.
SharafMohamed 1fd5422
Merge branch 'main' into new-log-test
SharafMohamed d9a99e8
Explicitly construct uncaught strings.
SharafMohamed 2699059
Use format for readability.
SharafMohamed 62cc586
Switch to backslash for multi-line continuation.
SharafMohamed 1b48c67
Add kube test case.
SharafMohamed 6335f1e
Format.
SharafMohamed 099693f
Fix case in docstring.
SharafMohamed 3719fa8
Add Query class.
SharafMohamed 5b6b477
Format.
SharafMohamed 502ade7
Format again.
SharafMohamed 3e88c66
Format again again.
SharafMohamed cf405e5
Tidy.
SharafMohamed f7729cf
Add unit tests.
SharafMohamed ed29b6d
Format.
SharafMohamed 2bc577d
Tidy.
SharafMohamed cb82695
Fixed unit-tests.
SharafMohamed 9d87248
Fix typo.
SharafMohamed ad58be5
Fix typo.
SharafMohamed fbe4e16
Fix typo.
SharafMohamed bb799d9
Fix docstring.
SharafMohamed 468ab31
Fix typos.
SharafMohamed af30e98
Fix docstring.
SharafMohamed da83377
Retype to unsigned char.
SharafMohamed b4dc1e9
Fix UB.
SharafMohamed 07f82d1
Reserve query string size.
SharafMohamed 9929151
Remove complexity claims.
SharafMohamed bf678ec
Return const reference to avoid copy.
SharafMohamed 270330b
Fix docstring.
SharafMohamed 9addd94
Remove accidental reference.
SharafMohamed aab4659
Add some checks that enforce test schema changes to be followed through.
SharafMohamed 8349c76
Remove unneeded braces in set initialization.
SharafMohamed a772ea2
Use front() in place of [0] and add () around if check.
SharafMohamed acdab72
Fix docstring.
SharafMohamed 3563e04
Fix docstring.
SharafMohamed c1a6466
Fix typo.
SharafMohamed 30e2ee3
Switch type back to char and cast when needed.
SharafMohamed db3b6ce
Merge branch 'main' into new-log-test
SharafMohamed cf86fdc
Fix docstring.
SharafMohamed fb9c2d0
Update docstrings to include log type.
SharafMohamed 326b242
Changed expected_event1 to expected_event.
SharafMohamed 7e50b75
Merge branch 'new-log-test' into Query
SharafMohamed ec9ce1c
Merge branch 'main' into Query
SharafMohamed 11bcee1
Rename m_query_string to m_processed_query_string.
SharafMohamed fbfa026
Fix escaped star test case.
SharafMohamed 46344b4
Format.
SharafMohamed 80bb255
Improve naming and docstring for is_surrounded_by_delimiters.
SharafMohamed 79d0114
Add multi-variable tests.
SharafMohamed 2091a02
Add missing header; Remove unused var.
SharafMohamed 9eebe55
Format.
SharafMohamed d094f45
Move TODOs into git issues.
SharafMohamed 9ca5b45
Improve docstring indentation and remove comma for consistency.
SharafMohamed 953e6bb
Move T(a,b) definition to relevent section; Indent equation for bette…
SharafMohamed f6a0121
Fix typo (0,1] to [0,1).
SharafMohamed b714676
Update docstring.
SharafMohamed 8057ede
Fix spacing.
SharafMohamed 55cad11
Remove the concept of an escaped wildcard from the docstring.
SharafMohamed 797b376
Move short circuit to top of method.
SharafMohamed b82aea5
Clarify in docstring interpretation length refers to tokens, query le…
SharafMohamed 73845f8
Discuss empty string case in docstring.
SharafMohamed 9261531
Add docstring for caching.
SharafMohamed a3bcc6b
Check if escaped character is a delim only, remove check for wildcard…
SharafMohamed de71fc7
Add bounds check.
SharafMohamed 7365572
Dynamic cast to ptr and check its not null to avoid throwing bad_cast…
SharafMohamed f739972
Fix typos.
SharafMohamed f8db5da
Update docstring for clarity.
SharafMohamed 1fd4067
Specify greedy wildcards to be accurate.
SharafMohamed 9e9fb7c
Fix grammar in docstring. Fix consistency of docstring.
SharafMohamed fe0fe3b
Merge remote-tracking branch 'upstream/main' into pr-152
davidlion 27d87b6
Tweak has_delim comment.
davidlion File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,176 @@ | ||
| #include "Query.hpp" | ||
|
|
||
| #include <cstddef> | ||
| #include <cstdint> | ||
| #include <iterator> | ||
| #include <set> | ||
| #include <string> | ||
| #include <utility> | ||
| #include <vector> | ||
|
|
||
| #include <log_surgeon/finite_automata/Dfa.hpp> | ||
| #include <log_surgeon/finite_automata/DfaState.hpp> | ||
| #include <log_surgeon/finite_automata/Nfa.hpp> | ||
| #include <log_surgeon/finite_automata/NfaState.hpp> | ||
| #include <log_surgeon/Lexer.hpp> | ||
| #include <log_surgeon/LexicalRule.hpp> | ||
| #include <log_surgeon/parser_types.hpp> | ||
| #include <log_surgeon/Schema.hpp> | ||
| #include <log_surgeon/SchemaParser.hpp> | ||
| #include <log_surgeon/wildcard_query_parser/Expression.hpp> | ||
| #include <log_surgeon/wildcard_query_parser/ExpressionView.hpp> | ||
| #include <log_surgeon/wildcard_query_parser/QueryInterpretation.hpp> | ||
|
|
||
| using log_surgeon::finite_automata::ByteDfaState; | ||
| using log_surgeon::finite_automata::ByteNfaState; | ||
| using log_surgeon::lexers::ByteLexer; | ||
| using std::set; | ||
| using std::string; | ||
| using std::vector; | ||
|
|
||
| using ByteDfa = log_surgeon::finite_automata::Dfa<ByteDfaState, ByteNfaState>; | ||
| using ByteLexicalRule = log_surgeon::LexicalRule<ByteNfaState>; | ||
| using ByteNfa = log_surgeon::finite_automata::Nfa<ByteNfaState>; | ||
|
|
||
| namespace log_surgeon::wildcard_query_parser { | ||
| Query::Query(string const& query_string) { | ||
| m_processed_query_string.reserve(query_string.size()); | ||
| Expression const expression(query_string); | ||
|
|
||
| bool prev_is_escape{false}; | ||
| string unhandled_wildcard_sequence; | ||
| bool unhandled_wildcard_sequence_contains_greedy_wildcard{false}; | ||
| for (auto c : expression.get_chars()) { | ||
| if (false == unhandled_wildcard_sequence.empty() && false == c.is_wildcard()) { | ||
| if (unhandled_wildcard_sequence_contains_greedy_wildcard) { | ||
| m_processed_query_string.push_back('*'); | ||
| } else { | ||
| m_processed_query_string += unhandled_wildcard_sequence; | ||
| } | ||
| unhandled_wildcard_sequence.clear(); | ||
| unhandled_wildcard_sequence_contains_greedy_wildcard = false; | ||
| } | ||
|
|
||
| if (prev_is_escape) { | ||
| m_processed_query_string.push_back(c.value()); | ||
| prev_is_escape = false; | ||
| } else if (c.is_escape()) { | ||
| prev_is_escape = true; | ||
| m_processed_query_string.push_back(c.value()); | ||
| } else if (c.is_greedy_wildcard()) { | ||
| unhandled_wildcard_sequence.push_back(c.value()); | ||
| unhandled_wildcard_sequence_contains_greedy_wildcard = true; | ||
| } else if (c.is_non_greedy_wildcard()) { | ||
| unhandled_wildcard_sequence.push_back(c.value()); | ||
| } else { | ||
| m_processed_query_string.push_back(c.value()); | ||
| } | ||
| } | ||
| if (false == unhandled_wildcard_sequence.empty()) { | ||
| if (unhandled_wildcard_sequence_contains_greedy_wildcard) { | ||
| m_processed_query_string.push_back('*'); | ||
| } else { | ||
| m_processed_query_string += unhandled_wildcard_sequence; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| auto Query::get_all_multi_token_interpretations(ByteLexer const& lexer) const | ||
| -> std::set<QueryInterpretation> { | ||
| if (m_processed_query_string.empty()) { | ||
| return {}; | ||
| } | ||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Expression const expression{m_processed_query_string}; | ||
| vector<set<QueryInterpretation>> query_interpretations(expression.length()); | ||
| for (size_t end_idx = 1; end_idx <= expression.length(); ++end_idx) { | ||
| for (size_t begin_idx = 0; begin_idx < end_idx; ++begin_idx) { | ||
| ExpressionView const expression_view{expression, begin_idx, end_idx}; | ||
| if ("*" != expression_view.get_search_string() | ||
| && expression_view.starts_or_ends_with_greedy_wildcard()) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| auto const extended_view{expression_view.extend_to_adjacent_greedy_wildcards().second}; | ||
| auto const single_token_interpretations{ | ||
| get_all_single_token_interpretations(extended_view, lexer) | ||
| }; | ||
| if (single_token_interpretations.empty()) { | ||
| continue; | ||
| } | ||
|
|
||
| if (begin_idx == 0) { | ||
SharafMohamed marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| query_interpretations[end_idx - 1].insert( | ||
| std::make_move_iterator(single_token_interpretations.begin()), | ||
| std::make_move_iterator(single_token_interpretations.end()) | ||
| ); | ||
| } else { | ||
| for (auto const& prefix : query_interpretations[begin_idx - 1]) { | ||
| for (auto const& suffix : single_token_interpretations) { | ||
| QueryInterpretation combined{prefix}; | ||
| combined.append_query_interpretation(suffix); | ||
| query_interpretations[end_idx - 1].insert(std::move(combined)); | ||
| } | ||
| } | ||
| } | ||
SharafMohamed marked this conversation as resolved.
Show resolved
Hide resolved
SharafMohamed marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
davidlion marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
| return query_interpretations.back(); | ||
| } | ||
|
|
||
| auto Query::get_all_single_token_interpretations( | ||
| ExpressionView const& expression_view, | ||
| ByteLexer const& lexer | ||
| ) -> std::vector<QueryInterpretation> { | ||
| vector<QueryInterpretation> interpretations; | ||
|
|
||
| if (false == expression_view.is_well_formed()) { | ||
| return interpretations; | ||
| } | ||
| if ("*" == expression_view.get_search_string()) { | ||
| interpretations.emplace_back("*"); | ||
| return interpretations; | ||
| } | ||
| if (false == expression_view.is_surrounded_by_delims(lexer.get_delim_table())) { | ||
| interpretations.emplace_back(string{expression_view.get_search_string()}); | ||
| return interpretations; | ||
| } | ||
SharafMohamed marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| auto const [regex_string, contains_wildcard]{expression_view.generate_regex_string()}; | ||
|
|
||
| auto const matching_var_type_ids{get_matching_variable_types(regex_string, lexer)}; | ||
| if (matching_var_type_ids.empty() || contains_wildcard) { | ||
| interpretations.emplace_back(string{expression_view.get_search_string()}); | ||
| } | ||
|
|
||
| for (auto const variable_type_id : matching_var_type_ids) { | ||
| interpretations.emplace_back( | ||
| variable_type_id, | ||
| string{expression_view.get_search_string()}, | ||
| contains_wildcard | ||
| ); | ||
| if (false == contains_wildcard) { | ||
| break; | ||
| } | ||
| } | ||
| return interpretations; | ||
| } | ||
|
|
||
| auto Query::get_matching_variable_types(string const& regex_string, ByteLexer const& lexer) | ||
| -> set<uint32_t> { | ||
| NonTerminal::m_next_children_start = 0; | ||
|
|
||
| Schema schema; | ||
| schema.add_variable("search:" + regex_string, -1); | ||
| auto const schema_ast = schema.release_schema_ast_ptr(); | ||
| auto& rule_ast = dynamic_cast<SchemaVarAST&>(*schema_ast->m_schema_vars[0]); | ||
| vector<ByteLexicalRule> rules; | ||
| rules.emplace_back(0, std::move(rule_ast.m_regex_ptr)); | ||
| ByteNfa const nfa{rules}; | ||
| ByteDfa const dfa{nfa}; | ||
|
|
||
| auto var_types = lexer.get_dfa()->get_intersect(&dfa); | ||
| return var_types; | ||
| } | ||
| } // namespace log_surgeon::wildcard_query_parser | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.