Skip to content
Merged
Show file tree
Hide file tree
Changes from 176 commits
Commits
Show all changes
181 commits
Select commit Hold shift + click to select a range
40c99c0
Add QueryIntepretation to log-surgeon with major modification to impr…
SharafMohamed Jul 29, 2025
43f9730
Remove is_encoded; Remove constexpr of unused vars; In append_logtype…
SharafMohamed Jul 29, 2025
45978ce
Gaurd against empty strings; Rename for clarity; Lint.
SharafMohamed Jul 29, 2025
33664c7
Add unit-tests; Rename m_logtype to m_tokens; Rename append_logtype t…
SharafMohamed Jul 31, 2025
3854c27
Add unit-test for appending an empty query interpretation.
SharafMohamed Jul 31, 2025
7aec173
rename qi to query_interpretation.
SharafMohamed Jul 31, 2025
beeb54e
Add initial WildcardExpression code.
SharafMohamed Aug 4, 2025
e8fe2e8
Split classes into seperate files.
SharafMohamed Aug 4, 2025
8415c56
Merge branch 'main' into QueryInterpretation
SharafMohamed Aug 4, 2025
8673254
Pass clang-tidy on new tests.
SharafMohamed Aug 4, 2025
88eae06
Merge branch 'main' into QueryInterpretation
SharafMohamed Aug 4, 2025
eb1d85e
Merge branch 'QueryInterpretation' into WildcardExpression
SharafMohamed Aug 4, 2025
35729d9
Merge branch 'main' into QueryInterpretation
davidlion Aug 5, 2025
da28956
Merge branch 'main' into WildcardExpression
SharafMohamed Aug 5, 2025
d99ce89
Rename query_parser to wildcard_query_parser.
SharafMohamed Aug 6, 2025
b4d3f1e
Use three-way comparitor in StaticQuery class.
SharafMohamed Aug 6, 2025
ebc51a1
Use three-way comparitor in VariableQueryToken class.
SharafMohamed Aug 6, 2025
b60f628
Use three-way comparitor in QueryInterpretation class.
SharafMohamed Aug 6, 2025
5c34255
Fix format errors.
SharafMohamed Aug 6, 2025
169c5af
Fix tidy errors.
SharafMohamed Aug 6, 2025
fad4702
Update test-static-query-token.cpp for three-way operator.
SharafMohamed Aug 7, 2025
e331aea
Move comparison helpers into its own file.
SharafMohamed Aug 7, 2025
5d34f84
Use a duplicate of the last token so it looks nicer.
SharafMohamed Aug 7, 2025
ecd985f
Update test-variable-query-token.cpp for three-way operator.
SharafMohamed Aug 7, 2025
ea9c9cd
Fix format errors.
SharafMohamed Aug 7, 2025
6574f80
Add operator== for VariableQueryToken.
SharafMohamed Aug 7, 2025
b6a3c3b
Add operator== for QueryInterpretation.
SharafMohamed Aug 7, 2025
79a749d
Update docstring and name of comparison unit tests.
SharafMohamed Aug 7, 2025
cfef204
Rename token variables to end in the word token.
SharafMohamed Aug 7, 2025
bd297e2
Rename token variables to end in the word token for VariableQueryToke…
SharafMohamed Aug 7, 2025
0cb9217
Update test-query-interpretation.cpp for three-way comparitor.
SharafMohamed Aug 7, 2025
d620be6
Fix tidy errors.
SharafMohamed Aug 7, 2025
db7c86a
Fix some naming.
SharafMohamed Aug 7, 2025
410adba
Clean up variable query tokens unit-tests.
SharafMohamed Aug 7, 2025
200b977
Fix interpretation initialization that accidentally used parentheis i…
SharafMohamed Aug 7, 2025
bc31dd7
Add missing include.
SharafMohamed Aug 7, 2025
5d4fd82
Add const.
SharafMohamed Aug 7, 2025
f4e3b63
Clean up static query token's unit-tests.
SharafMohamed Aug 7, 2025
f13251e
Deduplicate test code by moving it into utils file.
SharafMohamed Aug 8, 2025
0020744
Merge branch 'main' into QueryInterpretation
SharafMohamed Aug 8, 2025
47a64f1
Fix docstrings to refer to the query instead of logs.
SharafMohamed Aug 8, 2025
ffe0171
Remove redundancy in descriptions.
SharafMohamed Aug 8, 2025
6e3684f
Prevent accessing back of empty vector.
SharafMohamed Aug 8, 2025
34043bf
Change has_wildcard to contains_wildcard for consistency.
SharafMohamed Aug 8, 2025
1f75b0a
Change has_wildcard to contains_wildcard in more places for consistency.
SharafMohamed Aug 8, 2025
5017b44
Define the operator first in the cpp.
SharafMohamed Aug 8, 2025
e7247b7
Move append_static_token into cpp.
SharafMohamed Aug 8, 2025
2c376ea
Remove extra newline.
SharafMohamed Aug 8, 2025
230f0dc
Added docstring for append_static_token.
SharafMohamed Aug 8, 2025
6318ad3
Remove space.
SharafMohamed Aug 8, 2025
267c26a
Since append_query_interpretations doesn't modify suffix, make it const.
SharafMohamed Aug 8, 2025
b4728e7
Fix tidy warnnings.
SharafMohamed Aug 8, 2025
8efa83a
Fix tidy warnings.
SharafMohamed Aug 8, 2025
dc2e7c9
Move static_assert into the QueryInterpretations header.
SharafMohamed Aug 8, 2025
aaf52f7
Format.
SharafMohamed Aug 8, 2025
8cdb975
Use std::three_way_comparable to simplify concept.
SharafMohamed Aug 8, 2025
90ff8b9
Fix compiler error from previous commit.
SharafMohamed Aug 8, 2025
0124d64
Remove unused headers and using declarations.
SharafMohamed Aug 8, 2025
8e4b484
Use concept to enforce template type in comparison utils.
SharafMohamed Aug 8, 2025
387c4fc
Merge branch 'QueryInterpretation' into WildcardExpression
SharafMohamed Aug 11, 2025
4ac7e7d
Merge branch 'main' into WildcardExpression
SharafMohamed Aug 11, 2025
21f77bf
Complete refactor of the WildcardExpression class.
SharafMohamed Aug 12, 2025
59432d4
Format.
SharafMohamed Aug 12, 2025
c5ea9a8
Remove redundant docstring.
SharafMohamed Aug 12, 2025
be8d931
Add is_well_formed check.
SharafMohamed Aug 12, 2025
7154798
Fix type in WildcardCharacter.hpp file name and add it to CMakeLists.…
SharafMohamed Aug 13, 2025
0bcd93b
Move contains_wildcard into the view generate_regex_string method; Ad…
SharafMohamed Aug 13, 2025
db937eb
Add missing include.
SharafMohamed Aug 13, 2025
12308df
Fix typo in add missing include.
SharafMohamed Aug 13, 2025
56460c9
Fix spelling.
SharafMohamed Aug 13, 2025
b1838d4
Reserve regex string size.
SharafMohamed Aug 13, 2025
248f746
Remove unused header; Remove reference to char.
SharafMohamed Aug 13, 2025
5bb3fa5
Fix tidy warnings.
SharafMohamed Aug 13, 2025
ddd22da
Improve docstring.
SharafMohamed Aug 14, 2025
f2caabe
Remove doc comments from cpp and move them into hpp docstring.
SharafMohamed Aug 14, 2025
c105a29
Make CharType nested within WildcardCharacter.
SharafMohamed Aug 14, 2025
9e67679
Rename CharType to Type; Move Type to be public; Use WildcardCharacte…
SharafMohamed Aug 14, 2025
7a91cf6
Rename m_processed_search_string to m_search_string.
SharafMohamed Aug 14, 2025
2d0fdd9
Rename get_string to get_search_string.
SharafMohamed Aug 14, 2025
e20cf92
Format.
SharafMohamed Aug 14, 2025
7104a62
Add WildcardCharacter unit-tests.
SharafMohamed Aug 14, 2025
0e4b768
Format.
SharafMohamed Aug 14, 2025
9adf7e9
Improve naming of wildcard character test cases.
SharafMohamed Aug 14, 2025
2a039a0
Rename WildcardCharacter to ExpressionCharacter.
SharafMohamed Aug 14, 2025
109c198
Add unit-tests for WildcardExpression.
SharafMohamed Aug 14, 2025
bf4ad21
Format.
SharafMohamed Aug 14, 2025
2f87ae1
Update extend_to_adjacent_wildcards method to run a success flag; Add…
SharafMohamed Aug 14, 2025
e631ec3
Improve consistency in expression unit-tests by checking values in no…
SharafMohamed Aug 14, 2025
89b4bd4
Fix naming in header guards.
SharafMohamed Aug 14, 2025
aee3c62
Rename WildcardExpression to Expression.
SharafMohamed Aug 15, 2025
b8ee95b
Rename WildcardExpressionView to ExpressionView.
SharafMohamed Aug 15, 2025
66b92a7
Format.
SharafMohamed Aug 15, 2025
883b31e
Most of view tests are added now.
SharafMohamed Aug 15, 2025
6fec901
Format.
SharafMohamed Aug 15, 2025
fe2ded3
Tidy.
SharafMohamed Aug 15, 2025
adacad7
Fix typo.
SharafMohamed Aug 15, 2025
91e7049
Grammar.
SharafMohamed Aug 15, 2025
34eefe0
Remove magic number in test.
SharafMohamed Aug 15, 2025
1bdc456
Improve clarity of expression unit-test.
SharafMohamed Aug 15, 2025
0473274
Fix logic error from previous commit in expression unit-test.
SharafMohamed Aug 15, 2025
9fd521f
Add unit tests for a view that starts or ends with greedy wildcards.
SharafMohamed Aug 15, 2025
aeb416c
Add unit tests for extending a view.
SharafMohamed Aug 15, 2025
7868ba6
Format.
SharafMohamed Aug 15, 2025
49ffec7
Add unit-tests to test snapping.
SharafMohamed Aug 15, 2025
61f6baf
Reword snapping to clamping.
SharafMohamed Aug 15, 2025
c9da3f5
Add unit tests for generating regex.
SharafMohamed Aug 15, 2025
73c6336
Format.
SharafMohamed Aug 15, 2025
16c5a33
Fix unit test name.
SharafMohamed Aug 15, 2025
d03f952
Update docstrings.
SharafMohamed Aug 15, 2025
4bc0fa4
Add unit-test for regex meta characters.
SharafMohamed Aug 15, 2025
f86eade
Add test for multi-capture rule.
SharafMohamed Aug 18, 2025
454cbba
Format.
SharafMohamed Aug 18, 2025
1fd5422
Merge branch 'main' into new-log-test
SharafMohamed Aug 18, 2025
d9a99e8
Explicitly construct uncaught strings.
SharafMohamed Aug 18, 2025
2699059
Use format for readability.
SharafMohamed Aug 18, 2025
62cc586
Switch to backslash for multi-line continuation.
SharafMohamed Aug 18, 2025
1b48c67
Add kube test case.
SharafMohamed Aug 18, 2025
6335f1e
Format.
SharafMohamed Aug 18, 2025
099693f
Fix case in docstring.
SharafMohamed Aug 18, 2025
3719fa8
Add Query class.
SharafMohamed Aug 24, 2025
5b6b477
Format.
SharafMohamed Aug 24, 2025
502ade7
Format again.
SharafMohamed Aug 24, 2025
3e88c66
Format again again.
SharafMohamed Aug 24, 2025
cf405e5
Tidy.
SharafMohamed Aug 24, 2025
f7729cf
Add unit tests.
SharafMohamed Aug 25, 2025
ed29b6d
Format.
SharafMohamed Aug 25, 2025
2bc577d
Tidy.
SharafMohamed Aug 25, 2025
cb82695
Fixed unit-tests.
SharafMohamed Aug 25, 2025
9d87248
Fix typo.
SharafMohamed Aug 25, 2025
ad58be5
Fix typo.
SharafMohamed Aug 25, 2025
fbe4e16
Fix typo.
SharafMohamed Aug 25, 2025
bb799d9
Fix docstring.
SharafMohamed Aug 25, 2025
468ab31
Fix typos.
SharafMohamed Aug 25, 2025
af30e98
Fix docstring.
SharafMohamed Aug 25, 2025
da83377
Retype to unsigned char.
SharafMohamed Aug 25, 2025
b4dc1e9
Fix UB.
SharafMohamed Aug 26, 2025
07f82d1
Reserve query string size.
SharafMohamed Aug 26, 2025
9929151
Remove complexity claims.
SharafMohamed Aug 26, 2025
bf678ec
Return const reference to avoid copy.
SharafMohamed Aug 26, 2025
270330b
Fix docstring.
SharafMohamed Aug 26, 2025
9addd94
Remove accidental reference.
SharafMohamed Aug 26, 2025
aab4659
Add some checks that enforce test schema changes to be followed through.
SharafMohamed Aug 26, 2025
8349c76
Remove unneeded braces in set initialization.
SharafMohamed Aug 26, 2025
a772ea2
Use front() in place of [0] and add () around if check.
SharafMohamed Aug 26, 2025
acdab72
Fix docstring.
SharafMohamed Aug 26, 2025
3563e04
Fix docstring.
SharafMohamed Aug 26, 2025
c1a6466
Fix typo.
SharafMohamed Aug 26, 2025
30e2ee3
Switch type back to char and cast when needed.
SharafMohamed Aug 26, 2025
db3b6ce
Merge branch 'main' into new-log-test
SharafMohamed Aug 27, 2025
cf86fdc
Fix docstring.
SharafMohamed Aug 27, 2025
fb9c2d0
Update docstrings to include log type.
SharafMohamed Aug 27, 2025
326b242
Changed expected_event1 to expected_event.
SharafMohamed Aug 27, 2025
7e50b75
Merge branch 'new-log-test' into Query
SharafMohamed Aug 27, 2025
ec9ce1c
Merge branch 'main' into Query
SharafMohamed Aug 27, 2025
11bcee1
Rename m_query_string to m_processed_query_string.
SharafMohamed Aug 27, 2025
fbfa026
Fix escaped star test case.
SharafMohamed Aug 27, 2025
46344b4
Format.
SharafMohamed Aug 27, 2025
80bb255
Improve naming and docstring for is_surrounded_by_delimiters.
SharafMohamed Aug 29, 2025
79d0114
Add multi-variable tests.
SharafMohamed Sep 2, 2025
2091a02
Add missing header; Remove unused var.
SharafMohamed Sep 2, 2025
9eebe55
Format.
SharafMohamed Sep 2, 2025
d094f45
Move TODOs into git issues.
SharafMohamed Sep 2, 2025
9ca5b45
Improve docstring indentation and remove comma for consistency.
SharafMohamed Sep 2, 2025
953e6bb
Move T(a,b) definition to relevent section; Indent equation for bette…
SharafMohamed Sep 2, 2025
f6a0121
Fix typo (0,1] to [0,1).
SharafMohamed Sep 2, 2025
b714676
Update docstring.
SharafMohamed Sep 2, 2025
8057ede
Fix spacing.
SharafMohamed Sep 2, 2025
55cad11
Remove the concept of an escaped wildcard from the docstring.
SharafMohamed Sep 2, 2025
797b376
Move short circuit to top of method.
SharafMohamed Sep 2, 2025
b82aea5
Clarify in docstring interpretation length refers to tokens, query le…
SharafMohamed Sep 3, 2025
73845f8
Discuss empty string case in docstring.
SharafMohamed Sep 3, 2025
9261531
Add docstring for caching.
SharafMohamed Sep 3, 2025
a3bcc6b
Check if escaped character is a delim only, remove check for wildcard…
SharafMohamed Sep 3, 2025
de71fc7
Add bounds check.
SharafMohamed Sep 3, 2025
7365572
Dynamic cast to ptr and check its not null to avoid throwing bad_cast…
SharafMohamed Sep 3, 2025
f739972
Fix typos.
SharafMohamed Sep 3, 2025
f8db5da
Update docstring for clarity.
SharafMohamed Sep 3, 2025
1fd4067
Specify greedy wildcards to be accurate.
SharafMohamed Sep 3, 2025
9e9fb7c
Fix grammar in docstring. Fix consistency of docstring.
SharafMohamed Sep 3, 2025
fe0fe3b
Merge remote-tracking branch 'upstream/main' into pr-152
davidlion Sep 4, 2025
27d87b6
Tweak has_delim comment.
davidlion Sep 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ set(SOURCE_FILES
src/log_surgeon/wildcard_query_parser/ExpressionCharacter.hpp
src/log_surgeon/wildcard_query_parser/ExpressionView.cpp
src/log_surgeon/wildcard_query_parser/ExpressionView.hpp
src/log_surgeon/wildcard_query_parser/Query.cpp
src/log_surgeon/wildcard_query_parser/Query.hpp
src/log_surgeon/wildcard_query_parser/QueryInterpretation.cpp
src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp
src/log_surgeon/wildcard_query_parser/StaticQueryToken.hpp
Expand Down
1 change: 1 addition & 0 deletions docs/doxygen/mainpage.dox
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
* - @ref unit_tests_expression_view "Expression View"
* - @ref unit_tests_nfa "NFA"
* - @ref unit_tests_prefix_tree "Prefix tree"
* - @ref unit_tests_query "Query"
* - @ref unit_tests_query_interpretation "Query Interpretation"
* - @ref unit_tests_regex_ast "Regex AST"
* - @ref unit_tests_register_handler "Register handler"
Expand Down
5 changes: 5 additions & 0 deletions src/log_surgeon/Lexer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,10 @@ class Lexer {

[[nodiscard]] auto get_has_delimiters() const -> bool const& { return m_has_delimiters; }

[[nodiscard]] auto get_delim_table() const -> std::array<bool, cSizeOfByte> const& {
return m_is_delimiter;
}

[[nodiscard]] auto is_delimiter(uint8_t byte) const -> bool const& {
return m_is_delimiter[byte];
}
Expand Down Expand Up @@ -252,6 +256,7 @@ class Lexer {
std::array<bool, cSizeOfByte> m_is_first_char_of_a_variable{false};
std::vector<LexicalRule<TypedNfaState>> m_rules;
uint32_t m_line{0};
// `m_has_delimiters` is cached for performance
bool m_has_delimiters{false};
std::unique_ptr<finite_automata::Dfa<TypedDfaState, TypedNfaState>> m_dfa;
std::optional<uint32_t> m_first_delimiter_pos{std::nullopt};
Expand Down
3 changes: 3 additions & 0 deletions src/log_surgeon/wildcard_query_parser/Expression.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#ifndef LOG_SURGEON_WILDCARD_QUERY_PARSER_EXPRESSION_HPP
#define LOG_SURGEON_WILDCARD_QUERY_PARSER_EXPRESSION_HPP

#include <cstddef>
#include <string>
#include <vector>

Expand All @@ -24,6 +25,8 @@ class Expression {

[[nodiscard]] auto get_search_string() const -> std::string const& { return m_search_string; }

[[nodiscard]] auto length() const -> size_t { return m_search_string.size(); }

private:
std::vector<ExpressionCharacter> m_chars;
std::string m_search_string;
Expand Down
16 changes: 16 additions & 0 deletions src/log_surgeon/wildcard_query_parser/ExpressionCharacter.hpp
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
#ifndef LOG_SURGEON_WILDCARD_QUERY_PARSER_EXPRESSION_CHARACTER_HPP
#define LOG_SURGEON_WILDCARD_QUERY_PARSER_EXPRESSION_CHARACTER_HPP

#include <array>
#include <cstdint>

#include <log_surgeon/Constants.hpp>

namespace log_surgeon::wildcard_query_parser {
class ExpressionCharacter {
public:
Expand All @@ -23,6 +26,19 @@ class ExpressionCharacter {
return Type::NonGreedyWildcard == m_type;
}

[[nodiscard]] auto is_wildcard() const -> bool {
return Type::GreedyWildcard == m_type || Type::NonGreedyWildcard == m_type;
}

[[nodiscard]] auto is_delim(std::array<bool, cSizeOfByte> const& delim_table) const -> bool {
return delim_table.at(static_cast<uint8_t>(m_value));
}

[[nodiscard]] auto is_delim_or_wildcard(std::array<bool, cSizeOfByte> const& delim_table) const
-> bool {
return is_delim(delim_table) || is_wildcard();
}

[[nodiscard]] auto is_escape() const -> bool { return Type::Escape == m_type; }

private:
Expand Down
35 changes: 35 additions & 0 deletions src/log_surgeon/wildcard_query_parser/ExpressionView.cpp
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
#include "ExpressionView.hpp"

#include <algorithm>
#include <array>
#include <cstddef>
#include <span>
#include <string>
#include <string_view>
#include <utility>

#include <log_surgeon/Constants.hpp>
#include <log_surgeon/SchemaParser.hpp>
#include <log_surgeon/wildcard_query_parser/Expression.hpp>

Expand Down Expand Up @@ -42,6 +44,39 @@ auto ExpressionView::extend_to_adjacent_greedy_wildcards() const
return {is_extended, wildcard_expression_view};
}

auto ExpressionView::is_surrounded_by_delims(std::array<bool, cSizeOfByte> const& delim_table) const
-> bool {
auto const [begin_idx, end_idx]{get_indices()};

bool has_left_boundary{false};
if (0 == begin_idx) {
has_left_boundary = true;
} else {
auto const& preceding_char{m_expression->get_chars()[begin_idx - 1]};
has_left_boundary = preceding_char.is_delim_or_wildcard(delim_table)
|| (false == m_chars.empty() && m_chars.front().is_greedy_wildcard());
}

bool has_right_boundary{false};
if (m_expression->length() == end_idx) {
has_right_boundary = true;
} else {
auto const& succeeding_char{m_expression->get_chars()[end_idx]};
if (succeeding_char.is_escape()) {
if (m_expression->length() > end_idx + 1) {
auto const& logical_succeeding_char{m_expression->get_chars()[end_idx + 1]};
has_right_boundary = logical_succeeding_char.is_delim(delim_table);
}
} else {
has_right_boundary = succeeding_char.is_delim_or_wildcard(delim_table);
}
has_right_boundary = has_right_boundary
|| (false == m_chars.empty() && m_chars.back().is_greedy_wildcard());
}

return has_left_boundary && has_right_boundary;
}

auto ExpressionView::is_well_formed() const -> bool {
if (m_chars.empty()) {
return true;
Expand Down
29 changes: 29 additions & 0 deletions src/log_surgeon/wildcard_query_parser/ExpressionView.hpp
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
#ifndef LOG_SURGEON_WILDCARD_QUERY_PARSER_EXPRESSION_VIEW_HPP
#define LOG_SURGEON_WILDCARD_QUERY_PARSER_EXPRESSION_VIEW_HPP

#include <array>
#include <cstddef>
#include <span>
#include <string>
#include <string_view>
#include <utility>

#include <log_surgeon/Constants.hpp>
#include <log_surgeon/wildcard_query_parser/Expression.hpp>
#include <log_surgeon/wildcard_query_parser/ExpressionCharacter.hpp>

Expand Down Expand Up @@ -41,6 +43,33 @@ class ExpressionView {
&& (m_chars[0].is_greedy_wildcard() || m_chars.back().is_greedy_wildcard());
}

/**
* Checks whether the view is surrounded by delimiters. The start and end of an expression are
* always considered a delimiter. A greedy wildcard may represent a string that includes a
* flanking delimiter.
*
* A view is considered bounded if both its left and right boundary satisfy certain
* requirements.
*
* Left boundary:
* - The view is at the start of the expression, or
* - The first character is a greedy wildcard (if non-empty), or
* - Immediately left of the view is a delimiter or wildcard.
*
* Right boundary:
* - The view is at the end of the expression, or
* - The last character is a greedy wildcard (if non-empty), or
* - Immediately right of the view is a delimiter or wildcard, or
* - Immediately right of the view is an escape character and the character to its
* immediate right is a delimiter.
*
* @param delim_table Table indicating for each character whether or not it is a delimiter.
* @return true when both left and right boundaries qualify; false otherwise.
*/
[[nodiscard]] auto is_surrounded_by_delims(
std::array<bool, cSizeOfByte> const& delim_table
) const -> bool;

/**
* Checks whether this `ExpressionView` represents a well-formed subrange.
*
Expand Down
176 changes: 176 additions & 0 deletions src/log_surgeon/wildcard_query_parser/Query.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
#include "Query.hpp"

#include <cstddef>
#include <cstdint>
#include <iterator>
#include <set>
#include <string>
#include <utility>
#include <vector>

#include <log_surgeon/finite_automata/Dfa.hpp>
#include <log_surgeon/finite_automata/DfaState.hpp>
#include <log_surgeon/finite_automata/Nfa.hpp>
#include <log_surgeon/finite_automata/NfaState.hpp>
#include <log_surgeon/Lexer.hpp>
#include <log_surgeon/LexicalRule.hpp>
#include <log_surgeon/parser_types.hpp>
#include <log_surgeon/Schema.hpp>
#include <log_surgeon/SchemaParser.hpp>
#include <log_surgeon/wildcard_query_parser/Expression.hpp>
#include <log_surgeon/wildcard_query_parser/ExpressionView.hpp>
#include <log_surgeon/wildcard_query_parser/QueryInterpretation.hpp>

using log_surgeon::finite_automata::ByteDfaState;
using log_surgeon::finite_automata::ByteNfaState;
using log_surgeon::lexers::ByteLexer;
using std::set;
using std::string;
using std::vector;

using ByteDfa = log_surgeon::finite_automata::Dfa<ByteDfaState, ByteNfaState>;
using ByteLexicalRule = log_surgeon::LexicalRule<ByteNfaState>;
using ByteNfa = log_surgeon::finite_automata::Nfa<ByteNfaState>;

namespace log_surgeon::wildcard_query_parser {
Query::Query(string const& query_string) {
m_processed_query_string.reserve(query_string.size());
Expression const expression(query_string);

bool prev_is_escape{false};
string unhandled_wildcard_sequence;
bool unhandled_wildcard_sequence_contains_greedy_wildcard{false};
for (auto c : expression.get_chars()) {
if (false == unhandled_wildcard_sequence.empty() && false == c.is_wildcard()) {
if (unhandled_wildcard_sequence_contains_greedy_wildcard) {
m_processed_query_string.push_back('*');
} else {
m_processed_query_string += unhandled_wildcard_sequence;
}
unhandled_wildcard_sequence.clear();
unhandled_wildcard_sequence_contains_greedy_wildcard = false;
}

if (prev_is_escape) {
m_processed_query_string.push_back(c.value());
prev_is_escape = false;
} else if (c.is_escape()) {
prev_is_escape = true;
m_processed_query_string.push_back(c.value());
} else if (c.is_greedy_wildcard()) {
unhandled_wildcard_sequence.push_back(c.value());
unhandled_wildcard_sequence_contains_greedy_wildcard = true;
} else if (c.is_non_greedy_wildcard()) {
unhandled_wildcard_sequence.push_back(c.value());
} else {
m_processed_query_string.push_back(c.value());
}
}
if (false == unhandled_wildcard_sequence.empty()) {
if (unhandled_wildcard_sequence_contains_greedy_wildcard) {
m_processed_query_string.push_back('*');
} else {
m_processed_query_string += unhandled_wildcard_sequence;
}
}
}

auto Query::get_all_multi_token_interpretations(ByteLexer const& lexer) const
-> std::set<QueryInterpretation> {
if (m_processed_query_string.empty()) {
return {};
}

Expression const expression{m_processed_query_string};
vector<set<QueryInterpretation>> query_interpretations(expression.length());
for (size_t end_idx = 1; end_idx <= expression.length(); ++end_idx) {
for (size_t begin_idx = 0; begin_idx < end_idx; ++begin_idx) {
ExpressionView const expression_view{expression, begin_idx, end_idx};
if ("*" != expression_view.get_search_string()
&& expression_view.starts_or_ends_with_greedy_wildcard())
{
continue;
}

auto const extended_view{expression_view.extend_to_adjacent_greedy_wildcards().second};
auto const single_token_interpretations{
get_all_single_token_interpretations(extended_view, lexer)
};
if (single_token_interpretations.empty()) {
continue;
}

if (begin_idx == 0) {
query_interpretations[end_idx - 1].insert(
std::make_move_iterator(single_token_interpretations.begin()),
std::make_move_iterator(single_token_interpretations.end())
);
} else {
for (auto const& prefix : query_interpretations[begin_idx - 1]) {
for (auto const& suffix : single_token_interpretations) {
QueryInterpretation combined{prefix};
combined.append_query_interpretation(suffix);
query_interpretations[end_idx - 1].insert(std::move(combined));
}
}
}
}
}
return query_interpretations.back();
}

auto Query::get_all_single_token_interpretations(
ExpressionView const& expression_view,
ByteLexer const& lexer
) -> std::vector<QueryInterpretation> {
vector<QueryInterpretation> interpretations;

if (false == expression_view.is_well_formed()) {
return interpretations;
}
if ("*" == expression_view.get_search_string()) {
interpretations.emplace_back("*");
return interpretations;
}
if (false == expression_view.is_surrounded_by_delims(lexer.get_delim_table())) {
interpretations.emplace_back(string{expression_view.get_search_string()});
return interpretations;
}

auto const [regex_string, contains_wildcard]{expression_view.generate_regex_string()};

auto const matching_var_type_ids{get_matching_variable_types(regex_string, lexer)};
if (matching_var_type_ids.empty() || contains_wildcard) {
interpretations.emplace_back(string{expression_view.get_search_string()});
}

for (auto const variable_type_id : matching_var_type_ids) {
interpretations.emplace_back(
variable_type_id,
string{expression_view.get_search_string()},
contains_wildcard
);
if (false == contains_wildcard) {
break;
}
}
return interpretations;
}

auto Query::get_matching_variable_types(string const& regex_string, ByteLexer const& lexer)
-> set<uint32_t> {
NonTerminal::m_next_children_start = 0;

Schema schema;
schema.add_variable("search:" + regex_string, -1);
auto const schema_ast = schema.release_schema_ast_ptr();
auto& rule_ast = dynamic_cast<SchemaVarAST&>(*schema_ast->m_schema_vars[0]);
vector<ByteLexicalRule> rules;
rules.emplace_back(0, std::move(rule_ast.m_regex_ptr));
ByteNfa const nfa{rules};
ByteDfa const dfa{nfa};

auto var_types = lexer.get_dfa()->get_intersect(&dfa);
return var_types;
}
} // namespace log_surgeon::wildcard_query_parser
Loading
Loading