-
Notifications
You must be signed in to change notification settings - Fork 84
feat(log-converter): Add log-converter binary which converts unstructured text logs into KV-IR; Update log-surgeon to yscope/log-surgeon@840f262. #1460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
402fda4
Add stub for log converter binary.
gibber9809 5e06d36
Implement stub which uses log-surgeon to extract timestamps from ever…
gibber9809 c21bdd8
Fix stub implementation.
gibber9809 9b3badc
Fix move constructor and move assignment operator in clp_s::FileWriter.
gibber9809 818826e
Apply lint fix to FileWriter
gibber9809 46e2d21
Convert parsed logs into kv-ir.
gibber9809 675bd44
Add output-dir command line argument.
gibber9809 83d7102
Update timestamp schema to accept month names
gibber9809 57714f3
Separate out classes into different files.
gibber9809 68467a4
Add docstrings to classes and methods.
gibber9809 46e5d92
Add log-converter to core build.
gibber9809 413b1f0
Fix clang-tidy errors in command line arguments helper class.
gibber9809 7f3a2e0
Fix almost all clang-tidy warnings in LogSerializer
gibber9809 2e7ae46
Fix clang-tidy warnings in LogConverter
gibber9809 99b66b3
Fix clang-tidy errors in log_converter
gibber9809 3c0d369
Add missing newline
gibber9809 26449d8
Fix bug where timestamp isn't parsed in first log message.
gibber9809 49452ce
Address rabbit comments.
gibber9809 55fd3bf
Apply suggestions from code review
gibber9809 1a6f780
Update taskfile to include log-converter in package build.
gibber9809 1729a75
Apply code review comments.
gibber9809 871b16a
Update test for clp core binaries.
gibber9809 7171829
Merge remote-tracking branch 'upstream/main' into log-converter
gibber9809 f0a5fb7
Fix up nodiscards
gibber9809 e4cc3db
Fix bugs introduced during refactoring.
gibber9809 daa81ba
Merge branch 'main' into log-converter
gibber9809 b12ef31
Merge branch 'main' into log-converter
kirkrodrigues b3e7c4b
Apply suggestions from code review
gibber9809 bae33e0
Address review comments.
gibber9809 3d3bc4c
Merge remote-tracking branch 'upstream/main' into log-converter
gibber9809 3c784c3
Remove unnecessary includes.
gibber9809 babad69
Apply suggestions from code review
gibber9809 1e1de58
Address review comments
gibber9809 a3c899e
Fix error introduced in rebase
gibber9809 0cafbcf
Address rabbit comments
gibber9809 b598862
Merge remote-tracking branch 'upstream/main' into log-converter
gibber9809 b1cfead
Update log-surgeon dependency to 840f262 pull in fix.
gibber9809 bc156d5
Update expected log-surgeon error message in unit tests.
gibber9809 9d4b53c
Apply suggestions from code review
gibber9809 653e451
Address review comments.
gibber9809 a3e28c9
Fix docstrings.
gibber9809 2488622
Rename files-from argument to inputs-from.
gibber9809 e5274db
Improve comment in LogSerializer to indicate why we serialize timesta…
gibber9809 1d17501
Merge branch 'main' into log-converter
gibber9809 dc06942
Nit clang-tidy fix.
LinZhihao-723 1778251
Small docstring fix.
LinZhihao-723 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| add_subdirectory(indexer) | ||
| add_subdirectory(log_converter) | ||
| add_subdirectory(search) | ||
|
|
||
| set( | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| set( | ||
| CLP_S_LOG_CONVERTER_SOURCES | ||
| CommandLineArguments.cpp | ||
| CommandLineArguments.hpp | ||
| LogConverter.cpp | ||
| LogConverter.hpp | ||
| LogSerializer.cpp | ||
| LogSerializer.hpp | ||
| ) | ||
|
|
||
| if(CLP_BUILD_EXECUTABLES) | ||
| add_executable( | ||
| log-converter | ||
| log_converter.cpp | ||
| ${CLP_S_LOG_CONVERTER_SOURCES} | ||
| ) | ||
| target_compile_features(log-converter PRIVATE cxx_std_20) | ||
| target_link_libraries( | ||
| log-converter | ||
| PRIVATE | ||
| Boost::program_options | ||
| clp_s::clp_dependencies | ||
| clp_s::io | ||
| fmt::fmt | ||
| log_surgeon::log_surgeon | ||
| msgpack-cxx | ||
| nlohmann_json::nlohmann_json | ||
| spdlog::spdlog | ||
| ystdlib::containers | ||
| ystdlib::error_handling | ||
| ) | ||
| set_target_properties( | ||
| log-converter | ||
| PROPERTIES | ||
| RUNTIME_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}" | ||
| ) | ||
| endif() |
204 changes: 204 additions & 0 deletions
204
components/core/src/clp_s/log_converter/CommandLineArguments.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,204 @@ | ||
| #include "CommandLineArguments.hpp" | ||
|
|
||
| #include <exception> | ||
| #include <iostream> | ||
| #include <stdexcept> | ||
| #include <string> | ||
| #include <string_view> | ||
| #include <vector> | ||
|
|
||
| #include <boost/program_options/options_description.hpp> | ||
| #include <boost/program_options/parsers.hpp> | ||
| #include <boost/program_options/positional_options.hpp> | ||
| #include <boost/program_options/value_semantic.hpp> | ||
| #include <boost/program_options/variables_map.hpp> | ||
| #include <fmt/format.h> | ||
| #include <spdlog/spdlog.h> | ||
|
|
||
| #include "../ErrorCode.hpp" | ||
| #include "../FileReader.hpp" | ||
| #include "../InputConfig.hpp" | ||
|
|
||
| namespace po = boost::program_options; | ||
|
|
||
| namespace clp_s::log_converter { | ||
| namespace { | ||
| // Authorization method constants | ||
| constexpr std::string_view cNoAuth{"none"}; | ||
| constexpr std::string_view cS3Auth{"s3"}; | ||
|
|
||
| /** | ||
| * Reads and returns a list of paths from a file containing newline-delimited paths. | ||
| * @param input_path_list_file_path Path to the file containing the list of paths. | ||
| * @param path_destination The vector that the paths are pushed into. | ||
| * @return Whether paths were read successfully or not. | ||
| */ | ||
| [[nodiscard]] auto read_paths_from_file( | ||
| std::string const& input_path_list_file_path, | ||
| std::vector<std::string>& path_destination | ||
| ) -> bool; | ||
|
|
||
| /** | ||
| * Validates and populates network authorization options. | ||
| * @param auth_method | ||
| * @param auth | ||
| * @throws std::invalid_argument if the authorization option is invalid | ||
| */ | ||
| void validate_network_auth(std::string_view auth_method, NetworkAuthOption& auth); | ||
|
|
||
| auto read_paths_from_file( | ||
| std::string const& input_path_list_file_path, | ||
| std::vector<std::string>& path_destination | ||
| ) -> bool { | ||
| FileReader reader; | ||
| auto error_code = reader.try_open(input_path_list_file_path); | ||
| if (ErrorCodeFileNotFound == error_code) { | ||
| SPDLOG_ERROR( | ||
| "Failed to open input path list file {} - file not found", | ||
| input_path_list_file_path | ||
| ); | ||
| return false; | ||
| } | ||
| if (ErrorCodeSuccess != error_code) { | ||
| SPDLOG_ERROR("Error opening input path list file {}", input_path_list_file_path); | ||
| return false; | ||
| } | ||
|
|
||
| std::string line; | ||
| while (true) { | ||
| error_code = reader.try_read_to_delimiter('\n', false, false, line); | ||
| if (ErrorCodeSuccess != error_code) { | ||
| break; | ||
| } | ||
| if (false == line.empty()) { | ||
| path_destination.push_back(line); | ||
| } | ||
| } | ||
|
|
||
| if (ErrorCodeEndOfFile != error_code) { | ||
| return false; | ||
| } | ||
| return true; | ||
| } | ||
|
|
||
| void validate_network_auth(std::string_view auth_method, NetworkAuthOption& auth) { | ||
| if (cS3Auth == auth_method) { | ||
| auth.method = AuthMethod::S3PresignedUrlV4; | ||
| } else if (cNoAuth != auth_method) { | ||
| throw std::invalid_argument(fmt::format("Invalid authentication type \"{}\"", auth_method)); | ||
| } | ||
| } | ||
| } // namespace | ||
|
|
||
| auto CommandLineArguments::parse_arguments(int argc, char const** argv) | ||
| -> CommandLineArguments::ParsingResult { | ||
| if (1 == argc) { | ||
| print_basic_usage(); | ||
| return ParsingResult::Failure; | ||
| } | ||
|
|
||
| try { | ||
| po::variables_map parsed_command_line_options; | ||
|
|
||
| po::options_description general_options("General options"); | ||
| general_options.add_options()("help,h", "Print help"); | ||
|
|
||
| po::options_description conversion_positional_options; | ||
| std::vector<std::string> input_paths; | ||
| // clang-format off | ||
| conversion_positional_options.add_options()( | ||
| "input-paths", | ||
| po::value<std::vector<std::string>>(&input_paths)->value_name("PATHS"), | ||
| "input paths" | ||
| ); | ||
| // clang-format on | ||
LinZhihao-723 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| po::options_description conversion_options("Conversion options"); | ||
| std::string input_path_list_file_path; | ||
| std::string auth{cNoAuth}; | ||
| // clang-format off | ||
| conversion_options.add_options()( | ||
| "inputs-from,f", | ||
| po::value<std::string>(&input_path_list_file_path) | ||
| ->value_name("INPUTS_FILE") | ||
| ->default_value(input_path_list_file_path), | ||
| "Convert inputs specified in INPUTS_FILE." | ||
| )( | ||
| "output-dir", | ||
| po::value<std::string>(&m_output_dir) | ||
| ->value_name("OUTPUT_DIR") | ||
| ->default_value(m_output_dir), | ||
| "Output directory for converted inputs." | ||
| )( | ||
| "auth", | ||
| po::value<std::string>(&auth) | ||
| ->value_name("AUTH_METHOD") | ||
| ->default_value(auth), | ||
| "Type of authentication required for network requests (s3 | none). Authentication" | ||
| " with s3 requires the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment" | ||
| " variables, and optionally the AWS_SESSION_TOKEN environment variable." | ||
| ); | ||
| // clang-format on | ||
|
|
||
| po::positional_options_description positional_options; | ||
| positional_options.add("input-paths", -1); | ||
|
|
||
| po::options_description all_conversion_options; | ||
| all_conversion_options.add(general_options); | ||
| all_conversion_options.add(conversion_options); | ||
| all_conversion_options.add(conversion_positional_options); | ||
|
|
||
| po::store( | ||
| po::command_line_parser(argc, argv) | ||
| .options(all_conversion_options) | ||
| .positional(positional_options) | ||
| .run(), | ||
| parsed_command_line_options | ||
| ); | ||
| po::notify(parsed_command_line_options); | ||
|
|
||
| if (parsed_command_line_options.contains("help")) { | ||
| if (argc > 2) { | ||
| SPDLOG_WARN("Ignoring all options besides --help."); | ||
| } | ||
|
|
||
| print_basic_usage(); | ||
| po::options_description visible_options; | ||
| visible_options.add(general_options); | ||
| visible_options.add(conversion_options); | ||
| std::cerr << visible_options << '\n'; | ||
| return ParsingResult::InfoCommand; | ||
| } | ||
|
|
||
| if (false == input_path_list_file_path.empty()) { | ||
| if (false == read_paths_from_file(input_path_list_file_path, input_paths)) { | ||
| SPDLOG_ERROR("Failed to read paths from {}", input_path_list_file_path); | ||
| return ParsingResult::Failure; | ||
| } | ||
| } | ||
|
|
||
| for (auto const& path : input_paths) { | ||
| if (false == get_input_files_for_raw_path(path, m_input_paths)) { | ||
| throw std::invalid_argument(fmt::format("Invalid input path \"{}\".", path)); | ||
| } | ||
| } | ||
|
|
||
| if (m_input_paths.empty()) { | ||
| throw std::invalid_argument("No input paths specified."); | ||
| } | ||
gibber9809 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| validate_network_auth(auth, m_network_auth); | ||
| } catch (std::exception& e) { | ||
| SPDLOG_ERROR("{}", e.what()); | ||
| print_basic_usage(); | ||
| std::cerr << "Try " << get_program_name() << " --help for detailed usage instructions\n"; | ||
| return ParsingResult::Failure; | ||
| } | ||
|
|
||
| return ParsingResult::Success; | ||
| } | ||
|
|
||
| void CommandLineArguments::print_basic_usage() const { | ||
| std::cerr << "Usage: " << get_program_name() << " [INPUT_PATHS] [OPTIONS]\n"; | ||
| } | ||
| } // namespace clp_s::log_converter | ||
49 changes: 49 additions & 0 deletions
49
components/core/src/clp_s/log_converter/CommandLineArguments.hpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| #ifndef CLP_S_COMMANDLINEARGUMENTS_HPP | ||
| #define CLP_S_COMMANDLINEARGUMENTS_HPP | ||
|
|
||
| #include <cstdint> | ||
| #include <string> | ||
| #include <string_view> | ||
| #include <vector> | ||
|
|
||
| #include "../InputConfig.hpp" | ||
|
|
||
| namespace clp_s::log_converter { | ||
| class CommandLineArguments { | ||
| public: | ||
| // Types | ||
| enum class ParsingResult : uint8_t { | ||
| Success = 0, | ||
| InfoCommand, | ||
| Failure | ||
| }; | ||
|
|
||
| // Constructors | ||
| explicit CommandLineArguments(std::string_view program_name) : m_program_name{program_name} {} | ||
|
|
||
| // Methods | ||
| [[nodiscard]] auto parse_arguments(int argc, char const** argv) -> ParsingResult; | ||
|
|
||
| [[nodiscard]] auto get_program_name() const -> std::string const& { return m_program_name; } | ||
|
|
||
| [[nodiscard]] auto get_input_paths() const -> std::vector<Path> const& { return m_input_paths; } | ||
|
|
||
| [[nodiscard]] auto get_network_auth() const -> NetworkAuthOption const& { | ||
| return m_network_auth; | ||
| } | ||
|
|
||
| [[nodiscard]] auto get_output_dir() const -> std::string const& { return m_output_dir; } | ||
|
|
||
| private: | ||
| // Methods | ||
| void print_basic_usage() const; | ||
|
|
||
| // Variables | ||
| std::string m_program_name; | ||
| std::vector<Path> m_input_paths; | ||
| NetworkAuthOption m_network_auth{}; | ||
| std::string m_output_dir{"./"}; | ||
| }; | ||
| } // namespace clp_s::log_converter | ||
|
|
||
| #endif // CLP_S_COMMANDLINEARGUMENTS_HPP |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.