Skip to content

Conversation

@gibber9809
Copy link
Contributor

@gibber9809 gibber9809 commented Oct 21, 2025

Description

This PR adds a new log-converter binary which can be used to convert unstructured logtext into semi-structured KV-IR. This binary should allow us to offer some support for the use-case in #1308.

The implementation uses log-surgeon to separate the timestamp and logtext from unstructured text logs, and serialize these fields into KV-IR for each message.

The utility currently has several limitations:

  • It only accepts unstructured text inputs
  • It uses a predefined schema for timestamp patterns, which should cover most common cases but may not always work
  • It does not allow extracting other fields (e.g., log-level) from the unstructured text
  • It does not preserve the original filename

Currently the original file name is stored in the metadata of the generated KV-IR as "original_file". clp-s will ingest this metadata and store it in the range-index, but the _file_name field in the range index will still correspond to the file name of the generated KV-IR. Ideally we would add support for forcing clp-s to use a specific _file_name, e.g., by allowing metadata keys from KV-IR to override default properties inserted by clp-s such as _file_name.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Manually validated that log-converter can produce accurate KV-IR for unstructured log-text files
  • Manually validated that cli options for specifying inputs from the command line and input files work as expected
  • Manually validated that log-converter can convert content stored on S3

Summary by CodeRabbit

  • New Features

    • Added a log‑converter CLI with full command‑line parsing, support for file‑list inputs, network auth options, output directory selection, timestamp‑aware parsing and structured serialized output.
  • Chores

    • Integrated log‑converter into core build, packaging and runtime artifacts.
  • Maintenance

    • File writer and serializer made non‑copyable and move‑safe; added buffered, streaming conversion with robust error handling.
  • Tests

    • Integration tests updated to require the log‑converter.
  • Documentation

    • Updated log‑surgeon dependency reference.

@gibber9809 gibber9809 requested review from a team and wraymo as code owners October 21, 2025 17:32
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

Walkthrough

Adds a new log-converter executable and its sources (CLI, converter, serializer, runtime entry); makes FileWriter movable-only; updates CMake, packaging, dependency metadata, docs, and integration-test configuration to include the new binary and its build/runtime outputs.

Changes

Cohort / File(s) Summary
Build configuration
components/core/src/clp_s/CMakeLists.txt, components/core/src/clp_s/log_converter/CMakeLists.txt, taskfile.yaml
Adds log-converter subdirectory and a log-converter executable (guarded by CLP_BUILD_EXECUTABLES); defines CLP_S_LOG_CONVERTER_SOURCES, sets C++20, links required targets, sets runtime output dir to ${PROJECT_BINARY_DIR}, and includes the binary in packaging/core build targets.
FileWriter move semantics
components/core/src/clp_s/FileWriter.hpp
Deletes copy constructor and copy-assignment; adds move constructor and move-assignment using std::exchange and includes <utility>; public API otherwise unchanged.
Command-line parsing
components/core/src/clp_s/log_converter/CommandLineArguments.hpp, components/core/src/clp_s/log_converter/CommandLineArguments.cpp
Adds clp_s::log_converter::CommandLineArguments with ParsingResult enum and parse_arguments(...) implementation using Boost.Program_options; supports --inputs-from, --output-dir, --network-auth, positional inputs, --help, reading input lists from file, input validation, and error reporting.
Conversion engine
components/core/src/clp_s/log_converter/LogConverter.hpp, components/core/src/clp_s/log_converter/LogConverter.cpp
Adds LogConverter with resizable buffer (64 KiB default, up to 64 MiB); implements grow_buffer_if_full, compact_buffer, refill_buffer, and convert_file to read via ReaderInterface, parse events, and forward messages (with optional timestamps) to the serializer; propagates read/parse/serialize errors.
Serialization
components/core/src/clp_s/log_converter/LogSerializer.hpp, components/core/src/clp_s/log_converter/LogSerializer.cpp
Adds non-copyable, movable LogSerializer with static create(...) factory that generates UUID .clp filenames and JSON metadata; provides add_message overloads (with/without timestamp), internal IR buffer flush logic, close(), and buffer size constraints.
Executable entry & orchestration
components/core/src/clp_s/log_converter/log_converter.cpp
Adds main that initializes logging, parses CLI, creates output directory, opens/detects readers (local/network), logs CURL errors for network readers, iterates input paths and invokes LogConverter::convert_file, and returns exit codes for Success/InfoCommand/Failure.
Integration / tests config
integration-tests/tests/utils/config.py
Adds log-converter to required CLP core binaries list used by integration tests.
Dependency metadata & docs
taskfiles/deps/main.yaml, docs/src/dev-docs/components-core/index.md
Updates log_surgeon tarball URL and SHA256 to a new commit; updates docs to reflect the new dependency commit.
Tests
components/core/tests/test-ParserWithUserSchema.cpp
Adjusts an expected parser error message token name in the test assertion.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Main as log-converter (main)
    participant CLI as CommandLineArguments
    participant Converter as LogConverter
    participant Reader as ReaderInterface
    participant Serializer as LogSerializer
    participant Writer as FileWriter

    User->>Main: invoke log-converter
    Main->>CLI: parse_arguments(argc, argv)
    CLI-->>Main: ParsingResult (Success / InfoCommand / Failure)

    alt Success
        Main->>Converter: instantiate
        loop per input path
            Main->>Reader: open reader for path
            Reader-->>Main: reader instance / error
            alt reader OK
                Main->>Converter: convert_file(path, reader, out_dir)
                loop read/parse cycle
                    Converter->>Reader: read into buffer
                    Reader-->>Converter: bytes / EOF / error
                    Converter->>Converter: parse events
                    alt event has timestamp
                        Converter->>Serializer: add_message(timestamp, message)
                    else
                        Converter->>Serializer: add_message(message)
                    end
                    Serializer->>Writer: write / flush as needed
                end
                Converter->>Serializer: close()
            else reader error
                Main->>Main: log and continue/abort
            end
        end
    else InfoCommand
        Main->>User: print usage/info
    else Failure
        Main->>User: print error and exit non-zero
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Files that may need extra attention:
    • components/core/src/clp_s/log_converter/LogConverter.cpp — buffer lifecycle, resizing, parser offset handling, EOF and error propagation.
    • components/core/src/clp_s/log_converter/LogSerializer.cpp — UUID filename generation, metadata layout, IR buffer flush and FileWriter usage.
    • components/core/src/clp_s/log_converter/log_converter.cpp — reader type detection, CURL error detection/logging, exit-code semantics.
    • components/core/src/clp_s/FileWriter.hpp — move semantics to ensure single ownership and no double-close.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat(log-converter): Add log-converter binary which converts unstructured text logs into KV-IR; Update log-surgeon to yscope/log-surgeon@840f262." is directly aligned with the changeset. The raw summary confirms the addition of a new log-converter binary with supporting components (CommandLineArguments, LogConverter, LogSerializer classes), CMakeLists.txt configuration, and executable integration, along with the log-surgeon dependency update from a722d07 to 840f262. The title is specific and clear enough that a teammate scanning history would understand the primary contributions: the new log-converter binary for converting unstructured logs to KV-IR format and the dependency update. While the title is somewhat lengthy, it accurately captures both major changes without using vague terminology.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc06942 and 1778251.

📒 Files selected for processing (1)
  • components/core/src/clp_s/log_converter/LogSerializer.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/log_converter/LogSerializer.hpp
🧬 Code graph analysis (1)
components/core/src/clp_s/log_converter/LogSerializer.hpp (1)
components/core/src/clp_s/FileWriter.hpp (2)
  • writer (42-49)
  • writer (42-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
taskfile.yaml (1)

155-164: Add log-converter to the rsync list to package the built binary.

The verification confirms the issue: log-converter is built (listed in the generates section at line 216) but is not included in the rsync command at lines 155-164. The suggested fix is correct—add the binary to the rsync list alongside other core components.

        rsync -a
         "{{.G_CORE_COMPONENT_BUILD_DIR}}/clg"
         "{{.G_CORE_COMPONENT_BUILD_DIR}}/clo"
         "{{.G_CORE_COMPONENT_BUILD_DIR}}/clp"
         "{{.G_CORE_COMPONENT_BUILD_DIR}}/clp-s"
         "{{.G_CORE_COMPONENT_BUILD_DIR}}/indexer"
+        "{{.G_CORE_COMPONENT_BUILD_DIR}}/log-converter"
         "{{.G_CORE_COMPONENT_BUILD_DIR}}/reducer-server"
         "{{.G_SPIDER_BUILD_DIR}}/spider-build/src/spider/spider_scheduler"
         "{{.G_SPIDER_BUILD_DIR}}/spider-build/src/spider/spider_worker"
         "{{.OUTPUT_DIR}}/bin/"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94b501b and 3c0d369.

📒 Files selected for processing (11)
  • components/core/src/clp_s/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/FileWriter.hpp (1 hunks)
  • components/core/src/clp_s/log_converter/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/log_converter/CommandLineArguments.cpp (1 hunks)
  • components/core/src/clp_s/log_converter/CommandLineArguments.hpp (1 hunks)
  • components/core/src/clp_s/log_converter/LogConverter.cpp (1 hunks)
  • components/core/src/clp_s/log_converter/LogConverter.hpp (1 hunks)
  • components/core/src/clp_s/log_converter/LogSerializer.cpp (1 hunks)
  • components/core/src/clp_s/log_converter/LogSerializer.hpp (1 hunks)
  • components/core/src/clp_s/log_converter/log_converter.cpp (1 hunks)
  • taskfile.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/log_converter/LogConverter.hpp
  • components/core/src/clp_s/log_converter/CommandLineArguments.hpp
  • components/core/src/clp_s/log_converter/log_converter.cpp
  • components/core/src/clp_s/log_converter/LogConverter.cpp
  • components/core/src/clp_s/FileWriter.hpp
  • components/core/src/clp_s/log_converter/LogSerializer.cpp
  • components/core/src/clp_s/log_converter/LogSerializer.hpp
  • components/core/src/clp_s/log_converter/CommandLineArguments.cpp
🧬 Code graph analysis (6)
components/core/src/clp_s/log_converter/LogConverter.hpp (1)
components/core/src/clp_s/log_converter/LogSerializer.hpp (1)
  • output_dir (40-41)
components/core/src/clp_s/log_converter/CommandLineArguments.hpp (2)
components/core/src/clp_s/log_converter/CommandLineArguments.cpp (1)
  • nodiscard (36-69)
components/core/src/clp_s/CommandLineArguments.hpp (1)
  • m_network_auth (51-51)
components/core/src/clp_s/log_converter/log_converter.cpp (2)
components/core/src/clp_s/log_converter/CommandLineArguments.hpp (2)
  • CommandLineArguments (22-22)
  • CommandLineArguments (22-22)
components/core/src/clp_s/log_converter/LogConverter.hpp (3)
  • LogConverter (21-21)
  • path (31-35)
  • reader (49-49)
components/core/src/clp_s/log_converter/LogConverter.cpp (3)
components/core/src/clp_s/log_converter/LogConverter.hpp (2)
  • reader (49-49)
  • path (31-35)
components/core/src/clp_s/log_converter/LogSerializer.hpp (3)
  • output_dir (40-41)
  • message (56-56)
  • timestamp (49-49)
components/core/src/clp_s/log_converter/LogSerializer.cpp (2)
  • create (22-45)
  • create (22-23)
components/core/src/clp_s/log_converter/LogSerializer.cpp (1)
components/core/src/clp_s/log_converter/LogSerializer.hpp (4)
  • output_dir (40-41)
  • timestamp (49-49)
  • message (56-56)
  • flush_buffer (61-65)
components/core/src/clp_s/log_converter/CommandLineArguments.cpp (1)
components/core/src/clp_s/log_converter/CommandLineArguments.hpp (5)
  • nodiscard (27-27)
  • nodiscard (29-29)
  • nodiscard (31-33)
  • nodiscard (35-35)
  • argc (25-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (24)
taskfile.yaml (1)

216-224: Core build integration looks good.

Target added to generates and TARGETS. No further action here.

components/core/src/clp_s/CMakeLists.txt (1)

2-2: Add subdirectory: OK.

Subcomponent hook-up is correct.

components/core/src/clp_s/log_converter/CommandLineArguments.cpp (1)

180-189: Auth validation and CLI flow look solid.

Parsing, help handling, inputs merging, and auth validation are correct.

If S3 is frequently used, consider surfacing missing AWS env vars with a specific error message before conversion starts.

components/core/src/clp_s/log_converter/LogConverter.hpp (3)

1-13: LGTM!

The include guards and headers are properly structured and include the necessary dependencies for the class.


14-36: LGTM!

The class design and public API are well-structured with clear documentation and appropriate return types.


37-54: LGTM!

The private implementation details are well-designed with appropriate buffer sizing, error handling via std::optional, and proper member initialization.

components/core/src/clp_s/log_converter/LogSerializer.hpp (4)

1-14: LGTM!

Headers and include guards are properly structured for the serialization component.


19-32: LGTM!

The special member functions correctly implement move-only semantics, which is appropriate for a class managing file resources.


33-66: LGTM!

The public API is well-designed with a factory pattern for safe initialization and clear methods for message serialization. The inline close() implementation is appropriate for its simplicity.


67-97: LGTM!

The private implementation is well-structured with appropriate constants, move semantics, and buffer management.

components/core/src/clp_s/log_converter/LogSerializer.cpp (4)

1-20: LGTM!

All necessary headers are included for the implementation.


22-45: LGTM!

The factory method correctly handles initialization failures by returning std::nullopt. The UUID-based filename generation ensures unique output files, and the broad exception handling is appropriate for a factory pattern.


47-67: LGTM!

The method correctly constructs MsgPack objects, serializes them, and manages the buffer. The string_view parameters are safely used since serialization occurs immediately within the function scope.


69-83: LGTM!

The non-timestamped message serialization follows the same safe pattern as the timestamped version with appropriate buffer management.

components/core/src/clp_s/log_converter/CommandLineArguments.hpp (3)

1-10: LGTM!

Headers and include guards are properly structured.


11-36: LGTM!

The public API is well-designed with explicit constructor, [[nodiscard]] attributes for safety, and const-correct getters.


37-46: LGTM!

Private members are appropriately organized with sensible default values.

components/core/src/clp_s/log_converter/LogConverter.cpp (7)

1-19: LGTM!

All necessary headers are included for the implementation.


20-36: LGTM!

The timestamp schema is appropriately documented as non-exhaustive and covers common timestamp patterns. The TODO comment for future unicode support is noted.


38-79: LGTM!

The buffer management logic is well-implemented with proper handling of overlapping memory regions, exponential growth with an upper bound, and appropriate error handling.


81-96: LGTM!

The initialization sequence correctly sets up the schema, parser, and serializer with appropriate error handling.


141-144: LGTM!

Proper cleanup and success return handling.


117-119: Consider refactoring for consistency with coding guidelines.

While enum comparisons using != are acceptable, the coding guidelines prefer false == <expression> style. Consider refactoring for consistency:

As per coding guidelines.

-            if (log_surgeon::ErrorCode::Success != err) {
+            if (false == (log_surgeon::ErrorCode::Success == err)) {
                 return false;
             }

Alternatively, you could keep the current form as enum inequality checks are commonly accepted and readable.

Likely an incorrect or invalid review comment.


124-131: Timestamp position assumption is verified as correct—log-surgeon's schema documentation guarantees timestamps appear at message start.

The log-surgeon unstructured schema specification explicitly states "Timestamps must appear at the start of the message," which validates the code's assumption. The substr(timestamp.length()) operation at line 127 is safe and correct given this API contract.

The code requires no changes. Optionally, adding a comment referencing the schema guarantee would improve clarity for future maintainers.

Comment on lines +67 to +77
std::error_code ec{};
if (false == std::filesystem::create_directory(command_line_arguments.get_output_dir(), ec)
&& ec)
{
SPDLOG_ERROR(
"Can not create output directory {} - {}",
command_line_arguments.get_output_dir(),
ec.message()
);
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Create nested output dirs and validate path type.

create_directory fails for nested paths and doesn’t catch “path exists but not a dir”. Use create_directories and add an exists/is_directory check.

-    std::error_code ec{};
-    if (false == std::filesystem::create_directory(command_line_arguments.get_output_dir(), ec)
-        && ec)
-    {
-        SPDLOG_ERROR(
-                "Can not create output directory {} - {}",
-                command_line_arguments.get_output_dir(),
-                ec.message()
-        );
-        return false;
-    }
+    std::error_code ec{};
+    auto const& outdir = command_line_arguments.get_output_dir();
+    if (false == std::filesystem::exists(outdir, ec)) {
+        ec.clear();
+        if (false == std::filesystem::create_directories(outdir, ec) && ec) {
+            SPDLOG_ERROR("Cannot create output directory {} - {}", outdir, ec.message());
+            return false;
+        }
+    } else if (false == std::filesystem::is_directory(outdir, ec)) {
+        SPDLOG_ERROR("Output path exists but is not a directory: {}", outdir);
+        return false;
+    }
🤖 Prompt for AI Agents
In components/core/src/clp_s/log_converter/log_converter.cpp around lines 67 to
77, replace std::filesystem::create_directory with
std::filesystem::create_directories to handle nested paths, and before
attempting creation check if the path exists and is a directory (using exists()
and is_directory()); if the path exists but is not a directory log an error and
return false; when calling create_directories, still pass an error_code, check
for failure and log ec.message() as before. Ensure the logic covers: if path
exists and is_directory -> proceed, if path exists and not directory -> error,
else try create_directories and on error log and return false.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c0d369 and 26449d8.

📒 Files selected for processing (1)
  • components/core/src/clp_s/log_converter/LogConverter.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/log_converter/LogConverter.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/log_converter/LogConverter.cpp (3)
components/core/src/clp_s/log_converter/LogConverter.hpp (2)
  • reader (49-49)
  • path (31-35)
components/core/src/clp_s/log_converter/LogSerializer.hpp (3)
  • output_dir (40-41)
  • message (56-56)
  • timestamp (49-49)
components/core/src/clp_s/log_converter/LogSerializer.cpp (2)
  • create (22-45)
  • create (22-23)
🔇 Additional comments (3)
components/core/src/clp_s/log_converter/LogConverter.cpp (3)

38-79: Buffer management implementation looks solid.

The refill logic correctly uses memmove for overlapping memory when shifting consumed data, memcpy for the non-overlapping buffer growth scenario, and enforces a maximum buffer size with appropriate error handling.


86-96: Schema setup and serializer initialization are correct.

The log_surgeon schema configuration and serializer creation follow the expected pattern, with proper error handling using the codebase's style guidelines.


125-135: Verify timestamp extraction logic with log_surgeon's event representation.

The code extracts the timestamp string view from the event (line 126), then substrings the full message by the timestamp's length to obtain the remainder (lines 127-129). This assumes that:

  1. The timestamp always appears at the beginning of the message string returned by event.to_string()
  2. The length of the timestamp string view corresponds exactly to its position in the message

While this may be guaranteed by log_surgeon's parsing behaviour, please verify that event.to_string() and event.get_timestamp()->to_string_view() have this relationship. If the timestamp can appear elsewhere or have different formatting in the message string, this logic would produce incorrect results.

Consider generating a test case that validates this assumption, or add a runtime check:

if (nullptr != event.get_timestamp()) {
    auto timestamp{event.get_timestamp()->to_string_view()};
    std::string_view message_view{message};
    
    // Verify timestamp is at the beginning
    if (message_view.starts_with(timestamp)) {
        auto message_without_timestamp{message_view.substr(timestamp.length())};
        message_serialized_successfully
                = serializer.add_message(timestamp, message_without_timestamp);
    } else {
        // Fallback or error handling
        message_serialized_successfully = serializer.add_message(message);
    }
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
components/core/src/clp_s/FileWriter.hpp (2)

38-40: Mark move constructor noexcept for container compatibility.

The move constructor should be marked noexcept to enable optimizations in standard containers (e.g., std::vector will only use move operations during reallocation if they are noexcept). This issue was flagged in the previous review but remains unaddressed.

Apply this diff:

-    FileWriter(FileWriter&& writer)
+    FileWriter(FileWriter&& writer) noexcept
             : m_file{std::exchange(writer.m_file, nullptr)},
               m_fd{std::exchange(writer.m_fd, -1)} {}

42-49: Critical resource leak: Current file handle not closed before move assignment.

The move assignment operator still has two issues from the previous review that remain unaddressed:

  1. Resource leak (CRITICAL): If m_file is already open when move assignment occurs, the current file handle is overwritten without being closed, causing a file descriptor and FILE* leak.
  2. Missing noexcept: The move assignment should be marked noexcept for container compatibility.

Apply this diff to fix both issues:

-    auto operator=(FileWriter&& writer) -> FileWriter& {
+    auto operator=(FileWriter&& writer) noexcept -> FileWriter& {
         if (this == &writer) {
             return *this;
         }
+        if (nullptr != m_file) {
+            // Best-effort close to avoid leak; ignore errors in move.
+            std::fclose(m_file);
+        }
         m_file = std::exchange(writer.m_file, nullptr);
         m_fd = std::exchange(writer.m_fd, -1);
         return *this;
     }
components/core/src/clp_s/log_converter/log_converter.cpp (1)

67-77: Create nested output dirs and validate path type.

Use create_directories and handle “exists but not a directory”. Also fix wording to “Cannot”. [duplicate of a prior review; still unresolved]

-    std::error_code ec{};
-    if (false == std::filesystem::create_directory(command_line_arguments.get_output_dir(), ec)
-        && ec)
-    {
-        SPDLOG_ERROR(
-                "Can not create output directory {} - {}",
-                command_line_arguments.get_output_dir(),
-                ec.message()
-        );
-        return false;
-    }
+    std::error_code ec{};
+    auto const& outdir = command_line_arguments.get_output_dir();
+    if (false == std::filesystem::exists(outdir, ec)) {
+        ec.clear();
+        if (false == std::filesystem::create_directories(outdir, ec) && ec) {
+            SPDLOG_ERROR("Cannot create output directory {} - {}", outdir, ec.message());
+            return false;
+        }
+    } else if (false == std::filesystem::is_directory(outdir, ec)) {
+        SPDLOG_ERROR("Output path exists but is not a directory: {}", outdir);
+        return false;
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26449d8 and 49452ce.

📒 Files selected for processing (4)
  • components/core/src/clp_s/FileWriter.hpp (2 hunks)
  • components/core/src/clp_s/log_converter/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/log_converter/LogConverter.cpp (1 hunks)
  • components/core/src/clp_s/log_converter/log_converter.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/FileWriter.hpp
  • components/core/src/clp_s/log_converter/log_converter.cpp
  • components/core/src/clp_s/log_converter/LogConverter.cpp
🧬 Code graph analysis (2)
components/core/src/clp_s/log_converter/log_converter.cpp (2)
components/core/src/clp_s/log_converter/CommandLineArguments.hpp (3)
  • CommandLineArguments (22-22)
  • CommandLineArguments (22-22)
  • argc (25-25)
components/core/src/clp_s/log_converter/LogConverter.hpp (3)
  • LogConverter (21-21)
  • path (31-35)
  • reader (49-49)
components/core/src/clp_s/log_converter/LogConverter.cpp (4)
components/core/src/clp_s/log_converter/LogConverter.hpp (2)
  • reader (49-49)
  • path (31-35)
components/core/src/clp_s/FileWriter.hpp (1)
  • path (127-127)
components/core/src/clp_s/log_converter/LogSerializer.hpp (3)
  • output_dir (40-41)
  • message (56-56)
  • timestamp (49-49)
components/core/src/clp_s/log_converter/LogSerializer.cpp (2)
  • create (22-45)
  • create (22-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
components/core/src/clp_s/FileWriter.hpp (2)

8-8: LGTM: Required include for std::exchange.

The <utility> header is correctly added to support std::exchange used in the move semantics implementation.


34-35: LGTM: Correctly disables copy semantics.

Deleting the copy constructor and assignment operator is appropriate for a move-only resource-managing class.

components/core/src/clp_s/log_converter/CMakeLists.txt (1)

18-30: Link set looks correct; duplicate removed.

fmt and clp_s::clp_dependencies are present; target properties are sane. LGTM.

components/core/src/clp_s/log_converter/log_converter.cpp (1)

101-106: Review comment is incorrect — nested_readers is guaranteed to be non-empty for FileType::LogText.

The code analysis shows that nested_readers will always have at least one element when FileType::LogText is returned. In try_deduce_reader_type (InputConfig.cpp line 369), the readers vector is initialized with the original reader, then at line 380, buffered_reader is added to the vector before the switch statement evaluates the type. When the code reaches the FileType::LogText case (line 384), it immediately returns {std::move(readers), type} after readers has been populated with both the original reader and the buffered reader. Therefore, calling .back() on nested_readers at log_converter.cpp:104 is safe and will never access an empty container.

The suggested diff is unnecessary.

Likely an incorrect or invalid review comment.

Comment on lines 95 to 98
std::ignore = check_and_log_curl_error(path, reader);
SPDLOG_ERROR("Received input that was not unstructured logtext: {}.", path.path);
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Avoid assigning to std::ignore; drop the dependency.

Use a void cast to discard the result. If you keep std::ignore, include .

-                std::ignore = check_and_log_curl_error(path, reader);
+                (void)check_and_log_curl_error(path, reader);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::ignore = check_and_log_curl_error(path, reader);
SPDLOG_ERROR("Received input that was not unstructured logtext: {}.", path.path);
return false;
}
(void)check_and_log_curl_error(path, reader);
SPDLOG_ERROR("Received input that was not unstructured logtext: {}.", path.path);
return false;
}
🤖 Prompt for AI Agents
In components/core/src/clp_s/log_converter/log_converter.cpp around lines 95 to
98, the code assigns the result of check_and_log_curl_error to std::ignore which
is incorrect and forces a dependency on <tuple>; replace the assignment with a
void cast: (void)check_and_log_curl_error(path, reader); and remove the
unnecessary #include <tuple> (or alternatively, if you choose to keep
std::ignore, add #include <tuple>), ensuring no assignment to std::ignore
remains.

Comment on lines 115 to 121
if (log_surgeon::ErrorCode::BufferOutOfBounds == err) {
break;
}
if (log_surgeon::ErrorCode::Success != err) {
return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add minimal error logging to aid diagnosis.

Log parser errors before returning to ease debugging in field runs.

-            if (log_surgeon::ErrorCode::Success != err) {
-                return false;
-            }
+            if (log_surgeon::ErrorCode::Success != err) {
+                SPDLOG_ERROR("Parser error {} while processing {}", static_cast<int>(err), path.path);
+                return false;
+            }

Want me to push the logging tweak across other early-return paths too?

🤖 Prompt for AI Agents
In components/core/src/clp_s/log_converter/LogConverter.cpp around lines
115-121, the code returns false on non-success parse errors without any logging;
add a minimal log statement that records the error code (and a brief context
like "parse error in LogConverter") immediately before returning false so field
diagnostics capture which ErrorCode occurred; use the existing logging facility
used elsewhere in the file (same logger and log level) and keep the message
concise; also apply the same minimal logging pattern to other early-return paths
in this function to aid debugging.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a brief look. Still waiting for Bill to make a full review.

namespace clp_s::log_converter {
auto LogSerializer::create(std::string_view output_dir, std::string_view original_file_path)
-> std::optional<LogSerializer> {
nlohmann::json metadata{{cOriginalFileMetadataKey, original_file_path}};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid {} initializer for nlohmann::json. It has many weird behavior...

Comment on lines +51 to +52
.key = msgpack::object{cTimestampKey},
.val = msgpack::object{timestamp}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirkrodrigues shall we add a inner layer like what we did in clp-loglib-py? Just in case we need to add timezone offset support in future release...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While Devin was designing the new timestamp parsing for clp-s, he did mention that the inner layer thing has some issues with another feature we want to implement in clp-s.

Specifically, if the user compressed different datasets, each with a different authoritative timestamp key (e.g., ts_key_1 and ts_key_2), currently, the user can't reliably query for logs across both datasets in a certain time range. Yes, the user could do ts_key_1: ... OR ts_key_2: ..., but that's cumbersome and ts_key_2 might exist in both datasets, but might only be authoritative in one.

Ideally, we want to have some mapping from a special key to either of those timestamp keys. This would allow a user to query their logs with something like ": ..." regardless of what the authoritative timestamp key was in their original logs.

However, if the authoritative timestamp key was an object in the original logs, this becomes more difficult to do properly. In contrast, with the new parsed timestamp format in clp-s, we should be able to store the timezone as part of the format.

So tl;dr, we might want to rethink the way the clp-logging Python library writes timestamps to KV-IR streams. Sorry, I didn't realize the issues sooner.

* Creates an instance of LogSerializer.
* @param output_dir The destination directory for generated KV-IR.
* @param original_file_path The original path for the file being converted to KV-IR.
* @return An instance of LogSerializer on success, or std::nullopt on failure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the current Serializer isn't designed to return error code with Result properly, but can we apply Result in this class instead as we will plan to integrate Result to the serializer? It will make this class naturally compatible after migration.

Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments:

  1. I think it's sensible to add [[nodiscard]] to any function that expects caller to check/use its return value
  2. Out-of-range fixes: log-converter should be added to the sources sections of task:package as well as its rsync cmds
  3. I feel like std::optional<T> should be replaced with ystdlib::error_handling::Result<T> and some bools should be replaced with Result<void>
  4. We could add integration tests for log-converter. shouldn't be too hard.

Due to time constraint, you can skip 3/4. Just writing them down now in case we forget in the future.

Comment on lines 40 to 50
if (m_cur_offset > 0) {
// NOLINTBEGIN(cppcoreguidelines-pro-bounds-pointer-arithmetic)
std::memmove(
m_buffer.data(),
m_buffer.data() + m_cur_offset,
m_bytes_occupied - m_cur_offset
);
// NOLINTEND(cppcoreguidelines-pro-bounds-pointer-arithmetic)
m_bytes_occupied -= m_cur_offset;
m_cur_offset = 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this could be its own helper function with a very self-explanatory title. I know this is compacting the buffer.

Comment on lines 52 to 61
if (m_buffer.size() == m_bytes_occupied) {
size_t const new_size{2 * m_buffer.size()};
if (new_size > cMaxBufferSize) {
return std::nullopt;
}
ystdlib::containers::Array<char> new_buffer(new_size);
std::memcpy(new_buffer.data(), m_buffer.data(), m_bytes_occupied);
m_buffer = std::move(new_buffer);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too can be its own helper, although I question the likelihood of the buffer being full when we've just parsed a new log and wants to refill. Why don't we just set this buffer to its maximum size and never re-allocate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case where this happens is when a single log event is larger than the buffer size. Growing in this way allows us to use less memory for datasets that don't have really long log messages, but adapt to datasets that do. That could potentially end up mattering if this ever needs to run in a resource-constrained environment.

But yeah, I can move it to a helper.

}
}

serializer.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be in serializer's destructor upon variable scope exit.

Bill-hbrhbr
Bill-hbrhbr previously approved these changes Oct 21, 2025
Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, approving because I won't be awake to re-approve

@Bill-hbrhbr
Copy link
Contributor

Also, please update

required_binaries = ["clg", "clo", "clp", "clp-s", "indexer", "reducer-server"]

Maybe do a grep throughout the repo to find the remaining places

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
components/core/src/clp_s/log_converter/LogSerializer.hpp (3)

42-43: Add noexcept to move assignment for consistency.

The move constructor is marked noexcept (line 42), but the move assignment operator (line 43) is not. For consistency and to ensure optimal performance in STL containers, both should be noexcept.

Apply this diff:

 LogSerializer(LogSerializer&&) noexcept = default;
-[[nodiscard]] auto operator=(LogSerializer&&) -> LogSerializer& = default;
+[[nodiscard]] auto operator=(LogSerializer&&) noexcept -> LogSerializer& = default;

98-105: Skip zero-length writes in flush_buffer().

The method currently calls m_writer.write() even when the buffer is empty, resulting in an unnecessary system call.

Apply this diff:

     void flush_buffer() {
         auto const buffer{m_serializer.get_ir_buf_view()};
+        if (0U == buffer.size_bytes()) {
+            return;
+        }
         m_writer.write(
                 clp::size_checked_pointer_cast<char const>(buffer.data()),
                 buffer.size_bytes()
         );
         m_serializer.clear_ir_buf();
     }

73-77: Make close() idempotent to prevent double-close issues.

Calling close() multiple times will write the EOF marker repeatedly and attempt to close the writer multiple times, which could lead to errors. Add a guard flag to make the method idempotent.

Apply this diff to add a guard:

     void close() {
-        flush_buffer();
-        m_writer.write_numeric_value(clp::ffi::ir_stream::cProtocol::Eof);
-        m_writer.close();
+        if (false == m_closed) {
+            flush_buffer();
+            m_writer.write_numeric_value(clp::ffi::ir_stream::cProtocol::Eof);
+            m_writer.close();
+            m_closed = true;
+        }
     }

And add the member variable in the private section:

     clp::ffi::ir_stream::Serializer<clp::ir::eight_byte_encoded_variable_t> m_serializer;
     clp_s::FileWriter m_writer;
+    bool m_closed{false};
components/core/src/clp_s/log_converter/LogConverter.cpp (2)

56-58: Use semantically appropriate error code for I/O failures.

Returning std::errc::not_enough_memory for non-EOF read errors is misleading. The actual error could be a network failure, permission issue, disk error, or other I/O problem unrelated to memory.

Apply this diff:

     if (clp::ErrorCode_Success != rc) {
-        return std::errc::not_enough_memory;
+        return std::errc::io_error;
     }

128-132: Timestamp removal incorrectly assumes prefix position.

The code assumes the timestamp is at the beginning of the message by using substr(timestamp.length()). If the timestamp appears elsewhere in the message, this will incorrectly remove the wrong portion of the message.

Apply this diff to find and remove the timestamp correctly:

             if (nullptr != event.get_timestamp()) {
                 auto const timestamp{event.get_timestamp()->to_string_view()};
-                auto const message_without_timestamp{
-                        std::string_view{message}.substr(timestamp.length())
-                };
+                auto message_view{std::string_view{message}};
+                auto const pos{message_view.find(timestamp)};
+                std::string_view message_without_timestamp{};
+                if (std::string_view::npos != pos) {
+                    message_without_timestamp = message_view.substr(pos + timestamp.length());
+                } else {
+                    message_without_timestamp = message_view;
+                }
                 YSTDLIB_ERROR_HANDLING_TRYV(
                         serializer.add_message(timestamp, message_without_timestamp)
                 );
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c784c3 and babad69.

📒 Files selected for processing (3)
  • components/core/src/clp_s/log_converter/LogConverter.cpp (1 hunks)
  • components/core/src/clp_s/log_converter/LogConverter.hpp (1 hunks)
  • components/core/src/clp_s/log_converter/LogSerializer.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/log_converter/LogSerializer.hpp
  • components/core/src/clp_s/log_converter/LogConverter.cpp
  • components/core/src/clp_s/log_converter/LogConverter.hpp
🧬 Code graph analysis (3)
components/core/src/clp_s/log_converter/LogSerializer.hpp (1)
components/core/src/clp_s/FileWriter.hpp (2)
  • writer (42-49)
  • writer (42-42)
components/core/src/clp_s/log_converter/LogConverter.cpp (3)
components/core/src/clp_s/log_converter/LogConverter.hpp (2)
  • reader (56-57)
  • path (35-39)
components/core/src/clp_s/log_converter/LogSerializer.hpp (3)
  • output_dir (33-34)
  • message (67-68)
  • timestamp (57-58)
components/core/src/clp_s/log_converter/LogSerializer.cpp (2)
  • create (28-49)
  • create (28-29)
components/core/src/clp_s/log_converter/LogConverter.hpp (1)
components/core/src/clp_s/log_converter/LogSerializer.hpp (3)
  • nodiscard (39-39)
  • nodiscard (43-43)
  • output_dir (33-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: package-image
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (4)
components/core/src/clp_s/log_converter/log_converter.cpp (2)

67-77: Create nested output dirs and validate path type.

Handle pre-existing paths and multi-level dirs; error if path exists but isn’t a dir.

-    std::error_code ec{};
-    if (false == std::filesystem::create_directory(command_line_arguments.get_output_dir(), ec)
-        && ec)
-    {
-        SPDLOG_ERROR(
-                "Can not create output directory {} - {}",
-                command_line_arguments.get_output_dir(),
-                ec.message()
-        );
-        return false;
-    }
+    std::error_code ec{};
+    auto const& outdir = command_line_arguments.get_output_dir();
+    if (false == std::filesystem::exists(outdir, ec)) {
+        ec.clear();
+        if (false == std::filesystem::create_directories(outdir, ec) && ec) {
+            SPDLOG_ERROR("Cannot create output directory {} - {}", outdir, ec.message());
+            return false;
+        }
+    } else if (false == std::filesystem::is_directory(outdir, ec)) {
+        SPDLOG_ERROR("Output path exists but is not a directory: {}", outdir);
+        return false;
+    }

95-98: Don’t assign to std::ignore; drop the dependency.

Discard the result with a void cast.

-                std::ignore = check_and_log_curl_error(path, reader);
+                (void)check_and_log_curl_error(path, reader);
components/core/src/clp_s/log_converter/LogConverter.cpp (2)

94-109: Map read failures to I/O, not memory.

Use a semantically correct errc for non-EOF failures.

-    m_bytes_occupied += num_bytes_read;
+    m_num_bytes_buffered += num_bytes_read;
     if (clp::ErrorCode_EndOfFile == rc) {
         return num_bytes_read;
     }
     if (clp::ErrorCode_Success != rc) {
-        return std::errc::not_enough_memory;
+        return std::errc::io_error;
     }

70-82: Timestamp removal assumes prefix; fix position + trim delimiters.

Find the timestamp within the message and trim leading separators from the remainder.

-            auto const& event{parser.get_log_parser().get_log_event_view()};
-            auto const message{event.to_string()};
-            if (nullptr != event.get_timestamp()) {
-                auto const timestamp{event.get_timestamp()->to_string_view()};
-                auto const message_without_timestamp{
-                        std::string_view{message}.substr(timestamp.length())
-                };
-                YSTDLIB_ERROR_HANDLING_TRYV(
-                        serializer.add_message(timestamp, message_without_timestamp)
-                );
-            } else {
-                YSTDLIB_ERROR_HANDLING_TRYV(serializer.add_message(message));
-            }
+            auto const& event{parser.get_log_parser().get_log_event_view()};
+            auto const message{event.to_string()};
+            if (nullptr != event.get_timestamp()) {
+                auto const timestamp{event.get_timestamp()->to_string_view()};
+                auto full{std::string_view{message}};
+                auto const pos{full.find(timestamp)};
+                std::string_view remainder{full};
+                if (std::string_view::npos != pos) {
+                    remainder = full.substr(pos + timestamp.length());
+                }
+                auto const is_delim = [](char c) {
+                    switch (c) {
+                        case ' ': case '\t': case '\r': case '\n': case '[': case '(': case ':':
+                            return true;
+                        default:
+                            return false;
+                    }
+                };
+                while (false == remainder.empty() && true == is_delim(remainder.front())) {
+                    remainder.remove_prefix(1);
+                }
+                YSTDLIB_ERROR_HANDLING_TRYV(serializer.add_message(timestamp, remainder));
+            } else {
+                YSTDLIB_ERROR_HANDLING_TRYV(serializer.add_message(message));
+            }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between babad69 and 1e1de58.

📒 Files selected for processing (3)
  • components/core/src/clp_s/log_converter/LogConverter.cpp (1 hunks)
  • components/core/src/clp_s/log_converter/LogConverter.hpp (1 hunks)
  • components/core/src/clp_s/log_converter/log_converter.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/log_converter/log_converter.cpp
  • components/core/src/clp_s/log_converter/LogConverter.hpp
  • components/core/src/clp_s/log_converter/LogConverter.cpp
🧬 Code graph analysis (3)
components/core/src/clp_s/log_converter/log_converter.cpp (2)
components/core/src/clp_s/log_converter/CommandLineArguments.hpp (3)
  • CommandLineArguments (22-22)
  • CommandLineArguments (22-22)
  • argc (25-25)
components/core/src/clp_s/log_converter/LogConverter.hpp (3)
  • LogConverter (20-20)
  • path (35-36)
  • reader (53-54)
components/core/src/clp_s/log_converter/LogConverter.hpp (1)
components/core/src/clp_s/log_converter/LogSerializer.hpp (3)
  • nodiscard (39-39)
  • nodiscard (43-43)
  • output_dir (33-34)
components/core/src/clp_s/log_converter/LogConverter.cpp (4)
components/core/src/clp_s/log_converter/LogConverter.hpp (2)
  • path (35-36)
  • reader (53-54)
components/core/src/clp_s/FileWriter.hpp (1)
  • path (127-127)
components/core/src/clp_s/log_converter/LogSerializer.hpp (3)
  • output_dir (33-34)
  • message (67-68)
  • timestamp (57-58)
components/core/src/clp_s/log_converter/LogSerializer.cpp (2)
  • create (28-49)
  • create (28-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: package-image
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: antlr-code-committed (macos-15)
🔇 Additional comments (1)
components/core/src/clp_s/log_converter/log_converter.cpp (1)

100-106: The review comment is incorrect.

The concern about undefined behaviour from calling nested_readers.back() on an empty stack does not exist. The implementation of try_deduce_reader_type establishes a contract: when it returns FileType::LogText, the readers vector is guaranteed to contain at least two elements (the original reader at line 369, plus the buffered reader added at line 380 before the return at line 385).

The calling code in log_converter.cpp only reaches the .back() call when file_type == LogText (the other cases return early at lines 90–99). Therefore, the precondition for .back() is always satisfied.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
components/core/src/clp_s/log_converter/LogConverter.cpp (5)

66-69: Log parser errors before returning (keeps field diagnostics actionable).

Add a minimal error log with the code and path, then return as you do now. Also add the spdlog include (see separate comment).

-            if (log_surgeon::ErrorCode::Success != err) {
-                return std::errc::no_message;
-            }
+            if (log_surgeon::ErrorCode::Success != err) {
+                SPDLOG_ERROR(
+                        "log-converter parse error {} while processing {}",
+                        static_cast<int>(err),
+                        path.path
+                );
+                return std::errc::no_message;
+            }

70-80: Do not assume the timestamp is at index 0; find and trim delimiters.

Current substr(timestamp.length()) corrupts messages when the timestamp isn’t a prefix. Find the timestamp within the message and trim leading separators before serializing.

-            if (nullptr != event.get_timestamp()) {
-                auto const timestamp{event.get_timestamp()->to_string_view()};
-                auto const message_without_timestamp{
-                        std::string_view{message}.substr(timestamp.length())
-                };
-                YSTDLIB_ERROR_HANDLING_TRYV(
-                        serializer.add_message(timestamp, message_without_timestamp)
-                );
-            } else {
+            if (nullptr != event.get_timestamp()) {
+                auto const timestamp{event.get_timestamp()->to_string_view()};
+                auto const full{std::string_view{message}};
+                auto const pos{full.find(timestamp)};
+                std::string_view remainder{full};
+                if (std::string_view::npos != pos) {
+                    remainder = full.substr(pos + timestamp.length());
+                }
+                // Trim leading delimiters used around timestamps.
+                auto const is_delim = [](char c) {
+                    switch (c) {
+                        case ' ': case '\t': case '\r': case '\n': case '[': case '(': case ':':
+                            return true;
+                        default:
+                            return false;
+                    }
+                };
+                while (false == remainder.empty()
+                       && true == is_delim(remainder.front())) {
+                    remainder.remove_prefix(1);
+                }
+                YSTDLIB_ERROR_HANDLING_TRYV(serializer.add_message(timestamp, remainder));
+            } else {
                 YSTDLIB_ERROR_HANDLING_TRYV(serializer.add_message(message));
             }

3-14: Add spdlog include if adopting error logs.

Required for SPDLOG_* macros used above.

 #include <string_view>
 #include <system_error>
 #include <utility>
 
+#include <spdlog/spdlog.h>

17-17: Remove unused include.

InputConfig.hpp isn’t referenced in this file.

-#include "../InputConfig.hpp"

106-108: Use an I/O error for non-EOF read failures.

Mapping read failures to not_enough_memory is misleading; use io_error.

-    if (clp::ErrorCode_Success != rc) {
-        return std::errc::not_enough_memory;
-    }
+    if (clp::ErrorCode_Success != rc) {
+        return std::errc::io_error;
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e1de58 and a3c899e.

📒 Files selected for processing (1)
  • components/core/src/clp_s/log_converter/LogConverter.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/log_converter/LogConverter.cpp
🧠 Learnings (2)
📚 Learning: 2024-10-07T21:35:04.362Z
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.

Applied to files:

  • components/core/src/clp_s/log_converter/LogConverter.cpp
📚 Learning: 2024-10-07T21:16:41.660Z
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.

Applied to files:

  • components/core/src/clp_s/log_converter/LogConverter.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/log_converter/LogConverter.cpp (3)
components/core/src/clp_s/log_converter/LogConverter.hpp (2)
  • path (35-36)
  • reader (53-54)
components/core/src/clp_s/log_converter/LogSerializer.hpp (3)
  • output_dir (33-34)
  • message (67-68)
  • timestamp (57-58)
components/core/src/clp_s/log_converter/LogSerializer.cpp (2)
  • create (28-49)
  • create (28-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (1)
components/core/src/clp_s/log_converter/LogConverter.cpp (1)

85-86: Handle close() errors (or switch to RAII).

Currently close()’s result is ignored; failures will be lost. At minimum, propagate it. Ideally, make LogSerializer’s destructor call close() idempotently and remove the explicit call to avoid leaks on early returns.

[ suggest_essential_refactor ]

-    serializer.close();
+    YSTDLIB_ERROR_HANDLING_TRYV(serializer.close());
     return ystdlib::error_handling::success();

@gibber9809 gibber9809 changed the title feat(log-converter): Add log-converter binary which converts unstructured text logs into KV-IR. feat(log-converter): Add log-converter binary which converts unstructured text logs into KV-IR; Update log-surgeon to yscope/log-surgeon@840f262. Oct 24, 2025
Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed log_converter.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed commandline args.

Comment on lines 30 to 39
/**
* Reads and returns a list of paths from a file containing newline-delimited paths.
* @param input_path_list_file_path Path to the file containing the list of paths.
* @param path_destination The vector that the paths are pushed into.
* @return Whether paths were read successfully or not.
*/
[[nodiscard]] auto read_paths_from_file(
std::string const& input_path_list_file_path,
std::vector<std::string>& path_destination
) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should split the declaration and definition.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
components/core/src/clp_s/log_converter/log_converter.cpp (2)

61-71: Create nested output dirs and validate path type.

Handle nested paths and “exists but not a directory”; switch to create_directories and add exists/is_directory checks. Also fix the log message grammar.

Apply this diff:

-    std::error_code ec{};
-    if (false == std::filesystem::create_directory(command_line_arguments.get_output_dir(), ec)
-        && ec)
-    {
-        SPDLOG_ERROR(
-                "Can not create output directory {} - {}",
-                command_line_arguments.get_output_dir(),
-                ec.message()
-        );
-        return false;
-    }
+    std::error_code ec{};
+    auto const& outdir = command_line_arguments.get_output_dir();
+    if (false == std::filesystem::exists(outdir, ec)) {
+        ec.clear();
+        if (false == std::filesystem::create_directories(outdir, ec) && ec) {
+            SPDLOG_ERROR("Cannot create output directory {} - {}", outdir, ec.message());
+            return false;
+        }
+    } else if (false == std::filesystem::is_directory(outdir, ec)) {
+        SPDLOG_ERROR("Output path exists but is not a directory: {}", outdir);
+        return false;
+    }

89-92: Don’t assign to std::ignore; discard via void cast.

Avoids relying on and is clearer when intentionally ignoring the result.

Apply this diff:

-                std::ignore = check_and_log_curl_error(path, reader.get());
+                (void)check_and_log_curl_error(path, reader.get());
components/core/src/clp_s/log_converter/CommandLineArguments.cpp (1)

59-69: Trim CRLF in files-from list.

Strip trailing ‘\r’ so Windows line endings don’t corrupt paths.

Apply this diff:

     std::string line;
     while (true) {
         error_code = reader.try_read_to_delimiter('\n', false, false, line);
         if (ErrorCodeSuccess != error_code) {
             break;
         }
-        if (false == line.empty()) {
-            path_destination.push_back(line);
-        }
+        if (false == line.empty()) {
+            if ('\r' == line.back()) {
+                line.pop_back();
+            }
+            if (false == line.empty()) {
+                path_destination.push_back(line);
+            }
+        }
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d4b53c and 653e451.

📒 Files selected for processing (2)
  • components/core/src/clp_s/log_converter/CommandLineArguments.cpp (1 hunks)
  • components/core/src/clp_s/log_converter/log_converter.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/log_converter/log_converter.cpp
  • components/core/src/clp_s/log_converter/CommandLineArguments.cpp
🧬 Code graph analysis (2)
components/core/src/clp_s/log_converter/log_converter.cpp (2)
components/core/src/clp_s/log_converter/CommandLineArguments.hpp (7)
  • CommandLineArguments (22-22)
  • CommandLineArguments (22-22)
  • nodiscard (27-27)
  • nodiscard (29-29)
  • nodiscard (31-33)
  • nodiscard (35-35)
  • argc (25-25)
components/core/src/clp_s/log_converter/LogConverter.hpp (3)
  • LogConverter (20-20)
  • path (35-36)
  • reader (53-54)
components/core/src/clp_s/log_converter/CommandLineArguments.cpp (1)
components/core/src/clp_s/log_converter/CommandLineArguments.hpp (5)
  • nodiscard (27-27)
  • nodiscard (29-29)
  • nodiscard (31-33)
  • nodiscard (35-35)
  • argc (25-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (macos-15)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
components/core/src/clp_s/log_converter/LogSerializer.hpp (3)

42-43: Add noexcept to move assignment operator for consistency.

The move constructor is marked noexcept but the move assignment operator is not. For consistency and to ensure exception safety, both should have matching exception specifications. Since the move constructor is defaulted and noexcept, the move assignment should be too.

Apply this diff:

 LogSerializer(LogSerializer&&) noexcept = default;
-[[nodiscard]] auto operator=(LogSerializer&&) -> LogSerializer& = default;
+[[nodiscard]] auto operator=(LogSerializer&&) noexcept -> LogSerializer& = default;

77-81: Make close() idempotent to prevent double-close bugs.

Currently, calling close() multiple times will write multiple EOF markers and attempt to close the writer repeatedly, which can lead to corruption or runtime errors. Add a guard flag to ensure close operations happen only once.

Apply this diff:

 void close() {
-    flush_buffer();
-    m_writer.write_numeric_value(clp::ffi::ir_stream::cProtocol::Eof);
-    m_writer.close();
+    if (false == m_closed) {
+        flush_buffer();
+        m_writer.write_numeric_value(clp::ffi::ir_stream::cProtocol::Eof);
+        m_writer.close();
+        m_closed = true;
+    }
 }

And add the flag to private members:

 clp::ffi::ir_stream::Serializer<clp::ir::eight_byte_encoded_variable_t> m_serializer;
 clp_s::FileWriter m_writer;
+bool m_closed{false};

102-109: Skip zero-length writes to avoid unnecessary syscalls.

The current implementation always calls m_writer.write() even when the buffer is empty. Add an early return when there's no data to flush.

Apply this diff:

 void flush_buffer() {
     auto const buffer{m_serializer.get_ir_buf_view()};
+    if (0U == buffer.size_bytes()) {
+        return;
+    }
     m_writer.write(
             clp::size_checked_pointer_cast<char const>(buffer.data()),
             buffer.size_bytes()
     );
     m_serializer.clear_ir_buf();
 }
components/core/src/clp_s/log_converter/CommandLineArguments.cpp (1)

67-76: Handle CRLF line endings to prevent path corruption.

The existing past review comment remains unaddressed. When reading paths from files with Windows line endings (CRLF), the trailing \r character will corrupt the paths.

Apply this diff to trim trailing \r:

     std::string line;
     while (true) {
         error_code = reader.try_read_to_delimiter('\n', false, false, line);
         if (ErrorCodeSuccess != error_code) {
             break;
         }
         if (false == line.empty()) {
+            if ('\r' == line.back()) {
+                line.pop_back();
+            }
             path_destination.push_back(line);
         }
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3e28c9 and 1d17501.

📒 Files selected for processing (2)
  • components/core/src/clp_s/log_converter/CommandLineArguments.cpp (1 hunks)
  • components/core/src/clp_s/log_converter/LogSerializer.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/log_converter/LogSerializer.hpp
  • components/core/src/clp_s/log_converter/CommandLineArguments.cpp
🧬 Code graph analysis (2)
components/core/src/clp_s/log_converter/LogSerializer.hpp (1)
components/core/src/clp_s/FileWriter.hpp (2)
  • writer (42-49)
  • writer (42-42)
components/core/src/clp_s/log_converter/CommandLineArguments.cpp (1)
components/core/src/clp_s/log_converter/CommandLineArguments.hpp (5)
  • nodiscard (27-27)
  • nodiscard (29-29)
  • nodiscard (31-33)
  • nodiscard (35-35)
  • argc (25-25)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)
🔇 Additional comments (1)
components/core/src/clp_s/log_converter/LogSerializer.hpp (1)

91-96: LGTM!

The private constructor correctly takes ownership of the serializer and writer via move semantics.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
components/core/src/clp_s/log_converter/log_converter.cpp (3)

61-71: Create nested output directories and validate existing path type.

Use exists/is_directory checks and create_directories; also fix message grammar.

-    std::error_code ec{};
-    if (false == std::filesystem::create_directory(command_line_arguments.get_output_dir(), ec)
-        && ec)
-    {
-        SPDLOG_ERROR(
-                "Can not create output directory {} - {}",
-                command_line_arguments.get_output_dir(),
-                ec.message()
-        );
-        return false;
-    }
+    std::error_code ec{};
+    auto const& outdir = command_line_arguments.get_output_dir();
+    if (false == std::filesystem::exists(outdir, ec)) {
+        ec.clear();
+        if (false == std::filesystem::create_directories(outdir, ec) && ec) {
+            SPDLOG_ERROR("Cannot create output directory {} - {}", outdir, ec.message());
+            return false;
+        }
+    } else if (false == std::filesystem::is_directory(outdir, ec)) {
+        SPDLOG_ERROR("Output path exists but is not a directory: {}", outdir);
+        return false;
+    }

89-92: Invalid use of std::ignore; this won’t compile. Consume or explicitly discard the result.

Assigning to std::ignore is ill-formed. Either cast to void or, better, use the boolean to avoid duplicate logs and satisfy [[nodiscard]].

Minimal fix:

-                std::ignore = check_and_log_curl_error(path, reader.get());
+                (void)check_and_log_curl_error(path, reader.get());

Recommended (use the result to prevent double-logging):

-                std::ignore = check_and_log_curl_error(path, reader.get());
-                SPDLOG_ERROR("Received input that was not unstructured logtext: {}.", path.path);
-                return false;
+                auto const had_curl_error = check_and_log_curl_error(path, reader.get());
+                if (false == had_curl_error) {
+                    SPDLOG_ERROR("Received input that was not unstructured logtext: {}.", path.path);
+                }
+                return false;

95-99: Guard nested_readers.back() to avoid UB when vector is empty.

Empty vector leads to undefined behaviour; also guard against null element.

-        auto const convert_result{log_converter.convert_file(
-                path,
-                nested_readers.back().get(),
-                command_line_arguments.get_output_dir()
-        )};
+        if (nested_readers.empty() || nullptr == nested_readers.back()) {
+            (void)check_and_log_curl_error(path, reader.get());
+            SPDLOG_ERROR("No reader available after type deduction for {}.", path.path);
+            return false;
+        }
+        auto const convert_result{log_converter.convert_file(
+                path,
+                nested_readers.back().get(),
+                command_line_arguments.get_output_dir()
+        )};
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d17501 and dc06942.

📒 Files selected for processing (1)
  • components/core/src/clp_s/log_converter/log_converter.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp_s/log_converter/log_converter.cpp
🧬 Code graph analysis (1)
components/core/src/clp_s/log_converter/log_converter.cpp (2)
components/core/src/clp_s/log_converter/CommandLineArguments.hpp (7)
  • CommandLineArguments (22-22)
  • CommandLineArguments (22-22)
  • nodiscard (27-27)
  • nodiscard (29-29)
  • nodiscard (31-33)
  • nodiscard (35-35)
  • argc (25-25)
components/core/src/clp_s/log_converter/LogConverter.hpp (3)
  • LogConverter (20-20)
  • path (35-36)
  • reader (53-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (macos-15)

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Pushed some nit fixes. Otherwise lgtm.
  • Tested by parsing and converting private users' Android log. Everything worked fine.
  • PR title lgtm.

@LinZhihao-723 LinZhihao-723 merged commit 929bd25 into y-scope:main Oct 25, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants