Skip to content

Conversation

@gibber9809
Copy link
Contributor

@gibber9809 gibber9809 commented May 26, 2025

Description

This PR splits the clp-s build into several smaller library builds in order to make it easier to selectively use certain components of clp-s in other projects. Care has been taken to ensure that each library doesn't force users to include unnecessary transitive dependencies.

The following table outlines each library in the clp-s build its purpose:

library purpose
clp_s::clp_dependencies Encapsulate all dependencies from clp until clp has been split into library builds.
clp_s::reducer_dependencies Encapsulate all dependencies from the reducer until the reducer has been split into library builds.
clp_s::IO IO classes and utilities
clp_s::ArchiveWriter ArchiveWriter and all other classes needed to write an archive
clp_s::ArchiveReader ArchiveReader and all other classes needed to read an archive
clp_s::JsonConstructor JsonConstructor -- implements various decompression flows for clp-s binary
clp_s::TimestampPattern TimestampPattern and other classes needed to parse timestamps
clp_s::search Archive search
clp_s::search::ast Search AST
clp_s::search::kql KQL parsing
clp_s::search::sql SQL parsing stub

Note that not all of these libraries are new and that the kql and sql libraries have been renamed so that they are properly namespaced.

Also note that JsonParser is included in the clp_s::ArchiveWriter library (unlike JsonConstructor for the ArchiveReader library) because including it does not introduce any extra dependencies (unlike JsonConstructor which links against the mongodb c++ library).

Since many of these components depend on the temporary clp_s::clp_dependencies and clp_s::reducer_dependencies libraries we have chosen not to replace uses of clp-s components within the codebase with linkage against these libraries.

Decoupling these libraries did require some code changes including:

  • Separating OutputHandler into OutputHandler.hpp and OutputHandlerImpl.{cpp,hpp} in order to eliminate transitive depencies from various output handlers in clp_s::search
  • Moving FileType enum from CommandLineArguments to InputConfig to remove transitive dependency on CommandLineArguments in several components.
  • Moving try_create_reader from ReaderUtils to InputConfig so that it can be part of clp_s::IO without introducing ArchiveReader specific dependencies
  • Removing many unnecessary include directives throughout the codebase

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Validated that clp-s build succeeds after refactor.
  • Validated that simple test binary that includes headers from clp_s::search can build properly when linking against only clp_s::search.

Summary by CodeRabbit

  • New Features

    • Introduced multiple output handler classes for versatile search result handling, including standard output, network, results cache, count, count-by-time, and vector output handlers.
    • Added a unified reader creation mechanism supporting both file and network sources with optional AWS S3 presigned URL authentication.
  • Refactor

    • Modularized the build system into smaller, well-defined libraries for better maintainability.
    • Simplified the output handler interface by relocating concrete implementations to a dedicated module.
    • Updated namespace usage and enum references across the codebase to ensure consistency.
  • Style

    • Reduced dependencies and removed unnecessary includes to streamline code and limit external library exposure.
  • Chores

    • Updated build and lint configurations to align with the refactored file structure and new source files.
  • Tests

    • Modified test utilities and cases to accommodate new enums and updated output handler implementations.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 26, 2025

Walkthrough

This change modularizes the codebase by moving concrete OutputHandler implementations from search/OutputHandler.hpp into a new OutputHandlerImpl.hpp under the clp_s namespace, updates the build system to use smaller, well-defined libraries, and refactors reader creation logic from ReaderUtils to InputConfig. Related includes, type usages, and namespace references are updated accordingly.

Changes

File(s) / Path(s) Change Summary
components/core/src/clp_s/search/OutputHandler.hpp, OutputHandlerImpl.hpp, OutputHandlerImpl.cpp Removed all concrete OutputHandler subclasses from search/OutputHandler.hpp and moved them to new OutputHandlerImpl.hpp/.cpp under clp_s namespace. Updated includes, namespaces, and constructor initializations.
components/core/src/clp_s/search/OutputHandler.hpp Simplified to only declare the abstract OutputHandler interface with two boolean flags and no concrete subclasses.
components/core/src/clp_s/search/Output.cpp, components/core/src/clp_s/search/Output.hpp Added/removes includes for logging and simdjson; no logic changes.
components/core/src/clp_s/CommandLineArguments.hpp, InputConfig.hpp, InputConfig.cpp, kv_ir_search.cpp, ... Moved FileType enum from CommandLineArguments to InputConfig, updated all references and function signatures to use the new location. Added new try_create_reader logic to InputConfig.
components/core/src/clp_s/ReaderUtils.cpp, ReaderUtils.hpp, ArchiveReaderAdaptor.cpp, JsonParser.cpp, ... Removed reader creation logic from ReaderUtils, updated all code to use new try_create_reader function from InputConfig. Adjusted includes accordingly.
components/core/src/clp_s/JsonConstructor.cpp, JsonConstructor.hpp, ColumnWriter.hpp, VariableEncoder.hpp, ... Cleaned up includes, removed unnecessary dependencies (e.g., simdjson, spdlog) from headers.
components/core/src/clp_s/CMakeLists.txt, components/core/CMakeLists.txt, ... Refactored build system: split monolithic libraries into smaller, modular targets (e.g., clp_s::archive_reader, clp_s::search), updated dependencies and linkage.
components/core/src/clp_s/search/CMakeLists.txt, kql/CMakeLists.txt, sql/CMakeLists.txt, ast/CMakeLists.txt Added or updated CMake build files to reflect new modular structure, renamed targets for clarity, added aliases.
components/core/tests/clp_s_test_utils.cpp, clp_s_test_utils.hpp, test-clp_s-search.cpp, ... Updated includes and type references for FileType and output handlers to match new locations and namespaces.
components/core/src/clp_s/clp-s.cpp Updated includes and namespace qualification for output handlers and FileType checks.
components/core/.clang-format, taskfiles/lint.yaml Updated regex and lint sources to track new file locations for output handlers and libraries.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant MainApp as clp-s
    participant OutputHandlerImpl
    participant OutputHandler (Abstract)
    participant InputConfig
    participant ReaderInterface

    User->>MainApp: Run search/compress command
    MainApp->>InputConfig: try_create_reader(path, network_auth)
    InputConfig->>ReaderInterface: (Creates appropriate reader)
    InputConfig-->>MainApp: shared_ptr<ReaderInterface>
    MainApp->>OutputHandlerImpl: Instantiate concrete OutputHandler (e.g., StandardOutputHandler)
    OutputHandlerImpl->>OutputHandler: (Uses abstract interface)
    MainApp->>OutputHandlerImpl: write()/flush()/finish()
Loading

Possibly related PRs

  • y-scope/clp#668: Refactors and modularizes OutputHandler implementations, moving them out of search/OutputHandler.hpp—directly related to this PR's restructuring of output handlers.
  • y-scope/clp#639: Introduces InputConfig and refactors reader creation logic, matching this PR's changes to reader instantiation and input handling.
  • y-scope/clp#790: Modularizes clp_s libraries including clp_s::search and clp_s::TimestampPattern, related to this PR's build system refactoring.

Suggested reviewers

  • wraymo
  • LinZhihao-723

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbf7231 and 416dbd2.

📒 Files selected for processing (1)
  • components/core/src/clp_s/search/ast/CMakeLists.txt (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (1)
components/core/src/clp_s/search/ast/CMakeLists.txt (1)

63-63: Link against the new timestamp_pattern alias
Updating the dependency to clp_s::timestamp_pattern aligns with the renamed modular library and preserves the public interface contract.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@gibber9809 gibber9809 marked this pull request as ready for review May 26, 2025 18:04
@gibber9809 gibber9809 requested review from a team and wraymo as code owners May 26, 2025 18:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
components/core/src/clp_s/OutputHandlerImpl.cpp (2)

39-54: ⚠️ Potential issue

Handle partial sends in network write operation.

The send() call may not send all the data in one operation. You should loop until all bytes are sent to avoid data loss.

Apply this diff to handle partial sends properly:

 void NetworkOutputHandler::write(
         string_view message,
         epochtime_t timestamp,
         string_view archive_id,
         int64_t log_event_idx
 ) {
     static constexpr string_view cOrigFilePathPlaceholder{""};
     msgpack::type::tuple<epochtime_t, string, string, string, int64_t> const
             src(timestamp, message, cOrigFilePathPlaceholder, archive_id, log_event_idx);
     msgpack::sbuffer m;
     msgpack::pack(m, src);
 
-    if (-1 == send(m_socket_fd, m.data(), m.size(), 0)) {
-        throw OperationFailed(ErrorCode::ErrorCodeFailureNetwork, __FILE__, __LINE__);
+    size_t total_sent = 0;
+    while (total_sent < m.size()) {
+        ssize_t sent = send(m_socket_fd, m.data() + total_sent, m.size() - total_sent, 0);
+        if (-1 == sent) {
+            throw OperationFailed(ErrorCode::ErrorCodeFailureNetwork, __FILE__, __LINE__);
+        }
+        total_sent += sent;
     }
 }

71-73: 🛠️ Refactor suggestion

Preserve MongoDB exception details for better debugging.

The MongoDB exception messages contain valuable debugging information that is currently discarded. Consider logging the exception details before returning the error code.

Apply this pattern to all MongoDB exception handlers:

     } catch (mongocxx::exception const& e) {
+        SPDLOG_ERROR("MongoDB operation failed: {}", e.what());
         throw OperationFailed(ErrorCode::ErrorCodeBadParamDbUri, __FILENAME__, __LINE__);
     }

And for the flush method:

         } catch (mongocxx::exception const& e) {
+            SPDLOG_ERROR("MongoDB bulk write failed: {}", e.what());
             return ErrorCode::ErrorCodeFailureDbBulkWrite;
         }

Also applies to: 116-118, 126-128

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a17e654 and 7a4e2d2.

📒 Files selected for processing (38)
  • components/core/.clang-format (1 hunks)
  • components/core/CMakeLists.txt (2 hunks)
  • components/core/src/clp_s/ArchiveReaderAdaptor.cpp (1 hunks)
  • components/core/src/clp_s/ArchiveWriter.cpp (1 hunks)
  • components/core/src/clp_s/CMakeLists.txt (4 hunks)
  • components/core/src/clp_s/ColumnWriter.hpp (0 hunks)
  • components/core/src/clp_s/CommandLineArguments.hpp (0 hunks)
  • components/core/src/clp_s/DictionaryWriter.cpp (1 hunks)
  • components/core/src/clp_s/InputConfig.cpp (2 hunks)
  • components/core/src/clp_s/InputConfig.hpp (2 hunks)
  • components/core/src/clp_s/JsonConstructor.cpp (1 hunks)
  • components/core/src/clp_s/JsonConstructor.hpp (0 hunks)
  • components/core/src/clp_s/JsonParser.cpp (3 hunks)
  • components/core/src/clp_s/JsonParser.hpp (1 hunks)
  • components/core/src/clp_s/OutputHandlerImpl.cpp (4 hunks)
  • components/core/src/clp_s/OutputHandlerImpl.hpp (1 hunks)
  • components/core/src/clp_s/ReaderUtils.cpp (0 hunks)
  • components/core/src/clp_s/ReaderUtils.hpp (1 hunks)
  • components/core/src/clp_s/VariableDecoder.cpp (1 hunks)
  • components/core/src/clp_s/VariableEncoder.hpp (0 hunks)
  • components/core/src/clp_s/ZstdCompressor.cpp (1 hunks)
  • components/core/src/clp_s/ZstdCompressor.hpp (0 hunks)
  • components/core/src/clp_s/clp-s.cpp (4 hunks)
  • components/core/src/clp_s/indexer/indexer.cpp (0 hunks)
  • components/core/src/clp_s/kv_ir_search.cpp (1 hunks)
  • components/core/src/clp_s/search/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/search/Output.cpp (1 hunks)
  • components/core/src/clp_s/search/Output.hpp (0 hunks)
  • components/core/src/clp_s/search/OutputHandler.hpp (2 hunks)
  • components/core/src/clp_s/search/ast/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/search/kql/CMakeLists.txt (1 hunks)
  • components/core/src/clp_s/search/sql/CMakeLists.txt (1 hunks)
  • components/core/tests/clp_s_test_utils.cpp (2 hunks)
  • components/core/tests/clp_s_test_utils.hpp (2 hunks)
  • components/core/tests/test-clp_s-end_to_end.cpp (1 hunks)
  • components/core/tests/test-clp_s-range_index.cpp (1 hunks)
  • components/core/tests/test-clp_s-search.cpp (6 hunks)
  • taskfiles/lint.yaml (1 hunks)
💤 Files with no reviewable changes (8)
  • components/core/src/clp_s/indexer/indexer.cpp
  • components/core/src/clp_s/search/Output.hpp
  • components/core/src/clp_s/CommandLineArguments.hpp
  • components/core/src/clp_s/ColumnWriter.hpp
  • components/core/src/clp_s/VariableEncoder.hpp
  • components/core/src/clp_s/JsonConstructor.hpp
  • components/core/src/clp_s/ZstdCompressor.hpp
  • components/core/src/clp_s/ReaderUtils.cpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/src/clp_s/JsonConstructor.cpp
  • components/core/src/clp_s/ZstdCompressor.cpp
  • components/core/src/clp_s/VariableDecoder.cpp
  • components/core/src/clp_s/ArchiveWriter.cpp
  • components/core/src/clp_s/DictionaryWriter.cpp
  • components/core/src/clp_s/kv_ir_search.cpp
  • components/core/tests/clp_s_test_utils.hpp
  • components/core/src/clp_s/search/Output.cpp
  • components/core/tests/test-clp_s-end_to_end.cpp
  • components/core/tests/clp_s_test_utils.cpp
  • components/core/src/clp_s/JsonParser.cpp
  • components/core/tests/test-clp_s-range_index.cpp
  • components/core/src/clp_s/ReaderUtils.hpp
  • components/core/src/clp_s/clp-s.cpp
  • components/core/src/clp_s/ArchiveReaderAdaptor.cpp
  • components/core/src/clp_s/JsonParser.hpp
  • components/core/tests/test-clp_s-search.cpp
  • components/core/src/clp_s/OutputHandlerImpl.cpp
  • components/core/src/clp_s/InputConfig.cpp
  • components/core/src/clp_s/InputConfig.hpp
  • components/core/src/clp_s/search/OutputHandler.hpp
  • components/core/src/clp_s/OutputHandlerImpl.hpp
🧬 Code Graph Analysis (4)
components/core/src/clp_s/kv_ir_search.cpp (2)
components/core/src/clp_s/InputConfig.hpp (1)
  • try_create_reader (122-123)
components/core/src/clp_s/InputConfig.cpp (2)
  • try_create_reader (168-177)
  • try_create_reader (168-169)
components/core/src/clp_s/clp-s.cpp (1)
components/core/src/clp_s/kv_ir_search.cpp (1)
  • command_line_arguments (62-63)
components/core/src/clp_s/ArchiveReaderAdaptor.cpp (1)
components/core/src/clp_s/InputConfig.cpp (2)
  • try_create_reader (168-177)
  • try_create_reader (168-169)
components/core/src/clp_s/OutputHandlerImpl.cpp (1)
components/core/src/clp_s/OutputHandlerImpl.hpp (2)
  • NetworkOutputHandler (60-64)
  • NetworkOutputHandler (67-71)
🔇 Additional comments (52)
components/core/src/clp_s/JsonConstructor.cpp (1)

13-13: LGTM: Spdlog include addition is appropriate.

The addition of spdlog/spdlog.h include is necessary to support the existing SPDLOG_WARN usage on line 42. This change aligns with the broader modularization effort described in the PR.

components/core/src/clp_s/ArchiveWriter.cpp (1)

8-8: LGTM: Spdlog include addition is appropriate.

The addition of spdlog/spdlog.h include is necessary to support the existing SPDLOG_ERROR usage on lines 33 and 39. This change aligns with the broader modularization effort described in the PR.

components/core/src/clp_s/search/Output.cpp (1)

6-6: LGTM: Spdlog include addition is appropriate.

The addition of spdlog/spdlog.h include is necessary to support the existing SPDLOG_ERROR usage on lines 117 and 126 for error handling in output handler operations. This change aligns with the broader modularization effort described in the PR.

components/core/src/clp_s/ZstdCompressor.cpp (1)

4-4: LGTM: Spdlog include addition is appropriate.

The addition of spdlog/spdlog.h include is necessary to support the existing SPDLOG_ERROR usage on lines 13, 36, 79, and 108 for comprehensive error reporting in compression operations. This change aligns with the broader modularization effort described in the PR.

components/core/src/clp_s/search/ast/CMakeLists.txt (1)

66-66: LGTM! Proper CMake include directory configuration.

The addition of the public include directory correctly exposes the core headers needed for the AST library's public interface, supporting the modularization effort.

components/core/src/clp_s/DictionaryWriter.cpp (1)

5-5: LGTM! Necessary include for logging functionality.

The spdlog include is required for the SPDLOG_ERROR usage on line 18 and aligns with the consistent logging pattern being established across the codebase.

components/core/src/clp_s/VariableDecoder.cpp (1)

5-5: LGTM! Required include for error logging.

The spdlog include is necessary for the SPDLOG_ERROR usage in the variable count validation (lines 19-25) and maintains consistency with the logging approach across the codebase.

components/core/tests/clp_s_test_utils.cpp (2)

16-16: LGTM! Correct namespace refactoring.

The parameter type change from CommandLineArguments::FileType to clp_s::FileType properly reflects the enum's relocation to the clp_s namespace.


42-44: LGTM! Consistent enum usage after refactoring.

The enum value references have been correctly updated to use the new clp_s::FileType namespace, maintaining the same logic while simplifying the type qualification.

components/core/src/clp_s/kv_ir_search.cpp (1)

252-252: LGTM! Correct function call after refactoring.

The change from qualified ReaderUtils::try_create_reader to unqualified try_create_reader properly reflects the function's relocation to InputConfig.hpp (included on line 29). The function signature and behaviour remain unchanged.

components/core/tests/test-clp_s-range_index.cpp (1)

175-175: LGTM! Enum qualifier simplification is correct.

The changes correctly update the enum usage from clp_s::CommandLineArguments::FileType to clp_s::FileType, which aligns with the refactoring that moved the FileType enum from CommandLineArguments to the clp_s namespace in InputConfig.hpp. The required include is present on line 21.

Also applies to: 179-179

components/core/tests/test-clp_s-end_to_end.cpp (1)

110-110: LGTM! Consistent enum qualifier simplification.

The change correctly simplifies the enum qualifier from clp_s::CommandLineArguments::FileType::Json to clp_s::FileType::Json, maintaining consistency with the refactoring that moved the FileType enum to the clp_s namespace. The InputConfig.hpp include on line 12 ensures the enum is accessible.

components/core/src/clp_s/JsonParser.cpp (2)

31-31: LGTM! Proper include added for refactored functionality.

The addition of #include "InputConfig.hpp" is necessary to access the try_create_reader function that was moved from ReaderUtils to InputConfig as part of the modularization effort.


488-488: LGTM! Reader creation calls updated correctly.

The changes properly replace ReaderUtils::try_create_reader with the new try_create_reader function from InputConfig. The function signature and parameters remain the same, ensuring a seamless transition. The necessary include was added on line 31.

Also applies to: 951-951

components/core/.clang-format (1)

7-9: LGTM! Clang-format configuration updated for modularization.

The addition of "clp_s" to the library headers regex pattern is appropriate for the modularization effort. This ensures that headers from the newly introduced clp_s libraries (clp_s::IO, clp_s::ArchiveReader, clp_s::ArchiveWriter, etc.) are properly grouped during include sorting and formatting.

components/core/src/clp_s/JsonParser.hpp (1)

39-39: LGTM! Clean enum usage after refactoring.

The change properly uses the FileType enum from the InputConfig.hpp header, which aligns with the PR's objective to modularize the codebase by moving the enum from CommandLineArguments to InputConfig.

taskfiles/lint.yaml (1)

432-433: LGTM! Lint configuration updated to match file refactoring.

The exclusion list correctly reflects the file renaming from OutputHandler to OutputHandlerImpl files, ensuring the new implementation files are properly included in the linting process.

components/core/src/clp_s/clp-s.cpp (3)

25-25: LGTM! Necessary include for output handler implementations.

The include for OutputHandlerImpl.hpp is required to access the concrete output handler classes that were moved from the original OutputHandler.hpp as part of the modularization effort.


111-111: LGTM! Correct enum usage after refactoring.

The change to use clp_s::FileType::KeyValueIr is consistent with moving the FileType enum from CommandLineArguments to the clp_s namespace in InputConfig.hpp.


231-258: LGTM! Proper namespace prefixes for output handlers.

The explicit clp_s:: namespace prefixes are correct as the concrete output handler implementations were moved from the clp_s::search namespace to the clp_s namespace as part of the modularization effort. This ensures proper access to the output handler classes defined in OutputHandlerImpl.hpp.

components/core/tests/clp_s_test_utils.hpp (2)

5-5: LGTM! Updated include for FileType enum.

The include change from CommandLineArguments.hpp to InputConfig.hpp is correct as the FileType enum was moved to InputConfig as part of the modularization effort.


23-23: LGTM! Simplified FileType parameter type.

The parameter type change from clp_s::CommandLineArguments::FileType to clp_s::FileType correctly uses the simplified enum type after it was moved to the clp_s namespace in InputConfig.hpp.

components/core/src/clp_s/ArchiveReaderAdaptor.cpp (1)

263-263: LGTM! Correctly updated function call after refactoring.

The change from ReaderUtils::try_create_reader to the free function try_create_reader is consistent with the broader refactoring that moved this functionality from ReaderUtils to InputConfig. The function signature and usage remain unchanged.

components/core/src/clp_s/ReaderUtils.hpp (1)

4-5: LGTM! Header includes correctly updated after method removal.

The replacement of specific headers with standard library headers is appropriate since the try_create_reader method was moved out of this class. The remaining functionality still requires <map> and <memory> for the SchemaMap type alias and shared pointer usage.

components/core/src/clp_s/search/kql/CMakeLists.txt (1)

9-25: LGTM! Excellent target renaming and aliasing for modularization.

The change from kql to clp_s_search_kql with the alias clp_s::search::kql follows CMake best practices for:

  • Avoiding target name conflicts with more descriptive naming
  • Providing clean namespaced interfaces through aliases
  • Maintaining consistency across all target references

All build configuration and dependencies are correctly preserved.

components/core/src/clp_s/search/sql/CMakeLists.txt (1)

9-24: LGTM! Consistent target renaming following the same pattern as KQL.

The change from sql to clp_s_search_sql with the alias clp_s::search::sql maintains consistency with the KQL module changes and follows the same CMake best practices for modularization. All build configuration is correctly preserved and target references are consistently updated.

components/core/src/clp_s/InputConfig.hpp (3)

5-5: LGTM! Well-structured include additions.

The new includes are appropriate for the functionality being added - <memory> for std::shared_ptr and ReaderInterface.hpp for the return type of try_create_reader.

Also applies to: 11-11


19-25: LGTM! Well-designed enum class.

The FileType enum class follows best practices with:

  • Explicit underlying type (uint8_t)
  • Explicit value for the first member to ensure stability
  • Clear, descriptive member names

116-123: LGTM! Well-designed function signature.

The try_create_reader function declaration follows good practices with:

  • Appropriate const-correctness for parameters
  • Clear return type using std::shared_ptr
  • [[nodiscard]] attribute to prevent ignoring the return value
  • Descriptive parameter names
components/core/tests/test-clp_s-search.cpp (3)

19-19: LGTM! Include path correctly updated.

The include path change from "../src/clp_s/search/OutputHandler.hpp" to "../src/clp_s/OutputHandlerImpl.hpp" correctly reflects the file reorganization where concrete output handler implementations were moved to a separate header.


45-45: LGTM! Consistent namespace updates.

All references to VectorOutputHandler have been consistently updated from clp_s::search::VectorOutputHandler to clp_s::VectorOutputHandler, properly reflecting the namespace restructuring where concrete output handlers moved from the search namespace to the main clp_s namespace.

Also applies to: 60-60, 99-99, 121-121


163-163: LGTM! FileType enum reference correctly updated.

The change from clp_s::CommandLineArguments::FileType::Json to clp_s::FileType::Json correctly reflects the movement of the FileType enum from CommandLineArguments to InputConfig.

components/core/src/clp_s/search/CMakeLists.txt (3)

1-3: LGTM! Proper subdirectory structure.

The subdirectory additions for ast, kql, and sql are correctly organized to support the modular search library architecture.


5-29: LGTM! Well-organized source file list.

The source file organization is logical, grouping related functionality together and using appropriate relative paths. The inclusion of both implementation and header files ensures proper compilation dependencies.


31-47: LGTM! Modern CMake library configuration.

The library definition follows modern CMake best practices with:

  • Proper alias target (clp_s::search)
  • Appropriate PUBLIC/PRIVATE linkage distinctions
  • Correct dependency specification
  • C++20 standard requirement

The PUBLIC linkage for core dependencies and PRIVATE linkage for logging is well-designed.

components/core/CMakeLists.txt (2)

318-319: LGTM! Source files correctly updated.

The replacement of the old OutputHandler.cpp reference with the new OutputHandlerImpl.cpp and OutputHandlerImpl.hpp files correctly reflects the refactoring where concrete output handler implementations were moved to separate files.


689-690: LGTM! Library dependencies properly namespaced.

The update from kql and sql to clp_s::search::kql and clp_s::search::sql correctly uses the fully qualified library names from the new modular structure, ensuring proper dependency resolution.

components/core/src/clp_s/search/OutputHandler.hpp (1)

14-64: Well-designed abstract interface!

The refactoring to a pure abstract interface is clean and follows SOLID principles. The [[nodiscard]] attributes on error-returning methods and the trailing return type syntax are excellent modern C++ practices.

components/core/src/clp_s/OutputHandlerImpl.hpp (7)

1-22: LGTM! Well-organized header structure.

The includes are properly grouped by category (system, standard library, third-party, internal) and the header guard follows the standard naming convention.


24-44: LGTM! Simple and effective stdout handler.

The implementation correctly overrides both write methods and provides useful context in the output format.


46-87: LGTM! Proper RAII implementation for network handling.

The destructor correctly checks and closes the socket, and the exception class provides proper error handling infrastructure. The delegation pattern in the simpler write method avoids code duplication.


89-170: LGTM! Well-designed MongoDB cache handler.

The implementation properly handles batching, result limits, and priority-based result management. The QueryResult struct captures all necessary metadata, and the comparator correctly implements timestamp-based ordering for the priority queue.


172-200: LGTM! Purpose-built count aggregation handler.

The empty implementation of the metadata write method is appropriate for a count-only handler. The error code documentation for the finish method is helpful.


202-238: LGTM! Correct time-bucketing implementation.

The bucket calculation properly floors timestamps to bucket boundaries, and using std::map ensures chronological ordering of buckets.


240-286: LGTM! Efficient vector-based result collection.

The implementation efficiently uses emplace_back and properly handles both write method variants. Storing a reference to the output vector is appropriate for this use case.

components/core/src/clp_s/CMakeLists.txt (7)

1-2: LGTM! Proper subdirectory organization.

Adding the search subdirectory aligns with the modularization goals of the PR.


83-109: LGTM! Well-structured temporary library.

The library is properly documented as temporary and has correct dependency specifications with appropriate PUBLIC/PRIVATE visibility.


134-148: LGTM! Consistent temporary library pattern.

The reducer dependencies library follows the same pattern as clp_dependencies and has appropriate minimal dependencies.


170-185: LGTM! Well-defined IO library.

The library groups IO-related functionality with appropriate dependencies for file and network operations.


228-311: LGTM! Clear separation of archive reading and writing.

The ArchiveWriter and ArchiveReader libraries are well-structured with appropriate dependencies. The separation enables selective linking based on application needs.


321-336: LGTM! Smart separation of MongoDB-dependent functionality.

Isolating JsonConstructor with its MongoDB dependency from ArchiveReader allows other projects to use the reader functionality without unnecessary transitive dependencies.


362-406: LGTM! Executable properly uses the new modular structure.

The clp-s executable correctly links against all the new modular libraries, demonstrating the successful refactoring from monolithic to modular build structure.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a4e2d2 and fe1fcc8.

📒 Files selected for processing (1)
  • components/core/tests/test-clp_s-search.cpp (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.

**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • components/core/tests/test-clp_s-search.cpp
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (3)
components/core/tests/test-clp_s-search.cpp (3)

56-58: Align validate_results signature with refactored namespace
The function signature now correctly references clp_s::VectorOutputHandler::QueryResult instead of the old clp_s::search namespace. This aligns with the new location of VectorOutputHandler.


152-152: Declare results vector using refactored type
The test now instantiates std::vector<clp_s::VectorOutputHandler::QueryResult> (instead of the search namespace variant), which matches the updated handler. Looks good.


216-216: Use relocated FileType enum
Swapping clp_s::CommandLineArguments::FileType::Json for clp_s::FileType::Json reflects the move of the enum into the core namespace. Ensure that the signature of compress_archive in clp_s_test_utils.hpp was updated accordingly.

Copy link
Contributor

@wraymo wraymo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR! I like the idea of splitting OutputHandler and moving try_create_reader to InputConfig

- Regex: "<(absl|antlr4|archive|boost|bsoncxx|catch2|curl|date|fmt|log_surgeon|lzma|mongocxx\
|msgpack|mysql|nlohmann|openssl|outcome|regex_utils|simdjson|spdlog|sqlite3|string_utils|yaml-cpp\
|ystdlib|zstd)"
- Regex: "<(absl|antlr4|archive|boost|bsoncxx|catch2|clp_s|curl|date|fmt|log_surgeon|lzma\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if clp_s should be priority 3 or 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put priority 3 just because our string_utils library is also in this list.

spdlog::spdlog
ZStd::ZStd
)
target_include_directories(clp_s_IO PUBLIC ../)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? It seems that we still use ../clp instead of clp to include headers. And if we want to apply it to all targets. Can we simply use include_directories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are generally to make it so that users that include these libraries have the option to include them like #include <clp_s/...>. I didn't change how any of the existing includes work within clp though to minimize refactoring.

Probably makes sense to refactor the ir stream search code to use that include style for the AST components it pulls in in a follow-up PR though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think in the future we can use the relative path from components/core/src for the headers (to get rid of headers like ../, ../../)

${CLP_S_REDUCER_SOURCES}
)
add_library(clp_s::reducer_dependencies ALIAS clp_s_reducer_dependencies)
target_compile_features(clp_s_reducer_dependencies PRIVATE cxx_std_20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use set(CMAKE_CXX_STANDARD 20) at the project level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might make sense, but I'm sort of just following the existing convention throughout the project. Maybe @kirkrodrigues has some thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik, target_compile_features is a requirement...

Specifies compiler features required when compiling a given target.

... whereas CMAKE_CXX_STANDARD (CXX_STANDARD's default value) is a request...

This property specifies the C++ standard whose features are requested to build this target. ... If the value requested does not result in a compile flag being added for the compiler in use, a previous standard flag will be added instead.

So in that sense, I think we should continue with target_compile_features since we do require C++20.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about combining with CXX_STANDARD_REQUIRED

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that could work then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the conclusion of this discussion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think leave it for now.

Also I think a parent project might be able to override the CXX_STANDARD flag by setting it to a certain value as a cache variable, so might be better to set it at the target level in general.

Comment on lines 297 to 301
PUBLIC
absl::flat_hash_map
clp_s::IO
nlohmann_json::nlohmann_json
PRIVATE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you follow any rules that what should be public and what should be private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My general approach was:

  • start with everything as PRIVATE and move dependencies into PUBLIC until other components that depended on the library could build
  • put any libraries that are definitely needed to properly use some library into PUBLIC

A lot of dependencies had to be made PUBLIC for each library just because there were a small number of header files from that library that directly included headers from one of the dependencies which made it so that users of the library also needed to link against that dependency in order to be able to find the headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there are any risks in making most of the libraries PUBLIC? I’ve noticed that in some cases, we include CLP-S files in another project, and those files in turn include headers from these libraries. If the libraries are not marked as PUBLIC, we might need to explicitly link them in the other project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real risks -- we'd be linking against them anyway. This just lets other projects that depend on us build correctly without having to explicitly link against every library that is transitively included by some header of ours.

- "{{.G_CORE_COMPONENT_DIR}}/src/clp_s/JsonParser.cpp"
- "{{.G_CORE_COMPONENT_DIR}}/src/clp_s/JsonParser.hpp"
- "{{.G_CORE_COMPONENT_DIR}}/src/clp_s/JsonSerializer.hpp"
- "{{.G_CORE_COMPONENT_DIR}}/src/clp_s/search/OutputHandlerImpl.cpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems OutputHandlerImpl.cpp is in {{.G_CORE_COMPONENT_DIR}}/src/clp_s/

- "{{.G_CORE_COMPONENT_DIR}}/src/clp_s/search/Output.cpp"
- "{{.G_CORE_COMPONENT_DIR}}/src/clp_s/search/Output.hpp"
- "{{.G_CORE_COMPONENT_DIR}}/src/clp_s/search/OutputHandler.cpp"
- "{{.G_CORE_COMPONENT_DIR}}/src/clp_s/search/OutputHandler.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have hpp file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I ended up just trying to fix the clang-tidy warnings for this file since it is pretty small anyway after these changes.

* @param network_auth
* @return the opened clp::ReaderInterface or nullptr on error
*/
[[nodiscard]] auto try_create_reader(Path const& path, NetworkAuthOption const& network_auth)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may find a better place for this method, but it's fine to put it here for now.

@gibber9809 gibber9809 requested a review from wraymo May 28, 2025 15:00
wraymo
wraymo previously approved these changes May 30, 2025
PRIVATE
spdlog::spdlog
)
target_include_directories(clp_s_search PUBLIC ../../)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we reorder this before target_link_libraries for the sake of alphabetization? Ditto in other places. Unless we're purposely doing this for relative includes for some reason?

${CLP_S_REDUCER_SOURCES}
)
add_library(clp_s::reducer_dependencies ALIAS clp_s_reducer_dependencies)
target_compile_features(clp_s_reducer_dependencies PRIVATE cxx_std_20)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the conclusion of this discussion?

)

add_library(
clp_s_IO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about clp_s_io? In the broader context, I think we should establish a naming convention for CMake libraries. We've been using snake_case in ystdlib as well as the sql & kql libraries, so how about carrying that over here? So ArchiveWriter would become archive_writer, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we roughly have a convention that subdirectory structure should be nested namespace structure should be library name that we don't completely follow right now.

I intentionally didn't mess with the directory structure or namespaces in this PR to avoid extra refactoring, but I can change the names to lowercase to fit the convention a bit better.

target_include_directories(clp_s_TimestampPattern PUBLIC ../)

set(
CLP_S_SOURCES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about CLP_S_EXE_SOURCES?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
components/core/src/clp_s/CMakeLists.txt (1)

347-354: 🧹 Nitpick (assertive)

Inconsistent alias naming for TimestampPattern
The alias clp_s::TimestampPattern uses PascalCase, whereas other libraries use lowercase names (e.g., clp_s::archive_reader). Consider renaming to clp_s::timestamp_pattern for consistency.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7b6514 and 84f08dd.

📒 Files selected for processing (2)
  • components/core/src/clp_s/CMakeLists.txt (5 hunks)
  • components/core/src/clp_s/search/CMakeLists.txt (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (27)
components/core/src/clp_s/search/CMakeLists.txt (5)

1-3: Subdirectory inclusion is correct
The add_subdirectory(ast), add_subdirectory(kql), and add_subdirectory(sql) calls are in the proper order and necessary for building the respective search submodules.


5-29: Explicit source listing is good
Listing each file in CLP_S_SEARCH_SOURCES avoids unpredictable globbing and keeps the build deterministic.


31-34: Library definition follows conventions
Defining clp_s_search and aliasing it to clp_s::search aligns with the project’s modular structure.


35-37: Compile features and include paths
Requiring C++20 and exposing ../../ for headers is appropriate for this target’s public interface.


38-47: Link dependencies scoped correctly
The PUBLIC/PRIVATE separation ensures that only necessary transitive dependencies are exposed.

components/core/src/clp_s/CMakeLists.txt (22)

2-2: search subdirectory ordering is fine
Including search after indexer maintains logical grouping and alphabetical order.


83-91: Core-dependencies library setup is consistent
The clp_s_clp_dependencies target and its alias are declared correctly, and C++20 is enforced.


95-109: clp_s_clp_dependencies link libraries look appropriate
PUBLIC vs PRIVATE scopes appear correctly assigned for transitive header usage.


111-120: Reducer-dependencies source list is clear
Explicit listing maintains build determinism.


136-144: Reducer library declaration is correct
clp_s_reducer_dependencies and its alias follow naming conventions.


142-148: Reducer link configuration is valid
Dependencies are scoped properly, only exposing JSON as PUBLIC.


150-158: I/O sources defined explicitly
Good to avoid globbing for I/O module.


170-176: I/O library target is well-formed
Alias clp_s::io and C++20 feature requirement are consistent with other modules.


175-185: I/O link dependencies look correct
BOOST and ZStd are private, core dependencies are linked properly.


187-195: Archive-writer source set is explicit
Listing all sources avoids non-determinism in the archive writer module.


228-236: Archive-writer library declaration is good
Alias clp_s::archive_writer and C++20 enforcement match project conventions.


233-250: Archive-writer link libraries scoped appropriately
Public and private dependencies are correctly separated to avoid polluting consumer link lines.


252-260: Archive-reader sources listed clearly
Deterministic source listing is maintained.


289-297: Archive-reader library setup is correct
Alias and C++20 feature usage align with other modules.


313-319: JSON-constructor source set is clear
Explicit listing avoids globbing.


321-328: JSON-constructor library declaration is correct
Alias clp_s::json_constructor and C++20 enforcement are consistent.


326-336: JSON-constructor link libraries defined properly
Dependencies scoped PUBLIC/PRIVATE appear accurate.


338-345: Timestamp-pattern sources listed explicitly
Good for deterministic builds.


354-360: Timestamp-pattern link configuration is fine
Dependencies are scoped privately as expected.


362-367: Executable source list is explicit
Good practice for reproducible builds.


374-381: Executable target and compile features
clp-s is defined correctly with C++20 enabled.


402-406: Runtime output directory is set appropriately
Redirecting the executable to PROJECT_BINARY_DIR ensures binaries land in the expected location.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
components/core/src/clp_s/CMakeLists.txt (1)

346-352: 🧹 Nitpick (assertive)

TimestampPattern alias naming inconsistency
The clp_s_TimestampPattern target and alias clp_s::TimestampPattern use PascalCase, whereas others use snake_case (e.g., clp_s::io). Consider standardizing on one style for all aliases to improve consistency.

♻️ Duplicate comments (2)
components/core/src/clp_s/CMakeLists.txt (2)

5-5: 🧹 Nitpick (assertive)

Clarify source-set variable name
Consider renaming CLP_S_CLP_SOURCES to CLP_S_CLP_DEPENDENCIES_SOURCES to more clearly reflect that these sources back the clp_s_clp_dependencies library.


385-395: 🧹 Nitpick (assertive)

Remove redundant search subcomponent linking
You link both clp_s::search and its internal parts (clp_s::search::ast, clp_s::search::kql). Since clp_s::search should already PUBLICly depend on its sub-libraries, you can remove the explicit clp_s::search::ast and clp_s::search::kql entries.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84f08dd and 4346bd7.

📒 Files selected for processing (1)
  • components/core/src/clp_s/CMakeLists.txt (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (20)
components/core/src/clp_s/CMakeLists.txt (20)

2-2: Modular build: Include search subdirectory
The new add_subdirectory(search) call correctly integrates the standalone search library. Ensure the search/CMakeLists.txt defines its targets and export aliases to maintain encapsulation.


111-113: Library sources grouping is clear
The set(CLP_S_REDUCER_SOURCES …) grouping cleanly organizes reducer sources for clp_s_reducer_dependencies.


136-140: Reducer dependencies library creation
The new clp_s_reducer_dependencies target and alias clp_s::reducer_dependencies correctly encapsulate reducer logic.


141-148: Reducer dependencies linkage
Linking nlohmann_json PUBLIC and clp_s::clp_dependencies PRIVATE aligns with the intended dependency flow.


151-158: I/O library sources grouping
Organizing CLP_S_IO_SOURCES under a dedicated clp_s_io target improves separation of concerns for IO utilities.


170-174: I/O library alias and features
The clp_s_io library with alias clp_s::io and C++20 feature requirement is well-structured.


187-192: Archive writer sources grouping
Defining CLP_S_ARCHIVE_WRITER_SOURCES clearly lists all writer-specific files.


228-233: Archive writer library alias
The clp_s_archive_writer target alias clp_s::archive_writer follows the pattern well.


234-238: Archive writer compile and include
C++20 requirement and include directory setup are correct for the writer.


239-243: Archive writer linkage
PUBLICly linking absl::flat_hash_map and clp_s::io while keeping other dependencies PRIVATE reflects proper encapsulation.


253-260: Archive reader sources grouping
The CLP_S_ARCHIVE_READER_SOURCES grouping sets up reader-specific files cleanly.


289-294: Archive reader library alias and features
clp_s_archive_reader with alias clp_s::archive_reader and C++20 requirement is consistent with other libraries.


295-302: Archive reader linkage
PUBLICly linking absl::flat_hash_map, clp_s::io, and nlohmann_json then privately linking other dependencies aligns with the intended API boundary.


313-319: JSON constructor sources grouping
Listing CLP_S_JSON_CONSTRUCTOR_SOURCES for the JSON constructor module is well organized.


320-326: JSON constructor library alias and features
The clp_s_json_constructor target with alias clp_s::json_constructor and C++20 requirement is well-defined.


327-333: JSON constructor linkage
PUBLICly depending on clp_s::archive_reader and privately on fmt, MongoCXX, and spdlog ensures minimal transitive dependencies.


338-344: TimestampPattern sources grouping
The CLP_S_TIMESTAMP_PATTERN_SOURCES set correctly lists all timestamp-pattern files.


361-370: Executable sources grouping
Defining CLP_S_EXE_SOURCES for downstream components promotes clarity in the clp-s target.


373-376: Executable target setup
The clp-s executable now cleanly includes the CLP_S_EXE_SOURCES set.


379-382: Executable include directories
Using the CLP_OUTCOME_INCLUDE_DIRECTORY is appropriate for outcome integration.

PRIVATE
simdjson::simdjson
)
target_include_directories(clp_s_search_ast PUBLIC ../../../)
Copy link
Member

@kirkrodrigues kirkrodrigues May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Move above target_link_libraries.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4346bd7 and dbf7231.

📒 Files selected for processing (3)
  • components/core/CMakeLists.txt (2 hunks)
  • components/core/src/clp_s/CMakeLists.txt (4 hunks)
  • components/core/src/clp_s/search/ast/CMakeLists.txt (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (13)
components/core/src/clp_s/search/ast/CMakeLists.txt (1)

62-64: Correct linkage of timestamp_pattern dependency
The clp_s_search_ast target now properly declares clp_s::timestamp_pattern as a PUBLIC dependency, matching the AST’s need for timestamp pattern parsing.

components/core/CMakeLists.txt (2)

318-319: Add new OutputHandlerImpl files to unit test sources
The new OutputHandlerImpl.cpp and OutputHandlerImpl.hpp are correctly included in the clp_s unit tests, ensuring the refactored output handler implementations are covered.


688-691: Link unit tests against new modular search and timestamp libraries
The unitTest target now links against clp_s::search::ast, clp_s::search::kql, clp_s::search::sql, and clp_s::timestamp_pattern, reflecting the intended modular build structure.

components/core/src/clp_s/CMakeLists.txt (10)

2-2: Register new search subdirectory
Adding add_subdirectory(search) integrates the consolidated search module (AST, KQL, SQL) into the clp_s build and aligns with the new modular namespace.


83-90: Verify inclusion of outcome headers in clp_s::clp_dependencies
target_include_directories(clp_s_clp_dependencies PUBLIC "${CLP_OUTCOME_INCLUDE_DIRECTORY}") exposes outcome headers externally. Ensure ${CLP_OUTCOME_INCLUDE_DIRECTORY} is defined and points to the correct location so downstream consumers can resolve these includes.


111-118: Modularize reducer dependencies
The clp_s::reducer_dependencies library cleanly encapsulates reducer sources, publicly linking only against nlohmann_json and privately on clp_s::clp_dependencies. This aligns with the goal of splitting the monolith into focused libraries.

Also applies to: 136-142


228-235: Review public vs private dependencies of archive_writer
clp_s::archive_writer publicly links absl::flat_hash_map and clp_s::io, while Boost::url and msgpack-cxx remain private. Verify that none of the public headers expose types from the private dependencies—if they do, those libraries should be promoted to PUBLIC.


289-296: Review public vs private dependencies of archive_reader
Similarly, confirm that ArchiveReader.hpp and related public headers don’t expose types from Boost::url or msgpack-cxx. If they do, adjust those linkages to PUBLIC for correct transitive propagation.


320-327: Validate json_constructor dependencies
The clp_s::json_constructor library publicly uses clp_s::archive_reader and privately fmt, mongocxx, and spdlog. Ensure JsonConstructor.hpp does not expose any mongocxx or formatting types; if it does, those should be PUBLIC.


347-353: Review timestamp_pattern API dependencies
If TimestampPattern.hpp publicly uses types from date::date (e.g., year_month_day), linking date::date as PRIVATE may be insufficient. Promote it to PUBLIC if its types appear in the public API.


362-369: Include new output handler implementations in executable
The OutputHandlerImpl.cpp and .hpp files are correctly added to the clp-s executable’s sources, replacing the old search variants and supporting the refactored namespace layout.


375-382: Ensure CLP_OUTCOME_INCLUDE_DIRECTORY is set
clp-s privately includes ${CLP_OUTCOME_INCLUDE_DIRECTORY}. Verify that this variable is always defined when building clp-s, otherwise the include path may be missing.


386-395:

❌ Incorrect review comment

Streamline clp-s linkage to clp_s::search
Since clp_s::search already PUBLICly depends on its sublibraries (ast, kql, and sql), you can remove the explicit clp_s::search::ast and clp_s::search::kql entries. Also verify whether you need to explicitly link clp_s::search::sql or rely on its transitive linkage.


🏁 Script executed:

#!/usr/bin/env bash
# Locate definitions of clp_s::search and its sub-libraries in CMakeLists
rg -n 'add_library.*search::ast' -C2
rg -n 'add_library.*search::kql' -C2
rg -n 'add_library.*search::sql' -C2
rg -n 'add_library.*clp_s::search' -C2
rg -n 'target_link_libraries.*clp_s::search' -C2

Length of output: 3516


🏁 Script executed:

#!/usr/bin/env bash
# Show all target_link_libraries calls in the clp_s::search CMakeLists
rg -n 'target_link_libraries' components/core/src/clp_s/search/CMakeLists.txt -C3

Length of output: 347


Incorrect assumption about transitive linkage in clp_s::search
The components/core/src/clp_s/search/CMakeLists.txt shows that clp_s::search only links against absl::flat_hash_map and does not publicly link its ast, kql or sql sub-libraries. Removing the explicit

clp_s::search::ast
clp_s::search::kql

entries from the parent target_link_libraries would break downstream consumers. You have two options:

  • Keep the explicit clp_s::search::ast/kql links where they’re currently used.
  • Or, modify clp_s::search’s own CMakeLists (around line 38) to:
    target_link_libraries(
        clp_s_search
        PUBLIC
            absl::flat_hash_map
    +       clp_s::search::ast
    +       clp_s::search::kql
    +       clp_s::search::sql
    )
    This would make those sub-libraries transitively available through clp_s::search.

Likely an incorrect or invalid review comment.

@gibber9809 gibber9809 requested a review from kirkrodrigues May 30, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants