feat(integration-tests): Add core binary tests for the `log-converter` and `clp-s` pipeline. #1591

Bill-hbrhbr · 2025-11-12T21:01:38Z

Description

This PR adds integration tests for #1460 by testing examples mentioned in #1308.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

uv run pytest -k test_log_converter_clp_s_identity_transform passes.

Summary by CodeRabbit

Tests
- Added an identity-transformation integration test for the log-conversion pipeline.
- Added multiple unstructured log test datasets (simple, escaped-sequence, Hive 24hr, OpenStack 24hr, Hadoop multiline).
Chores
- Added test infrastructure support for the log-converter binary and a canonical-output filename constant.
- Improved test cleanup to remove prior outputs before re-running.

coderabbitai · 2025-11-12T21:01:48Z

Walkthrough

Adds multiple unstructured-log JSON fixtures, a new identity transformation integration test that invokes the log-converter and verifies CLP-S compress/decompress canonical output, and exposes a log_converter_binary_path property in test config.

Changes

Cohort / File(s)	Change Summary
Unstructured log test fixtures `integration-tests/tests/data/unstructured-logs/00-hello-world/converted.json`, `integration-tests/tests/data/unstructured-logs/01-escape-seq/converted.json`, `integration-tests/tests/data/unstructured-logs/02-hive-24hr/converted.json`, `integration-tests/tests/data/unstructured-logs/03-openstack-24hr/converted.json`, `integration-tests/tests/data/unstructured-logs/04-hadoop-multiline/converted.json`	Added JSON files containing expected converted unstructured log entries (single- and multi-line records) used as ground-truth for identity transformation tests.
Identity transformation tests `integration-tests/tests/test_identity_transformation.py`	Added constant `CLP_S_CANONICAL_OUTPUT_FILENAME`, introduced `test_log_converter_clp_s_identity_transform` to run the `log-converter` binary, clean previous outputs, produce KV-IR, run CLP‑S compress/decompress, and compare canonical output to `converted.json`. Minor refactor to reuse the filename constant.
Test configuration utility `integration-tests/tests/utils/config.py`	Added `log_converter_binary_path` property to `CoreConfig` returning the Path to the `log-converter` executable.

Sequence Diagram(s)

sequenceDiagram
    %% Styling: subtle coloured notes for key components
    participant Test as Test (pytest)
    participant FS as File System
    participant LogConv as log-converter
    participant CLPS as CLP-S pipeline
    participant Validator as Comparator

    Test->>FS: enumerate test case dirs
    Test->>FS: remove previous KV-IR outputs
    Test->>LogConv: invoke `log-converter` (via log_converter_binary_path)
    LogConv->>FS: write KV-IR output files
    Test->>CLPS: run compress then decompress (CLP-S)
    CLPS->>FS: produce canonical output file (`original`)
    Test->>FS: read `converted.json` (expected)
    Test->>Validator: compare canonical output vs expected
    Validator-->>Test: result (pass/fail)

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

New tests and multiple data fixtures increase review surface but follow consistent patterns.
Areas to pay extra attention:
- test_log_converter_clp_s_identity_transform — ensure subprocess invocation, cleanup, and paths are correct and portable.
- log_converter_binary_path in integration-tests/tests/utils/config.py — verify resolution logic and test environment expectations.
- The new converted.json fixtures for correct encoding/escaping of multiline and escaped sequences.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: adding integration tests for the log-converter binary and clp-s pipeline, which is the core objective of the PR.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89cfc6b and 54407a6.

⛔ Files ignored due to path filters (3)

integration-tests/tests/data/unstructured-logs/02-hive-24hr/raw.log is excluded by !**/*.log
integration-tests/tests/data/unstructured-logs/03-openstack-24hr/raw.log is excluded by !**/*.log
integration-tests/tests/data/unstructured-logs/04-hadoop-multiline/raw.log is excluded by !**/*.log

📒 Files selected for processing (3)

integration-tests/tests/data/unstructured-logs/02-hive-24hr/converted.json (1 hunks)
integration-tests/tests/data/unstructured-logs/03-openstack-24hr/converted.json (1 hunks)
integration-tests/tests/data/unstructured-logs/04-hadoop-multiline/converted.json (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.

Learnt from: AVMatthews
Repo: y-scope/clp PR: 595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

🪛 Biome (2.1.2)

integration-tests/tests/data/unstructured-logs/02-hive-24hr/converted.json

[error] 1-2: End of file expected