-
Notifications
You must be signed in to change notification settings - Fork 84
feat(integration-tests): Add core binary tests for the log-converter and clp-s pipeline.
#1591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(integration-tests): Add core binary tests for the log-converter and clp-s pipeline.
#1591
Conversation
WalkthroughAdds multiple unstructured-log JSON fixtures, a new identity transformation integration test that invokes the log-converter and verifies CLP-S compress/decompress canonical output, and exposes a Changes
Sequence Diagram(s)sequenceDiagram
%% Styling: subtle coloured notes for key components
participant Test as Test (pytest)
participant FS as File System
participant LogConv as log-converter
participant CLPS as CLP-S pipeline
participant Validator as Comparator
Test->>FS: enumerate test case dirs
Test->>FS: remove previous KV-IR outputs
Test->>LogConv: invoke `log-converter` (via log_converter_binary_path)
LogConv->>FS: write KV-IR output files
Test->>CLPS: run compress then decompress (CLP-S)
CLPS->>FS: produce canonical output file (`original`)
Test->>FS: read `converted.json` (expected)
Test->>Validator: compare canonical output vs expected
Validator-->>Test: result (pass/fail)
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro ⛔ Files ignored due to path filters (3)
📒 Files selected for processing (3)
🧰 Additional context used🧠 Learnings (1)📓 Common learnings🪛 Biome (2.1.2)integration-tests/tests/data/unstructured-logs/02-hive-24hr/converted.json[error] 1-2: End of file expected Use an array for a sequence of values: (parse) [error] 2-3: End of file expected Use an array for a sequence of values: (parse) [error] 3-4: End of file expected Use an array for a sequence of values: (parse) [error] 4-5: End of file expected Use an array for a sequence of values: (parse) [error] 5-6: End of file expected Use an array for a sequence of values: (parse) [error] 6-7: End of file expected Use an array for a sequence of values: (parse) [error] 7-8: End of file expected Use an array for a sequence of values: (parse) [error] 8-9: End of file expected Use an array for a sequence of values: (parse) [error] 9-10: End of file expected Use an array for a sequence of values: (parse) [error] 10-11: End of file expected Use an array for a sequence of values: (parse) [error] 11-12: End of file expected Use an array for a sequence of values: (parse) [error] 12-13: End of file expected Use an array for a sequence of values: (parse) [error] 13-14: End of file expected Use an array for a sequence of values: (parse) [error] 14-15: End of file expected Use an array for a sequence of values: (parse) [error] 15-16: End of file expected Use an array for a sequence of values: (parse) [error] 16-17: End of file expected Use an array for a sequence of values: (parse) [error] 17-18: End of file expected Use an array for a sequence of values: (parse) integration-tests/tests/data/unstructured-logs/03-openstack-24hr/converted.json[error] 1-2: End of file expected Use an array for a sequence of values: (parse) [error] 2-3: End of file expected Use an array for a sequence of values: (parse) [error] 3-4: End of file expected Use an array for a sequence of values: (parse) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai where the pytest fail msg may be truncated due to path being too long. I don't know what's a good solution. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 2497 🏁 Script executed: Length of output: 167 🏁 Script executed: Length of output: 42 @Bill-hbrhbr I've opened issue #1592 to track the pytest assertion message truncation problem with long file paths in JSON comparison. The issue documents:
The issue has been assigned to you. You can view it here: #1592 🧠 Learnings used |
quinntaylormitchell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good; a few comments in the code, and some high-level comments as well:
- Fundamentally, I think that
test_log_converter_clp_s_identity_transformis testing the functionality of thelog-converterbinary, but in some ways it seems like it's testing theclp-sbinary's ability to compress/decompress KV-IR as well. I think that if we're going to test the veracity of KV-IR compression/decompression, that should happen in a separate test, and we should make it clear thattest_log_converter_clp_s_identity_transformis testing the veracity of the functionality of thelog-converterbinary by way of compressing and decompressing the KV-IR thatlog-converterproduces. To that end, I've suggested a new docstring; have a look. - The concept of having the test cases enumerated by the directory structure in
integration-tests/tests/datadoesn't really sit right with me, although I do see how it simplifies the code. Feel free to disregard, I just wanted to bring it up. - I think that overall, the tests should be logged better. The logging doesn't need to be verbose at all, but it would be good to have some little indications telling the dev what the test is doing while it's doing it.
| Validate the end-to-end functionality of the `log-converter` and `clp-s` pipeline. | ||
| This test ensures that: | ||
| 1. `log-converter` correctly transforms unstructured logs into key-value IR format. | ||
| 2. The kv-IR output can be compressed and decompressed by `clp-s` without data loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Validate the end-to-end functionality of the `log-converter` and `clp-s` pipeline. | |
| This test ensures that: | |
| 1. `log-converter` correctly transforms unstructured logs into key-value IR format. | |
| 2. The kv-IR output can be compressed and decompressed by `clp-s` without data loss. | |
| Validates the end-to-end functionality of the `log-converter` binary by compressing and | |
| decompressing the KV-IR it produces upon ingesting an unstructured log file. |
See high-level comment.
| log_converter_bin_path_str = str(core_config.log_converter_binary_path) | ||
|
|
||
| unstructured_logs_dir = Path(__file__).resolve().parent / "data" / "unstructured-logs" | ||
| for test_case_dir in unstructured_logs_dir.iterdir(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my high-level comment about the dependence on directory structure to enumerate test cases.
| log_converter_out_dir.mkdir(parents=True, exist_ok=True) | ||
| log_converter_bin_path_str = str(core_config.log_converter_binary_path) | ||
|
|
||
| unstructured_logs_dir = Path(__file__).resolve().parent / "data" / "unstructured-logs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| unstructured_logs_dir = Path(__file__).resolve().parent / "data" / "unstructured-logs" | |
| unstructured_logs_dir = ( | |
| Path(get_env_var("INTEGRATION_TESTS_DIR")).expanduser().resolve() | |
| / "data" | |
| / "unstructured-logs" | |
| ) |
How would you feel about having this be dependent on a new env variable in .pytest.ini? It clearly makes the code here longer, but I'm worried that having file-location-dependent path buried here in this code will be more difficult to notice/remember to change, if we ever need to.
|
|
||
| test_name = test_case_dir.name | ||
| kv_ir_out = log_converter_out_dir / test_name | ||
| unlink(kv_ir_out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels odd to have this unlink here; I realize it isn't wrong per se, but is there a reason you've put it here at the beginning of the test case rather than at the end with the test_paths.clear_test_outputs() line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because there may be leftover files from a previous test that we'd like to remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, but if the kv_ir_out directory is only ever used during this test, wouldn't it make sense to just unlink it when we're done, rather than unlinking it only when a dev runs the test again? kind of like a "clean up everything when you're done" policy?
| _clp_s_compress_and_decompress(core_config, test_paths) | ||
|
|
||
| expected_out = test_case_dir / "converted.json" | ||
| actual_out = test_paths.decompression_dir / CLP_S_CANONICAL_OUTPUT_FILENAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a bit of trouble with the value of CLP_S_CANONICAL_OUTPUT_FILENAME being "original", because it seems to me that the output of a decompression task is, by definition, not the original file (even though it should be structurally identical to the original). What do you think?
Description
This PR adds integration tests for #1460 by testing examples mentioned in #1308.
Checklist
breaking change.
Validation performed
uv run pytest -k test_log_converter_clp_s_identity_transformpasses.Summary by CodeRabbit
Tests
Chores