-
Notifications
You must be signed in to change notification settings - Fork 83
feat(clp-s): Add delta-encoding support for integer columns; Use it for the log_event_idx column.
#1021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Walkthrough
Support for a new `DeltaInteger` node type was added across the CLP-S archive system. This includes new delta-encoded integer column reader and writer classes, schema and archive logic updates, and test coverage. The archive format patch version was incremented. A new test and input file validate delta encoding and log event index behaviour.
## Changes
| File(s) | Change Summary |
|---------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
| .../core/src/clp_s/SchemaTree.hpp<br>.../core/src/clp_s/SchemaTree.cpp | Added `DeltaInteger` to `NodeType` enum and handled its mapping to literal type. |
| .../core/src/clp_s/SingleFileArchiveDefs.hpp | Incremented archive patch version constant from 1 to 2. |
| .../core/src/clp_s/ColumnReader.hpp<br>.../core/src/clp_s/ColumnReader.cpp | Added `DeltaEncodedInt64ColumnReader` class for reading delta-encoded int64 columns. |
| .../core/src/clp_s/ColumnWriter.hpp<br>.../core/src/clp_s/ColumnWriter.cpp | Added `DeltaEncodedInt64ColumnWriter` class for writing delta-encoded int64 columns. |
| .../core/src/clp_s/ArchiveReader.cpp | Added handling for `DeltaInteger` in column reader creation and simplified log event index marking. |
| .../core/src/clp_s/ArchiveWriter.cpp | Added handling for `DeltaInteger` in schema writer initialization. |
| .../core/src/clp_s/SchemaReader.cpp | Supported `DeltaInteger` in timestamp extraction and JSON serialization logic. |
| .../core/src/clp_s/SchemaReader.hpp | Broadened log event index column type from `Int64ColumnReader*` to `BaseColumnReader*`. |
| .../core/src/clp_s/JsonParser.cpp | Changed `log_event_idx_node_id` metadata type from `Integer` to `DeltaInteger`. |
| .../core/CMakeLists.txt | Included new test source file for delta encoding in unit tests. |
| .../core/tests/test-clp_s-delta-encode-log-order.cpp | Added test for delta encoding and log event index, including custom filter and value checks. |
| .../core/tests/test_log_files/test_simple_order.jsonl | Added a simple ordered JSONL file for testing delta encoding and log event index. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant Test as Test Case
participant ArchiveWriter as ArchiveWriter
participant ArchiveReader as ArchiveReader
participant SchemaReader as SchemaReader
participant DeltaWriter as DeltaEncodedInt64ColumnWriter
participant DeltaReader as DeltaEncodedInt64ColumnReader
Test->>ArchiveWriter: Compress JSONL with DeltaInteger
ArchiveWriter->>DeltaWriter: Write delta-encoded int64 values
ArchiveWriter-->>Test: Archive created
Test->>ArchiveReader: Open archive
ArchiveReader->>SchemaReader: Read schema, identify DeltaInteger column
SchemaReader->>DeltaReader: Provide delta-encoded int64 column reader
Test->>DeltaReader: Seek and extract values at various indices
DeltaReader-->>Test: Return decoded int64 valuesPossibly related PRs
Suggested reviewers
Learnt from: davemarco Learnt from: AVMatthews Learnt from: AVMatthews Learnt from: LinZhihao-723 Learnt from: LinZhihao-723 Learnt from: LinZhihao-723 Learnt from: Bill-hbrhbr Learnt from: haiqi96 Learnt from: LinZhihao-723 Learnt from: gibber9809 Learnt from: gibber9809 Learnt from: LinZhihao-723 Learnt from: LinZhihao-723 Learnt from: haiqi96 Learnt from: AVMatthews Learnt from: gibber9809 Learnt from: gibber9809 Learnt from: gibber9809 Learnt from: LinZhihao-723 Learnt from: haiqi96 Learnt from: LinZhihao-723 Learnt from: AVMatthews Learnt from: LinZhihao-723 Learnt from: LinZhihao-723 Learnt from: AVMatthews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (19)
components/core/CMakeLists.txt(1 hunks)components/core/src/clp_s/ArchiveReader.cpp(3 hunks)components/core/src/clp_s/ArchiveWriter.cpp(1 hunks)components/core/src/clp_s/ColumnReader.cpp(2 hunks)components/core/src/clp_s/ColumnReader.hpp(1 hunks)components/core/src/clp_s/ColumnWriter.cpp(1 hunks)components/core/src/clp_s/ColumnWriter.hpp(1 hunks)components/core/src/clp_s/CommandLineArguments.cpp(0 hunks)components/core/src/clp_s/CommandLineArguments.hpp(0 hunks)components/core/src/clp_s/JsonParser.cpp(4 hunks)components/core/src/clp_s/JsonParser.hpp(0 hunks)components/core/src/clp_s/SchemaReader.cpp(4 hunks)components/core/src/clp_s/SchemaReader.hpp(2 hunks)components/core/src/clp_s/SchemaTree.cpp(1 hunks)components/core/src/clp_s/SchemaTree.hpp(1 hunks)components/core/src/clp_s/SingleFileArchiveDefs.hpp(1 hunks)components/core/src/clp_s/clp-s.cpp(0 hunks)components/core/tests/test-clp_s-delta-encode-log-order.cpp(1 hunks)components/core/tests/test_log_files/test_simple_order.jsonl(1 hunks)
💤 Files with no reviewable changes (4)
- components/core/src/clp_s/clp-s.cpp
- components/core/src/clp_s/CommandLineArguments.cpp
- components/core/src/clp_s/CommandLineArguments.hpp
- components/core/src/clp_s/JsonParser.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Preferfalse == <expression>rather than!<expression>.
components/core/src/clp_s/SchemaTree.hppcomponents/core/src/clp_s/SchemaTree.cppcomponents/core/src/clp_s/SingleFileArchiveDefs.hppcomponents/core/src/clp_s/ColumnWriter.cppcomponents/core/src/clp_s/ArchiveWriter.cppcomponents/core/src/clp_s/ArchiveReader.cppcomponents/core/src/clp_s/ColumnWriter.hppcomponents/core/src/clp_s/SchemaReader.cppcomponents/core/src/clp_s/SchemaReader.hppcomponents/core/src/clp_s/ColumnReader.cppcomponents/core/src/clp_s/ColumnReader.hppcomponents/core/src/clp_s/JsonParser.cppcomponents/core/tests/test-clp_s-delta-encode-log-order.cpp
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: conventional-commits
🔇 Additional comments (20)
components/core/tests/test_log_files/test_simple_order.jsonl (1)
1-3: Correct test fixture for ordered indices.The JSONL file provides a simple, ordered sequence of idx values as intended for delta-encoding validation. No issues found.
components/core/src/clp_s/SchemaTree.hpp (1)
44-44: Enumeration extension follows guidelines.
DeltaIntegerhas been correctly inserted immediately beforeUnknown, preserving enum ordering constraints and ensuring backward compatibility of encoded values.components/core/src/clp_s/SchemaTree.cpp (1)
14-15: DeltaInteger mapped to Integer literal type.The added
case NodeType::DeltaIntegerreturningLiteralType::IntegerTis correct and consistent with integer handling.components/core/CMakeLists.txt (1)
634-634: New unit test added to build.Inclusion of
test-clp_s-delta-encode-log-order.cppin theunitTesttarget is correct, ensuring the delta-encoding behavior is verified.components/core/src/clp_s/SingleFileArchiveDefs.hpp (1)
14-14: Archive patch version incremented.Updating
cArchivePatchVersionfrom 1 to 2 accurately reflects the format change required for introducingDeltaInteger.components/core/src/clp_s/ArchiveWriter.cpp (1)
316-318: LGTM: Proper integration of delta integer support.The addition of the
DeltaIntegercase follows the established pattern for handling different node types in the schema writer initialization.components/core/src/clp_s/ColumnWriter.cpp (1)
26-29: LGTM: Storage implementation follows established pattern.The
storemethod correctly writes the delta-encoded values using the same approach as other column writers.components/core/src/clp_s/ArchiveReader.cpp (3)
191-193: LGTM: Proper DeltaInteger support in column reader factory.The addition of the
DeltaIntegercase follows the established pattern for creating column readers based on node type.
244-246: LGTM: Consistent DeltaInteger support for unordered columns.The unordered column handling correctly includes the same
DeltaIntegercase, maintaining consistency across ordered and unordered column creation paths.
333-335: LGTM: Simplified log event index column marking.The simplified logic for marking the log event index column is cleaner and works correctly with the generalized
BaseColumnReader*type.components/core/src/clp_s/ColumnWriter.hpp (1)
66-82: LGTM: Well-designed DeltaColumnWriter class declaration.The class follows the established pattern for column writers with proper inheritance, virtual destructor, and appropriate private members for delta encoding functionality.
components/core/src/clp_s/SchemaReader.hpp (2)
209-211: LGTM: Proper generalization for delta encoding support.Changing the parameter type from
Int64ColumnReader*toBaseColumnReader*appropriately supports the newDeltaColumnReaderwhile maintaining backwards compatibility.
324-324: LGTM: Consistent type generalization for log event index column.The member variable type change aligns with the method parameter change, providing consistent support for any column reader type as the log event index column.
components/core/src/clp_s/JsonParser.cpp (1)
513-514: LGTM: Clean transition to delta-encoded log event index.The changes correctly remove the conditional logic and consistently use
NodeType::DeltaIntegerfor the log event index field. The implementation properly adds the log event index value to each message.Also applies to: 583-587
components/core/src/clp_s/SchemaReader.cpp (2)
32-36: LGTM: Proper timestamp handling for delta-encoded integers.The implementation correctly adds support for
DeltaIntegerin timestamp extraction, following the same pattern as the existingIntegercase with appropriate casting toDeltaColumnReader.
436-436: LGTM: Consistent JSON serialization support for delta integers.The changes properly extend JSON serialization to handle
DeltaIntegernodes alongsideIntegernodes, ensuring both types are treated consistently during template generation.Also applies to: 521-521, 630-630
components/core/src/clp_s/ColumnReader.cpp (1)
19-25: LGTM: Proper initialization of delta decoding state.The load method correctly initializes the delta decoding state by setting the current index to 0 and current value to the first element when messages are present.
components/core/src/clp_s/ColumnReader.hpp (1)
94-119: LGTM: Well-designed delta column reader class.The
DeltaColumnReaderclass declaration follows established patterns with proper inheritance, virtual method overrides, and appropriate private members for maintaining delta decoding state.components/core/tests/test-clp_s-delta-encode-log-order.cpp (2)
63-113: LGTM: Comprehensive test for delta encoding functionality.The test effectively validates the delta encoding implementation by testing archive compression/decompression, column reader type verification, and delta decoding correctness with various access patterns. The use of parametrized testing with different starting indices is excellent for ensuring robustness.
24-46: LGTM: Clean test utility for accessing column readers.The
SimpleFilterClassprovides a clean way to access the underlying column readers from aSchemaReader, enabling effective testing of the delta column reader functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
components/core/src/clp_s/ColumnReader.cpp (1)
27-41: Bounds checking issue still exists.The delta decoding logic lacks proper bounds checking that could lead to buffer overflows, as previously identified.
The method still accesses
m_values[m_cur_idx + 1]andm_values[m_cur_idx]without validating that these indices are within bounds. Apply the previously suggested fix to add bounds checking.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
components/core/src/clp_s/ArchiveReader.cpp(3 hunks)components/core/src/clp_s/ArchiveWriter.cpp(1 hunks)components/core/src/clp_s/ColumnReader.cpp(2 hunks)components/core/src/clp_s/ColumnReader.hpp(1 hunks)components/core/src/clp_s/ColumnWriter.cpp(1 hunks)components/core/src/clp_s/ColumnWriter.hpp(1 hunks)components/core/src/clp_s/SchemaReader.cpp(4 hunks)components/core/tests/test-clp_s-delta-encode-log-order.cpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Preferfalse == <expression>rather than!<expression>.
components/core/src/clp_s/ArchiveWriter.cppcomponents/core/src/clp_s/ColumnWriter.cppcomponents/core/src/clp_s/ArchiveReader.cppcomponents/core/src/clp_s/ColumnWriter.hppcomponents/core/src/clp_s/ColumnReader.cppcomponents/core/src/clp_s/SchemaReader.cppcomponents/core/src/clp_s/ColumnReader.hppcomponents/core/tests/test-clp_s-delta-encode-log-order.cpp
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-14, false)
🔇 Additional comments (15)
components/core/src/clp_s/ArchiveWriter.cpp (1)
316-318: LGTM: Clean integration of DeltaInteger supportThe addition of the
DeltaIntegercase follows the established pattern in the switch statement and correctly instantiates aDeltaEncodedInt64ColumnWriterfor delta-encoded integer columns.components/core/src/clp_s/ColumnWriter.cpp (2)
14-24: LGTM: Correct delta encoding implementationThe delta encoding logic is implemented correctly:
- First value stored directly and
m_curis initialized- Subsequent values store the difference (
next - m_cur) and updatem_cur- Returns consistent
sizeof(int64_t)for memory accountingThis follows standard delta encoding principles and should provide the compression benefits mentioned in the PR objectives.
26-29: LGTM: Store method correctly handles delta-encoded dataThe store method appropriately writes the delta-encoded values as raw bytes to the compressor, following the same pattern as other column writers.
components/core/src/clp_s/ArchiveReader.cpp (3)
191-193: LGTM: Consistent DeltaInteger support in ordered columnsThe addition of the
DeltaIntegercase correctly creates aDeltaEncodedInt64ColumnReaderinstance, following the same pattern as other column types.
244-246: LGTM: Consistent DeltaInteger support in unordered columnsThe
DeltaIntegercase is correctly handled in unordered columns, maintaining consistency with the ordered column handling.
333-335: LGTM: Simplified log event index column markingThe removal of the dynamic cast check and direct use of
column_readeris cleaner and works correctly since the method signature was generalized to acceptBaseColumnReader*.components/core/src/clp_s/ColumnWriter.hpp (1)
66-82: LGTM: Well-structured DeltaEncodedInt64ColumnWriter classThe class declaration follows established patterns from other column writers and correctly includes:
- Proper inheritance from
BaseColumnWriter- Brace initialization for
m_cur{}ensuring it starts at 0- Appropriate member variables for delta encoding implementation
- Consistent virtual method overrides
The structure aligns well with the existing codebase conventions.
components/core/src/clp_s/SchemaReader.cpp (4)
32-36: LGTM: Correct timestamp extraction for DeltaInteger columnsThe implementation correctly casts to
DeltaEncodedInt64ColumnReader*and extracts the int64_t value for timestamp processing, following the same pattern as regular Integer columns.
436-441: LGTM: Consistent DeltaInteger handling in structured array templatesThe addition of
DeltaIntegeralongsideIntegerin the case statement correctly treats both types identically for JSON serialization purposes, since they both represent integer values with different storage encodings.
521-526: LGTM: Consistent DeltaInteger handling in structured object templatesThe
DeltaIntegercase correctly follows the same logic asIntegerfor JSON field generation, maintaining consistency across the serialization logic.
630-635: LGTM: Consistent DeltaInteger handling in JSON template generationThe handling of
DeltaIntegeralongsideIntegerensures consistent JSON serialization behavior across all template generation methods, correctly treating them as integer fields.components/core/src/clp_s/ColumnReader.cpp (3)
19-25: LGTM! Initialization logic is correct.The load method properly initializes the delta-encoded column reader with the first value as the base and sets up tracking variables.
43-47: LGTM! Delegation pattern is correctly implemented.The extract_value method properly delegates to the bounds-checked get_value_at_idx method.
58-63: LGTM! String conversion logic is correct.The extract_string_value_into_buffer method correctly converts the delta-decoded value to string format.
components/core/src/clp_s/ColumnReader.hpp (1)
94-125: Well-designed class with proper documentation.The DeltaEncodedInt64ColumnReader class follows the established pattern of other column readers with:
- Proper inheritance and virtual method overrides
- Clear documentation for the private helper method
- Appropriate member variables for delta decoding state
The class interface is consistent with the existing codebase architecture.
log_event_idx column.
wraymo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me.
| } else if (m_timestamp_column->get_type() == NodeType::DeltaInteger) { | ||
| m_get_timestamp = [this]() { | ||
| return std::get<int64_t>(static_cast<DeltaEncodedInt64ColumnReader*>(m_timestamp_column) | ||
| ->extract_value(m_cur_message)); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we apply this encoding to an integer timestamp column in a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that that will probably be worthwhile. We can experiment with that when we implement timestamp normalization.
|
For the title, what about feat(clp-s): Add delta-encoding support for integer columns; Use it for |
log_event_idx column.| std::vector<clp_s::Path> archive_paths; | ||
| REQUIRE(clp_s::get_input_archives_for_raw_path( | ||
| std::string{cTestDeltaEncodeOrderArchiveDirectory}, | ||
| archive_paths | ||
| )); | ||
| REQUIRE(1 == archive_paths.size()); | ||
| clp_s::ArchiveReader archive_reader; | ||
| REQUIRE_NOTHROW(archive_reader.open(archive_paths.back(), clp_s::NetworkAuthOption{})); | ||
| REQUIRE_NOTHROW(archive_reader.read_dictionaries_and_metadata()); | ||
| REQUIRE_NOTHROW(archive_reader.open_packed_streams()); | ||
| auto mpt = archive_reader.get_schema_tree(); | ||
| auto log_event_idx_node_id = mpt->get_metadata_field_id(clp_s::constants::cLogEventIdxName); | ||
| REQUIRE(-1 != log_event_idx_node_id); | ||
| std::vector<std::shared_ptr<clp_s::SchemaReader>> schema_readers; | ||
| REQUIRE_NOTHROW(schema_readers = archive_reader.read_all_tables()); | ||
| REQUIRE(1 == schema_readers.size()); | ||
| auto schema_reader = schema_readers.back(); | ||
| REQUIRE(cNumEntries == schema_reader->get_num_messages()); | ||
| SimpleFilterClass simple_filter_class; | ||
| schema_reader->initialize_filter(&simple_filter_class); | ||
| clp_s::BaseColumnReader* log_event_idx_reader{nullptr}; | ||
| for (auto* column_reader : simple_filter_class.get_column_readers()) { | ||
| if (log_event_idx_node_id == column_reader->get_id()) { | ||
| log_event_idx_reader = column_reader; | ||
| break; | ||
| } | ||
| } | ||
| REQUIRE(nullptr != log_event_idx_reader); | ||
| REQUIRE(clp_s::NodeType::DeltaInteger == log_event_idx_reader->get_type()); | ||
| REQUIRE(nullptr != dynamic_cast<clp_s::DeltaEncodedInt64ColumnReader*>(log_event_idx_reader)); | ||
| // Test forwards and backwards seeks on `DeltaEncodedInt64ColumnReader`. | ||
| size_t i{start_index}; | ||
| for (size_t num_iterations{0ULL}; num_iterations < cNumEntries; ++num_iterations) { | ||
| int64_t val{}; | ||
| REQUIRE_NOTHROW(val = std::get<int64_t>(log_event_idx_reader->extract_value(i))); | ||
| REQUIRE(val == static_cast<int64_t>(i)); | ||
| i = (i + 1) % cNumEntries; | ||
| } | ||
| REQUIRE_NOTHROW(archive_reader.close()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it could be broken up into blocks for readability.
Co-authored-by: kirkrodrigues <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
components/core/tests/test-clp_s-delta-encode-log-order.cpp (3)
22-22: Documentation needed for test constant.The constant
cNumEntries{3}should include a comment explaining its significance and relationship to the test data file.
25-27: Improve class documentation.The comment could be more concise by removing the redundant "This class is" phrase.
63-114: Consider enhancing test coverage and readability.The test provides good basic coverage but could benefit from:
- Breaking the test logic into smaller, more focused blocks for better readability
- Adding edge case testing (empty datasets, single entries, boundary conditions)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/tests/test-clp_s-delta-encode-log-order.cpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}: - Preferfalse == <expression>rather than!<expression>.
⚙️ Source: CodeRabbit Configuration File
List of files the instruction was applied to:
components/core/tests/test-clp_s-delta-encode-log-order.cpp
🧠 Learnings (2)
📓 Common learnings
Learnt from: davemarco
PR: y-scope/clp#698
File: components/core/src/clp/streaming_archive/Constants.hpp:9-9
Timestamp: 2025-01-28T03:02:30.542Z
Learning: In the CLP project's archive version format (semver), the patch version uses 16 bits (uint16_t), while major version uses 8 bits and minor version uses 16 bits.
components/core/tests/test-clp_s-delta-encode-log-order.cpp (19)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:85-103
Timestamp: 2024-10-14T03:42:53.145Z
Learning: The function `assert_kv_pair_log_event_creation_failure` is correctly placed within the anonymous namespace in `test-ffi_KeyValuePairLogEvent.cpp`.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:13.322Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, validation and exception throwing are unnecessary in the `get_archive_node_id` method when processing nodes, and should not be added.
Learnt from: haiqi96
PR: y-scope/clp#619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ `FileDecompressor` implementations at `components/core/src/clp/clp/FileDecompressor.cpp` and `components/core/src/glt/glt/FileDecompressor.cpp`, the `temp_output_path` variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Learnt from: LinZhihao-723
PR: y-scope/clp#593
File: components/core/tests/test-NetworkReader.cpp:216-219
Timestamp: 2024-11-15T03:15:45.919Z
Learning: In the `network_reader_with_valid_http_header_kv_pairs` test case in `components/core/tests/test-NetworkReader.cpp`, additional error handling for JSON parsing failures is not necessary, as the current error message is considered sufficient.
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/src/clp/ffi/ir_stream/utils.hpp:0-0
Timestamp: 2024-10-18T02:31:18.595Z
Learning: In `components/core/src/clp/ffi/ir_stream/utils.hpp`, the function `size_dependent_encode_and_serialize_schema_tree_node_id` assumes that the caller checks that `node_id` fits within the range of `encoded_node_id_t` before casting.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.265Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:769-779
Timestamp: 2024-10-07T21:16:41.660Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, when handling errors in `parse_from_ir`, prefer to maintain the current mix of try-catch and if-statements because specific messages are returned back up in some cases.
Learnt from: gibber9809
PR: y-scope/clp#630
File: components/core/src/clp_s/JsonParser.cpp:702-703
Timestamp: 2024-12-10T16:03:08.691Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `get_archive_node_id` function, validation and exception throwing for UTF-8 compliance of `curr_node.get_key_name()` are unnecessary and should be omitted.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:91-94
Timestamp: 2024-11-12T18:47:03.828Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, the `SchemaNode` class uses `std::unique_ptr<char[]> m_key_buf` and `std::string_view m_key_name` to ensure that references to `m_key_name` remain valid even after `SchemaNode` is move-constructed.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:171-171
Timestamp: 2024-11-12T18:46:20.933Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, it's acceptable to use `std::string_view` as keys in `m_node_map` because `SchemaNode`'s `m_key_name` remains valid even after move operations or reallocations, preventing dangling references.
Learnt from: gibber9809
PR: y-scope/clp#584
File: components/core/src/clp_s/SchemaTree.hpp:40-55
Timestamp: 2024-11-12T18:56:31.067Z
Learning: In `components/core/src/clp_s/SchemaTree.hpp`, within the `SchemaNode` class, the use of `std::string_view` for `m_key_name` referencing `m_key_buf` is intentional to ensure that references to the key name remain valid even after move construction.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:756-765
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir()` function, reaching the end of log events in a given IR is not considered an error case. The errors `std::errc::no_message_available` and `std::errc::result_out_of_range` are expected signals to break the deserialization loop and proceed accordingly.
Learnt from: AVMatthews
PR: y-scope/clp#543
File: components/core/src/clp_s/JsonParser.cpp:735-794
Timestamp: 2024-10-07T21:35:04.362Z
Learning: In `components/core/src/clp_s/JsonParser.cpp`, within the `parse_from_ir` method, encountering errors from `kv_log_event_result.error()` aside from `std::errc::no_message_available` and `std::errc::result_out_of_range` is anticipated behavior and does not require additional error handling or logging.
🧬 Code Graph Analysis (1)
components/core/tests/test-clp_s-delta-encode-log-order.cpp (1)
components/core/src/clp_s/ColumnReader.hpp (25)
reader(35-35)reader(80-80)reader(103-103)reader(136-136)reader(159-159)reader(191-191)reader(241-241)reader(278-278)cur_message(46-46)cur_message(54-54)cur_message(62-65)cur_message(63-63)cur_message(84-86)cur_message(88-88)cur_message(107-109)cur_message(111-111)cur_message(140-142)cur_message(144-144)cur_message(163-165)cur_message(167-167)cur_message(197-199)cur_message(201-201)cur_message(204-204)cur_message(211-211)DeltaInteger(105-105)
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
🔇 Additional comments (1)
components/core/tests/test-clp_s-delta-encode-log-order.cpp (1)
106-112: LGTM: Seek testing logic is well-designed.The parameterised testing with different start indices combined with the modular arithmetic ensures comprehensive coverage of forward/backward seeking functionality in the delta-encoded column reader.
log_event_idx column.
kirkrodrigues
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the changes I requested; deferring to @wraymo's approval for everything else.
…or the `log_event_idx` column. (y-scope#1021) Co-authored-by: kirkrodrigues <[email protected]>
Description
This PR adds support for delta-encoding integers and uses that support to delta-encode the log_event_idx column. This leads to a significant improvement in compression ratio when log event ordering is enabled for some datasets.
The following table shows the compression ratio improvements for the open source datasets, with the compression ratio without log ordering as a baseline:
Compression speed and ordered decompression speed seem to not significantly change compared to when not using delta-encoded log ordering.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Summary by CodeRabbit