Skip to content

Conversation

@faysou
Copy link
Collaborator

@faysou faysou commented Oct 13, 2025

Pull Request

NautilusTrader prioritizes correctness and reliability, please follow existing patterns for validation and testing.

  • I have reviewed the CONTRIBUTING.md and followed the established practices

Summary

Refactor streaming writer to support per-bar-type persistence and improve catalog conversion

Overview

This PR refactors the streaming writer infrastructure to properly handle per-bar-type file organization and enhances the catalog's ability to convert streamed data back to parquet format. The changes enable better organization of streamed data and support for converting internal bars to external bars during catalog conversion.

Key changes

Streaming writer improvements:

  • Refactored StreamingFeatherWriter to support per-bar-type file organization for Bar data, similar to existing per-instrument organization for ticks and order book data.
  • Consolidated file rotation logic into unified _rotate_identifier_file() method that handles both per-instrument and per-bar-type cases.
  • Renamed _create_instrument_writer() to _create_identifier_writer() to better reflect its dual purpose (instrument_id and bar_type).
  • Fixed writer creation logic to properly handle custom data types with and without instrument_id attributes.
  • Improved file size tracking to use appropriate keys (tuples for per-identifier data, strings for regular data).
  • Changed logger attribute from self.logger to self.log for consistency with codebase conventions.

Catalog conversion enhancements:

  • Added identifiers parameter to convert_stream_to_data() to allow filtering by specific instrument_ids or bar_types.
  • Added convert_bar_type_to_external parameter to _handle_table_nautilus() to convert INTERNAL bar types to EXTERNAL during deserialization.
  • Refactored convert_stream_to_data() to handle both subdirectory-organized files (per-instrument/per-bar-type) and flat files.
  • Improved feather file discovery to properly handle nested directory structures.
  • Data is now written incrementally per feather file rather than loading all data into memory first.

Data engine fix:

  • Fixed bars() catalog query to use bar_types parameter (list) instead of bar_type (string).

Example updates:

  • Updated databento_option_greeks.py example to demonstrate streaming both GreeksData and Bar data.
  • Added subscription to 2-MINUTE bars with INTERNAL aggregation for demonstration.
  • Enhanced catalog conversion to include both greeks and bars with identifier filtering.

Minor improvements:

  • Cleaned up code formatting and removed debug print statements.
  • Simplified _extract_sql_safe_filename() implementation.
  • Added proper handling of custom data types in writer creation.

Testing

New tests added:

  • test_feather_writer_per_bar_type: Tests per-bar-type file organization with separate subdirectories for different bar types.
  • test_convert_stream_to_data_with_identifiers: Tests filtering by identifiers when converting stream data to catalog.
  • test_convert_stream_to_data_internal_to_external: Tests conversion of INTERNAL bar types to EXTERNAL during catalog conversion.

Test results:

  • All 14 tests in test_streaming.py pass successfully.
  • Example notebook demonstrates end-to-end workflow of streaming and converting bars and greeks data.

Related Issues/PRs

Type of change

  • Bug fix (non-breaking)
  • New feature (non-breaking)
  • Breaking change (impacts existing behavior)
  • Documentation update
  • Maintenance / chore

Breaking change details (if applicable)

Documentation

  • Documentation changes follow the style guide (docs/developer_guide/docs.md)

Release notes

  • I added a concise entry to RELEASES.md that follows the existing conventions (when applicable)

Testing

Ensure new or changed logic is covered by tests.

  • Affected code paths are already covered by the test suite
  • I added/updated tests to cover new or changed logic

Tested all functionalities in databento_option_greeks.py (updated)

@faysou faysou force-pushed the refine-writer branch 3 times, most recently from 401d297 to ccc67ba Compare October 14, 2025 05:48
@faysou faysou changed the title Refine Streaming writer and converter to catalog for Bars Refactor streaming writer to support per-bar-type persistence and improve catalog conversion Oct 14, 2025
@cjdsellers cjdsellers changed the title Refactor streaming writer to support per-bar-type persistence and improve catalog conversion Refactor streaming writer to support per-bar-type persistence Oct 14, 2025
@cjdsellers
Copy link
Member

Thanks @faysou, great stuff 👍

@cjdsellers cjdsellers merged commit 507f2ac into develop Oct 14, 2025
17 checks passed
@cjdsellers cjdsellers deleted the refine-writer branch October 14, 2025 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants