Skip to content

Comments

WIP: Load method strategies#3402

Draft
edgarrmondragon wants to merge 1 commit intomainfrom
load-method-strategy
Draft

WIP: Load method strategies#3402
edgarrmondragon wants to merge 1 commit intomainfrom
load-method-strategy

Conversation

@edgarrmondragon
Copy link
Collaborator

@edgarrmondragon edgarrmondragon commented Dec 3, 2025

Summary by Sourcery

Introduce pluggable SQL load strategies and loaders to support append-only, overwrite, and upsert load methods with backward-compatible behavior.

New Features:

  • Add LoadMethodStrategy hierarchy and concrete append-only, overwrite, and upsert strategies for SQL sinks.
  • Introduce reusable Loader classes (simple insert, temp-table upsert, and merge-based upsert) for database-agnostic DML handling.
  • Expose load strategies and loaders from the sql package for easier customization by targets.

Enhancements:

  • Wire SQLConnector and SQLSink to select and use load strategies based on configuration while preserving legacy table preparation as a fallback.
  • Enable SQLSink to delegate table preparation and batch loading to the selected load strategy, including config validation and capability checks.

Tests:

  • Add comprehensive tests for load strategies, loaders, and factory selection behavior, including append-only, overwrite, upsert, and custom merge scenarios.
  • Update the dummy SQL connector used in sink tests to declare support for overwrite and merge-upsert load methods.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 3, 2025

Reviewer's Guide

Introduces a strategy/loader abstraction for SQL load methods (append-only, overwrite, upsert), wires SQLConnector/SQLSink to use a per-sink LoadMethodStrategy instead of hardcoded logic, and adds extensive tests to validate behavior and configuration for each strategy and loader type.

Sequence diagram for SQLSink setup and batch loading using strategies

sequenceDiagram
    actor Target
    participant SQLTarget
    participant SQLSink
    participant SQLConnector
    participant LoadMethodStrategy
    participant Loader
    participant Database as DB

    Target->>SQLTarget: run_sync()
    SQLTarget->>SQLSink: setup()

    activate SQLSink
    SQLSink->>SQLConnector: prepare_schema(schema_name)
    deactivate SQLSink

    note over SQLSink,SQLConnector: Initialize and validate load strategy
    activate SQLSink
    SQLSink->>SQLSink: load_strategy (property access)
    alt strategy not yet created
        SQLSink->>SQLConnector: _create_load_strategy(sink)
        activate SQLConnector
        SQLConnector->>SQLConnector: read config.load_method
        SQLConnector-->>SQLSink: new AppendOnlyStrategy | OverwriteStrategy | UpsertStrategy
        deactivate SQLConnector
        SQLSink->>LoadMethodStrategy: validate_config()
        SQLSink->>SQLSink: cache _load_strategy
    else strategy already cached
        SQLSink-->>SQLSink: reuse existing _load_strategy
    end
    deactivate SQLSink

    note over SQLSink,LoadMethodStrategy: Table preparation delegated to strategy
    activate SQLSink
    SQLSink->>LoadMethodStrategy: prepare_table(full_table_name, schema, key_properties)
    activate LoadMethodStrategy
    alt AppendOnlyStrategy
        LoadMethodStrategy->>SQLConnector: table_exists(full_table_name)
        alt table missing
            SQLConnector->>SQLConnector: create_empty_table(...)
        else table exists
            loop each property
                LoadMethodStrategy->>SQLConnector: prepare_column(...)
            end
            LoadMethodStrategy->>SQLConnector: prepare_primary_key(...)
        end
    else OverwriteStrategy
        LoadMethodStrategy->>SQLConnector: table_exists(full_table_name)
        alt first time or table missing
            LoadMethodStrategy->>SQLConnector: parse_full_table_name(...)
            LoadMethodStrategy->>SQLConnector: drop existing table via SQLAlchemy
            LoadMethodStrategy->>SQLConnector: create_empty_table(...)
        else already prepared
            loop each property
                LoadMethodStrategy->>SQLConnector: prepare_column(...)
            end
            LoadMethodStrategy->>SQLConnector: prepare_primary_key(...)
        end
    else UpsertStrategy
        LoadMethodStrategy->>SQLConnector: table_exists(full_table_name)
        alt table missing
            SQLConnector->>SQLConnector: create_empty_table(...)
        else table exists
            loop each property
                LoadMethodStrategy->>SQLConnector: prepare_column(...)
            end
            LoadMethodStrategy->>SQLConnector: prepare_primary_key(...)
        end
    end
    deactivate LoadMethodStrategy
    deactivate SQLSink

    note over SQLSink,Loader: Batch processing via loader abstraction
    SQLTarget->>SQLSink: process_batch(context)
    activate SQLSink
    SQLSink->>LoadMethodStrategy: load_batch(full_table_name, schema, context.records)
    activate LoadMethodStrategy
    LoadMethodStrategy->>Loader: load_records(full_table_name, schema, records)
    activate Loader
    Loader->>DB: execute INSERT / DELETE+INSERT / MERGE
    DB-->>Loader: rowcount
    Loader-->>LoadMethodStrategy: records_loaded
    deactivate Loader
    LoadMethodStrategy-->>SQLSink: records_loaded
    deactivate LoadMethodStrategy
    SQLSink-->>SQLTarget: tally and continue
    deactivate SQLSink
Loading

Class diagram for SQL load strategies and loaders

classDiagram
    class SQLConnector {
        - dict config
        - dict _tables_prepared
        - LoadMethodStrategy _load_strategy
        + jsonschema_to_sql() JSONSchemaToSQL
        + _create_load_strategy(sink: SQLSink) LoadMethodStrategy
        + prepare_table(full_table_name: str, schema: dict, primary_keys: Sequence~str~, partition_keys: list~str~, as_temp_table: bool) void
        + _prepare_table_legacy(full_table_name: str | FullyQualifiedName, schema: dict, primary_keys: Sequence~str~, partition_keys: list~str~ | None, as_temp_table: bool) void
        + table_exists(full_table_name: str) bool
        + create_empty_table(full_table_name: str, schema: dict, primary_keys: Sequence~str~, as_temp_table: bool) void
        + prepare_column(full_table_name: str, property_name: str, sql_type: str) void
        + prepare_primary_key(full_table_name: str, primary_keys: Sequence~str~) void
        + parse_full_table_name(full_table_name: str) tuple
        + allow_overwrite bool
        + allow_temp_tables bool
        + allow_merge_upsert bool
    }

    class SQLSink {
        - SQLConnector _connector
        - LoadMethodStrategy _load_strategy
        + connector() SQLConnector
        + load_strategy() LoadMethodStrategy
        + setup() void
        + process_batch(context: dict) void
        + schema dict
        + key_properties Sequence~str~
        + conform_name(name: str, object_type: str) str
        + merge_upsert_from_table(target_table_name: str, from_table_name: str, join_keys: list~str~) int
    }

    class LoadMethodStrategy {
        <<abstract>>
        - SQLConnector connector
        - SQLSink sink
        - Logger logger
        - Loader loader
        + _create_loader() Loader
        + prepare_table(full_table_name: str, schema: dict, primary_keys: Sequence~str~) void
        + load_batch(full_table_name: str, schema: dict, records: Iterable~dict~) int | None
        + validate_config() void
    }

    class AppendOnlyStrategy {
        + _create_loader() Loader
        + prepare_table(full_table_name: str, schema: dict, primary_keys: Sequence~str~) void
        + validate_config() void
    }

    class OverwriteStrategy {
        + _create_loader() Loader
        + prepare_table(full_table_name: str, schema: dict, primary_keys: Sequence~str~) void
        + validate_config() void
    }

    class UpsertStrategy {
        + _create_loader() Loader
        + prepare_table(full_table_name: str, schema: dict, primary_keys: Sequence~str~) void
        + validate_config() void
    }

    class Loader {
        <<abstract>>
        - Engine engine
        - dict schema
        - Sequence~str~ key_properties
        - Callable conform_name
        - Logger logger
        + load_records(full_table_name: str, schema: dict, records: Iterable~dict~) int | None
    }

    class SimpleInsertLoader {
        + load_records(full_table_name: str, schema: dict, records: Iterable~dict~) int | None
    }

    class TempTableUpsertLoader {
        - Callable temp_table_creator
        + load_records(full_table_name: str, schema: dict, records: Iterable~dict~) int | None
        + _create_temp_table_default(temp_table_name: str, schema: dict, engine: Engine) void
    }

    class MergeUpsertLoader {
        - Callable temp_table_creator
        - Callable merge_function
        + load_records(full_table_name: str, schema: dict, records: Iterable~dict~) int | None
        + _create_temp_table_default(temp_table_name: str, schema: dict, engine: Engine) void
    }

    SQLConnector o-- LoadMethodStrategy : _load_strategy
    SQLSink o-- LoadMethodStrategy : _load_strategy
    LoadMethodStrategy o-- Loader : loader
    LoadMethodStrategy --> SQLConnector : uses
    LoadMethodStrategy --> SQLSink : uses

    LoadMethodStrategy <|-- AppendOnlyStrategy
    LoadMethodStrategy <|-- OverwriteStrategy
    LoadMethodStrategy <|-- UpsertStrategy

    Loader <|-- SimpleInsertLoader
    Loader <|-- TempTableUpsertLoader
    Loader <|-- MergeUpsertLoader

    SQLSink --> SQLConnector : connector
    SQLConnector --> SQLSink : _create_load_strategy(sink)
Loading

Flow diagram for selecting load method strategy and upsert loader

flowchart TD
    A[Start: resolve load strategy] --> B[Read connector.config.load_method]
    B --> C{load_method value}
    C --> D[Use AppendOnlyStrategy]:::strategy_label
    C --> E[Use OverwriteStrategy]:::strategy_label
    C --> F[Use UpsertStrategy]:::strategy_label

    classDef strategy_label fill:#eef,stroke:#333,stroke-width:1px

    subgraph Strategy_factory_in_SQLConnector
        D --> G[Instantiate AppendOnlyStrategy with connector and sink]
        E --> H[Instantiate OverwriteStrategy with connector and sink]
        F --> I[Instantiate UpsertStrategy with connector and sink]
        G --> J[Return strategy to SQLSink]
        H --> J
        I --> J
    end

    J --> K[SQLSink.load_strategy caches strategy and calls validate_config]

    subgraph UpsertStrategy__create_loader
        I --> L{Custom merge_upsert_from_table implemented on sink?}
        L --> M[Yes: create temp_table_creator wrapper calling connector.create_empty_table with as_temp_table True]
        L --> N[No: create temp_table_creator wrapper calling connector.create_empty_table with as_temp_table True]
        M --> O[Create merge_function wrapper calling sink.merge_upsert_from_table]
        O --> P[Instantiate MergeUpsertLoader with engine, schema, key_properties, conform_name, logger, temp_table_creator, merge_function]
        N --> Q[Instantiate TempTableUpsertLoader with engine, schema, key_properties, conform_name, logger, temp_table_creator]
    end

    P --> R[Assign loader on UpsertStrategy]
    Q --> R
    R --> S[Strategy ready: load_batch delegates to loader.load_records]
    K --> S
Loading

File-Level Changes

Change Details Files
Introduce LoadMethodStrategy abstraction and plug it into SQLConnector and SQLSink lifecycle.
  • Add LoadMethodStrategy base class plus concrete AppendOnlyStrategy, OverwriteStrategy, and UpsertStrategy implementations to orchestrate DDL, config validation, and loader selection.
  • Add SQLConnector._load_strategy attribute and a factory method _create_load_strategy() that maps load_method config values (append-only, upsert, overwrite) to concrete strategy classes and defaults to append-only with validation for unknown methods.
  • Update SQLConnector.prepare_table() to delegate to a configured load strategy when present, falling back to a new _prepare_table_legacy() method for backward compatibility and for temp-table scenarios.
  • Expose strategy-related classes from singer_sdk.sql.init to support external customization and extension.
singer_sdk/sql/load_strategies.py
singer_sdk/sql/connector.py
singer_sdk/sql/__init__.py
Add loader layer to encapsulate DML behavior for inserts and upserts, decoupled from connectors/sinks/strategies.
  • Introduce an abstract Loader base plus SimpleInsertLoader, TempTableUpsertLoader, and MergeUpsertLoader implementations that handle bulk INSERT, temp-table-based upsert, and custom MERGE-based upsert respectively.
  • Implement generic temp-table creation, insert, delete+insert merge, and cleanup logic in TempTableUpsertLoader, with support for multi-column primary keys and pluggable temp table creators.
  • Implement MergeUpsertLoader to stage into temp tables then delegate to a caller-provided merge function (e.g., using database-specific MERGE/UPSERT SQL), enforcing that a merge function is supplied.
  • Provide constructor hooks for conform_name, custom loggers, and pluggable temp table creation functions so loaders can be reused and customized independently of strategies.
singer_sdk/sql/loaders.py
singer_sdk/sql/load_strategies.py
Wire SQLSink to use per-sink load strategies for setup and batch processing instead of direct connector calls.
  • Add a _load_strategy field and load_strategy property on SQLSink that lazily obtains and validates a strategy from its connector via _create_load_strategy().
  • Change SQLSink.setup() to initialize the connector’s _load_strategy and delegate table preparation to the strategy rather than directly calling connector.prepare_table().
  • Change SQLSink.process_batch() to delegate batch loading to LoadMethodStrategy.load_batch() instead of calling bulk_insert_records() with direct INSERT behavior.
singer_sdk/sql/sink.py
singer_sdk/sql/connector.py
Extend SQL test fixtures and add comprehensive tests for strategies, loaders, and factory behavior.
  • Expand DummySQLConnector test double to advertise overwrite and merge-upsert capabilities so new strategies can be exercised in tests.
  • Add a large test suite in test_load_strategies.py covering AppendOnlyStrategy, OverwriteStrategy, and UpsertStrategy table prep and load semantics, config validation failure modes, temp-table cleanup, loader SQL generation, factory selection, and custom merge_upsert_from_table integration.
  • Add targeted adjustments in existing sink tests to align with the new strategy-based setup path.
  • Introduce specialized DummyConnector/DummySink/DummyTarget test classes to isolate behavior and avoid dependencies on production connectors/targets.
tests/sql/test_sink.py
tests/sql/test_load_strategies.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@read-the-docs-community
Copy link

read-the-docs-community bot commented Dec 3, 2025

Documentation build overview

📚 Meltano SDK | 🛠️ Build #30560969 | 📁 Comparing 3c09e48 against latest (26b30fa)


🔍 Preview build

Show files changed (2 files in total): 📝 2 modified | ➕ 0 added | ➖ 0 deleted
File Status
genindex.html 📝 modified
classes/singer_sdk.sql.SQLSink.html 📝 modified

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 87.73234% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.42%. Comparing base (26b30fa) to head (3c09e48).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
singer_sdk/sql/loaders.py 84.90% 21 Missing and 3 partials ⚠️
singer_sdk/sql/connector.py 61.11% 7 Missing ⚠️
singer_sdk/sql/load_strategies.py 97.50% 1 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (87.73%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3402      +/-   ##
==========================================
- Coverage   93.90%   93.42%   -0.48%     
==========================================
  Files          69       71       +2     
  Lines        5774     6041     +267     
  Branches      716      737      +21     
==========================================
+ Hits         5422     5644     +222     
- Misses        248      291      +43     
- Partials      104      106       +2     
Flag Coverage Δ
core 81.74% <87.36%> (+0.01%) ⬆️
end-to-end 74.77% <45.72%> (-1.66%) ⬇️
optional-components 42.57% <22.67%> (-0.92%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link

codspeed-hq bot commented Dec 3, 2025

CodSpeed Performance Report

Merging #3402 will not alter performance

Comparing load-method-strategy (3c09e48) with main (a876bda)1

Summary

✅ 8 untouched

Footnotes

  1. No successful run was found on main (26b30fa) during the generation of this report, so a876bda was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@edgarrmondragon edgarrmondragon force-pushed the load-method-strategy branch 3 times, most recently from e13bf00 to 9c66c36 Compare December 4, 2025 04:23
Signed-off-by: Edgar Ramírez Mondragón <edgarrm358@gmail.com>
@edgarrmondragon edgarrmondragon self-assigned this Dec 10, 2025
@edgarrmondragon edgarrmondragon added this to the v0.54 milestone Dec 10, 2025
@edgarrmondragon edgarrmondragon added SQL Support for SQL taps and targets Type/Target Singer targets labels Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

SQL Support for SQL taps and targets Type/Target Singer targets

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant