Skip to content

Support Redshift Unload Command #700

@ryanschulz46

Description

@ryanschulz46

Is your feature request related to a problem? Please describe.
Similar to how Redshift imports data using COPY (supported via Issue#164), UNLOAD is when Redshift exports data.

Using this SQL example from Redshift Docs:

unload ('select venueid, venuename, venueseats from venue')
to 's3://amzn-s3-demo-bucket/venue_tab_'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
delimiter as '\t';

(There are a bunch of options that can occur after the UNLOAD ({CTE}) TO s3://..., but I do not believe those would be relevant for Lineage)

Describe the solution you'd like

Statement #1: unload ('select venueid, venuename, venueseats from ve...
    table read: [Table: <default>.venue]
    table write: [Path: s3://amzn-s3-demo-bucket/]
    table cte: []
    table drop: []
    table rename: []
==========
Summary:
Statements(#): 1
Source Tables:
    <default>.venue
Target Tables:
    s3://amzn-s3-demo-bucket/

Describe alternatives you've considered
N/A

Additional context
Unload's output files written is non-deterministic and is dependent on Threads, filesize, etc. The path is a prefix-path. I think cutting to the path parent folder would be the most straight forward approach.

From the docs:
https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD_command_examples.html#unload-examples-venue

By default, UNLOAD writes one or more files per slice.

unload ('select * from venue')
to 's3://amzn-s3-demo-bucket/unload/venue_pipe_'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';

The result is these four files in the unload folder, again assuming four slices.

venue_pipe_0000_part_00
venue_pipe_0001_part_00
venue_pipe_0002_part_00
venue_pipe_0003_part_00

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions