-
Notifications
You must be signed in to change notification settings - Fork 272
Description
Is your feature request related to a problem? Please describe.
Similar to how Redshift imports data using COPY (supported via Issue#164), UNLOAD is when Redshift exports data.
Using this SQL example from Redshift Docs:
unload ('select venueid, venuename, venueseats from venue')
to 's3://amzn-s3-demo-bucket/venue_tab_'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
delimiter as '\t';
(There are a bunch of options that can occur after the UNLOAD ({CTE}) TO s3://..., but I do not believe those would be relevant for Lineage)
Describe the solution you'd like
Statement #1: unload ('select venueid, venuename, venueseats from ve...
table read: [Table: <default>.venue]
table write: [Path: s3://amzn-s3-demo-bucket/]
table cte: []
table drop: []
table rename: []
==========
Summary:
Statements(#): 1
Source Tables:
<default>.venue
Target Tables:
s3://amzn-s3-demo-bucket/
Describe alternatives you've considered
N/A
Additional context
Unload's output files written is non-deterministic and is dependent on Threads, filesize, etc. The path is a prefix-path. I think cutting to the path parent folder would be the most straight forward approach.
From the docs:
https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD_command_examples.html#unload-examples-venue
By default, UNLOAD writes one or more files per slice.
unload ('select * from venue')
to 's3://amzn-s3-demo-bucket/unload/venue_pipe_'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';
The result is these four files in the unload folder, again assuming four slices.
venue_pipe_0000_part_00
venue_pipe_0001_part_00
venue_pipe_0002_part_00
venue_pipe_0003_part_00