Conversation
anair123
left a comment
There was a problem hiding this comment.
Left some notes regarding the dlt_pipeline.py. Overall, there are no problems, but I think some tweaks can be made for users to follow along the course with less friction.
|
|
||
| github_source = rest_api_source(config) | ||
|
|
||
| # pipeline = dlt.pipeline( |
There was a problem hiding this comment.
Let's uncomment this and put it in under a main check (if name == "main"). I (and the users) can run the pipeline locally that way.
| @@ -0,0 +1,80 @@ | |||
| import dlt | |||
| from dlt.sources.rest_api import RESTAPIConfig, rest_api_source | |||
|
|
|||
There was a problem hiding this comment.
Pendulum needs to be imported
|
|
||
| # pipeline = dlt.pipeline( | ||
| # pipeline_name="github_repos_issues", | ||
| # destination="duckdb", |
There was a problem hiding this comment.
We'll be using bigquery as our destination instead of duckdb
| @@ -0,0 +1,80 @@ | |||
| import dlt | |||
There was a problem hiding this comment.
Not relevant to this script itself, but I am wondering if there should also be a README file with instructions on running the script. We can give installation commands and also provide links to the dlt docs when relevant.
The script looks and works great, but it might be hard for users to navigate it without a guide.
Implemented backfill and incremental loading.