-
Notifications
You must be signed in to change notification settings - Fork 380
provide simple implementation of one-level lineage optimized for parent jobs #2657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…nt jobs Signed-off-by: Julien Le Dem <[email protected]>
✅ Deploy Preview for peppy-sprite-186812 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
| } | ||
|
|
||
|
|
||
| lineageDao.getDirectLineageFromParent("foo", "bar"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I will clean up that test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: Julien Le Dem <[email protected]>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2657 +/- ##
============================================
+ Coverage 83.35% 84.08% +0.72%
+ Complexity 1295 1080 -215
============================================
Files 244 203 -41
Lines 5948 5052 -896
Branches 279 244 -35
============================================
- Hits 4958 4248 -710
+ Misses 844 684 -160
+ Partials 146 120 -26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
pawel-big-lebowski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the feature. I put some questions in comments as I would like to understand more why do we need separate DAO methods to support this API call.
api/src/main/java/marquez/db/mappers/SimpleLineageEdgeMapper.java
Outdated
Show resolved
Hide resolved
…ith name Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
1354701 to
53ff595
Compare
Signed-off-by: Julien Le Dem <[email protected]>
pawel-big-lebowski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor comments added.
I think that SQL query and tests are already fine.
|
|
||
| public record DirectLineageEdge( | ||
| JobId job1, | ||
| String direction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using existing IOType enum? It took me some time to understand what direction is.
Would it make sense to replace job1,job2 with job, upstreamJob?
| @Consumes(APPLICATION_JSON) | ||
| @Produces(APPLICATION_JSON) | ||
| @Path("/lineage/direct") | ||
| public Response getDirectLineage(@QueryParam("parentJobNodeId") @NotNull NodeId parentJobNodeId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mind updating openapi.yaml and changeling
|
Hi ! Do you know if you will continue to work on this PR ? This feature seems quite interesting. Thanks for you reply ! |
Problem
The main lineage graph API focuses on individual jobs and is not easy to use when one wants coverage of a all the children of a parent job.
Solution
This new endpoint provides a non-recursive one level of lineage for all the children of a given parent job.
This will facilitate for example if someone wants to retrieve all the lineage of a given Airflow DAG.
It will return all its children (tasks) and all the datasets they consume or produce as well as the other tasks and DAGs producing and consuming them.
Example:
GET /api/v1/lineage/simple?nodeId=job:default:order_analysis
Checklist
CHANGELOG.md(Depending on the change, this may not be necessary)..sqldatabase schema migration according to Flyway's naming convention (if relevant)