Skip to content

Conversation

@julienledem
Copy link
Member

Problem

The main lineage graph API focuses on individual jobs and is not easy to use when one wants coverage of a all the children of a parent job.

Solution

This new endpoint provides a non-recursive one level of lineage for all the children of a given parent job.
This will facilitate for example if someone wants to retrieve all the lineage of a given Airflow DAG.
It will return all its children (tasks) and all the datasets they consume or produce as well as the other tasks and DAGs producing and consuming them.

Example:
GET /api/v1/lineage/simple?nodeId=job:default:order_analysis

{
  "parent": {
    "namespace": "default",
    "name": "order_analysis"
  },
  "children": [
    {
      "job": {
        "namespace": "default",
        "name": "order_analysis.find_popular_products"
      },
      "inputs": [
        {
          "dataset": {
            "namespace": "postgres://host.docker.internal:5435",
            "name": "postgres.public.orders"
          },
          "consumers": null,
          "producers": [
            {
               "job": {
                "namespace": "default",
                "name": "order_analysis.import_orders"
               },
               "parent": {
                  "namespace": "default", 
                  "name": "order_analysis"
               }
            }
          ]
        }
      ],
     "outputs": [
         ...   
      ]
    ),
    ...
  ]
}

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added the api API layer changes label Oct 23, 2023
@netlify
Copy link

netlify bot commented Oct 23, 2023

Deploy Preview for peppy-sprite-186812 ready!

Name Link
🔨 Latest commit e7af696
🔍 Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/6542e889e1c70e00082b2cc3
😎 Deploy Preview https://deploy-preview-2657--peppy-sprite-186812.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

}


lineageDao.getDirectLineageFromParent("foo", "bar");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, I will clean up that test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Signed-off-by: Julien Le Dem <[email protected]>
@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

❌ Patch coverage is 93.15068% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.08%. Comparing base (f6be002) to head (e7af696).
⚠️ Report is 264 commits behind head on main.

Files with missing lines Patch % Lines
...src/main/java/marquez/api/OpenLineageResource.java 50.00% 1 Missing and 1 partial ⚠️
.../src/main/java/marquez/service/LineageService.java 95.00% 0 Missing and 2 partials ⚠️
api/src/main/java/marquez/db/LineageDao.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2657      +/-   ##
============================================
+ Coverage     83.35%   84.08%   +0.72%     
+ Complexity     1295     1080     -215     
============================================
  Files           244      203      -41     
  Lines          5948     5052     -896     
  Branches        279      244      -35     
============================================
- Hits           4958     4248     -710     
+ Misses          844      684     -160     
+ Partials        146      120      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the feature. I put some questions in comments as I would like to understand more why do we need separate DAO methods to support this API call.

Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <[email protected]>
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments added.
I think that SQL query and tests are already fine.


public record DirectLineageEdge(
JobId job1,
String direction,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using existing IOType enum? It took me some time to understand what direction is.
Would it make sense to replace job1,job2 with job, upstreamJob?

@Consumes(APPLICATION_JSON)
@Produces(APPLICATION_JSON)
@Path("/lineage/direct")
public Response getDirectLineage(@QueryParam("parentJobNodeId") @NotNull NodeId parentJobNodeId) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mind updating openapi.yaml and changeling

@dkt-sophie-ly
Copy link

dkt-sophie-ly commented Nov 27, 2024

Hi ! Do you know if you will continue to work on this PR ? This feature seems quite interesting.
I tried to update it but it's not that easy given that this PR is a year old :/
Also do you plan to add a front view for this ?

Thanks for you reply !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API layer changes client/java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants