Skip to content

[Feature][Core] Support monitoring the busyness level of Zeta engine tasks #10463

@zhangshenghang

Description

@zhangshenghang

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

Background

Currently, when a job is running, it is difficult to quickly determine "where it is slowing down/where there is congestion/whether it is being held back by downstream processes". Troubleshooting often relies only on logs or experience.

What needs to be achieved

Provide real-time operation indicators and trends on the job running page to visually display three types of information:

  1. Whether the link is being slowed down by downstream processes (whether upstream sending often needs to wait)
  2. Whether the intermediate buffer is congested (whether it is nearly full, whether it is repeatedly full)
  3. Whether operators in each stage are slowing down (busyness/idleness of reading/processing/writing and time consumption per unit of data)

Which indicators to look at

Link side: Proportion of downstream waiting time; Buffer occupancy (occupancy ratio + occupancy/capacity); Auxiliary information such as waiting duration
Node side:
Source: Reading ratio, idle ratio
Transform: Busy ratio, average processing time (per record), input volume/output volume
Sink: Busy ratio, average writing time (per record), writing volume, and time consumption of submission-related stages

Screenshot of the effect

  • The function has been implemented so far. I will contribute it in version 3.0. Below are several example diagrams that can accurately locate the specific where it gets stuck.
  • It will also support viewing via API
Image Image Image

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions