-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Search before asking
- I had searched in the feature and found no similar feature requirement.
Description
Background
Currently, when a job is running, it is difficult to quickly determine "where it is slowing down/where there is congestion/whether it is being held back by downstream processes". Troubleshooting often relies only on logs or experience.
What needs to be achieved
Provide real-time operation indicators and trends on the job running page to visually display three types of information:
- Whether the link is being slowed down by downstream processes (whether upstream sending often needs to wait)
- Whether the intermediate buffer is congested (whether it is nearly full, whether it is repeatedly full)
- Whether operators in each stage are slowing down (busyness/idleness of reading/processing/writing and time consumption per unit of data)
Which indicators to look at
Link side: Proportion of downstream waiting time; Buffer occupancy (occupancy ratio + occupancy/capacity); Auxiliary information such as waiting duration
Node side:
Source: Reading ratio, idle ratio
Transform: Busy ratio, average processing time (per record), input volume/output volume
Sink: Busy ratio, average writing time (per record), writing volume, and time consumption of submission-related stages
Screenshot of the effect
- The function has been implemented so far. I will contribute it in version 3.0. Below are several example diagrams that can accurately locate the specific where it gets stuck.
- It will also support viewing via API
Usage Scenario
No response
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct