-
-
Notifications
You must be signed in to change notification settings - Fork 774
Workflow engine graceful shutdown #5463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow engine graceful shutdown #5463
Conversation
|
Marking the PR as WIP until dependent MR is merged #5396 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some work is still needed so the mechanism is more robust. When a workflow engine starts up, does it know to resume workflows that have been paused due to a shutdown?
Added feature to resume. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khushboobhatia01 The implementation here needs some more rework.
Here're some background on the workflow engine that may help you
- a workflow execution can be processed by any workflow engines
- the process function to handle workflow execution or handle action execution messages is short lived
- the workflow engine is multi-threaded and many
processfunctions could be running at any given time - if the workflow engine needs to pause workflows because there are no other workflow engines running, then the workflow engine should query the database for the list of running workflows
Given the above,
- For each workflow engine, we just need to track how many messages it is still processing. We don't need to track the specific workflow execution or task execution. We just need to increment a counter when invoking
processand decrement the same counter when exitingprocess. On shutdown, if the counter is > 0, that means the workflow engine is processing message(s). - Incrementing or decrementing the counter should be thread safe so need to thread lock before updating the counter.
- On shutdown, the workflow engine should initiate shutdown and stop receiving any more messages. Then it waits until the counter becomes zero.
- After the workflow engine stopped processing, if there are no other workflow engine running, then acquire a distributed lock (use coordinator), query the database for any active/running workflow execution, and pause the workflows. The start up sequence of workflow engine should use the same distributed lock when trying to resume paused workflows. This is to prevent workflow engines stepping over each other.
Please review and let me know if need more clarification.
cbc0e56 to
d65f198
Compare
@m4dcoder Updated the MR with new implementation. One issue with the above flow is during last engine shutdown, workflows which have active tasks could transition only to pausing state instead of paused state. To address this, I've added a delay (https://github.com/StackStorm/st2/pull/5463/files#diff-e4e1e16f3a5aefdf069b0af9bd510c1fde29a1b8df10796b7e74cb183820973cR79) in the engine startup resume sequence to ensure workflows stuck in pausing state transition to paused state before we start resuming. Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going in the right direction. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional follow up. We are almost there. Great job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khushboobhatia01 Great Job! Thanks for contribution here. This is a significant improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I'm looking forward to graceful shutdowns of the workflow engine!
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
c198e79 to
dea5f1e
Compare
This PR will ensure graceful shutdown of workflow engine.
Please ref #5373 for more details.