Skip to content

Conversation

@thelightway24
Copy link
Contributor

Description

This PR introduces a new recover method to the JobOperator interface and its implementations.
This method allows to recover a failed job execution based on its ID. This is useful for restarting a failed job from the point where it left off.

This PR also adds unit tests for the new recover method in JobOperator.

Changes

  • Add recover method to JobOperator interface.
  • Implement the recover method in SimpleJobOperator.
  • Add unit tests for JobOperator.recover in TaskExecutorJobOperatorTests.
  • Add recover method to CommandLineJobOperator
  • Add unit tests for CommandLineJobOperator.recover in CommandLineJobOperatorTests.

Additional Notes

At SimpleJobOperator.java line 403, when a job execution is already recovered,
I currently throw an existing exception class.
I’m considering whether it would be better to:

Define a new exception class (e.g., AlreadyRecoveredException) — similar to how abandon throws an exception, or
Skip throwing an exception and instead log the event while returning the JobExecution.

For now, I’ve chosen to throw an exception for consistency with abandon

Related Issue

Resolves #4876

fmbenhassine added a commit that referenced this pull request Aug 19, 2025
- Log warnings instead of throwing exceptions when recovering
already recovered job executions
- Update tests
- Update Javadocs
@fmbenhassine
Copy link
Contributor

This is awesome! Thank you for your contribution 🙏

I added a couple inline comments about some minor changes that I will take care of on merge.

At SimpleJobOperator.java line 403, when a job execution is already recovered,
I currently throw an existing exception class.
I’m considering whether it would be better to:
Define a new exception class (e.g., AlreadyRecoveredException) — similar to how abandon throws an exception, or
Skip throwing an exception and instead log the event while returning the JobExecution.
For now, I’ve chosen to throw an exception for consistency with abandon

Unlike attempting to abandon a running execution which is an error, attempting to recover an already recovered execution does not really require an exception, so logging a warning is enough. I will update the code and tests accordingly.

fmbenhassine added a commit that referenced this pull request Aug 19, 2025
@injae-kim
Copy link
Contributor

Nice work @thelightway24 !!
Thanks for your detailed review @fmbenhassine 👍👍👍

@thelightway24
Copy link
Contributor Author

Thanks for the review and fix, really appreciate it.
I’ll be more thorough next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add API to recover failed job executions

3 participants