Skip to content

OptimisticLockingFailureException due to race condition in graceful shutdown #5217

@KMGeon

Description

@KMGeon

Bug Description

GracefulShutdownFunctionalTests.testStopJob fails intermittently due to a race condition (Flaky Test).

Affects Version

  • Spring Batch 6.0.1

Error Message

OptimisticLockingFailureException: Attempt to update step execution id=0
with wrong version (1), where current version is 2

Reproducibility

Note: This is a very rare flaky test. In my local environment (Mac), it occurred approximately 1 in 100 runs. This issue is highly timing-sensitive due to the race condition, so the failure rate varies depending on CPU load and thread scheduling.

I'm reporting this just in case it might be helpful. Feel free to close this issue if it's not considered a priority - I completely understand that rare flaky tests may not warrant immediate attention.

Failed Build Links

CI Failure:

04:48:43.282 [batch-executor1] INFO  - Job: [SimpleJob: [name=job]] launched
04:48:43.293 [batch-executor1] INFO  - Executing step: [chunkOrientedStep]
04:48:44.343 [main] INFO  - Stopping job execution: status=STARTED
04:48:44.345 [main] INFO  - Upgrading job execution status from STOPPING to STOPPED

org.springframework.dao.OptimisticLockingFailureException:
Attempt to update step execution id=0 with wrong version (1), where current version is 2
    at JdbcStepExecutionDao.updateStepExecution(JdbcStepExecutionDao.java:254)
    at SimpleJobRepository.update(SimpleJobRepository.java:174)
    at SimpleJobOperator.stop(SimpleJobOperator.java:375)
    at GracefulShutdownFunctionalTests.testStopJob(GracefulShutdownFunctionalTests.java:80)

Root Cause Analysis

1. Race Condition Scenario

[batch-executor1 Thread]              [main Thread]
          │                                 │
          │ Processing chunk                │
          │ update(stepExecution)           │
          │ version: 1 → 2                  │
          │                                 │
          │                                 │ stop() called
          │                                 │ Attempts update with version 1
          │                                 │ ❌ FAILS! (DB already has version 2)

2. Root Cause

In SimpleJobRepository.update(StepExecution), only isStopped() is checked, but isStopping() is not:

// SimpleJobRepository.java - line 166
if (latestStepExecution.getJobExecution().isStopped()) {  // ← Only checks STOPPED!
    stepExecution.setVersion(latestStepExecution.getVersion());
}

When stop() is called, the JobExecution status changes to STOPPING, but isStopped() only returns true for STOPPED status. Therefore, version synchronization does not occur in the STOPPING state.

Proposed Solution

// SimpleJobRepository.java - line 166

// Before
if (latestStepExecution.getJobExecution().isStopped()) {

// After
if (latestStepExecution.getJobExecution().isStopped()
    || latestStepExecution.getJobExecution().isStopping()) {

Adding the isStopping() check ensures version synchronization occurs during the STOPPING state, preventing the race condition.

Related Code

  • SimpleJobRepository.update(StepExecution)
  • SimpleJobOperator.stop()
  • JdbcStepExecutionDao.updateStepExecution()

Final Note

As mentioned above, this is an extremely rare issue that is difficult to reproduce consistently. I wanted to report it in case it provides useful information, but I fully understand if this is closed as low priority or "won't fix". Thank you for your time reviewing this!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions