Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

Ozone build configures 20 minutes fork timeout:

ozone/pom.xml

Line 286 in 73e6f90

<surefire.fork.timeout>1200</surefire.fork.timeout>

but it's not applied, because the facility to check the fork is disabled by default:

Since 3.0.0-M4 the process checkers are disabled (source)

So instead of killing the fork after 20 minutes, any runaway test is caught only by the Github job timeout, 2.5 hours for integration tests. The problem is even worse for the flaky-test-check, which gives more time (6 hours) to jobs.

This change enables the process checkers.

However, current version of Surefire doesn't seem to kill the fork after timeout even with process checkers enabled. Thus we need to downgrade to 3.0.0-M5.

https://issues.apache.org/jira/browse/HDDS-10174

How was this patch tested?

Tested with a simple repro:

package com.example;

import org.junit.jupiter.api.Test;

class ForkTest {
  @Test
  void sleeps() throws Exception {
    Thread.sleep(300_000);
  }
}

and this command, using various versions of Surefire (including latest 3.2.5):

$ mvn -B clean test -Dsurefire.timeout=60 -Dmaven-surefire-plugin.version="${v}"

3.0.0-M6 and newer run for 5 minutes:

12:37:58,056 [INFO] --- maven-surefire-plugin:3.0.0-M6:test (default-test) @ surefire-fork-timeout ---
12:37:58,164 [INFO] Using auto detected provider org.apache.maven.surefire.junitplatform.JUnitPlatformProvider
12:37:58,746 [INFO] 
12:37:58,747 [INFO] -------------------------------------------------------
12:37:58,747 [INFO]  T E S T S
12:37:58,747 [INFO] -------------------------------------------------------
12:37:59,084 [INFO] Running com.example.ForkTest
12:42:59,111 [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 300.021 s - in com.example.ForkTest
12:42:59,120 [INFO] 
12:42:59,120 [INFO] Results:
12:42:59,120 [INFO] 
12:42:59,120 [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
12:42:59,120 [INFO] 
12:42:59,122 [INFO] ------------------------------------------------------------------------
12:42:59,122 [INFO] BUILD FAILURE
12:42:59,122 [INFO] ------------------------------------------------------------------------
12:42:59,123 [INFO] Total time:  05:01 min
12:42:59,123 [INFO] Finished at: 2024-01-23T12:42:59+01:00
12:42:59,123 [INFO] ------------------------------------------------------------------------
12:42:59,123 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M6:test (default-test) on project surefire-fork-timeout: There was a timeout in the fork -> [Help 1]

3.0.0-M5 and earlier timeout after 1 minute as requested:

12:36:56,575 [INFO] --- maven-surefire-plugin:3.0.0-M5:test (default-test) @ surefire-fork-timeout ---
12:36:56,781 [INFO] 
12:36:56,781 [INFO] -------------------------------------------------------
12:36:56,781 [INFO]  T E S T S
12:36:56,781 [INFO] -------------------------------------------------------
12:36:57,094 [INFO] Running com.example.ForkTest
12:37:57,146 [INFO] 
12:37:57,146 [INFO] Results:
12:37:57,146 [INFO] 
12:37:57,147 [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
12:37:57,147 [INFO] 
12:37:57,150 [INFO] ------------------------------------------------------------------------
12:37:57,150 [INFO] BUILD FAILURE
12:37:57,150 [INFO] ------------------------------------------------------------------------
12:37:57,151 [INFO] Total time:  01:00 min
12:37:57,151 [INFO] Finished at: 2024-01-23T12:37:57+01:00
12:37:57,151 [INFO] ------------------------------------------------------------------------
12:37:57,152 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M5:test (default-test) on project surefire-fork-timeout: There was a timeout in the fork -> [Help 1]

CI:
https://github.com/adoroszlai/ozone/actions/runs/7625431710

This reverts commit a41dae4.

Reason for revert: new Surefire fails to kill fork after surefire.timeout
@adoroszlai adoroszlai added the CI label Jan 23, 2024
@adoroszlai adoroszlai self-assigned this Jan 23, 2024
@adoroszlai adoroszlai requested a review from hemantk-12 January 23, 2024 16:59
Copy link
Member

@aswinshakil aswinshakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM+1. Great find. Thanks for the patch @adoroszlai

Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @adoroszlai

LGTM.

@adoroszlai adoroszlai merged commit 8835ccf into apache:master Jan 24, 2024
@adoroszlai adoroszlai deleted the HDDS-10174-downgrade branch January 24, 2024 09:42
@adoroszlai
Copy link
Contributor Author

Thanks @aswinshakil, @hemantk-12 for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants