Describe the bug
EndToEndRawSpanTest.testPipelineEndToEnd() intermittently fails in the Data Prepper Trace Analytics Raw Span Peer Forwarder End-to-end test with Gradle CI workflow with:
EndToEndRawSpanTest > testPipelineEndToEnd FAILED
java.lang.AssertionError at EndToEndRawSpanTest.java:122
Line 122 is `assertThat(foundEntrySet).containsAll(expectedDoc.entrySet())` inside `assertThatFoundDocumentsContainAllFieldsFromExpectedDocuments`, which is called within the `await().atMost(30, TimeUnit.SECONDS).untilAsserted()` block. This means that after 30 seconds of polling, at least one indexed document was still missing expected fields.
In the peer forwarder configuration, trace data is split across two Data Prepper nodes. Root spans and non root spans for the same trace are sent to different nodes, so traceGroup fields must be propagated via peer forwarder. If this propagation does not complete within the 30 second timeout, the indexed documents will be missing traceGroup fields, causing the assertion failure.
To Reproduce
This is difficult to reproduce locally. It has been observed in CI (GitHub Actions ubuntu runner) under the build (21) and build (11) matrix jobs.
Recent failures on main:
Also observed on PR #6720:
Expected behavior
The test should pass consistently across all CI matrix jobs.
Additional context
The CI workflow (data-prepper-trace-analytics-raw-span-peer-forwarder-e2e-tests.yml) does not upload test result artifacts or run with --info, making it difficult to determine the exact assertion failure message (e.g., which fields were missing). Adding artifact upload and/or --info to the workflow would help diagnose future failures.
Describe the bug
EndToEndRawSpanTest.testPipelineEndToEnd()intermittently fails in theData Prepper Trace Analytics Raw Span Peer Forwarder End-to-end test with GradleCI workflow with:In the peer forwarder configuration, trace data is split across two Data Prepper nodes. Root spans and non root spans for the same trace are sent to different nodes, so traceGroup fields must be propagated via peer forwarder. If this propagation does not complete within the 30 second timeout, the indexed documents will be missing traceGroup fields, causing the assertion failure.
To Reproduce
This is difficult to reproduce locally. It has been observed in CI (GitHub Actions ubuntu runner) under the
build (21)andbuild (11)matrix jobs.Recent failures on
main:Also observed on PR #6720:
Expected behavior
The test should pass consistently across all CI matrix jobs.
Additional context
The CI workflow (
data-prepper-trace-analytics-raw-span-peer-forwarder-e2e-tests.yml) does not upload test result artifacts or run with--info, making it difficult to determine the exact assertion failure message (e.g., which fields were missing). Adding artifact upload and/or--infoto the workflow would help diagnose future failures.