test(sdk): add processor test suite coverage for shutdown and timeout scenarios#3382
Open
bryantbiggs wants to merge 4 commits intoopen-telemetry:mainfrom
Open
test(sdk): add processor test suite coverage for shutdown and timeout scenarios#3382bryantbiggs wants to merge 4 commits intoopen-telemetry:mainfrom
bryantbiggs wants to merge 4 commits intoopen-telemetry:mainfrom
Conversation
The default thread-based processors (BatchSpanProcessor, BatchLogProcessor, PeriodicReader) call futures_executor::block_on() on their dedicated worker threads. When the exporter uses tonic/gRPC, the export future depends on tokio tasks (e.g. tonic's Buffer worker) that can only be polled by tokio worker threads. If all tokio worker threads are blocked (single-threaded runtime, or multi-thread with 1 worker), this creates a circular wait. Add BlockingStrategy that captures the tokio runtime handle at construction time and enters the runtime context via Handle::enter() before calling futures_executor::block_on(). This makes tokio types available on the dedicated background threads without taking ownership of the reactor. Falls back to plain futures_executor::block_on() without tokio. Fixes: open-telemetry#2802
Add tests with TokioSpawn*Exporter mocks that call tokio::spawn() inside export(), simulating tonic/gRPC exporters. These prove that BlockingStrategy correctly provides tokio runtime context on the processor's dedicated OS thread, preventing deadlocks on constrained multi_thread(1) runtimes (open-telemetry#2802, open-telemetry#3356).
…st suite Add coverage for processor test suite gaps identified in open-telemetry#3381: - Add shutdown() regression tests with TokioSpawn*Exporter mocks for BatchSpanProcessor, BatchLogProcessor, and PeriodicReader on multi_thread(1) runtime, verifying the same BlockingStrategy fix that resolved force_flush() deadlocks also works for shutdown(). - Add timeout behavior tests with HangingExporter mocks for BatchSpanProcessor and BatchLogProcessor, verifying that shutdown_with_timeout returns Err(Timeout) when exporters hang. - Improve documentation on SimpleLogProcessor's ignored deadlock tests, explaining they demonstrate inherent design limitations (not bugs) and linking to relevant issues (open-telemetry#2802, open-telemetry#3381). - Document current_thread runtime limitation across all processors and explain why PeriodicReader timeout test is omitted (hardcoded 5s timeout makes it too slow for regular test suite). Closes open-telemetry#3381
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3382 +/- ##
======================================
Coverage 82.2% 82.3%
======================================
Files 128 128
Lines 24626 24825 +199
======================================
+ Hits 20267 20453 +186
- Misses 4359 4372 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses test coverage gaps identified in #3381:
Shutdown regression tests: Added
shutdown()tests withTokioSpawn*Exportermocks for all three processors (BatchSpanProcessor,BatchLogProcessor,PeriodicReader) onmulti_thread(worker_threads = 1)runtime — verifying theBlockingStrategyfix forforce_flush()deadlocks (OTLP MetricExporter deadlock issue #2802) also works forshutdown(), which goes through the same export code path.Timeout behavior tests: Added
HangingExportermocks forBatchSpanProcessorandBatchLogProcessorthat block forever inexport(), verifyingshutdown_with_timeout()returnsErr(Timeout)as expected.PeriodicReadertimeout test is omitted (documented why) due to its hardcoded 5-second timeout making it too slow for the regular test suite.SimpleLogProcessor documentation: Improved comments on the two
#[ignore]d async exporter deadlock tests, explaining they demonstrate inherent design limitations ofSimpleLogProcessor(not bugs), and linking to Processor test suite gaps: no coverage for tokio-dependent exporters or runtime deadlock scenarios #3381 and OTLP MetricExporter deadlock issue #2802. These deadlocks cannot be fixed without changingSimpleLogProcessor's synchronousblock_ondesign — users should useBatchLogProcessorfor production async exporters.current_thread limitation documentation: Documented across all three processor test files that
current_threadruntime with tokio-dependent exporters is a fundamental limitation, whilemulti_thread(1)(the 1-vCPU k8s pod scenario) is supported viaBlockingStrategy.Design notes
Why shutdown needs separate tests from force_flush: Both
shutdown()andforce_flush()funnel through the same export function (get_spans_and_export/get_logs_and_export/collect_and_export) which usesBlockingStrategyto properly enter the tokio runtime context. The shutdown path then additionally callsexporter.shutdown(). Without separate tests, a regression in the shutdown-specific code path could go undetected.Why
pending()instead ofNotify: The hanging exporter mocks usefutures_util::future::pending()rather thantokio::sync::Notify.pending()is simpler — it creates a future that never resolves, which is all we need to simulate a permanently hanging exporter. Since we're testing the caller's timeout behavior (not the exporter's ability to be cancelled), there's no need for the test to control when the hang resolves.PeriodicReader's timeout gap:
PeriodicReader::shutdown()has a hardcoded 5-second timeout that ignores the_timeoutparameter inshutdown_with_timeout()(marked with a TODO in the source). Until that's made configurable, a hanging exporter timeout test would take 5 seconds per run — too slow for CI. A comment documents this gap with an issue link.Dependencies
This PR is based on #3380 (BlockingStrategy fix) and should be merged after it.
Test plan
multi_thread(worker_threads = 1)Err(Timeout)is returned within expected duration (~500ms)