Switch to OpenTracing's ContextVarsScopeManager#18849
Merged
MadLittleMods merged 4 commits intodevelopfrom Aug 27, 2025
Merged
Conversation
Instead of our own `LogContextScopeManager`
MadLittleMods
commented
Aug 20, 2025
|
|
||
|
|
||
| class LogContextScopeManagerTestCase(TestCase): | ||
| class TracingScopeTestCase(TestCase): |
Contributor
Author
There was a problem hiding this comment.
Even though we've removed the LogContextScopeManager now, I've decided to keep these tests around. They appear like good sanity checks to make sure our tracing still works with Twisted regardless of what we try to use.
4 tasks
ContextVarsScopeManagerContextVarsScopeManager
a9d7008 to
b1f308c
Compare
MadLittleMods
added a commit
that referenced
this pull request
Aug 20, 2025
These changes are being introduced in #18849 instead.
3 tasks
Contributor
Author
|
Thanks for the review @anoadragon453 🐍 |
MadLittleMods
added a commit
that referenced
this pull request
Aug 27, 2025
…e while writing bytes to the request (#18804) This will allow to easily see how much time is taken up by being able to filter by the `write_bytes_to_request` operation in Jaeger. Spawning from #17722 The `write_bytes_to_request` span won't show up in the trace until #18849 is merged. Note: It's totally fine for a span child to finish after the parent. See https://opentracing.io/specification/#references-between-spans which shows "Child Span D" outliving the "Parent Span"
3 tasks
MadLittleMods
added a commit
that referenced
this pull request
Sep 26, 2025
Before #18849, we we're using our own custom `LogContextScopeManager` which tied the tracing scope to the `LoggingContext`. Since we created a new `BackgroundProcessLoggingContext` any time we `run_as_background_process(...)`, the trace for the background work was separate from the trace that kicked of the work as expected (e.g. request trace is separate from the background process we kicked to fetch more messages from the federation). Since we've now switched to the `ContextVarsScopeManager` (in #18849), the tracing scope now crosses the `LoggingContext` boundaries (and thread boundaries) without a problem. This means we end up with request traces that include all of the background work that we've kicked off bloating the trace and making it hard to understand what's going on. This PR separates the traces again to how things were before. Additionally, things are even better now since I added some cross-link references between the traces to easily be able to jump between. Follow-up to #18849 --- In the before, you can see that the trace is blown up by the background process (`bgproc.qwer`). In the after, we now only have a little cross-link marker span (`start_bgproc.qwer`) to jump to background process trace. Before | After --- | --- <some image> | <some image> ### Testing strategy 1. Run a Jaeger instance (https://www.jaegertracing.io/docs/1.6/getting-started/) ```shell $ docker run -d --name jaeger \ -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \ -p 5775:5775/udp \ -p 6831:6831/udp \ -p 6832:6832/udp \ -p 5778:5778 \ -p 16686:16686 \ -p 14268:14268 \ -p 9411:9411 \ jaegertracing/all-in-one:1.59.0 ``` 1. Configure Synapse to use tracing: `homeserver.yaml` ```yaml ## Tracing ## opentracing: enabled: true jaeger_config: sampler: type: const param: 1 logging: false ``` 1. Make sure the optional `opentracing` dependency is installed: `poetry install --extras all` 1. In the `VersionsRestServlet`, modify it to kick off a dummy background process (easy to test this way) ```python from synapse.metrics.background_process_metrics import run_as_background_process async def _qwer() -> None: await self.clock.sleep(1) run_as_background_process("qwer", "test_server", _qwer) ``` 1. Run Synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Fire off a version requests: `curl http://localhost:8008/_matrix/client/versions` 1. Visit http://localhost:16686/search to view the traces - Select the correct service - Look for the `VersionsRestServlet` operation - Press 'Find Traces' button - Select the relevant trace - Notice how the trace isn't bloated - Look for the `start_bgproc.qwer` span cross-linking to the background process - Jump to the other trace using the cross-link reference -> `bgproc.qwer`
MadLittleMods
added a commit
that referenced
this pull request
Oct 1, 2025
This reverts commit 27fc338. Conflicts: synapse/logging/context.py tests/logging/test_opentracing.py
This was referenced Oct 1, 2025
MadLittleMods
added a commit
that referenced
this pull request
Oct 2, 2025
…19007) Revert #18849 Go back to our custom `LogContextScopeManager` after trying OpenTracing's `ContextVarsScopeManager`. Fix #19004 ### Why revert? For reference, with the normal reactor, `ContextVarsScopeManager` worked just as good as our custom `LogContextScopeManager` as far as I can tell (and even better in some cases). But since Twisted appears to not fully support `ContextVar`'s, it doesn't work as expected in all cases. Compounding things, `ContextVarsScopeManager` was causing errors with the experimental `SYNAPSE_ASYNC_IO_REACTOR` option. Since we're not getting the full benefit that we originally desired, we might as well revert and figure out alternatives for extending the logcontext lifetimes to support the use case we were trying to unlock (c.f. #18804). See #19004 (comment) for more info. ### Does this require backporting and patch releases? No. Since `ContextVarsScopeManager` operates just as good with the normal reactor and was only causing actual errors with the experimental `SYNAPSE_ASYNC_IO_REACTOR` option, I don't think this requires us to backport and make patch releases at all. ### Maintain cross-links between main trace and background process work In order to maintain the functionality introduced in #18932 (cross-links between the background process trace and currently active trace), we also needed a small change. Previously, when we were using `ContextVarsScopeManager`, it tracked the tracing scope across the logcontext changes without issue. Now that we're using our own custom `LogContextScopeManager` again, we need to capture the active span from the logcontext before we reset to the sentinel context because of the `PreserveLoggingContext()` below. Added some tests to ensure we maintain the `run_as_background` tracing behavior regardless of the tracing scope manager we use.
MadLittleMods
added a commit
that referenced
this pull request
Oct 2, 2025
…_available` This test is still expected to fail with our current `LogContextScopeManager` implementation but there were some flaws in how we were handling the logcontext before (like pumping the reactor in a non-sentinel logcontext) This test is originally from #18849 spawning from #18804 (comment)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Switch to OpenTracing's
ContextVarsScopeManagerinstead of our own customLogContextScopeManager.This is now possible because the linked Twisted issue from this comment is resolved:
synapse/synapse/logging/scopecontextmanager.py
Lines 42 to 43 in 40edb10
This PR is spawning from exploring different possibilities to solve the
scopeloss problem I was encountering in #18804 (comment). This appears to solve the problem and I've added the additional test from there to this PR ✅Related to #10342 (previously matrix-org/synapse#10342) where we also want to change our
LoggingContextmachinery to baseContextVarbased.Testing strategy
homeserver.yamlopentracingdepdency is installed:poetry install --extras allpoetry run synapse_homeserver --config-path homeserver.yamlGET http://localhost:8008/_matrix/client/versionsServlet'sI tested a few endpoints and the traces seem to look normal.
Dev notes
Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.