Be mindful of other logging context filters in 3rd-party code#19068
Be mindful of other logging context filters in 3rd-party code#19068MadLittleMods merged 8 commits intodevelopfrom
Conversation
01d945b to
0256915
Compare
| def safe_set(attr: str, value: Any) -> None: | ||
| """ | ||
| Only write the attribute if it hasn't already been set or we actually have | ||
| a Synapse logcontext (indicating that this log record is relevant to | ||
| Synapse). | ||
| """ | ||
| if context is not SENTINEL_CONTEXT or not hasattr(record, attr): | ||
| setattr(record, attr, value) |
There was a problem hiding this comment.
Is the abstraction worth it? Should I just inline the usage?
Originally, I thought I would have to use it more for all of the request attributes below but turns out we can do a little optimization to avoid it.
There was a problem hiding this comment.
Why not move setting the server_name under the request attribute? Then you'd only have one usage of safe_set, and then it'd be even more clear that it's not necessary.
There was a problem hiding this comment.
I'm not following 🙇
There was a problem hiding this comment.
Sorry, I meant structuring the code to be be something like the following:
context = current_context()
record.request = self._default_request
# Avoid overwriting an existing `server_name` on the record. This is running in
# the context of a global log record filter so there may be 3rd-party code that
# adds their own `server_name` and we don't want to interfere with that.
if not hasattr(record, "server_name"):
record.server_name = "unknown_server_from_no_logcontext"
# context should never be None, but if it somehow ends up being, then
# we end up in a death spiral of infinite loops, so let's check, for
# robustness' sake.
if context is not None:
# Add some data from the HTTP request.
request = context.request
# The sentinel logcontext has no request so if we get past this point, we
# know we have some actual Synapse logcontext and don't need to worry about
# using `safe_set`. We'll consider this an optimization since this is a
# pretty hot-path.
if request is None:
return True
def safe_set(attr: str, value: Any) -> None:
"""
Only write the attribute if it hasn't already been set.
"""
if not hasattr(record, attr):
setattr(record, attr, value)
safe_set("server_name", context.server_name)
# Logging is interested in the request ID. Note that for backwards
# compatibility this is stored as the "request" on the record.
safe_set("request", str(context))
record.ip_address = request.ip_address
record.site_tag = request.site_tag
record.requester = request.requester
record.authenticated_entity = request.authenticated_entity
record.method = request.method
record.url = request.url
record.protocol = request.protocol
record.user_agent = request.user_agent
return Truewhere you get the check for SENTINEL out of the way early, and then run code that assumes a non-sentinel logcontext.
Though, I now realise safe_set was checking both that this wasn't the SENTINEL_CONTEXT, as well as checking that the attribute wasn't already set. So the above doesn't completely eliminate the need for safe_set.
Though if safe_set is now only checking that the attribute is not already set, then it becomes even less necessary, and can probably be removed.
There was a problem hiding this comment.
nit: I'd also refactor this block like so to save on indentation:
# context should never be None, but if it somehow ends up being, then
# we end up in a death spiral of infinite loops, so let's check, for
# robustness' sake.
if context is None:
return True
# Add some data from the HTTP request.
request = context.request
...
return TrueThere was a problem hiding this comment.
Sorry, I meant structuring the code to be be something like the following: [...] where you get the check for SENTINEL out of the way early, and then run code that assumes a non-sentinel logcontext.
This won't work because not everything is logged with a context.request (like start-up messages, background jobs, etc)
There was a problem hiding this comment.
Merged but if we come up with a better pattern, I can make a follow-up PR ⏩
There was a problem hiding this comment.
Ah, I was wondering if there was something like that. Thanks for explaining, looks fine to me then.
anoadragon453
left a comment
There was a problem hiding this comment.
No blocking comments - overall this looks good to me.
| def safe_set(attr: str, value: Any) -> None: | ||
| """ | ||
| Only write the attribute if it hasn't already been set or we actually have | ||
| a Synapse logcontext (indicating that this log record is relevant to | ||
| Synapse). | ||
| """ | ||
| if context is not SENTINEL_CONTEXT or not hasattr(record, attr): | ||
| setattr(record, attr, value) |
There was a problem hiding this comment.
Why not move setting the server_name under the request attribute? Then you'd only have one usage of safe_set, and then it'd be even more clear that it's not necessary.
|
Thanks for the review @anoadragon453 🦨 |
This was unintentionally changed in #19068. There is no real bug here. Without this PR, we just printed an empty string for the `sentinel` logcontext whereas the prior art behavior was to print `sentinel` which this PR restores. Found while staring at the logs in #19165 ### Reproduction strategy 1. Configure Synapse with [logging](https://github.com/element-hq/synapse/blob/df802882bb2ec7d52d5c064c20531fe5a5b263b1/docs/sample_log_config.yaml) 1. Start Synapse: `poetry run synapse_homeserver --config-path homeserver.yaml` 1. Notice the `asyncio - 64 - DEBUG - - Using selector: EpollSelector` log line (notice empty string `- -`) 1. With this PR, the log line will be `asyncio - 64 - DEBUG - sentinel - Using selector: EpollSelector` (notice `sentinel`)
Be mindful that Synapse can be run alongside other code in the same Python process. We shouldn't overwrite fields on given log record unless we know it's relevant to Synapse.
(no clobber)
Background
As part of Element's plan to support a light form of vhosting (virtual host) (multiple instances of Synapse in the same Python process), we're currently diving into the details and implications of running multiple instances of Synapse in the same Python process.
"Per-tenant logging" tracked internally by https://github.com/element-hq/synapse-small-hosts/issues/48
Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.