Prevent incorrect parsing in prune_staged_events_in_room() when room version is wrong#11819
Prevent incorrect parsing in prune_staged_events_in_room() when room version is wrong#11819ForestJohnson wants to merge 8 commits intomatrix-org:developfrom
Conversation
7bd86a4 to
254321c
Compare
whoops, I misunderstood the code, it's actually a list of lists not a list of dicts
|
Note that the high amount of logging was due to a typo in the original code, and a fix is available in Synapse v1.50.2. |
Ah interesting, I didn't catch that. So what I had assumed about the room version being wrong was probably not the case, it was simply a bug in the specific synapse version we were using. I still think that basing the parsing on the actual JSON which was sent makes more sense than basing it on what room version was retrieved from the DB. Even better would be if the room version was included as a key on the JSON message, it's typical for any message format which can change to include a version number at the beginning of the message or include a version number as associated metadata, right? I don't know anything about the internal machinations of synapse and its different room versions, maybe assuming the room version must be correct is fine because if it wasn't, something else would go catastrophically wrong and log an error which explains the root cause of the issue? |
|
This PR has more deletes than inserts and simplifies one little thing, but at the same time its just a nitpick considering the actual painful bug was already fixed. I don't really care if the PR is closed, if passing the room version into every federation fn is a pattern that yall wanna keep, don't let me step on your toes there. |
|
Eh, its hard to write a proper changelog for this change and it sounds like the actual problem was already fixed. one less thing for yall to have to look at |
Pull Request Checklist
Currently the JSON parsing logic inside
prune_staged_events_in_roomdepends on the room version. The event json can be formatted either one way or the other, however, there is no version number on the JSON itself, so theroom_versionpassed from outside of the function is used in lieu of that.However, if an event is federated with the wrong format (i.e., the room version is set wrong on either side) then an error will be logged. Our matrix server experienced an incident where a matrix.org room federated thousands of incorrectly formatted events to us, and each event was processed by this function multiple times, including some individual events that appear to have been processed upwards of one hundred times. This error message was logged so much that it grew the log file and filled our disk while we were asleep.EDIT: I was wrong, this insane amount of logging was probably due to a separate bug mentioned below.
Notice that both
prev_eventssemantics (listversusstr) are logically equivalent regardless of the room version. So I see no downside to simply handling them correctly (within this function) regardless of the room version.EDIT: here are a couple more for the check list 😅:
Signed-off-by: forest johnson (forest.n.johnson@gmail.com)