Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Prevent incorrect parsing in prune_staged_events_in_room() when room version is wrong#11819

Closed
ForestJohnson wants to merge 8 commits intomatrix-org:developfrom
ForestJohnson:develop
Closed

Prevent incorrect parsing in prune_staged_events_in_room() when room version is wrong#11819
ForestJohnson wants to merge 8 commits intomatrix-org:developfrom
ForestJohnson:develop

Conversation

@ForestJohnson
Copy link
Copy Markdown
Contributor

@ForestJohnson ForestJohnson commented Jan 25, 2022

Pull Request Checklist

Currently the JSON parsing logic inside prune_staged_events_in_room depends on the room version. The event json can be formatted either one way or the other, however, there is no version number on the JSON itself, so the room_version passed from outside of the function is used in lieu of that.

However, if an event is federated with the wrong format (i.e., the room version is set wrong on either side) then an error will be logged. Our matrix server experienced an incident where a matrix.org room federated thousands of incorrectly formatted events to us, and each event was processed by this function multiple times, including some individual events that appear to have been processed upwards of one hundred times. This error message was logged so much that it grew the log file and filled our disk while we were asleep.

EDIT: I was wrong, this insane amount of logging was probably due to a separate bug mentioned below.

Notice that both prev_events semantics (list versus str) are logically equivalent regardless of the room version. So I see no downside to simply handling them correctly (within this function) regardless of the room version.

EDIT: here are a couple more for the check list 😅:

Signed-off-by: forest johnson (forest.n.johnson@gmail.com)

@ForestJohnson ForestJohnson requested a review from a team as a code owner January 25, 2022 01:00
@anoadragon453
Copy link
Copy Markdown
Member

Note that the high amount of logging was due to a typo in the original code, and a fix is available in Synapse v1.50.2.

@ForestJohnson
Copy link
Copy Markdown
Contributor Author

ForestJohnson commented Jan 25, 2022

Note that the high amount of logging was due to a typo in the original code, and a fix is available in Synapse v1.50.2.

Ah interesting, I didn't catch that. So what I had assumed about the room version being wrong was probably not the case, it was simply a bug in the specific synapse version we were using.

I still think that basing the parsing on the actual JSON which was sent makes more sense than basing it on what room version was retrieved from the DB. Even better would be if the room version was included as a key on the JSON message, it's typical for any message format which can change to include a version number at the beginning of the message or include a version number as associated metadata, right?

I don't know anything about the internal machinations of synapse and its different room versions, maybe assuming the room version must be correct is fine because if it wasn't, something else would go catastrophically wrong and log an error which explains the root cause of the issue?

@ForestJohnson ForestJohnson changed the title Prevent GBs logging from prune_staged_events_in_room() w/ wrong room version Prevent incorrect parsing in prune_staged_events_in_room() when room version is wrong Jan 25, 2022
@ForestJohnson
Copy link
Copy Markdown
Contributor Author

This PR has more deletes than inserts and simplifies one little thing, but at the same time its just a nitpick considering the actual painful bug was already fixed. I don't really care if the PR is closed, if passing the room version into every federation fn is a pattern that yall wanna keep, don't let me step on your toes there.

@ForestJohnson
Copy link
Copy Markdown
Contributor Author

ForestJohnson commented Jan 26, 2022

Eh, its hard to write a proper changelog for this change and it sounds like the actual problem was already fixed. one less thing for yall to have to look at

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants