Sliding sync: various fixups to the background update#17652
Sliding sync: various fixups to the background update#17652erikjohnston merged 23 commits intodevelopfrom
Conversation
| # Scrutinize JSON values | ||
| if room_name is None or isinstance(room_name, str): | ||
| if room_name is None or ( | ||
| isinstance(room_name, str) and "\0" not in room_name |
There was a problem hiding this comment.
Why do we care to check this? As far as I can tell, it's valid JSON. Is this a Matrix spec thing?
Perhaps it's because we have to mix with a system that is sensitive to null-terminated strings (C strings)? Doing some quick searching, it seems like Postgres does not allow null bytes in TEXT fields but it is allowed in SQLite.
The only other place we do this is in
synapse/synapse/storage/databases/main/stats.py
Lines 268 to 288 in 391c4f8
Why aren't we checking other content values like we do in the stats code?
Why aren't we checking this in the events code where we also insert these state values into the database? Do we disallow this with some validation somewhere to prevent this with new data?
We should explain the reason why we're doing this. I assume it's the Postgres reason.
There was a problem hiding this comment.
D'oh! Sorry yeah, postgres TEXT fields can't have \0 bytes in them (nyargghghgh why). Will update with comment
There was a problem hiding this comment.
Why aren't we checking other content values like we do in the stats code?
^
Why aren't we checking this in the
eventscode where we also insert these state values into the database? Do we disallow this with some validation somewhere to prevent this with new data?
^
There was a problem hiding this comment.
Oh yeah, I forgot that room_type is also a thing. The rest I think are fine, e.g. tombstone room ID should be a valid room ID (but we may as well check that too).
| AND (c.membership != ? OR c.user_id != e.sender) | ||
| ORDER BY c.room_id ASC, c.user_id ASC | ||
| LIMIT ? | ||
| """, | ||
| (last_room_id, last_user_id, batch_size), | ||
| (last_room_id, last_user_id, Membership.LEAVE, batch_size), |
There was a problem hiding this comment.
I'd prefer not to exclude leave memberships. It's just extra things to think about.
We want the table to be complete in the end so that means we will need to add another background update to fill in the leave events. If we're going to split the background update, we should just do that in this PR so we don't need to remove those assertions from the tests that will probably get lost.
There was a problem hiding this comment.
I'm a bit in tow minds about this. The problem is that this is a surprisingly huge amount of data that will just never be read. On the other hand, it feels inconsistent to not port them over but to keep future old rows.
I wonder if the right thing to do here maybe is to skip these rows, but add a background job that clears out old left rooms from the table? Which is possible since we can response with M_UNKNOWN_POS if we get a sliding sync request that has a pos from before we purged?
There was a problem hiding this comment.
If we want to accept the forever background update to clean-up leaves and keep the size of the database table down, that can work 👍
I assume we want some grace period for left rooms (a day)? That way people don't immediately get a M_UNKNOWN_POS a soon as they leave a room and allows other clients to catch-up gracefully if the room is left on another client.
Is it really worth this complexity though? We're not storing every membership ever, just the latest membership of a given user for a given room.
There was a problem hiding this comment.
Are we sure that we're not going to add a include_leave option like Sync v2 has?
There was a problem hiding this comment.
I assume we want some grace period for left rooms (a day)?
Yeah, probably something like a week or something, maybe longer.
Is it really worth this complexity though? We're not storing every membership ever, just the latest membership of a given user fro a given room.
It's kinda sucky to keep around things forever. It's fine when it's not causing issues, but it'll take a really long time for the background updates to run for data that we currently don't use.
Are we sure that we're not going to add a
include_leaveoption like Sync v2 has?
That's a fair question, I guess it would be good to know if any clients actually use that option.
I guess a potential half-way house is for us to not port over the metadata for left rooms for now? Which also seems wrong but will be a lot quicker and have a chance to actually complete.
There was a problem hiding this comment.
Some discussion in an internal room (light discussion with people on both sides)
There was a problem hiding this comment.
If I remove this patch and open a separate PR are you happy with the rest? We probably want the actual bug fixes to go into the RC
There was a problem hiding this comment.
Sounds good 🙂 ("Ignore leave events for bg updates" moved to another PR)
e398290 to
1679ba0
Compare
Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>
Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>
a1d3838 to
06173a3
Compare
Follow-up to #17641, #17634, #17631 and #17632 to fix-up #17512
A few things are going on here.
forgottenstate for those memberships, but I don't actually think we need to. If a row already exists for that room/user then it was inserted as a new one by the event persist path, in which case it is by definition not forgotten, so we don't need to fix that path up.Ignore leave events for bg updatesReviewable commit-by-commit.