Sliding sync: various fixups to the background update by erikjohnston · Pull Request #17652 · element-hq/synapse

erikjohnston · 2024-09-03T11:42:20Z

Follow-up to #17641, #17634, #17631 and #17632 to fix-up #17512

A few things are going on here.

Skip over existing rows
- This saves a bunch of time ignoring work we've already done.
- This does mean that we don't update the forgotten state for those memberships, but I don't actually think we need to. If a row already exists for that room/user then it was inserted as a new one by the event persist path, in which case it is by definition not forgotten, so we don't need to fix that path up.
Handle missing previous memberships
- Turns out there are valid (though rare) reasons for us to not have a previous membership for a leave/ban outlier, so we need to gracefully handle that case
~~Ignore leave events for bg updates~~
- This is probably controversial, but let's skip inserting rows for left rooms. We'll end up filtering them out anyway, so we just end up spending ages inserting them for no benefit. If we want we can add another background update to add them back in (which wouldn't block using the new tables).
- We still insert new left rooms, to handle the case of detecting newly left rooms

Reviewable commit-by-commit.

MadLittleMods · 2024-09-03T16:33:29Z

synapse/storage/databases/main/events.py

                # Scrutinize JSON values
-                if room_name is None or isinstance(room_name, str):
+                if room_name is None or (
+                    isinstance(room_name, str) and "\0" not in room_name


Why do we care to check this? As far as I can tell, it's valid JSON. Is this a Matrix spec thing?

Perhaps it's because we have to mix with a system that is sensitive to null-terminated strings (C strings)? Doing some quick searching, it seems like Postgres does not allow null bytes in TEXT fields but it is allowed in SQLite.

The only other place we do this is in

synapse/synapse/storage/databases/main/stats.py

Lines 268 to 288 in 391c4f8

# Ensure that the values to update are valid, they should be strings and

# not contain any null bytes.

#

# Invalid data gets overwritten with null.

#

# Note that a missing value should not be overwritten (it keeps the

# previous value).

sentinel = object()

for col in (

"join_rules",

"history_visibility",

"encryption",

"name",

"topic",

"avatar",

"canonical_alias",

"guest_access",

"room_type",

):

field = fields.get(col, sentinel)

if field is not sentinel and (not isinstance(field, str) or "\0" in field):

Why aren't we checking other content values like we do in the stats code?

Why aren't we checking this in the events code where we also insert these state values into the database? Do we disallow this with some validation somewhere to prevent this with new data?

We should explain the reason why we're doing this. I assume it's the Postgres reason.

D'oh! Sorry yeah, postgres TEXT fields can't have \0 bytes in them (nyargghghgh why). Will update with comment

Why aren't we checking other content values like we do in the stats code?

^

Why aren't we checking this in the events code where we also insert these state values into the database? Do we disallow this with some validation somewhere to prevent this with new data?

^

Oh yeah, I forgot that room_type is also a thing. The rest I think are fine, e.g. tombstone room ID should be a valid room ID (but we may as well check that too).

MadLittleMods · 2024-09-03T16:40:47Z

synapse/storage/databases/main/events_bg_updates.py

+                        AND (c.membership != ? OR c.user_id != e.sender)
                    ORDER BY c.room_id ASC, c.user_id ASC
                    LIMIT ?
                    """,
-                    (last_room_id, last_user_id, batch_size),
+                    (last_room_id, last_user_id, Membership.LEAVE, batch_size),


I'd prefer not to exclude leave memberships. It's just extra things to think about.

We want the table to be complete in the end so that means we will need to add another background update to fill in the leave events. If we're going to split the background update, we should just do that in this PR so we don't need to remove those assertions from the tests that will probably get lost.

I'm a bit in tow minds about this. The problem is that this is a surprisingly huge amount of data that will just never be read. On the other hand, it feels inconsistent to not port them over but to keep future old rows.

I wonder if the right thing to do here maybe is to skip these rows, but add a background job that clears out old left rooms from the table? Which is possible since we can response with M_UNKNOWN_POS if we get a sliding sync request that has a pos from before we purged?

If we want to accept the forever background update to clean-up leaves and keep the size of the database table down, that can work 👍

I assume we want some grace period for left rooms (a day)? That way people don't immediately get a M_UNKNOWN_POS a soon as they leave a room and allows other clients to catch-up gracefully if the room is left on another client.

Is it really worth this complexity though? We're not storing every membership ever, just the latest membership of a given user for a given room.

Are we sure that we're not going to add a include_leave option like Sync v2 has?

I assume we want some grace period for left rooms (a day)?

Yeah, probably something like a week or something, maybe longer.

Is it really worth this complexity though? We're not storing every membership ever, just the latest membership of a given user fro a given room.

It's kinda sucky to keep around things forever. It's fine when it's not causing issues, but it'll take a really long time for the background updates to run for data that we currently don't use.

Are we sure that we're not going to add a include_leave option like Sync v2 has?

That's a fair question, I guess it would be good to know if any clients actually use that option.

I guess a potential half-way house is for us to not port over the metadata for left rooms for now? Which also seems wrong but will be a lot quicker and have a chance to actually complete.

Some discussion in an internal room (light discussion with people on both sides)

If I remove this patch and open a separate PR are you happy with the rest? We probably want the actual bug fixes to go into the RC

Sounds good 🙂 ("Ignore leave events for bg updates" moved to another PR)

c.f. #17699

synapse/storage/databases/main/events_bg_updates.py

synapse/storage/databases/main/events.py

synapse/storage/databases/main/events_bg_updates.py

…nd job (#17673) Follow-up to #17652, #17641, #17634, #17631 and #17632 to fix-up #17512

Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>

This reverts commit f71dd25.

This reverts commit 8140ca3.

erikjohnston added 10 commits September 2, 2024 13:23

Skip over existing rows

b5ad7da

Handle missing previous memberships

269dc55

Don't overwrite forgotten flag

01860e1

Ignore leave events for bg updates

f71dd25

Fix tests for leaves

8140ca3

Ignore invites to rooms with unknown room version

4f3333b

Ignore nulls in room names

4369e94

Ignore nulls in invite state

330e614

Handle the case where there is a missing room_membership row

037cb10

Newsfile

20542f0

erikjohnston added X-Release-Blocker A-Sync labels Sep 3, 2024

erikjohnston marked this pull request as ready for review September 3, 2024 11:48

erikjohnston requested a review from a team as a code owner September 3, 2024 11:48

erikjohnston requested a review from MadLittleMods September 3, 2024 11:48

MadLittleMods reviewed Sep 3, 2024

View reviewed changes

erikjohnston added 5 commits September 5, 2024 14:18

Comment why we strip null bytes

0a7f41c

Only skip rows if we have the same event ID

47bfb1b

Update comment on room versions

2a48840

Don't inherit values

5f04c2f

Add note on why we're skipping old left rooms

b778219

erikjohnston requested a review from MadLittleMods September 5, 2024 13:58

erikjohnston mentioned this pull request Sep 5, 2024

Sliding sync: various fixups to the sliding sync joined room background job #17673

Merged

Fix old sqlite versions

1679ba0

erikjohnston force-pushed the erikj/ss_fixups branch from e398290 to 1679ba0 Compare September 5, 2024 14:32

MadLittleMods reviewed Sep 5, 2024

View reviewed changes

synapse/storage/databases/main/events.py Outdated Show resolved Hide resolved

synapse/storage/databases/main/events.py Show resolved Hide resolved

synapse/storage/databases/main/events_bg_updates.py Outdated Show resolved Hide resolved

erikjohnston added a commit that referenced this pull request Sep 10, 2024

Sliding sync: various fixups to the sliding sync joined room backgrou…

b3047f3

…nd job (#17673) Follow-up to #17652, #17641, #17634, #17631 and #17632 to fix-up #17512

erikjohnston and others added 3 commits September 10, 2024 10:26

Update synapse/storage/databases/main/events.py

1963f18

Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>

Update synapse/storage/databases/main/events.py

ec6b2d5

Co-authored-by: Eric Eastwood <eric.eastwood@beta.gouv.fr>

Also ignore room types with null bytes

46804dd

erikjohnston added 2 commits September 10, 2024 16:00

Merge remote-tracking branch 'origin/develop' into erikj/ss_fixups

7e84e7c

Also ignore null bytes in tombstone room IDs

06173a3

erikjohnston force-pushed the erikj/ss_fixups branch from a1d3838 to 06173a3 Compare September 10, 2024 15:10

erikjohnston added 2 commits September 11, 2024 14:49

Revert "Ignore leave events for bg updates"

e0e7a8b

This reverts commit f71dd25.

Revert "Fix tests for leaves"

cf4e6cb

This reverts commit 8140ca3.

erikjohnston merged commit 596b964 into develop Sep 11, 2024

erikjohnston deleted the erikj/ss_fixups branch September 11, 2024 14:38

erikjohnston mentioned this pull request Sep 11, 2024

Sliding sync: don't port left rooms in sliding sync tables background update #17699

Closed

erikjohnston added a commit that referenced this pull request Sep 11, 2024

Sliding sync: various fixups to the background update (#17652)

b732d13

	# Ensure that the values to update are valid, they should be strings and
	# not contain any null bytes.
	#
	# Invalid data gets overwritten with null.
	#
	# Note that a missing value should not be overwritten (it keeps the
	# previous value).
	sentinel = object()
	for col in (
	"join_rules",
	"history_visibility",
	"encryption",
	"name",
	"topic",
	"avatar",
	"canonical_alias",
	"guest_access",
	"room_type",
	):
	field = fields.get(col, sentinel)
	if field is not sentinel and (not isinstance(field, str) or "\0" in field):

Conversation

erikjohnston commented Sep 3, 2024 • edited by MadLittleMods Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erikjohnston commented Sep 3, 2024 •

edited by MadLittleMods

Loading

MadLittleMods Sep 5, 2024 •

edited

Loading