Fix sliding sync performance slow down for long lived connections. by erikjohnston · Pull Request #19206 · element-hq/synapse

erikjohnston · 2025-11-20T13:40:33Z

Fixes #19175

This PR moves tracking of what lazy loaded membership we've sent to each room out of the required state table. This avoids that table from continuously growing, which massively helps performance as we pull out all matching rows for the connection when we receive a request.

The new table is only read when we have data in a room to send, so we end up reading a lot fewer rows from the DB. Though we now read from that table for every room we have events to return in, rather than once at the start of the request.

For an explanation of how the new table works, see the comment on the table schema.

The table is designed so that we can later prune old entries if we wish, but that is not implemented in this PR.

Reviewable commit-by-commit.

We then filter them out before sending to the client, but it is unnecessary to do so and interferes with later changes.

This is so that clients know if they can use a cached `/members` response or not.

This ensures that the set of required state doesn't keep growing as we add and remove member state. We then only load them from the DB when needed, rather than all state for all rooms when we get a request.

It was thinking the table name was `IN`, as it matched `connection_positi(on IS) NULL`.

MadLittleMods

I haven't fully onboarded onto the concept and details to be confident in the approach.

synapse/storage/schema/main/delta/93/02_sliding_sync_members.sql

scripts-dev/check_schema_delta.py

synapse/handlers/sliding_sync/__init__.py

synapse/types/handlers/sliding_sync.py

synapse/handlers/sliding_sync/__init__.py

MadLittleMods · 2025-11-21T20:58:46Z

synapse/handlers/sliding_sync/__init__.py

+                            else:
+                                # For non-limited timelines we always return all
+                                # membership changes. This is so that clients
+                                # who have fetched the full membership list
+                                # already can continue to maintain it for
+                                # non-limited syncs.
+                                #
+                                # This assumes that for non-limited syncs there
+                                # won't be many membership changes that wouldn't
+                                # have been included already (this can only
+                                # happen if membership state was rolled back due
+                                # to state resolution anyway).
+                                required_state_types.append((EventTypes.Member, None))


This seems like a bigger behavioral change.

I think this fixes #18782 🤔 - If so, we should add a test.

Ah, did mean to factor that out but it sneaked in as it needs to be accounted for in the lazy loading stuff.

Added with test_lazy_load_state_reset ✅

Actually, this only fixes it for non-limited syncs. I think we should also return state reset membership in limited timeline scenarios as well.

We should at-least leave a FIXME with a link to the issue in the if-block above.

I don't think we want to return all membership changes when it is limited? Only the ones for users that appear in the timeline / required_state?

When limited, we should do this:

If the state reset/rollback happened in the timeline range, we should give an update.

If we don't want to do that in this PR, we should a) fix it properly b) leave a fixme behind or c) could consider any state rollback as relevant regardless (because sending more state is not wrong).

Why should we do that? In the limited scenario the client knows it has missed some membership updates, and so will need to requery them if needed.

Because we're sending membership state for whatever is relevant in the timeline when lazy-loading. State rollbacks for membership can be just as relevant to the timeline.

We probably need to hop on a call for this.

In the limited case if there is a state rollback for a user who has sent a message in the timeline, then that will get included? We only won't include a state rollback if that user is not referenced in the timeline?

We only won't include a state rollback if that user is not referenced in the timeline?

Isn't that possible and ideally should be included?

Related MSC discussion, matrix-org/matrix-spec-proposals#4186 (comment)

…iously_returned in tests

Co-authored-by: Eric Eastwood <[email protected]>

When fetching previously sent lazy members we didn't filter by room, which meant that we didn't send down member events in a room if we'd previously sent that user's member event in another room.

synapse/handlers/sliding_sync/__init__.py

synapse/storage/schema/main/delta/93/02_sliding_sync_members.sql

MadLittleMods · 2025-12-10T17:24:28Z

synapse/storage/schema/main/delta/93/02_sliding_sync_members.sql

+-- When invalidating rows, we can just delete them. Technically this could
+-- invalidate for a forked position, but this is acceptable as equivalent to a
+-- cache eviction.
+CREATE TABLE sliding_sync_connection_lazy_members (


-> aa2c426

MadLittleMods · 2025-12-10T17:35:58Z

synapse/storage/schema/main/delta/93/02_sliding_sync_members.sql

+-- When invalidating rows, we can just delete them. Technically this could
+-- invalidate for a forked position, but this is acceptable as equivalent to a
+-- cache eviction.
+CREATE TABLE sliding_sync_connection_lazy_members (


I think the current iteration doesn't explain the problem well. We try to share rows in sliding_sync_connection_required_state across as many rooms in a list as possible. With lazy-loading room members, sliding_sync_connection_required_state constantly churns for each room individually and they can no longer be shared. And since sliding_sync_connection_required_state stores a big JSON list of state of all of the required state for each room, it's not efficient. We can instead store a single row for each user in each room in this new table sliding_sync_connection_lazy_members, etc.

(not very good words)

synapse/handlers/sliding_sync/__init__.py

MadLittleMods · 2025-12-10T18:42:34Z

changelog.d/19206.bugfix

@@ -0,0 +1 @@
+Fix sliding sync performance slow down for long lived connections.


In terms of the optimizations being applied on top of Sliding Sync, we already had a pretty high complexity in this area and now it's being multiplied again.

I fear for anyone else who has to try to understand and adapt this further. It's hard enough for me as the one familiar with all of the Sliding Sync code and being witness to all of it growing over time.

We do have decent tests and comments explaining the decisions here if you want to move this forward ⏩

This PR is complex, but I think from a high-level PoV makes more sense. The concept is simple: we need to cache which memberships we've sent down when lazy-loading and we do that by storing it in a table. The actual implementation is definitely a bit finicky. If we were doing this from scratch I'd also factor out the optimisation for remembering what other state we've sent down too, as that is a great source of complexity.

Either way, we need to fix this bug ASAP as it's causing bad perf regressions for users.

synapse/handlers/sliding_sync/__init__.py

synapse/storage/databases/main/sliding_sync.py

tests/handlers/test_sliding_sync.py

Co-authored-by: Eric Eastwood <[email protected]>

erikjohnston · 2025-12-12T10:02:47Z

Thanks for all the reviews @MadLittleMods ! ❤️

Hywan · 2025-12-16T08:41:19Z

🎉

erikjohnston added 3 commits November 20, 2025 09:53

Refactor heroes to not be added to room state

fc6000c

We then filter them out before sending to the client, but it is unnecessary to do so and interferes with later changes.

Always return all memberships for non-limited syncs

087f6eb

This is so that clients know if they can use a cached `/members` response or not.

Make _required_state_changes return struct

49fa7eb

erikjohnston force-pushed the erikj/sss_better_membership_storage2 branch from f67e114 to 0d6ccbe Compare November 20, 2025 13:43

erikjohnston added 4 commits November 20, 2025 13:47

Track lazy loaded members in SSS separately.

8cba313

This ensures that the set of required state doesn't keep growing as we add and remove member state. We then only load them from the DB when needed, rather than all state for all rooms when we get a request.

Update tests

6303bb1

Newsfile

5c48983

Fix check delta script

4984858

It was thinking the table name was `IN`, as it matched `connection_positi(on IS) NULL`.

erikjohnston force-pushed the erikj/sss_better_membership_storage2 branch from 0d6ccbe to 4984858 Compare November 20, 2025 13:52

erikjohnston marked this pull request as ready for review November 20, 2025 15:52

erikjohnston requested a review from a team as a code owner November 20, 2025 15:52

MadLittleMods added A-Sync A-Database A-Performance labels Nov 21, 2025

MadLittleMods reviewed Nov 21, 2025

View reviewed changes

erikjohnston and others added 11 commits November 24, 2025 14:12

Rename required_user_state

7a0a8a2

Reword the cache comments on the schema

8a3ec20

Rename RoomLazyMembershipChanges fields

fc01740

Add RoomLazyMembershipChanges last_seen_ts comment

ae3f569

Clean up comments

027b422

Always include lazy_members_previously_returned and lazy_members_prev…

abee4db

…iously_returned in tests

Fixup comment

99855ba

Use duration constants

0b1ecf1

Update tests/handlers/test_sliding_sync.py

2090d14

Co-authored-by: Eric Eastwood <[email protected]>

Rename previously_returned_user_state param

113f6ce

Fix bug where we didn't correctly filter lazy members by room

ec45e00

When fetching previously sent lazy members we didn't filter by room, which meant that we didn't send down member events in a room if we'd previously sent that user's member event in another room.

erikjohnston force-pushed the erikj/sss_better_membership_storage2 branch from fe94608 to ec45e00 Compare November 25, 2025 11:12

erikjohnston added 3 commits November 25, 2025 11:21

Lint

f8f6dc9

Add test for forked position

5604d3a

Ensure that the last_seen_ts is correctly updated

815b852

erikjohnston added 9 commits December 5, 2025 15:54

s/mem/members/ in query

b63c8ad

Test s/lazy_load_user_ids/request_lazy_load_user_ids/

c1887b8

s/lazy_members_previously_returned/users_to_add_to_lazy_cache/

bfe05de

Add FIXME to pass in connection_key

d96e790

Update returned_user_id_to_last_seen_ts_map comment

42a770d

Rewrap comment

d32e5c2

Update test comment to be accurate

404b047

Move to attribute docstrings

f3eb8e1

Move test to tests.storage

310a342

erikjohnston requested a review from MadLittleMods December 10, 2025 13:16

MadLittleMods reviewed Dec 10, 2025

View reviewed changes

erikjohnston and others added 9 commits December 11, 2025 11:42

Apply suggestions from code review

7ad2c98

Co-authored-by: Eric Eastwood <[email protected]>

Rewrap

536755e

Deduplicate LAZY_MEMBERS_UPDATE_INTERVAL logic

ba3b9b6

Expand comment on test_lazy_loading_room_members_last_seen_ts

cd75ddf

Expand test comments

72ba9f2

Merge into single return

2ae0e98

s/users_to_add_to_lazy_cache/extra_users_to_add_to_lazy_cache/

7367684

Update test comment for empty extra_users_to_add_to_lazy_cache

62e0ae3

Reword table comment about why we don't store in _required_state table

a2427bf

erikjohnston requested a review from MadLittleMods December 11, 2025 15:20

MadLittleMods approved these changes Dec 11, 2025

View reviewed changes

erikjohnston added 2 commits December 11, 2025 18:33

s/required_state_map_change/changed_required_state_map/

e359dc3

Add docstring to has_updates

22bdb95

erikjohnston merged commit dfd00a9 into develop Dec 12, 2025
77 of 80 checks passed

erikjohnston deleted the erikj/sss_better_membership_storage2 branch December 12, 2025 10:02

MadLittleMods mentioned this pull request Dec 30, 2025

Sliding Sync: ensure $LAZY membership changes from state resets get sent down #18782

Open

MadLittleMods mentioned this pull request Jan 28, 2026

MSC4186: Simplified Sliding Sync matrix-org/matrix-spec-proposals#4186

Open

		@@ -0,0 +1 @@
		Fix sliding sync performance slow down for long lived connections.

Conversation

erikjohnston commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MadLittleMods left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MadLittleMods Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erikjohnston commented Dec 12, 2025

Uh oh!

Uh oh!

Hywan commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erikjohnston commented Nov 20, 2025 •

edited

Loading

MadLittleMods Dec 30, 2025 •

edited

Loading