Conversation
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Implementation requirements:
- Server (preferably multiple)
- Client with account key awareness (preferably multiple)
- Complement tests
There was a problem hiding this comment.
These implementation requirements are a bit light. Given that there is no security disclosure happening related to this MSC, could we please be a little more considerate with rolling this out vs V12?
There was a problem hiding this comment.
Tbh, why are implementation requirements this weak for a mainline room version?
There was a problem hiding this comment.
We do prefer multiple implementations for changes like this - I've clarified the comment.
Discussed over chat: this MSC does not define a (stable) room version like v12 - it describes a component of a possible future room version. The testing is more important for the future version's MSC when all the bundled changes are included in a "real" room version.
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
| - As user IDs are user controlled, spammers set their localpart to abusive messages in order to harass and intimidate others. Redactions | ||
| do not remove the user ID so these messages persist in the room. |
There was a problem hiding this comment.
Similarly to https://github.com/matrix-org/matrix-spec-proposals/pull/4243/files#r2336511177, abusive domain names are still persisted
There was a problem hiding this comment.
The linked comment is:
This does not work for single user instances, since their domain is still part of all mxids in this proposal.
There was a problem hiding this comment.
Resolved with the change to using the key as the principal. For clarity, the unsigned section now looks like:
{
// .. event fields
"unsigned": {
"sender_account": {
"key": "l8Hft5qXKn1vfHrg3p4-W8gELQVo8N13JkluMfmn2sQ",
"user_id": "@kegan:matrix.org",
}
}
}Where user_id is optional, and set only for verified domains/users. Then:
Clients can render the
unsigned.sender_account.user_idfield (if it exists) as a human-readable displayable identifier for the user. When the user is deleted, thekeycan be rendered instead.
This is specifically referring to GDPR erasure but the same can apply for redacting the membership event.
| may want to send the account name user ID as the user ID may be displayed on the notification, but some clients may want to map | ||
| the name to an account key user ID. | ||
|
|
||
| ### Security Considerations |
There was a problem hiding this comment.
How do we stop a server from generating an unbounded number account keys for server names for which ownership cannot be verified (ie entirely made up). Then leak them to other servers to dos the room with membership for those account keys or other events? ie https://github.com/matrix-org/matrix-spec-proposals/pull/4345/files#diff-20e93dc19ad1924029f07297e00cdc8cd11288f8ad65c77b6a71df8f05b3c1d0R48-R73
The reason an attacker would do this is to evade detection, with the spam generating servers being unaffected by m.room.server_acl provided the leaker can avoid DAG forensics or coordinated traffic analysis (both of which would take time AND for targetted servers to synchronise with the room admin's servers). Unlike the current room model, the spam generating servers under this MSC would not actually need any distinct infrastructure, deployment, domain names. So the attack would be cheaper to conduct.
It can also be used in private rooms where the invite permission can fan out to new members.
There was a problem hiding this comment.
MSC4345 handles this by requiring room members to publish a server key AND requiring a room admin to acknowledge each new key. And the power level to do that is distinct to invite.
There was a problem hiding this comment.
Just to make sure I understand: your concern is that this MSC makes the barrier to entry for spammers to spam at the server-level too low, because you don't need any DNS registration. In comparison, MSC4345 has this admin handshake where new servers need to get sign-off from an existing admin prior to being able to join. Is that right?
Assuming it is right, then it's a tradeoff between safety and availability. My primary concern with an admin handshake is that it globally decreases availability in the protocol because if you cannot talk to an admin server then you cannot join the room. This may feel like some distant edge case but in practice it isn't. Last year matrix.org was down for an extended time and we were still able to communicate in existing rooms with only matrix.org admins because we could invite our backup accounts into the room. Had we been operating with MSC4345, we would not be able to do this and would have to create new rooms for the duration of the outage. In addition, whilst I can see the argument for an admin handshake on the public network, Matrix is used in a lot more than just the public federation. Many private, closed federations exist where a requirement to talk to the admin may not be appropriate or practical. These reasons are why I tend to push for more availability in the protocol, with the ability to centralise via policy servers (who can apply much more sophisticated rules) should the network need it.
That being said, I do think there is a place for traceable memberships in the protocol, but done in a less intrusive way. Specifically, I think we should force all public rooms to require invites, but allow the server to issue the invite (and thus not require any client interaction to join public rooms). This means the join-helper server in the make/send-join dance would be identifiable in the DAG, which can be critical to identify colluding servers.
It's worth noting that the attack here wouldn't affect clients as events from unverified domains are not sent to clients, so it would be purely a DoS attack, which can be mitigated via existing rate limiting tooling.
There was a problem hiding this comment.
Just to make sure I understand: your concern is that this MSC makes the barrier to entry for spammers to spam at the server-level too low, because you don't need any DNS registration.
Yep
Last year matrix.org was down for an extended time and we were still able to communicate in existing rooms with only matrix.org admins because we could invite our backup accounts into the room. Had we been operating with MSC4345, we would not be able to do this and would have to create new rooms for the duration of the outage.
So you are saying you had no redundant admin accounts and have lost control of all powered user accounts? In this scenario the integrity of your room is seriously compromised. Being able to join or even interact with the room as a joined user is a serious security issue.
My primary concern with an admin handshake is that it globally decreases availability in the protocol because if you cannot talk to an admin server then you cannot join the room.
This is desirable, if no one can moderate new joins, then joining the room should be unavailable.
Many private, closed federations exist where a requirement to talk to the admin may not be appropriate or practical.
Then the power level or any mechanism which allows a server to handle joins can be more distributed in these environments, much like the invite power level. Are we confident closed federations are not also be vulnerable to the same attack vector?
These reasons are why I tend to push for more availability in the protocol, with the ability to centralise via policy servers (who can apply much more sophisticated rules) should the network need it.
This would make any room without a policy server insecure, and private rooms will also have this problem.
That being said, I do think there is a place for traceable memberships in the protocol, but done in a less intrusive way. Specifically, I think we should force all public rooms to require invites, but allow the server to issue the invite (and thus not require any client interaction to join public rooms). This means the join-helper server in the make/send-join dance would be identifiable in the DAG, which can be critical to identify colluding servers.
Would it not be simpler and equivalent to require a policy server to sign each join?
It's worth noting that the attack here wouldn't affect clients as events from unverified domains are not sent to clients, so it would be purely a DoS attack, which can be mitigated via existing rate limiting tooling.
While the events aren't sent to clients, the memberships will surely add to the state complexity of the room? And so rate limiting will not be sufficient, and the attack couldn't be detected by room admins?
There was a problem hiding this comment.
if no one can moderate new joins, then joining the room should be unavailable.
And that's the key difference in our views. It's ultimately going to be up to the SCT to decide if this risk is worth the gain in availability. It's really about the default behaviour because both options can add this moderation gate / be more available, but it's whether it does by default. It's a bit late when your admin server is knocked offline to change the PLs. Similarly, it's a bit late to add a policy server when your room is full of CSAM. Taken to the extreme, the loss in availability damages the ability to do P2P Matrix as the chances of admin users being online are much lower than admin servers. On the other extreme, the focus on policy servers everywhere would indeed mean "any room without a policy server insecure", damaging the decentralisation efforts of the public federation.
Would it not be simpler and equivalent to require a policy server to sign each join?
No, because any server can act as a join-helper currently, whereas a policy server is centralised.
While the events aren't sent to clients, the memberships will surely add to the state complexity of the room? And so rate limiting will not be sufficient, and the attack couldn't be detected by room admins?
It adds more state events to the DAG, which is what "state complexity" tries to measure. Unfortunately this is just the way append-only data structures work. This problem exists irrespective of either MSC, meaning servers need to protect themselves (e.g via rate limits) regardless. This isn't the room admin's responsibility as it only consumes server resources. I have some ideas on compacting BFT CRDTs, but it's very much at the academic research level, there aren't any good off-the-shelf options that we can use.
There was a problem hiding this comment.
It adds more state events to the DAG, which is what "state complexity" tries to measure. Unfortunately this is just the way append-only data structures work. This problem exists irrespective of either MSC, meaning servers need to protect themselves (e.g via rate limits) regardless. This isn't the room admin's responsibility as it only consumes server resources. I have some ideas on compacting BFT CRDTs, but it's very much at the academic research level, there aren't any good off-the-shelf options that we can use.
A BFT CRDT only helps in the situation that a leaky server is used to evade detection. But it seems that given these events from invalid domains would not be shown to clients a byzantine node may not be required. All that is happening is that nodes are generating a lot of noise. As for compaction? Maybe?
In MSC4345 these events are traceable and can be seen by the room admin. The leaking servers can be found added to m.room.server_acl. It was designed with this attack in mind. So the idea that we can't do anything about this now without getting to the cutting edge of research isn't actually true.
There was a problem hiding this comment.
Unfortunately this is just the way append-only data structures work. This problem exists irrespective of either MSC
Specifically, we already have a solution available outside of the current room model in m.room.server_acl and this proposal (MSC4243) does weaken the situations where m.room.server_acl is effective.
To be precise m.room.server_acl is an example of thinking outside the box. ie Byzantines cannot deliver their messages and so their events cannot be considered, and the rest of the room converges on what the history of byzantine nodes is before the point they were cut off1. So it isn't fair or appropriate to make this appeal to the nature of append-only data structures as a reason to continue forward regardless.
Separate to that point being unfair, we can always do more and solve this in a better way. MSC4345 opens the door for a lot of exploration here. For example we can change the bar from "the rest of the room converges on what the history of an excluded server is" to canonicalising that history through the revoke participation auth event (which is coordinated with power level).
Footnotes
-
Though you might consider
m.room.server_aclto be a product of the "ingenuity of fools" ? ↩
There was a problem hiding this comment.
It's ultimately going to be up to the SCT to decide if this risk is worth the gain in availability.
I haven't run this past the SCT, but my quick suggestion here is:
- The protocol shouldn't force joins to go via an admin server in general, as otherwise it undermines availability during a partition for environments which care about that (i.e. any non-public-chatroom use case).
- However, if you're in a public chatroom, you probably do want your joins to go via a central point of control to avoid abuse - whether that's a joingate or policyserver or admin server. So you can and should add them in to protect your public rooms.
- This chokepoint should be layered on top for public rooms rather than something baked into the protocol for all rooms however.
There was a problem hiding this comment.
an alternative could be that public-joins always do need to be signed off by an admin server (or join gate, or similar) to prevent abuse, but invites/knocks don't?
There was a problem hiding this comment.
(See also #4345 (review))
@tulir already has join-gates deployed in all his public rooms via restricted join and a synapse patch. We could probably just change the join rule for public rooms to be more like restricted join and mandate that the homeserver implement some basic checks here like checking that the server name is valid on new keys before signing any join. And then we can later pass-through the capability to policy server or moderation bots so that they can do arbitrary checks on joining users (or this can continue to happen in an implementation dependent way).
https://github.com/maunium/meowlnir/blob/285d810e56e0586a9d0bcb61a3b3bc989fb409f4/policyeval/antispam.go#L141-L169
https://mau.dev/maunium/synapse/-/blob/master/synapse/module_api/callbacks/spamchecker_callbacks.py?ref_type=heads#L491-522
Rendered