Conversation
There was a problem hiding this comment.
Implementation requirements:
- Complement tests for unstable room version
- Server with unstable room version
As with all room version changes, stable implementation is not possible until curated/aggregated into a stable room version in a different MSC.
|
|
||
| > If the content of the `m.room.create` event in the room state has the property `m.federate` set to `false`, and the `sender` domain of the event does not match the `sender` domain of the create event, reject. | ||
|
|
||
| Instead, `m.federate` will now function the same as server ACLs. When `m.federate` is `false` then the server MUST reject _all_ inbound federation traffic for that room). |
There was a problem hiding this comment.
This is not sufficient probably. The m.federate flag is often also used to prevent any outbound traffic, which ACLs do not limit. So that should probably mentioned here. (At least that is my understanding?)
There was a problem hiding this comment.
I don't think that's universally true. I recall @jevolk mentioning that m.federate invites do get sent out, but they just can't accept it or something? We should absolutely clarify what we mean, but is this the MSC to do it?
| - [Checking](https://spec.matrix.org/v1.14/appendices/#checking-for-a-signature) the `m.room.member` event has a valid signature from that server signing key. | ||
|
|
||
| Links MUST be periodically rechecked to allow for server signing keys to be rotated, or domains to be reused. | ||
| This means a previously verified link can revert to unverified. |
There was a problem hiding this comment.
This generally seems like a bad idea for a few reasons.
- Introducing the need to poll servers, when none existed before, will result in an increase in the amount of outbound federation traffic on large servers. On a medium sized server, if we re-check just once a day, this would result in a new outbound server request - and a new server resolution - every few seconds. The actual time for this to be a useful security feature would be much much shorter.
- This seems like a bad idea anyway, because we have already witnessed that the event sender was authorised at a particular point in time. Discarding information that a server knows to be true seems like an antipattern.
- We can't soft-fail events that are already in the timeline.
There was a problem hiding this comment.
Perhaps the wording can be improved here because I think you're misunderstanding.
Introducing the need to poll servers, when none existed before
There has always been a need to poll as you need to check for server signing key revocation periodically. valid_until_ts puts a 7 day hard cap on how high it can be: "Servers MUST use the lesser of this field and 7 days into the future when determining if a key is valid."
we have already witnessed that the event sender was authorised at a particular point in time. Discarding information that a server knows to be true seems like an antipattern.
We're not discarding information. Whether the member is authorised to send an event is independent to the label we attach to said member (the user ID). That label is mutable, and should be kept up-to-date as it contains routing information.
We can't soft-fail events that are already in the timeline.
I don't see the problem here. The aim is not to get all servers to agree on which events are soft-failed, which would require some way to soft-fail events already sent to clients.
There was a problem hiding this comment.
There has always been a need to poll as you need to check for server signing key revocation periodically.
As far as I'm aware, this is not correct. All existing implementations, including synapse, validate keys on demand. Requiring a periodic polling is a significant change.
We're not discarding information. Whether the member is authorised to send an event is independent to the label we attach to said member (the user ID). That label is mutable, and should be kept up-to-date as it contains routing information.
There probably needs to be a lot more clarity about what happens when this happens, and how to treat past events. In any case, current routing information has different semantics from past information, and knowing a user ID was valid in the past is useful information from a user experience perspective.
I don't see the problem here.
I misinterpreted it as applying to all events - see prev quote - but even only applying to new servers, this would still result in a massive amount of bitrot.
There was a problem hiding this comment.
As far as I'm aware, this is not correct. All existing implementations, including synapse, validate keys on demand. Requiring a periodic polling is a significant change.
We can't avoid periodic polling. Ignoring key revocation for a moment, servers will not converge on room state unless they periodically share their latest forward extremities in rooms. The current behaviour doesn't do this, which leads to scenarios where Server A knows that Alice is banned because Server C sent the ban event, but Server B remains ignorant because Server A never lets it know (and perhaps Server C did try to tell Server B but couldn't due to a network partition). This is represented in MSC4242 by the notion of "eventual delivery" which is a hard requirement for strong eventual consistency.
knowing a user ID was valid in the past is useful information from a user experience perspective
@ara4n thought a lot about this. To paraphrase his thoughts:
- the problem is almost identical to historical profiles in the end
- so we could also implement historical mxids, by pulling the mapping from the membership event… but overlaying the map for live activity
Comments supporting your position being:
i think it’s because i would be pissed if I sent 10y of messages as matthew@company_a and then somebody came and bought the company and unceremoniously ported me to be matthew@company_b and retrospectively everything i ever said was now branded as company B.
My counter-points were:
- other mainstream apps don't attempt to display historical display names.
- if you change your name / transition you may intentionally desire for the old identity to be scrubbed.
- erasure means we have to remove history where possible
- it's objectively simpler for clients to track a single current value than a series of changes on an unlinearisable data structure.
but even only applying to new servers, this would still result in a massive amount of bitrot.
Can you clarify what you mean by "bitrot"?
There was a problem hiding this comment.
Can you clarify what you mean by "bitrot"?
The loss of access to information over time. In this case, say we're a new server joining a historic room. Many servers that have participated in this room may have gone offline. As we backfill room history and encounter new profiles, we are unable to verify profiles for servers that have gone offline - if we collow the current proposal, this means we have to soft-fail them.
The current state of the federation is that signing keys can fail in the same way, but key notaries keep old_verify_keys around so that events can be validated as long as keys are not revoked. This greatly slows bitrot, because large/commonly-trusted servers are likely to have the keys for much of the federation. It's not ideal, though. I believe this proposal has the opportunity to improve things.
the problem is almost identical to historical profiles in the end.
Yep, it's partially the same problem - especially as this proposal makes the localpart and domain a part of the membership - with the added semantics of verification. However, I'm less concerned about changing the name as I am about the loss of verification.
- other mainstream apps don't attempt to display historical display names.
- it's objectively simpler for clients to track a single current value than a series of changes on an unlinearisable data structure.
I can't argue there.
- if you change your name / transition you may intentionally desire for the old identity to be scrubbed.
- erasure means we have to remove history where possible
These both require explicit redaction, otherwise the data may still be accessible. And as a trans person who has changed my name, I get the importance - but decay of verifiability and explicit removal of information have very different semantics, and you can't just invalidate a server key to redact one user's identity anyway. Continuing to use the same identity also comes with risks that some of us choose to take and some of us don't. This also applies to Matthew's point - I think company profiles are likely to be 'server-controlled' for that reason.
I haven't looked at 4242 yet, but if that's why that change is relevant it should probably be contained in that proposal - perhaps it can be noted as a dependency.
As it is, from what I can see we could have semantics like this:
- Servers already play an important part in the join process. They can refuse to make a join for an identity they cannot verify. A server that accepts unverifiable joins can be excluded from the room via moderation tools. (perhaps this requirement can be controlled at a room level, for a future p2p-first experience where users might not have a server-based identity at all)
- Servers other than the joining server can validate the identity, and provide this information to clients. If they can't verify the identity, clients should not display the identity and should display a warning (but can display names, etc, as they wish). It may be worth having servers filter out unverifiable identites from clients, but not the events themselves.
- Events should otherwise be processed as per existing auth rules, with the key swapped to the user key.
Re-validating identities then becomes much less necessary - it can once again hapen on demand, whether for routing or for user interaction - for example, viewing a global profile. Historical identities remain for servers that witness them, and are irrelevant for those that don't.
Please tell me if there's a glaring reason that couldn't work?
There was a problem hiding this comment.
Servers already play an important part in the join process. They can refuse to make a join for an identity they cannot verify. A server that accepts unverifiable joins can be excluded from the room via moderation tools.
I don't see how this is related, but yes servers could apply some checks and refuse to take part in the join process if those checks fail. Other servers cannot blindly trust that those checks were done though, irrespective of moderation tooling since there's no guarantee the room will have said tooling. Something like MSC4416 would help here.
Servers other than the joining server can validate the identity, and provide this information to clients. If they can't verify the identity, clients should not display the identity and should display a warning (but can display names, etc, as they wish).
Indeed, and the UX would look almost exactly like bluesky since we are concerned about the same thing (validity of DNS names attached to a user). See "invalid handle":
Re-validating identities then becomes much less necessary
..but this is where we disagree. Just because you confirmed a Member Key <--> User ID link yesterday does not make the link valid tomorrow. The primary reason why this matters is for the case where the server signing key and/or member key were compromised. In this scenario, you've lost control over those members cryptographically, the only thing you have which an attacker does not have is control over the domain name. Whilst you can't invalidate events sent with the compromised key, you can unlink it from your domain to make their messages begin to soft-fail. However, you need to poll in order to do this because otherwise the attacker can artificially keep their timestamps within the existing validity period and so other servers will never re-query and see that it has now expired.
To flip it around, what are the downsides to polling:
- large amounts of bandwidth consumed? Not really, as we're only polling the server signing key which does not scale with the number of rooms or users on that server.
- lots of file descriptors used to reach out to independent domains? Sure, but you can stagger these requests / have a worker pool to ensure you never exceed a certain number of in-flight reqs at once.
- it slows down server processing as you have to wait for responses? The Member Key <--> User ID link is optional and doesn't need to sit on the critical path e.g. when joining a room, since we have a mechanism in MSC4428 to notify clients of changes independently to rooms.
- "needless" chatter when it isn't necessary (if you don't buy my compromised key scenario): there are other reasons why we will eventually need to poll (and they actually scale worse than just a signing key as it's per-room), so it's not like the general direction of travel is for zero background chatter
There was a problem hiding this comment.
I don't see how this is related, but yes servers could apply some checks and refuse to take part in the join process if those checks fail. Other servers cannot blindly trust that those checks were done though, irrespective of moderation tooling since there's no guarantee the room will have said tooling. Something like MSC4416 would help here.
This is a moderation level concern - I'm not suggesting that we introduce trust of other servers here, but considering the potential lowered cost of attacking a room. And yes, although I was thinking of restricted join rules and join gates already in use by tulir, for example.
The reason why I brought this up is because I'm trying to think of a good reason to soft fail events here, and I still don't think it's a good idea.
..but this is where we disagree. Just because you confirmed a Member Key <--> User ID link yesterday does not make the link valid tomorrow. The primary reason why this matters is for the case where the server signing key and/or member key were compromised. In this scenario, you've lost control over those members cryptographically, the only thing you have which an attacker does not have is control over the domain name. Whilst you can't invalidate events sent with the compromised key, you can unlink it from your domain to make their messages begin to soft-fail. However, you need to poll in order to do this because otherwise the attacker can artificially keep their timestamps within the existing validity period and so other servers will never re-query and see that it has now expired.
I heavily disagree with a lot of the assumptions here.
The first thing I disagree with is the need to hide messages sent with an account key when the server key is compromised - these are two separate identities, and need to be treated differently.
The change in this MSC means that the server's identity is no longer attesting the validity of events - the member key is - but instead it's attesting the validity of the member key's claimed identity. However, the member's claimed identity doesn't have to be valid for it to participate in a room - there's no need to soft fail events for new servers when validating the identity fails.
The second thing I disagree with is the seeming conflation of the expiration/non-verifiabity with explicit revocation. These are things that need to be treated differently. If a key is expired, we know to stop verifying newly witnessed events itenties with it, but we can trust past events that we've witnessed - but a key revocation is telling us that all events identities signed with the revoked key are potentially bad because it was compromised at some point. These are very different things. This proposal makes handling the latter case with good user experience nicer, because we only have to revoke past identies. However, we do need an explicit way of revoking keys - server keys currently (I believe there's some work on that going on in the background?) and possibly member keys with this proposal.
The third thing is that polling is needed to verify current identities - The validity of the identies is only relevant when these things are read - either by the user or by the server for whatever reason. Polling is effectively a tradeoff of having a hot cache vs doing unneeded work. Mandating a behavior seems unnecessary, and limiting.
Finally, as far as I'm concerned, placing the claimed identity inside a membership event and signing the membership event means that the signature is attesting the validity of the identity claimed in the membership at the point of time of the membership. However, from what I can gather this proposal actually wants to care about the validity of the identity at the point each event is witnessed - hence periodic revalidation and soft failure of new events, right? But I think that if that's what matters then it should be stated explicitly that that is what we're doing, and rather than soft failing the event, the change in validity should be communicated to the client - both as a part of the state and as a part of the timeline. If we need to kill new events from the key, I don't think soft failing is the correct way to do things.
large amounts of bandwidth consumed? Not really, as we're only polling the server signing key which does not scale with the number of rooms or users on that server.
lots of file descriptors used to reach out to independent domains? Sure, but you can stagger these requests / have a worker pool to ensure you never exceed a certain number of in-flight reqs at once.
The default validity period of Synapse's keys is one day, even though the max is seven days. If we're revalidating keys, this effectively puts a performance-based cap on the amount of servers we can be present in a room with while avoiding stale identity checks, even if no server is sending messages (like in a tombstoned room). This is important because some people are running homeservers on older raspberry pis or even phones. It's also a consideration for future peer to peer work - doing this on a mobile endpoint would be a bad idea for performance and battery life.
there are other reasons why we will eventually need to poll (and they actually scale worse than just a signing key as it's per-room), so it's not like the general direction of travel is for zero background chatter
It's hard for me to evaluate this, as the relevant part of 4242 seems to state:
This will be addressed in a future MSC.
it slows down server processing as you have to wait for responses? The Member Key <--> User ID link is optional and doesn't need to sit on the critical path e.g. when joining a room, since we have a mechanism in MSC4428 to notify clients of changes independently to rooms.
This is a good idea, although it does raise the complexity of implementation so I don't expect all implementations to do this. It would also seemingly require soft failing events we've already sent to the client if we need to soft fail all unverified links.
There was a problem hiding this comment.
The first thing I disagree with is the need to hide messages sent with an account key when the server key is compromised - these are two separate identities, and need to be treated differently.
This depends how important you see "claimed identity". For moderation, it is important because otherwise as you say there is a "potential lowered cost of attacking a room". Soft-failure as a general concept basically exists for moderation (to prevent backdated messages prior to a ban from being sent to clients) separate to the "validity of events", so it definitely feels like the right tool for the right purpose here. If/When we want to allow serverless users to participate (e.g. mixed P2P/federation) then this is something we can re-evaluate in a new room version. It's tough enough to change the identifier in a running federation as it is, I'd rather not increase this complexity further by enabling serverless users. I don't fundamentally oppose the idea, and yes I think it will happen, just more of a "not now". I can add something to this effect in the MSC. So to be clear, when you say "rather than soft failing the event, the change in validity should be communicated to the client - both as a part of the state and as a part of the timeline." - yes, 100% I agree, but not soft-failing materially makes it easier to attack a room (per @Gnuxie's work) which I'd rather not enable right now.
The second thing I disagree with is the seeming conflation of the expiration/non-verifiabity with explicit revocation.
Yes, there is a small difference because expired means you don't need to invalidate expire past events. Much like with your first point, I don't oppose this, but adding correct revocation semantics to the event signing key (be it member of server) is not in-scope right now. Perhaps you've already seen the problem with this though because you allude to it with the phrase "past events". Today's model fails to accurately capture this: revocations would need to be in-DAG. I talk about this in the account portability section when talking about attestations. This is why the current proposal blows away all verification state and re-signs with a new key: we can't really do better here without making an entirely new MSC focused on revocations. I had a whole document around revocations which was elided as I thought the account portability section handled this, but I can add a dedicated section to Future Work.
The third thing is that polling is needed to verify current identities - The validity of the identies is only relevant when these things are read - either by the user or by the server for whatever reason.
I think you're trying to make the distinction that if there's no clients reading the room, you can defer polling, or something to such effect. In practice, I don't think this distinction makes much sense because you must have at least 1 user in the room who resides on your server, and push notifications are used on most clients, so as soon as you receive an event you will need to "read" it to determine if you should be sending push notifications or not.
I disagree that polling is just "a hot cache". It's the primary way to be notified about key revocations. Today, you can mostly avoid this and "push" your new key if you ping all servers you've interacted with over federation with a new key, as that would cause a re-fetch. This assumes you know all servers you've interacted with though, which won't be the case if you've dropped your database / the domain is reused. We could decide not to handle those scenarios. In some ways we've already compromised the domain reuse scenario: the original proposal was even stricter as it treated expired keys as invalid (thus enforcing a hard cap on partition duration before identities reverted to unverified). This was changed just before publishing to fail-open instead, as introducing a hard cap when none previously existed didn't feel like the right call. Failing open weakens the domain reuse case as it only invalidates on 404 / 200 OK which omits the key, meaning if your Member Key was compromised and then the HS was decommissioned, the attacker can continue to sign events with the Member Key without being contested.
If we consciously choose not handle those edge cases, we could avoid polling and instead push out key updates. It is just delaying the inevitable though, as I strongly suspect polling will make an appearance for eventual delivery guarantees.
Overall, I agree with basically everything you're saying. If we were designing a new protocol from scratch, we clearly would not be making the tradeoffs this MSC is making. But we aren't making a new protocol, meaning we need to balance the desire to rewrite the world and the desire to land protocol improvements. This means we shouldn't make it easier to attack the network and should not reinvent the revocation system. Unfortunately, a lot of these things end up becoming circular so it's not like "oh just land a revocation system first then" is an answer here (the tl;dr on this is we need revocations in-DAG meaning we need to agree on a single DAG which we don't do today, which means we need state DAGs, which needs self-verified events which is this MSC!).
As an aside: I'm worried I might miss something in these long responses, so please make a new comment thread for each point you're making, thanks.
| } | ||
| ``` | ||
|
|
||
| Upon receiving a member key request, if the user ID is known to the server, a member key for that user in that room should be created/returned. |
There was a problem hiding this comment.
Should this endpoint only return member keys with a valid link?
There was a problem hiding this comment.
Yes, that's the intent. There is concern that this exposes the user ID on the remote server hence:
It's intentionally NOT a bulk endpoint to discourage account crawling,
This could be addressed by producing member keys for user IDs who do not exist yet, but that introduces its own set of challenges. Perhaps that is the better way though, considering that is the current behaviour today.
| * `localpart`: The claimed localpart for this Member Key. Redactable for GDPR / moderation. | ||
| * `domain`: The claimed domain for this Member Key. Not redactable as it provides routing information. |
There was a problem hiding this comment.
This might allow two members to have the same localpart and domain in the room.
Maybe that is desireable in case the member key is lost but there should probably be some explanation on how to deal with it.
There was a problem hiding this comment.
This is a good point. It's probably relevant for MSC4428 to address how that should be handled as well.
There was a problem hiding this comment.
I think it's more relevant for this proposal since this is the one that's actually discussing the federation part.
|
|
||
| Signatures on an event follow the same format as today for backwards compatibility with existing server code, but: | ||
| - the [entity](https://spec.matrix.org/v1.14/appendices/#checking-for-a-signature) signing the event is now the constant `sender`. | ||
| - the [signing key identifier](https://spec.matrix.org/v1.14/appendices/#checking-for-a-signature) is now the constant `ed25519:1`. |
There was a problem hiding this comment.
Did you have any thoughts on how to evolve the identifier in future? :1 sounds like it might be incremented or rotated in future?
If not, I was wondering if we could just use the public key as identifier, similar to how it's done for e.g. the MSK in /keys/query. The :1 strikes me as slightly too opaque / unintuitive otherwise.
There was a problem hiding this comment.
Member Key rotation would not be done this way because the signatures block is not consistent among all servers (it's not part of the event hash). The constant :1 was primarily used for compatibility reasons.
We could absolutely use the public key as the identifier, but it's much longer than :1, and even worse in a PQ world.
| Links MUST be periodically rechecked to allow for server signing keys to be rotated, or domains to be reused. | ||
| This means a previously verified link can revert to unverified. | ||
|
|
||
| Events received by Member Keys with _unverified links_ are automatically soft-failed. |
There was a problem hiding this comment.
Continuing the soft-failing topic of this thread here.
It would be incredibly strange for messages to disappear because the identity could no longer be verified so I think displaying that the handle is invalid is enough in all cases. Moderation concerns can be solved with existing tooling: either moderation bots kick members with unverified links or policy servers prevent new events from being sent while the member's identity is unverified.
There was a problem hiding this comment.
They wouldn't disappear, they would simply not appear. The distinction is that if we've already told clients about the event we won't suddenly stop telling them, but we will stop telling them about future events.
There was a problem hiding this comment.
That probably deserves clarification on its own but as was previously mentioned, it's still be possible for newly joining servers to soft-fail messages other servers didn't. Then the history of the room looks like everyone's schizophrenic talking to apparently nobody. In some cases similar things can happen today (and have happened) but in the end those cases can self heal but in this case there might be no path to recovery.
I will admit I am not deeply familiar with the S-S side, but it doesn't feel like soft-failing is the right tool for dealing with volatile external information like these links anyway. In order for soft-failing to make sense there would need to be some sort of explicit revocation of the identity (just leave?) within the room state.

Rendered