feat: enforce protocols to always be valid utf8 strings#3745
feat: enforce protocols to always be valid utf8 strings#3745thomaseizinger wants to merge 23 commits intomasterfrom
Conversation
|
@mxinden please review despite broken build, |
mxinden
left a comment
There was a problem hiding this comment.
Preliminary review. Still need more time.
| @@ -158,7 +196,7 @@ pub trait InboundUpgrade<C>: UpgradeInfo { | |||
| /// method is called to start the handshake. | |||
| /// | |||
| /// The `info` is the identifier of the protocol, as produced by `protocol_info`. | |||
|
This is turning out to be pretty invasive but I think it is worth the breaking change. We reduce complexity by removing one layer of abstraction. This looses us some type safety as mentioned in the description but in reality, this just means we can no longer do |
|
I'd be hoping that we can ship deprecations like #3807 first such that I can delete all this code and don't have to touch it in these breaking changes :) |
mxinden
left a comment
There was a problem hiding this comment.
Overall direction looks good to me.
core/src/upgrade/select.rs
Outdated
| EitherName::A(info) => EitherFuture::First(self.0.upgrade_inbound(sock, info)), | ||
| EitherName::B(info) => EitherFuture::Second(self.1.upgrade_inbound(sock, info)), | ||
| fn upgrade_inbound(self, sock: C, selected_protocol: Protocol) -> Self::Future { | ||
| if self.0.protocols().any(|p| p == selected_protocol) { |
There was a problem hiding this comment.
As described in your pull request, having to do this now is a bummer. This is in the hot-path. With our move to one-stream-per-request this will be executed often.
All that said, I am fine moving forward here. Based on intuition, this warrants a benchmark before introducing any optimization.
There was a problem hiding this comment.
One issue is that SelectUpgrade is constructed for each upgrade, meaning it is pointless to try and cache something.
Once we get rid of upgrades for protocols, we can try and cache some information in the handler, i.e. a HashMap of Protocol to sub-handler.
I'd suggest we defer it until then. Despite being the hotpath, I think iterating a handful of protocols (which should be allocation-free) shouldn't be very slow?
There was a problem hiding this comment.
An alternative that came to my mind:
Introduce a ToProtocol trait which Protocol implements by default. Our ToProtocolsIter would require an Iterator where Item implements ToProtocol.
With this abstraction we still have the simplicity for the user, namely that they can just use Protocol and that they can't mistake a Protocol for a str given that the latter is wrapped in a newtype. At the same time we maintain the functionality for e.g. the SelectUpgrade.
Just a thought. I don't feel strongly about this.
There was a problem hiding this comment.
That is effectively the same solution as we had before, just with a different name. It doesn't remove a "layer" for the user because they still have to specify the Item type.
But perhaps we should go with this path until we fully tackle: #2863
At that point, we will also be able to implement performance optimizations in the Select handler.
There was a problem hiding this comment.
With the introduction of separate newtypes in multistream-select and libp2p-core, this is proving to be difficult.
On the one hand, I'd like to maintain this separation, I think multistream-select should be an implementation detail of libp2p-core and libp2p-swarm.
But, with the two new-types in place, I cannot re-create the type-safety we had before because multistream-select doesn't know about the type used in libp2p-core.
Perhaps it is too early to try and remove this abstraction. Here is a different idea:
- Resurrect feat: replace
ProtocolNamewithAsRef<str>#3746 - Still introduce a
Protocolnewtype but have it inlibp2p-swarm. This means protocols can use it and we can use it within our event types and commands
| #[derive(Clone, Debug, PartialEq, Eq)] | ||
| pub struct Protocol(Bytes); | ||
| #[derive(Clone, Debug, PartialEq, Eq, Hash)] | ||
| pub struct Protocol(Cow<'static, str>); // TODO: Instead of `Cow`, we should probably be storing `Arc<str>` |
There was a problem hiding this comment.
How about we start with as is and introduce an Arc<str> if necessary? None of it is exposed, thus easy to change in a non-breaking way.
There was a problem hiding this comment.
I am not fully sure. It improves performance for cases where we dynamically construct a Protocol. Ideally, users use from_static as much as possible so it is kind of a niche usecase but I still want it to be performant.
It is not much work so I think I am just gonna do it. I don't see why a protocol string should ever be mutable which is the only thing that becomes impossible with Arc<str>.
There was a problem hiding this comment.
An Arc<str> would not be enough, right? Arc would require a heap allocation, no? So you would have Either<&'static str, Arc<String>> instead?
There was a problem hiding this comment.
Yes, we would have an Either of &'static str and Arc<String>.
This comment was marked as resolved.
This comment was marked as resolved.
f1a9349 to
f909139
Compare
core/src/upgrade/select.rs
Outdated
| EitherName::A(info) => EitherFuture::First(self.0.upgrade_inbound(sock, info)), | ||
| EitherName::B(info) => EitherFuture::Second(self.1.upgrade_inbound(sock, info)), | ||
| fn upgrade_inbound(self, sock: C, selected_protocol: Protocol) -> Self::Future { | ||
| if self.0.protocols().any(|p| p == selected_protocol) { |
There was a problem hiding this comment.
An alternative that came to my mind:
Introduce a ToProtocol trait which Protocol implements by default. Our ToProtocolsIter would require an Iterator where Item implements ToProtocol.
With this abstraction we still have the simplicity for the user, namely that they can just use Protocol and that they can't mistake a Protocol for a str given that the latter is wrapped in a newtype. At the same time we maintain the functionality for e.g. the SelectUpgrade.
Just a thought. I don't feel strongly about this.
| #[derive(Clone, Debug, PartialEq, Eq)] | ||
| pub struct Protocol(Bytes); | ||
| #[derive(Clone, Debug, PartialEq, Eq, Hash)] | ||
| pub struct Protocol(Cow<'static, str>); // TODO: Instead of `Cow`, we should probably be storing `Arc<str>` |
There was a problem hiding this comment.
An Arc<str> would not be enough, right? Arc would require a heap allocation, no? So you would have Either<&'static str, Arc<String>> instead?
These functions were only used for some code in the interop-tests which is easily mitigated and perhaps even easier to understand now. We can thus deprecate these functions and their related types and thereby reduce the API surface of `libp2p-core` and the maintenance burden. This change is motivated by the work around making protocols always strings which requires/required updates to all these upgrades. Related #3806. Related #3271. Related #3745. Pull-Request: #3807.
This comment was marked as resolved.
This comment was marked as resolved.
|
@thomaseizinger please ping me when I should take another look. |
Will do! |
This comment was marked as outdated.
This comment was marked as outdated.
|
This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏 |
|
I am going with a different approach now: #3746 |
Description
Previously, a protocol could be any sequence of bytes as long as it started with
/. Now, we directly parse a protocol asStringwhich enforces it to be valid UTF8. This has a significant API impact. Instead of any type being able to represent a protocol, we use aProtocolnewtype now that can only be constructed from aStringor a&'static str.Technically, this would not be necessary to for protocols to always be strings. We could also only change the
ProtocolNametrait to just return&strinstead of&[u8]. However, both of these are breaking changes and I am of the opinion that there are currently too many abstraction layers in regards to stream upgrades. This change removes some of these layers by introducing theUpgradeProtocolstrait which replaces theUpgradeInfotrait.We will eventually lose a bit of type safety with this approach. In particular, once we also ship #2863, types like
DeniedUpgradewon't work anymore because each protocol is represented as the same type.Notes & open questions
Is the change in abstraction layers good? Should we go this far or retain the type-safety of upgrades we get from protocols being able to be any type and just having to satisfy a trait? I am going to prototype that too and see what it is like.
This is a draft because currently, only the
multistream-selecttests compile and work. I've only touched the code necessary for that to work.Dependencies
In order to make the implementation of this PR simpler, I've extracted several others. They are all deprecations, meaning we can release them first, then delete the code and finally merge this PR (once it is ready in itself).
{In,Out}boundUpgradeExt#3807OptionalUpgrade#3806libp2p-request-responseChange checklist