-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
Fix handling of HTTP upgrades with bodies #60016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Review requested:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #60016 +/- ##
==========================================
- Coverage 88.58% 88.58% -0.01%
==========================================
Files 704 704
Lines 207858 207961 +103
Branches 40054 40073 +19
==========================================
+ Hits 184131 184219 +88
- Misses 15760 15769 +9
- Partials 7967 7973 +6
🚀 New features to boost your workflow:
|
mcollina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
See also de4600e. It is an old fix for upgrade responses, but I think it is related because the behavior should be the same? |
@lpinca that bug is related to client parsing of server responses, whereas this change is only for server parsing of requests, so they're not directly related, and I'm not really sure how the same issue would apply in this scenario. There's no case where the server should receive the upgraded protocol data before it has handled the request and responded anyway (because you have to wait for the 101 response to confirm the upgrade). In general, the behaviour should not actually be the same for body parsing in these two cases. When a server is parsing a client upgrade request, there could possibly be a body, because any HTTP request of any kind could include upgrade headers. When a client is parsing a 101 response though, there can never be a body (1XX never include a body), which is why telling the parser to skip the body is correct in that fix. That previous bug seems to occur because the headers imply there is a body (transfer-encoding: chunked) even though it's not actually allowed and no valid chunk is actually sent. Maybe I'm missing something though - can you describe the kind of request that this change would parse wrong? I have tried tweaking the test here, and it works fine when using transfer-encoding: chunked and changing the first byte to be >127. |
fabde34 to
db9387a
Compare
db9387a to
00202ef
Compare
|
I've just pushed a new commit to fix the issue discussed above, and also rebased onto main. With this 2nd commit, this change now subtly changes the semantics of The issue comes when a request body is sent slowly. Previously, we always emitted This is very confusing and unhelpful. It's an edge case (most upgrade requests won't have a body, especially for the common case of websockets) but it's a perfectly valid one. The only reasonable solution I can see is to defer the The updated test covers this case. To compare that to the same scenario on current node, right now we fire upgrade immediately, and the request body appears split between |
If I understand correctly, the issue here is that you can't read the request body until the |
Yes, for some reason I thought that would be fine and the parser would buffer this somehow, but I did more testing and that does indeed run into issues. Past a certain point (about 85KB on my machine) the stream stops flowing, presumably because the buffers are full, so the body never completes and Not really sure how to resolve this:
Some options I can see:
It's very tempting to ignore the request body entirely (option 1) and for websockets (easily the most common case) it's irrelevant. Technically though you could upgrade to any protocol you want, and the RFC expects you to then handle the original request (implicitly including its body) as if it was sent over the new protocol, so it's not unreasonable to actually be able to access the request body somehow. |
|
I don't know. Discarding the data without being able to access it, does not seem good. It is worse than the current behavior. Currently the user can read the request body. For example, the user can check the |
Nobody does this though. WS for example inherits the equivalent bug - if I send a websocket upgrade request with a body, WS will parses the request body as part of websocket stream, and fails to start the connection. Nobody should send that request, since it's a GET with a body, but it's as valid as any other GET-with-body and people do weird things. I think I've reduced my options to:
I'm leaning towards option 2 I think - it's more complicated, but cleaner and more effective, and gives us a sensible API: you listen for |
Yes, but it can be done. The data is accessible. By properly inspecting the request headers the developer can ignore the whole request body. WS implementations do not care much because exchanging data before the opening handshake completes is forbidden by the spec. I have no opinions, but keeping the raw socket and the ability to read it directly seems sensible. |
|
@lpinca I'm reasonably confident I now have a plan that fixes everything 🤞 The current state of things:
I think this preserves existing behaviour in all cases without request bodies, exactly as it was. In the rare cases where you do have a request body, this does change the data you'll see on the socket (which is the point - this fixes the bug). It does not buffer indefinitely or block until you stream the request body though - instead, at the moment you start reading the upgrade stream, it starts the request body flowing automatically. That means if you just read the upgrade stream and ignore the request body (probably common) everything will work as expected. If you try to read both, everything will still work as expected. The only bad case is if you try to start streaming the upgrade stream immediately, but then stream the request body later on - you'll find it's already gone. I think this is a) unavoidable and b) doesn't work in the current implementation anyway, so nobody will be doing that successfully today. I've bulked up the new with-body test to cover various different cases here as well. To try to make sure this is non-breaking, I've also manually tested enabled --- a/lib/_http_server.js
+++ b/lib/_http_server.js
@@ -1076,7 +1076,9 @@ function onParserExecuteCommon(server, socket, parser, state, ret, d) {
- // If the request is complete (no body, or all body read upfront) then
- // we just emit the socket directly as the upgrade stream.
- upgradeStream = socket;
+ upgradeStream = new UpgradeStream(socket, req);
+ upgradeStream.requestBodyCompleted(undefined);What do you think? |
6fb8e66 to
0a97dda
Compare
|
Force pushed to rebase & resolve conflicts with latest main |
| } | ||
|
|
||
| _write(chunk, encoding, callback) { | ||
| this[kSocket].write(chunk, encoding, callback); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly the callback is always called asynchronously even if the write is synchronous so there will be a small overhead over a raw net.Socket in case of multiple consecutive sync writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is definitely a small performance hit here. I doubt it will be significant for most cases, but it's not zero.
I'm open to ideas to improve that perf, but I think it's worth it for the correctness in this case, and the intention is that it only applies in this upgrade-with-body case, so it doesn't affect any non-body upgrade requests at all.
|
|
Ethan-Arrowood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
Any more thoughts on this @lpinca? I think this is ready to merge if you're happy, but it'd be good to confirm the plan here, or decide on & implement the alternative (most likely imo: remove the proxy entirely, so that arg 2 is a simple upgrade stream if a request body is present, not a socket at all - meaning more breakage, but less magic internally). |
|
I would like to remove the proxy. The documentation clarifies when the |
613d9a4 to
21f8245
Compare
Previously, we ignored all indicators of the body (content-length or transfer-encoding headers) and treated any information following the headers as part of the upgraded stream. We now fully process the requests bodies separately instead, allowing us to automatically handle correct parsing of the body like any other HTTP request. This is a breaking change if you are currently accepting HTTP Upgrade requests with request bodies successfully, or if you use socket-specific fields & methods on the upgraded stream argument. In the former case, before now you will have received the request body and then the upgraded data on the same stream without any distinction or HTTP parsing applied. Now, you will need to separately read the request body from the request (the 1st argument) and the upgraded data from the upgrade stream (the 2nd argument). If you're not interested in request bodies, you can continue to just read from the upgrade stream directly. In the latter case, if you want to access the raw socket, you should do so via request.socket, instead of expecting the 2nd argument to be a socket.
21f8245 to
6798b70
Compare
|
Alright, I'm on board @lpinca - let's break things now for the greater good instead of dancing around it. I've pushed a fix to do exactly that (updating the docs to match), and also rebased & updated the commit message to document this. I've separately tested manually what happens if I use this upgrade stream for all cases (i.e. all the non-request-body tests too) and the current breakage there is:
If I skip those assertions, everything else works functionally ok. In practice I think there will be some breakage in the real world, but very little. This shouldn't affect anybody handling upgrades without request bodies (=99.9% of use cases imo), and for the few cases where request bodies are received most stream parsing code will already be broken anyway. Other than fixing the stream content, nearly everything else shouldn't break even in that scenario - it's only very socket-specific & unusual things (reading local/remote address details after upgrade, passing sockets around via |
|
@nodejs/tsc: as semver-major, this needs two approvals to confirm the API-breaking fix we've settled on here. The PR description is a reasonable summary, my last comment covers the likely breakage in practice. |
RafaelGSS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
mcollina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
mcollina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a request change here because I fear this would add significant overhead to all websockets.
|
There is no impact for WebSocket implementations as they continue to receive a raw socket. The spec specifies that no data should be exchanged until the handshake is complete. Only implementations where the initial GET request has a body and the |
mcollina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
Landed in 4346c0f |
Fixes #58394
Previously, when processing and accepting a Upgrade request, we ignored all indicators of a body in the request (content-length or transfer-encoding headers) and treated any information following the headers as part of the upgraded stream itself.
This was not correct. An HTTP request to upgrade can have a body, and it will always indicate this with standard headers to do so. If the request had a valid HTTP body, you shouldn't treat it as part of the new protocol stream you've upgraded to.
With this change, we now fully process the requests bodies separately instead, allowing us to automatically handle correct parsing of the body like any other HTTP request.
Fixing this is a matter of persuading llhttp to parse the body like normal. The llhttp return values for on_headers_complete (i.e. this
parserOnIncomingfunction) are (docs):The current Node code for this basically assumes that
2is required for to finish parsing an accepted upgrade or CONNECT, but as far as I can tell (based on this) that's not true. In this case, the upgrade flag is already set (that'sreq.upgradehere) and that's what controls whether the upgrade happens after the message is completed (here). Setting2would set the upgrade flag to true (but it's already true) and set SKIPBODY to true (which is what we're trying to fix here, and isn't required because this condition skips the body for CONNECT or no-body-headers-present for upgrades anyway).This is a breaking change if you are currently accepting HTTP Upgrade requests with request bodies successfully, or if you use socket-specific fields & methods on the upgraded stream argument.
In the former case, before now you will have received the request body and then the upgraded data on the same stream without any distinction or HTTP parsing applied. Now, you will need to separately read the request body from the request (the 1st argument) and the upgraded data from the upgrade stream (the 2nd argument). If you're not interested in request bodies, you can continue to just read from the upgrade stream directly.
In the latter case, if you want to access the raw socket, you should do so via request.socket, instead of expecting the 2nd argument to be a socket.
Separately, it's debatable whether we should also actually support bodies on CONNECT requests. Even with this change, we currently don't. That would require a small change to llhttp. The latest HTTP RFCs say:
but also
so to my mind it's somewhat unspecified (as opposed to Upgrade requests, where this isn't ambiguous at all imo). Opinions welcome on whether to modify llhttp to support that.