-
Notifications
You must be signed in to change notification settings - Fork 721
fix(security,tls): handle no SSL/TLS shutdown from peer in server #1310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(security,tls): handle no SSL/TLS shutdown from peer in server #1310
Conversation
|
@dmg0345, please remember to bump the patch version per our readme. |
e1a16e2 to
f002a17
Compare
|
I bumped the patch version, and updated the commit message, I think commitlint will be happy now. 5 macOS jobs out of the 100+ macos/linux/windows where giving segfault / segabort in the test just added, I changed the way cURL requests are sent within the test after reading a comment of reuse of cURL handle if using https://curl.se/libcurl/c/CURLOPT_SSL_CTX_FUNCTION.html. OK to run CI again? (I can't do it from my side). |
|
The new test sends 4 requests in quick sucession, one after the other, to test that the lack of TLS shutdown of a previous request is handled properly in the server. Two macos jobs failed in the previous run also with segfaults in this test, I am not sure what is causing the segfaults in the macOS jobs, I can't reproduce it on Debian even if sending thousands of requests instead of the original four requests. I noticed that the other existing tests that use libcurl and https send a single request as part of the test, thus I created a different test that sends 20 normal requests in quick sucession to see if that exposes and isolates a different issue to this one in the macOS jobs, and simplified the test for this issue with just two request and a delay in between. Maybe this is not the final tests to deliver for the issue but hopefully it can help bring some clarity of what is going on with the macos jobs. Could we run the workflow again please? Thank you! |
|
Previous test with 20 requests in quick sucession passed ok in all jobs, the one added in this test with delay still failed in one macos job. Whatever is happening with the macos jobs seems to trigger from the new test. This time I refactored a bit the test to not use a lambda for the callback, added some additional checks for null pointers in the test and extra logging to try to figure out what might be happening from the logs.
|
|
I've noticed the following in this scenario between Linux and macOS. The trace below is for a HTTPS request without TLS shutdown in a Linux run: The trace below is for a HTTPS request without TLS shutdown in a macOS run: Note that the last error returned in macOS by SSL_read 6, SSL_ERROR_ZERO_RETURN, which stands for: The test is designed so that it does not send the close_notify alert, thus it shouldn´t be sending it, even if it was, as suggested by the error returned by SSL_read, that type of shutdown is already handled without these changes. The changes introduced in this issue to fix are not even executed in the macOS run. In Linux I haven't witnessed problems, in macOS I eventually end up with a deadly signal from SIGSEGV, SIGBUSV or SIGABRT crashing the test after some requests, I noticed the It might take a lot of digging on my side to figure out this one on the macOS implementation, I think it is not related to the issue of lack of TLS shutdown but reproducible by it, I will wrap my changes and stop the test from running in macOS as suggested fix for the TLS shutdown issue. |
ce84185 to
ae6fb08
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1310 +/- ##
==========================================
- Coverage 76.29% 76.23% -0.07%
==========================================
Files 64 65 +1
Lines 11115 10901 -214
==========================================
- Hits 8480 8310 -170
+ Misses 2635 2591 -44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Codecov is complaining because of this fix in handling of error returned by SSL_Write server side (from <0 to <=0): But the code is not exercised, I do not have a use case to trigger this this, although it now follows the approach described in the documentation. |
|
@dmg0345 - thanks for working through this. For the crash in macOS mode, is it possible to obtain a stack trace, or even an indication where the crash occurred? (If you have a Mac, you could even build and run the test there, easier perhaps than using the github runner). In transport.cc, I notice the code doesn't call |
@dgreatwood thanks for the feedback. I have never used Mac 😅 I do not have a physical Mac nor virtualized one to debug it properly, I can only rely on the test runner if doing further debugging on that. I enabled logging and the debug logging in Github actions via compilation macros and by checking the logs that is how I found out that discrepancy above with the handling of the file descriptors. It is easily reproducible from the test in this PR, but it does not crash consistently at the same point nor with the same signals. If I remember correctly, the documentation explicitly states which error codes fill the queue with errors. Per OpenSSL documentation SSL_ERROR_ZERO_RETURN does not fill the error queue. |
|
Hi again @dmg0345 - I found a slice of time to take a closer look. So, I don't think it is a macOS Pistache bug. In fact it appears to be a libcurl/openssl bug that shows only on macOS. Specifically, I think the bug is that, if I am attaching a version of https_server_test.cc that demonstrates the problem, even though it does not invoke the Pistache library at all (it sends requests to google.com instead). I'm also pasting the key part of the code at the bottom of this message FYI. It is possible that this issue might be addressed in some way, for instance invoke libcurl in a different thread to the one in which the Pistache server is created. I'm afraid I don't have further time to experiment with that. Rather than ifdef-ing out your test completely, I'd suggest modifying your callback function as follows: and of course remove the Could try that? Thanks! Code that demonstrates the segfault without invoking Pistache: |
@dgreatwood interesting, thanks for the indepth analysis and the code to reproduce it on macos! I can already confirm that if the function to enable the quiet shutdown in the callback is commented out, the test won't crash in macOS, it is one of the things I tried to two weeks ago to attempt to find the root cause of the issue. However note that if this function is disabled or commented out, the requests in the test will sent the shutdown alert and the original issue of the PR is not exercised (handling a client request that closes without TLS shutdown alert). But with this new info provided I will also try a few things on my side too. Thanks for the feedback. |
Yep, make sense. BTW, even if you write 0 with SSL_CTX_set_quiet_shutdown (the default, i.e. quiet shutdown off), there will be still be an occasional seg fault. So it is the act of writing to the sslctx in the callback, not the quiet shutdown, which causes the issue. Likewise, if you do SSL_CTX_get_quiet_shutdown in the callback you'll sometimes see an obviously invalid value.
Great, thx. |
545506a to
e0c7e15
Compare
|
I ran the cURL requests, and even the full initialization of cURL, in a separate thread as a I ended up disabling the quiet shutdown call in the callback for macOS as suggested in the previous comments as I haven't been able to find a workaround for that environment. It is ready for review again from my side, I have updated the patch version date to today. This time I was also able to capture one of such faults in the logs of the Github runners: |
e0c7e15 to
ed5f832
Compare
|
[I thought I had added the following info a couple of days ago, but now I can't see it, perhaps I failed to submit the comment... apologies] The issue causing the intermittent macOS test crash is as follows: The default SSL library for libcurl on macOS is LibreSSL. Meanwhile, Pistache uses OpenSSL. In the test, when the The OpenSSL libcurl can be installed using: To fix the GitHub runner, in macos.yaml, we can do Meanwhile, in https_server_test.cc, in @dmg0345 - could you take a shot at the above? |
f79dcf1 to
72b9e70
Compare
Thanks for the feedback, in case you posted it before I missed it too. Yeah, I will try this. @dgreatwood I was also looking at an issue in the This was working fine two weeks ago so maybe a dependency changed, not sure what triggered it. The From what I've investigated via the Github runners, |
- Handle clearing error queue on SSL_read and SSL_write in `transport.cc` on error. - Treat `0` and `<0` as errors in SSL_read and SSL_write, use SSL_get_error for retries. - Disable sanitizer in `dump2def`. - Bump patch version. See https://docs.openssl.org/3.1/man3/SSL_read/ for error handling. See https://docs.openssl.org/3.1/man3/SSL_write/ for error handling. See pistacheio#1309 for more details.
fad3155 to
c38aab1
Compare
|
MacOS pipelines ran fine now in my fork with libcurl using OpenSSL, no more intermittent crashes (thanks @dgreatwood for the help). For the Windows 2019 pipelines that were failing, setting From my side changes are ready for review. |
[DG] Great.
[DG] dump2def is a pretty simple util, and of course is used only at Pistache build-time, not at run-time. I think it's perfectly fine if we don't use sanitizer with dump2def.
[DG] Great. I'll take a look when I get the chance, some time in the next few days. |
|
Looks good to me. Many thanks to @dmg0345 for putting this together and persistence in chasing down the question marks. |
The OpenSSL version of libcurl is required for the Pistache test https_server_test.basic_tls_requests_with_no_shutdown_from_peer. However, the default libcurl on macOS uses LibreSSL, not OpenSSL. On macOS, the convenience build scripts mesbuild.sh and mesbuilddebug.sh will now install the OpenSSL libcurl using brew, if it is not already present, and modify PKG_CONFIG_PATH if needed. See pistacheio#1310 (comment) for more details
transport.ccon error.0and<0as errors in SSL_read and SSL_write, use SSL_get_error for retries.See https://docs.openssl.org/3.1/man3/SSL_read/ for error handling.
See https://docs.openssl.org/3.1/man3/SSL_write/ for error handling.
See #1309 for more details.