Skip to content

Conversation

@dmg0345
Copy link
Contributor

@dmg0345 dmg0345 commented Apr 16, 2025

  • Handle clearing error queue on SSL_read and SSL_write in transport.cc on error.
  • Treat 0 and <0 as errors in SSL_read and SSL_write, use SSL_get_error for retries.
  • Bump patch version.

See https://docs.openssl.org/3.1/man3/SSL_read/ for error handling.
See https://docs.openssl.org/3.1/man3/SSL_write/ for error handling.
See #1309 for more details.

@kiplingw
Copy link
Member

@dmg0345, please remember to bump the patch version per our readme.

@dmg0345 dmg0345 force-pushed the feature/tls_shutdown_from_peer_handling branch 2 times, most recently from e1a16e2 to f002a17 Compare April 16, 2025 17:28
@dmg0345 dmg0345 changed the title [#1309] Handle no SLL shutdown notification from peer in server fix(security,tls): handle no SSL/TLS shutdown from peer in server Apr 16, 2025
@dmg0345
Copy link
Contributor Author

dmg0345 commented Apr 16, 2025

I bumped the patch version, and updated the commit message, I think commitlint will be happy now.

5 macOS jobs out of the 100+ macos/linux/windows where giving segfault / segabort in the test just added, I changed the way cURL requests are sent within the test after reading a comment of reuse of cURL handle if using https://curl.se/libcurl/c/CURLOPT_SSL_CTX_FUNCTION.html.

OK to run CI again? (I can't do it from my side).

@dmg0345
Copy link
Contributor Author

dmg0345 commented Apr 16, 2025

The new test sends 4 requests in quick sucession, one after the other, to test that the lack of TLS shutdown of a previous request is handled properly in the server.

Two macos jobs failed in the previous run also with segfaults in this test, I am not sure what is causing the segfaults in the macOS jobs, I can't reproduce it on Debian even if sending thousands of requests instead of the original four requests.

I noticed that the other existing tests that use libcurl and https send a single request as part of the test, thus I created a different test that sends 20 normal requests in quick sucession to see if that exposes and isolates a different issue to this one in the macOS jobs, and simplified the test for this issue with just two request and a delay in between.

Maybe this is not the final tests to deliver for the issue but hopefully it can help bring some clarity of what is going on with the macos jobs. Could we run the workflow again please? Thank you!

@dmg0345
Copy link
Contributor Author

dmg0345 commented Apr 17, 2025

Previous test with 20 requests in quick sucession passed ok in all jobs, the one added in this test with delay still failed in one macos job. Whatever is happening with the macos jobs seems to trigger from the new test.

This time I refactored a bit the test to not use a lambda for the callback, added some additional checks for null pointers in the test and extra logging to try to figure out what might be happening from the logs.

Could we trigger it again? nvm, I can run CI from my fork, I will try there.

@dmg0345
Copy link
Contributor Author

dmg0345 commented Apr 18, 2025

I've noticed the following in this scenario between Linux and macOS.

The trace below is for a HTTPS request without TLS shutdown in a Linux run:

2025-04-18T13:17:36.2994281Z 18 13:17:36 DBG (7f015ee006c0 PSTCH) transport.cc:321 in handleIncoming(): SSL_read
2025-04-18T13:17:36.2994884Z 18 13:17:36 DBG (7f015ee006c0 PSTCH) transport.cc:370 in handleIncoming(): Fd 16, bytes read 73, totalBytes 0, retry false,err 0 

2025-04-18T13:17:36.3003158Z 18 13:17:36 DBG (7f015ee006c0 PSTCH) transport.cc:321 in handleIncoming(): SSL_read
2025-04-18T13:17:36.3003671Z 18 13:17:36 DBG (7f015ee006c0 PSTCH) transport.cc:331 in handleIncoming(): SSL_read err 2, bytes -1
2025-04-18T13:17:36.3004374Z 18 13:17:36 DBG (7f015ee006c0 PSTCH) transport.cc:370 in handleIncoming(): Fd 16, bytes read -1, totalBytes 0, retry true,err 11 Resource temporarily unavailable

2025-04-18T13:17:36.3073380Z 18 13:17:36 DBG (7f015ee006c0 PSTCH) transport.cc:321 in handleIncoming(): SSL_read
2025-04-18T13:17:36.3074518Z 18 13:17:36 DBG (7f015ee006c0 PSTCH) transport.cc:331 in handleIncoming(): SSL_read err 1, bytes 0
2025-04-18T13:17:36.3075097Z 18 13:17:36 DBG (7f015ee006c0 PSTCH) transport.cc:337 in handleIncoming(): SSL_read clearing error queue, last 0x0A000126

The trace below is for a HTTPS request without TLS shutdown in a macOS run:

2025-04-18T13:17:04.5351090Z 18 13:17:03 DBG (016b66f0 PSTCH) transport.cc:321 in handleIncoming(): SSL_read
2025-04-18T13:17:04.5351330Z 18 13:17:03 DBG (016b66f0 PSTCH) transport.cc:370 in handleIncoming(): Fd 0x6000005e4720, bytes read 73, totalBytes 0, retry false,err 0 

2025-04-18T13:17:04.5354270Z 18 13:17:03 DBG (016b66f0 PSTCH) transport.cc:321 in handleIncoming(): SSL_read
2025-04-18T13:17:04.5354430Z 18 13:17:03 DBG (016b66f0 PSTCH) transport.cc:331 in handleIncoming(): SSL_read err 2, bytes -1
2025-04-18T13:17:04.5354730Z 18 13:17:03 DBG (016b66f0 PSTCH) transport.cc:370 in handleIncoming(): Fd 0x6000005e4720, bytes read -1, totalBytes 0, retry true,err 35 Resource temporarily unavailable

2025-04-18T13:17:04.5476970Z 18 13:17:03 DBG (016b66f0 PSTCH) transport.cc:321 in handleIncoming(): SSL_read
2025-04-18T13:17:04.5477120Z 18 13:17:03 DBG (016b66f0 PSTCH) transport.cc:331 in handleIncoming(): SSL_read err 6, bytes 0
2025-04-18T13:17:04.5477360Z 18 13:17:03 DBG (016b66f0 PSTCH) transport.cc:370 in handleIncoming(): Fd 0x6000005e4720, bytes read 0, totalBytes 0, retry false,err 0 

Note that the last error returned in macOS by SSL_read 6, SSL_ERROR_ZERO_RETURN, which stands for:

SSL_ERROR_ZERO_RETURN

The TLS/SSL peer has closed the connection for writing by sending the close_notify alert. No more data can be read. Note that SSL_ERROR_ZERO_RETURN does not necessarily indicate that the underlying transport has been closed.

This error can also appear when the option SSL_OP_IGNORE_UNEXPECTED_EOF is set. See SSL_CTX_set_options(3) for more details.

The test is designed so that it does not send the close_notify alert, thus it shouldn´t be sending it, even if it was, as suggested by the error returned by SSL_read, that type of shutdown is already handled without these changes. The changes introduced in this issue to fix are not even executed in the macOS run.

In Linux I haven't witnessed problems, in macOS I eventually end up with a deadly signal from SIGSEGV, SIGBUSV or SIGABRT crashing the test after some requests, I noticed the eventmeth and event for the macOS implementation and how it mimics file descriptors, is there a possibility that somewhat the macOS libevent implementation is mixing events or similar in this particular scenario?

It might take a lot of digging on my side to figure out this one on the macOS implementation, I think it is not related to the issue of lack of TLS shutdown but reproducible by it, I will wrap my changes and stop the test from running in macOS as suggested fix for the TLS shutdown issue.

@dmg0345 dmg0345 force-pushed the feature/tls_shutdown_from_peer_handling branch from ce84185 to ae6fb08 Compare April 18, 2025 15:03
@codecov
Copy link

codecov bot commented Apr 18, 2025

Codecov Report

Attention: Patch coverage is 71.42857% with 4 lines in your changes missing coverage. Please review.

Project coverage is 76.23%. Comparing base (31ef837) to head (c38aab1).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
src/common/transport.cc 71.42% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1310      +/-   ##
==========================================
- Coverage   76.29%   76.23%   -0.07%     
==========================================
  Files          64       65       +1     
  Lines       11115    10901     -214     
==========================================
- Hits         8480     8310     -170     
+ Misses       2635     2591      -44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dmg0345
Copy link
Contributor Author

dmg0345 commented Apr 19, 2025

Codecov is complaining because of this fix in handling of error returned by SSL_Write server side (from <0 to <=0):

<= 0

The write operation was not successful, because either the connection was closed, an error occurred or action must be taken by the calling process. Call SSL_get_error() with the return value ret to find out the reason.

Old documentation indicated a difference between 0 and -1, and that -1 was retryable. You should instead call SSL_get_error() to find out if it's retryable.

But the code is not exercised, I do not have a use case to trigger this this, although it now follows the approach described in the documentation.

@dgreatwood
Copy link
Collaborator

@dmg0345 - thanks for working through this.

For the crash in macOS mode, is it possible to obtain a stack trace, or even an indication where the crash occurred? (If you have a Mac, you could even build and run the test there, easier perhaps than using the github runner).

In transport.cc, I notice the code doesn't call ERR_clear_error in the SSL_ERROR_ZERO_RETURN case. Perhaps it should?

@dmg0345
Copy link
Contributor Author

dmg0345 commented May 1, 2025

@dmg0345 - thanks for working through this.

For the crash in macOS mode, is it possible to obtain a stack trace, or even an indication where the crash occurred? (If you have a Mac, you could even build and run the test there, easier perhaps than using the github runner).

In transport.cc, I notice the code doesn't call ERR_clear_error in the SSL_ERROR_ZERO_RETURN case. Perhaps it should?

@dgreatwood thanks for the feedback.

I have never used Mac 😅 I do not have a physical Mac nor virtualized one to debug it properly, I can only rely on the test runner if doing further debugging on that.

I enabled logging and the debug logging in Github actions via compilation macros and by checking the logs that is how I found out that discrepancy above with the handling of the file descriptors. It is easily reproducible from the test in this PR, but it does not crash consistently at the same point nor with the same signals.

If I remember correctly, the documentation explicitly states which error codes fill the queue with errors. Per OpenSSL documentation SSL_ERROR_ZERO_RETURN does not fill the error queue.

@dgreatwood
Copy link
Collaborator

Hi again @dmg0345 -

I found a slice of time to take a closer look.

So, I don't think it is a macOS Pistache bug. In fact it appears to be a libcurl/openssl bug that shows only on macOS.

Specifically, I think the bug is that, if SSL_CTX_new is called from the same thread that is using libcurl, and libcurl uses CURLOPT_SSL_CTX_FUNCTION, then occasionally a bad sslctx pointer is passed to the CURLOPT_SSL_CTX_FUNCTION callback; and if the callback attempts to write to the sslctx - for instance, by calling SSL_CTX_set_quiet_shutdown - then sometimes a segfault may follow later, probably because of heap/memory corruption.

I am attaching a version of https_server_test.cc that demonstrates the problem, even though it does not invoke the Pistache library at all (it sends requests to google.com instead). I'm also pasting the key part of the code at the bottom of this message FYI.

It is possible that this issue might be addressed in some way, for instance invoke libcurl in a different thread to the one in which the Pistache server is created. I'm afraid I don't have further time to experiment with that.

Rather than ifdef-ing out your test completely, I'd suggest modifying your callback function as follows:

    const auto sslctxCallback = +[](CURL* /* curl */, void* sslctx, void* /* parm */) -> CURLcode {
        // Enable quiet shutdown so that we do not send any shutdown notification to server from peer.
#if !defined(__APPLE__)
        // In macOS, setting this flag causes a seg fault intermittently,
        // seemingly because a bad sslctx is being passed in by
        // libcurl/openssl. See discussion in:
        //   https://github.com/pistacheio/pistache/pull/1310
        // @May/2025.
        SSL_CTX_set_quiet_shutdown(reinterpret_cast<SSL_CTX*>(sslctx), 1);
#endif
        return CURLE_OK;
    };

and of course remove the #if !defined(__APPLE__) that currently surrounds the TEST(https_server_test, basic_tls_requests_with_no_shutdown_from_peer).

Could try that?

Thanks!


Code that demonstrates the segfault without invoking Pistache:

TEST(https_server_test, basic_tls_requests_with_no_shutdown_from_peer)
{
    CURLcode res = curl_global_init(CURL_GLOBAL_DEFAULT);

    ASSERT_EQ(res, CURLE_OK);

    const auto sslctxCallback = +[](CURL* curl, void* sslctx, void* /* parm */) -> CURLcode {
        // Enable quiet shutdown so that we do not send any shutdown notification to server from peer.

        SSL_CTX_set_quiet_shutdown(reinterpret_cast<SSL_CTX*>(sslctx), 1L);
        return CURLE_OK;
    };

    SSL_load_error_strings();
    OpenSSL_add_ssl_algorithms();

    const SSL_METHOD* method = SSLv23_server_method();
    if (!method)
    {
        throw std::runtime_error("Cannot setup SSL context");
    }

    SSL_CTX * ssl_ctx_ptr = SSL_CTX_new(method);
    if (!ssl_ctx_ptr)
    {
        throw std::runtime_error("Cannot setup SSL context");
    }

    const std::string url("https://www.google.com:443");
    std::string buffer_no_shutdown;

    // Perform request with quiet shutdown and ensure it is handled.
    CURL* curl = curl_easy_init();
    ASSERT_NE(curl, nullptr);

    curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
    curl_easy_setopt(curl, CURLOPT_CAINFO, "./certs/rootCA.crt");
    curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
    curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &write_cb);
    curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer_no_shutdown);

    CSO_WIN_REVOKE_BEST_EFFORT;
    curl_easy_setopt(curl, CURLOPT_SSL_CTX_FUNCTION, sslctxCallback);

    res = curl_easy_perform(curl);

    EXPECT_EQ(res, CURLE_OK);

    curl_easy_cleanup(curl);

    buffer_no_shutdown.clear();

    curl_global_cleanup();
}

https_server_test.txt

@dmg0345
Copy link
Contributor Author

dmg0345 commented May 3, 2025

Hi again @dmg0345 -

I found a slice of time to take a closer look.

So, I don't think it is a macOS Pistache bug. In fact it appears to be a libcurl/openssl bug that shows only on macOS.

Specifically, I think the bug is that, if SSL_CTX_new is called from the same thread that is using libcurl, and libcurl uses CURLOPT_SSL_CTX_FUNCTION, then occasionally a bad sslctx pointer is passed to the CURLOPT_SSL_CTX_FUNCTION callback; and if the callback attempts to write to the sslctx - for instance, by calling SSL_CTX_set_quiet_shutdown - then sometimes a segfault may follow later, probably because of heap/memory corruption.

I am attaching a version of https_server_test.cc that demonstrates the problem, even though it does not invoke the Pistache library at all (it sends requests to google.com instead). I'm also pasting the key part of the code at the bottom of this message FYI.

It is possible that this issue might be addressed in some way, for instance invoke libcurl in a different thread to the one in which the Pistache server is created. I'm afraid I don't have further time to experiment with that.

Rather than ifdef-ing out your test completely, I'd suggest modifying your callback function as follows:

    const auto sslctxCallback = +[](CURL* /* curl */, void* sslctx, void* /* parm */) -> CURLcode {
        // Enable quiet shutdown so that we do not send any shutdown notification to server from peer.
#if !defined(__APPLE__)
        // In macOS, setting this flag causes a seg fault intermittently,
        // seemingly because a bad sslctx is being passed in by
        // libcurl/openssl. See discussion in:
        //   https://github.com/pistacheio/pistache/pull/1310
        // @May/2025.
        SSL_CTX_set_quiet_shutdown(reinterpret_cast<SSL_CTX*>(sslctx), 1);
#endif
        return CURLE_OK;
    };

and of course remove the #if !defined(__APPLE__) that currently surrounds the TEST(https_server_test, basic_tls_requests_with_no_shutdown_from_peer).

Could try that?

Thanks!


Code that demonstrates the segfault without invoking Pistache:

TEST(https_server_test, basic_tls_requests_with_no_shutdown_from_peer)
{
    CURLcode res = curl_global_init(CURL_GLOBAL_DEFAULT);

    ASSERT_EQ(res, CURLE_OK);

    const auto sslctxCallback = +[](CURL* curl, void* sslctx, void* /* parm */) -> CURLcode {
        // Enable quiet shutdown so that we do not send any shutdown notification to server from peer.

        SSL_CTX_set_quiet_shutdown(reinterpret_cast<SSL_CTX*>(sslctx), 1L);
        return CURLE_OK;
    };

    SSL_load_error_strings();
    OpenSSL_add_ssl_algorithms();

    const SSL_METHOD* method = SSLv23_server_method();
    if (!method)
    {
        throw std::runtime_error("Cannot setup SSL context");
    }

    SSL_CTX * ssl_ctx_ptr = SSL_CTX_new(method);
    if (!ssl_ctx_ptr)
    {
        throw std::runtime_error("Cannot setup SSL context");
    }

    const std::string url("https://www.google.com:443");
    std::string buffer_no_shutdown;

    // Perform request with quiet shutdown and ensure it is handled.
    CURL* curl = curl_easy_init();
    ASSERT_NE(curl, nullptr);

    curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
    curl_easy_setopt(curl, CURLOPT_CAINFO, "./certs/rootCA.crt");
    curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
    curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &write_cb);
    curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer_no_shutdown);

    CSO_WIN_REVOKE_BEST_EFFORT;
    curl_easy_setopt(curl, CURLOPT_SSL_CTX_FUNCTION, sslctxCallback);

    res = curl_easy_perform(curl);

    EXPECT_EQ(res, CURLE_OK);

    curl_easy_cleanup(curl);

    buffer_no_shutdown.clear();

    curl_global_cleanup();
}

https_server_test.txt

@dgreatwood interesting, thanks for the indepth analysis and the code to reproduce it on macos!

I can already confirm that if the function to enable the quiet shutdown in the callback is commented out, the test won't crash in macOS, it is one of the things I tried to two weeks ago to attempt to find the root cause of the issue. However note that if this function is disabled or commented out, the requests in the test will sent the shutdown alert and the original issue of the PR is not exercised (handling a client request that closes without TLS shutdown alert).

But with this new info provided I will also try a few things on my side too.

Thanks for the feedback.

@dgreatwood
Copy link
Collaborator

I can already confirm that if the function to enable the quiet shutdown in the callback is commented out, the test won't crash in macOS, it is one of the things I tried to two weeks ago to attempt to find the root cause of the issue.

Yep, make sense. BTW, even if you write 0 with SSL_CTX_set_quiet_shutdown (the default, i.e. quiet shutdown off), there will be still be an occasional seg fault. So it is the act of writing to the sslctx in the callback, not the quiet shutdown, which causes the issue. Likewise, if you do SSL_CTX_get_quiet_shutdown in the callback you'll sometimes see an obviously invalid value.

But with this new info provided I will also try a few things on my side too.

Great, thx.

@dmg0345 dmg0345 force-pushed the feature/tls_shutdown_from_peer_handling branch from 545506a to e0c7e15 Compare May 7, 2025 22:45
@dmg0345
Copy link
Contributor Author

dmg0345 commented May 7, 2025

I ran the cURL requests, and even the full initialization of cURL, in a separate thread as a std::packaged_task and also played a bit with some CURL options (CURLOPT_FORBID_REUSE, CURLOPT_FRESH_CONNECTION) but the segmentation fault still occurs in macOS.

I ended up disabling the quiet shutdown call in the callback for macOS as suggested in the previous comments as I haven't been able to find a workaround for that environment.

It is ready for review again from my side, I have updated the patch version date to today.

This time I was also able to capture one of such faults in the logs of the Github runners:

UndefinedBehaviorSanitizer:DEADLYSIGNAL
==8870==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000009 (pc 0x00010ba54b96 bp 0x7ff7b5130f00 sp 0x7ff7b5130ee0 T32927)
==8870==The signal is caused by a READ memory access.
==8870==Hint: address points to the zero page.
    #0 0x10ba54b96 in alg_cleanup+0x18 (libcrypto.3.dylib:x86_64+0x212b96)
    #1 0x10b97e59e in sa_doall+0xc2 (libcrypto.3.dylib:x86_64+0x13c59e)
    #2 0x10ba54b43 in ossl_method_store_free+0x25 (libcrypto.3.dylib:x86_64+0x212b43)
    #3 0x10b96a616 in context_deinit_objs+0x6d (libcrypto.3.dylib:x86_64+0x128616)
    #4 0x10b969e17 in context_deinit+0x1a (libcrypto.3.dylib:x86_64+0x127e17)
    #5 0x10b969de7 in ossl_lib_ctx_default_deinit+0x18 (libcrypto.3.dylib:x86_64+0x127de7)
    #6 0x10b96d5eb in OPENSSL_cleanup+0xd2 (libcrypto.3.dylib:x86_64+0x12b5eb)
    #7 0x7ff80f808ba7 in __cxa_finalize_ranges+0x19f (libsystem_c.dylib:x86_64+0x2aba7)
    #8 0x7ff80f8089ba in exit+0x22 (libsystem_c.dylib:x86_64+0x2a9ba)
    #9 0x7ff80f94e8d2 in dyld4::LibSystemHelpers::exit(int) const+0xa (libdyld.dylib:x86_64+0x118d2)
    #10 0x7ff80f5dc44c in start+0x79c (dyld:x86_64+0xfffffffffff6e44c)

==8870==Register values:
rax = 0x0000000000000001  rbx = 0x0000000000000009  rcx = 0x0000000000000001  rdx = 0x00006000023dc420  
rdi = 0x0000000000000009  rsi = 0x0000000000000001  rbp = 0x00007ff7b5130f00  rsp = 0x00007ff7b5130ee0  
 r8 = 0x0000000000000400   r9 = 0x00000000000003ff  r10 = 0x00000000000000b0  r11 = 0x0000600002fdb4bc  
r12 = 0x0000600002df5e20  r13 = 0x0000000000000009  r14 = 0x00006000023dc420  r15 = 0x0000000000000001  
UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV (libcrypto.3.dylib:x86_64+0x212b96) in alg_cleanup+0x18
==8870==ABORTING

@dmg0345 dmg0345 force-pushed the feature/tls_shutdown_from_peer_handling branch from e0c7e15 to ed5f832 Compare May 11, 2025 16:30
@dgreatwood
Copy link
Collaborator

[I thought I had added the following info a couple of days ago, but now I can't see it, perhaps I failed to submit the comment... apologies]

The issue causing the intermittent macOS test crash is as follows:

The default SSL library for libcurl on macOS is LibreSSL. Meanwhile, Pistache uses OpenSSL. In the test, when the CURLOPT_SSL_CTX_FUNCTION callback is called by libcurl, the sslctx pointer passed to the callback points to a LibreSSL SSL_CTX. However, when the callback then invokes SSL_CTX_set_quiet_shutdown, passing it the sslctx pointer, SSL_CTX_set_quiet_shutdown is an OpenSSL function that of course expects an OpenSSL SSL_CTX. Since the SSL_CTX definition has diverged in OpenSSL vs. LibreSSL, hence why there is memory corruption and an intermittent crash.

The OpenSSL libcurl can be installed using:
brew install curl
Then before the Pistache test code is built we'd need to do:
export PKG_CONFIG_PATH="$(brew --prefix)/opt/curl/lib/pkgconfig:$PKG_CONFIG_PATH"

To fix the GitHub runner, in macos.yaml, we can do brew install curl in the Install dependencies (macOS) section, alongside the other brew install commands.
Likewise, we can do the
export PKG_CONFIG_PATH="$(brew --prefix)/opt/curl/lib/pkgconfig:$PKG_CONFIG_PATH"
in the Configure Meson section, before meson setup build.

Meanwhile, in https_server_test.cc, in basic_tls_requests_with_no_shutdown_from_peer, we should call curl_version and do a case-insensitive search in the curl_version response string for the substring openssl. If the response string does not contain openssl, the test should throw an error with a helpful message.

@dmg0345 - could you take a shot at the above?

@dmg0345 dmg0345 force-pushed the feature/tls_shutdown_from_peer_handling branch from f79dcf1 to 72b9e70 Compare May 12, 2025 06:59
@dmg0345
Copy link
Contributor Author

dmg0345 commented May 12, 2025

[I thought I had added the following info a couple of days ago, but now I can't see it, perhaps I failed to submit the comment... apologies]

The issue causing the intermittent macOS test crash is as follows:

The default SSL library for libcurl on macOS is LibreSSL. Meanwhile, Pistache uses OpenSSL. In the test, when the CURLOPT_SSL_CTX_FUNCTION callback is called by libcurl, the sslctx pointer passed to the callback points to a LibreSSL SSL_CTX. However, when the callback then invokes SSL_CTX_set_quiet_shutdown, passing it the sslctx pointer, SSL_CTX_set_quiet_shutdown is an OpenSSL function that of course expects an OpenSSL SSL_CTX. Since the SSL_CTX definition has diverged in OpenSSL vs. LibreSSL, hence why there is memory corruption and an intermittent crash.

The OpenSSL libcurl can be installed using: brew install curl Then before the Pistache test code is built we'd need to do: export PKG_CONFIG_PATH="$(brew --prefix)/opt/curl/lib/pkgconfig:$PKG_CONFIG_PATH"

To fix the GitHub runner, in macos.yaml, we can do brew install curl in the Install dependencies (macOS) section, alongside the other brew install commands. Likewise, we can do the export PKG_CONFIG_PATH="$(brew --prefix)/opt/curl/lib/pkgconfig:$PKG_CONFIG_PATH" in the Configure Meson section, before meson setup build.

Meanwhile, in https_server_test.cc, in basic_tls_requests_with_no_shutdown_from_peer, we should call curl_version and do a case-insensitive search in the curl_version response string for the substring openssl. If the response string does not contain openssl, the test should throw an error with a helpful message.

@dmg0345 - could you take a shot at the above?

Thanks for the feedback, in case you posted it before I missed it too. Yeah, I will try this.

@dgreatwood I was also looking at an issue in the Windows / Windows with Libevent (windows-2019, msvc, address, y, x) that seems to happen recently not just in this PR, e.g.:

https://github.com/pistacheio/pistache/actions/runs/14918842081/job/41910241447?pr=1313

This was working fine two weeks ago so maybe a dependency changed, not sure what triggered it. The dump2def util, when compiled with MSVC 2019 and with sanitizers, will fail to launch from Meson when started to create the pistache.def from pistache.dump, and the build process and pipeline will fail.

From what I've investigated via the Github runners, dump2def doesn't seem to find clang_rt.asan_dynamic-x86_64.dll in the call from meson to that util in 2019 setup and thus will fail to start, if setting b_sanitize=none in meson.build for the dump2def util it will work fine again.

- Handle clearing error queue on SSL_read and SSL_write in `transport.cc` on error.
- Treat `0` and `<0` as errors in SSL_read and SSL_write, use SSL_get_error for retries.
- Disable sanitizer in `dump2def`.
- Bump patch version.

See https://docs.openssl.org/3.1/man3/SSL_read/ for error handling.
See https://docs.openssl.org/3.1/man3/SSL_write/ for error handling.
See pistacheio#1309 for more details.
@dmg0345 dmg0345 force-pushed the feature/tls_shutdown_from_peer_handling branch from fad3155 to c38aab1 Compare May 12, 2025 09:30
@dmg0345
Copy link
Contributor Author

dmg0345 commented May 12, 2025

MacOS pipelines ran fine now in my fork with libcurl using OpenSSL, no more intermittent crashes (thanks @dgreatwood for the help).

For the Windows 2019 pipelines that were failing, setting b_sanitize=none as default setting in that Meson subproject so that sanitizer settings in main project do not propagate to dump2def subproject start OK and generate pistache.def for the rest of the build (Was it intended that those settings propagate to the subproject with the util? e.g. others like buildtype are fixed to release.)

From my side changes are ready for review.

@dgreatwood
Copy link
Collaborator

MacOS pipelines ran fine now in my fork with libcurl using OpenSSL, no more intermittent crashes (thanks @dgreatwood for the help).

[DG] Great.

For the Windows 2019 pipelines that were failing, setting b_sanitize=none as default setting in that Meson subproject so that sanitizer settings in main project do not propagate to dump2def subproject start OK and generate pistache.def for the rest of the build (Was it intended that those settings propagate to the subproject with the util? e.g. others like buildtype are fixed to release.)

[DG] dump2def is a pretty simple util, and of course is used only at Pistache build-time, not at run-time. I think it's perfectly fine if we don't use sanitizer with dump2def.

From my side changes are ready for review.

[DG] Great. I'll take a look when I get the chance, some time in the next few days.

@dgreatwood
Copy link
Collaborator

Looks good to me.

Many thanks to @dmg0345 for putting this together and persistence in chasing down the question marks.

@dgreatwood dgreatwood merged commit 2679d10 into pistacheio:master May 14, 2025
135 of 137 checks passed
dgreatwood added a commit to dgreatwood/pistachefork that referenced this pull request May 16, 2025
The OpenSSL version of libcurl is required for the Pistache test
https_server_test.basic_tls_requests_with_no_shutdown_from_peer. However,
the default libcurl on macOS uses LibreSSL, not OpenSSL. On macOS, the
convenience build scripts mesbuild.sh and mesbuilddebug.sh will now
install the OpenSSL libcurl using brew, if it is not already present,
and modify PKG_CONFIG_PATH if needed.
See
pistacheio#1310 (comment)
for more details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants