-
Notifications
You must be signed in to change notification settings - Fork 23
Description
We're seeing issues in Windows CI that look like:
thread 'dummy_kernel' panicked at crates/ark/src/start.rs:112:9:
Couldn't connect to frontend: SocketBindError("Control", "tcp://127.0.0.1:23393", Address already in use)
I can also reproduce it locally by running integration tests in a loop. It's a mystery why it happens so often on the Windows CI though.
This might be due to the "classic jupyter race condition": jupyter/jupyter_client#487. In Jupyter's connection scheme, the client searches for available ports, communicates those to the server which then tries to bind to them. This fails if any of the ports end up getting used up in the meantime.
There is no other solution than to let the kernel pick the ports. In the linked issue, they suggest implementing this scheme:
The client opens a socket A, passes the port of this socket to the kernel that it launches and waits the kernel starts, finds free ports to bind shell, control, stdin, heartbeat and iopub sockets. Then it connects to the socket A of the client, sends a message containing these ports, and close the connection to socket A. Upon reception of this message, the client connects to the kernel and closes the socket A.
Essentially the client would pick a port for a handshake socket, bind to it, and send this connection info:
{
"transport": "tcp",
"signature_scheme": "hmac-sha256",
"ip": "127.0.0.1",
"key": "a0436f6c-1916-498b-8eb9-e81ab9368e84",
"handshake_port": 40885
}And the server would connect to the handshake socket and send back:
{
"control_port": 50160,
"shell_port": 57503,
"stdin_port": 52597,
"hb_port": 42540,
"iopub_port": 40885,
}On the server side, it looks like we can use :* or :0 to let the OS pick a port: https://stackoverflow.com/questions/16699890/connect-to-first-free-port-with-tcp-using-0mq
Positron could also use this to make the initial connection to Ark more robust, cc @jmcphers.