Skip to content

Conversation

@lithdew
Copy link
Contributor

@lithdew lithdew commented May 12, 2021

Cross-platform versions of msghdr, sendmsg, recvmsg, linger, and iovec were provided based on findings from glibc, musl, and Microsoft's documentation.

Implemented initial Reactor interface for epoll (linux) which wraps around I/O reactor subsystems such as epoll, kqueue, select, etc. across different platforms. The Reactor interface allows for driving async I/O in Zig applications.

A test was added for the Reactor interface to drive a TCP client/listener socket pair.

A greatest-common-subset of possible socket initialization flags (close socket on exec syscalls, initialize socket to be non-blocking) were implemented.

A test was added for using sendmsg/recvmsg syscalls across different platforms for a TCP client/listener socket pair.

@lithdew lithdew marked this pull request as draft May 12, 2021 15:32
@lithdew
Copy link
Contributor Author

lithdew commented May 14, 2021

On Azure CI on x86_64-linux-musl, it seems that:

sendmsg(
  socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP),
  const * struct msghdr {
    iovecs: [ "hello", " world" ],
    iovecs_len: 2,
    // ... everything else is zeroed
  },
  0,
)

for some reason returns EMSGSIZE. The test code is:

    const message = "hello world";
    _ = try conn.client.writeVectorized(Socket.Message.fromBuffers(&[_]Buffer{
        Buffer.from(message[0 .. message.len / 2]),
        Buffer.from(message[message.len / 2 ..]),
    }), 0);

The full stack trace is:

Test [1415/1884] 
x.net.tcp.test "std-x86_64-linux-musl-Debug-c-multi tcp/client: read and write multiple vectors"... FAIL (MessageTooBig)
/home/vsts/work/1/s/lib/std/x/os/socket_posix.zig:69:13: 0xe23199 in x.os.socket_posix.Mixin(x.os.socket.Socket).read (test)
            return os.recv(self.fd, buf, flags);
            ^
/home/vsts/work/1/s/lib/std/x/net/tcp.zig:136:9: 0xe22ef7 in x.net.tcp.Client.read (test)
        return self.socket.read(buf, flags);
        ^
/home/vsts/work/1/s/lib/std/x/net/tcp.zig:70:13: 0xe22ea3 in x.net.tcp.Reader.read (test)
            return self.client.read(buffer, self.flags);
            ^
/home/vsts/work/1/s/lib/std/io/reader.zig:32:13: 0xcb6daf in io.reader.Reader(x.net.tcp.Reader,@typeInfo(@typeInfo(@TypeOf(x.net.tcp.Reader.read)).Fn.return_type.?).ErrorUnion.error_set,x.net.tcp.Reader.read).read (test)
            return readFn(self.context, buffer);
            ^
/home/vsts/work/1/s/lib/std/os.zig:5031:29: 0xe237bf in os.sendmsg (test)
                EMSGSIZE => return error.MessageTooBig,
                            ^
/home/vsts/work/1/s/lib/std/x/os/socket_posix.zig:82:13: 0xe235d9 in x.os.socket_posix.Mixin(x.os.socket.Socket).writeVectorized (test)
            return os.sendmsg(self.fd, msg, flags);
            ^
/home/vsts/work/1/s/lib/std/x/net/tcp.zig:149:9: 0xcb70c7 in x.net.tcp.Client.writeVectorized (test)
        return self.socket.writeVectorized(msg, flags);
        ^
/home/vsts/work/1/s/lib/std/x/net/tcp.zig:403:9: 0x4d0174 in x.net.tcp.test "std-x86_64-linux-musl-Debug-c-multi tcp/client: read and write multiple vectors" (test)
    _ = try conn.client.writeVectorized(Socket.Message.fromBuffers(&[_]Buffer{
        ^

Might anyone know what could be the cause?

My hunch is that it may be an ABI issue with the layout of msghdr, though the definition of msghdr (defined in the PR as std.x.os.Socket.Message) should match the definition of msghdr in musl.

Running the test with the target locally on my Linux laptop passes. All other CI targets pass.

cc @LemonBoy @daurnimator @ifreund

EDIT: Here's the strace output from my laptop.

[lith@dew:~/Desktop]$ zig build-exe sendmsg.zig -target x86_64-linux-musl -lc

[lith@dew:~/Desktop]$ strace -f -e trace=network ./sendmsg
socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3, 128)                          = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(41977), sin_addr=inet_addr("0.0.0.0")}, [128->16]) = 0
socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(41977), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
accept4(3, {sa_family=AF_INET, sin_port=htons(48864), sin_addr=inet_addr("127.0.0.1")}, [128->16], SOCK_CLOEXEC) = 5
sendmsg(5, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="hello", iov_len=5}, {iov_base=" world", iov_len=6}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, 0) = 11
recvmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="hello", iov_len=5}, {iov_base=" world", iov_len=7}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, 0) = 11
got message: hello world
+++ exited with 0 +++

@LemonBoy
Copy link
Contributor

I can't reproduce the problem locally, try adding some debug prints to see what the msghdr contents are before it's passed to the kernel.
Also keep in mind that (for some unknown reason) the x86_64 tests are going trough qemu's user mode emulator.

@lithdew
Copy link
Contributor Author

lithdew commented May 14, 2021

@LemonBoy printed out the msghdr just now alongside information about the target; it's contents seem to be fine. I'll add more debug prints in the next commit to see the contents of the iovec's that are part of the msghdr as well.

sizeof(usize): 8, endian: Endian.Little, message: Message{ .name = 0, .name_len = 0, .buffers = 274894677896, .buffers_len = 2, ._pad_1 = 0, .control = 0, .control_len = 0, ._pad_2 = 0, .flags = 0 }, flags: 0

@lithdew lithdew force-pushed the master branch 2 times, most recently from b6de155 to 0ff81a7 Compare May 15, 2021 08:00
@lithdew
Copy link
Contributor Author

lithdew commented May 15, 2021

Still haven't figured out how to get the Azure CI Linux target passing for sendmsg(). The Azure CI Linux target is x86_64-linux-musl (little-endian, 64-bit), and is essentially Ubuntu 18.04 (w/ a patched Linux kernel).

The msghdr struct passed to sendmsg() in Zig follows the same layout as musl's msghdr struct (which can be found in include/sys/socket.h), and the bytes representation of the msghdr struct passed to sendmsg() in Zig seems perfectly fine:

{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 120, 231, 255, 0, 64, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }

The size of the msghdr struct is 56 bytes, and its alignment is 8 bytes as expected. The contents of the pointer to the iovec's contained in the msghdr aren't malformed either, which I checked by inspecting the contents and lengths specified in each iovec.

The socket write buffer size is 2626560 bytes, which is the same on my laptop; so I assume that patches made by Azure aren't interfering with any settings to do with stream sockets.

Yet, sendmsg() still returns EMSGSIZE.

Is there any way to, for example, enable strace in the CI or to emulate the environment locally? Or, might anyone have any other ideas in mind on what could be wrong?

@LemonBoy
Copy link
Contributor

You're ignoring a pretty huge detail here: qrmu

@lithdew
Copy link
Contributor Author

lithdew commented May 15, 2021

You're ignoring a pretty huge detail here: qrmu

Thanks for the link, it finally makes a lot of sense 😌. Looks like the only way around it is to patch qemu on CI to zero out the msghdr struct though.

@kubkon @andrewrk is patching and building qemu for CI something desirable? Or might there be any other quick workarounds for this?

@LemonBoy
Copy link
Contributor

Looks like the only way around it is to patch qemu on CI to zero out the msghdr struct though.

Consider sending a patch upstream.

cc @mikdusan as he was considering an upgrade to Qemu 6.0

@mikdusan
Copy link
Member

cc @mikdusan as he was considering an upgrade to Qemu 6.0

I wouldn't be against patching qemu here before it shows up in upstream releases but if you want to pursue this angle I'd need to check first with andrew. Let me know if this is the case.

Side note: also unknown is if 6.0.0 needs patching wrt this issue. Ie: would an upgrade to 6.0.0 just fix things for you.

Side note 2: #8653 has a link to 6.0.0 tarball if you can test locally .

@LemonBoy
Copy link
Contributor

6.0 has the same problem, the problem was never patched nor reported upstream

@andrewrk
Copy link
Member

Patching is acceptable if we first send the patch upstream

@lithdew
Copy link
Contributor Author

lithdew commented May 17, 2021

@LemonBoy @mikdusan @andrewrk

Sent a patch upstream to QEMU here: https://patchew.org/QEMU/[email protected]/

Should I make a PR to add the patch to the custom QEMU w/ patches that Zig uses now?

@mikdusan
Copy link
Member

@lithdew yes please add a PR at https://github.com/ziglang/qemu-static

lithdew added a commit to lithdew/qemu-static that referenced this pull request May 17, 2021
Refer to this PR for more details about this patch: ziglang/zig#8750

Some spacing has also been added to the existing patch to adhere to QEMU's code style guide.
mikdusan pushed a commit to ziglang/qemu-static that referenced this pull request May 17, 2021
Refer to this PR for more details about this patch: ziglang/zig#8750

Some spacing has also been added to the existing patch to adhere to QEMU's code style guide.
mikdusan added a commit to mikdusan/zig that referenced this pull request May 17, 2021
- apply patch for qemu-user syscall do_sendrecvmsg_locked
- see ziglang#8750
mikdusan added a commit that referenced this pull request May 17, 2021
- apply patch for qemu-user syscall do_sendrecvmsg_locked
- see #8750
@lithdew lithdew marked this pull request as ready for review May 18, 2021 08:31
@lithdew
Copy link
Contributor Author

lithdew commented May 18, 2021

All tests are finally passing on CI 😄.

... except Drone though, looks like something's up with Drone CI for recent commits/PR's.

@kubkon
Copy link
Member

kubkon commented May 18, 2021

All tests are finally on CI smile.

... except Drone though, looks like something's up with Drone CI for recent commits/PR's.

Yep, Drone seems unable to connect to Github lately.

@lithdew lithdew force-pushed the master branch 2 times, most recently from 7c77f39 to c5bcfd4 Compare May 19, 2021 03:57
@lithdew
Copy link
Contributor Author

lithdew commented May 19, 2021

All CI tests are passing! 😄

@lithdew
Copy link
Contributor Author

lithdew commented May 31, 2021

Rebased with master to fix merge conflicts.

@lithdew lithdew force-pushed the master branch 2 times, most recently from 5cec5c3 to c28f3ff Compare May 31, 2021 06:58
lithdew added 11 commits June 1, 2021 18:22
Cross-platform versions of msghdr, sendmsg, recvmsg, linger, and iovec
were provided based on findings from glibc, musl, and Microsoft's
documentation.

Implemented initial Reactor interface for epoll (linux) which wraps
around I/O reactor subsystems such as epoll, kqueue, select, etc. across
different platforms. The Reactor interface allows for driving async I/O
in Zig applications.

A test was added for the Reactor interface to drive a TCP
client/listener socket pair.

A greatest-common-subset of possible socket initialization flags (close
socket on exec syscalls, initialize socket to be non-blocking) were
implemented.

A test was added for using sendmsg/recvmsg syscalls across different
platforms for a TCP client/listener socket pair.
`msghdr` and `msghdr_const` definitions have been added back the way
they were in std.os. std.os.sendmsg has also been modified to accept a
msghdr_const again to ensure backwards-compatibility with this PR.
Underneath the hood, std.os.sendmsg will @ptrCast the provided
msghdr_const into a std.x.os.Socket.Message.

`sockaddr_storage` definitions have been added back the way they were in
std.os, except that it now simply aliases
std.x.os.Socket.Address.Native.Storage as all of
std.x.os.Socket.Address.Native.Storage's fields are equivalent to the
fields that were previously defined for std.x.os.bits.sockaddr_storage.

std.x.os.Socket.sendMessage now no longer is a stub that aliases
std.os.sendmsg, but instead calls and handles
errors from std.os.system.sendmsg directly.

Addresses feedback to urge backwards compatibility from @andrewrk.
@lithdew
Copy link
Contributor Author

lithdew commented Jun 1, 2021

All CI tests are now passing 😄.

@andrewrk andrewrk merged commit 9c08a33 into ziglang:master Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants