Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions docs/CIBIR.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# CIBIR

## What is it

See [XDP](./XDP.md) first to understand the context.

When CIBIR is used, rather than programming XDP to filter packets on port numbers,
we now filter and de-mux packets based on QUIC connection ID.

CIBIR (CID-Based Identification and Routing) is just a prefix substring that XDP
will use to match and filter all packets with a QUIC CID that contains the prefix substring equal to CIBIR.

What using CIBIR also enables is allowing 2 separate server processes to share a single
port. As long as the CIBIR configuration used by each process is different, XDP can
properly de-mux and dispatch received packets to the right process.

## Port reservation
The first process that uses CIBIR will still need to reserve the OS ports to avoid
non-CIBIR applications from getting their traffic stolen. The second (and so on) processes
using CIBIR thereafter will skip reserving OS socket ports.


CIBIR usage is controlled by setting the `QUIC_PARAM_LISTENER_CIBIR_ID` setparam.

CIBIR does 2 things when set:
1. XDP will now steer packets to the correct process/listener by matching the CIBIR prefix within the packet QUIC Connection ID.

2. In the case of a port collision when reserving OS UDP/TCP sockets, MsQuic will continue with initializing the datapath. If XDP is not available/enabled, then no traffic will flow for the listener that experiences a collision.


2 changes: 1 addition & 1 deletion docs/Settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ These parameters are accessed by calling [GetParam](./api/GetParam.md) or [SetPa
|-------------------------------------------|---------------------------|-----------|-----------------------------------------------------------|
| `QUIC_PARAM_LISTENER_LOCAL_ADDRESS`<br> 0 | QUIC_ADDR | Get-only | Get the full address tuple the server is listening on. |
| `QUIC_PARAM_LISTENER_STATS`<br> 1 | QUIC_LISTENER_STATISTICS | Get-only | Get statistics specific to this Listener instance. |
| `QUIC_PARAM_LISTENER_CIBIR_ID`<br> 2 | uint8_t[] | Both | The CIBIR well-known idenfitier. |
| `QUIC_PARAM_LISTENER_CIBIR_ID`<br> 2 | uint8_t[] | Both | Sets a [CIBIR](./CIBIR.md) (CID-Based Identification and Routing) well-known identifier. |
| `QUIC_PARAM_DOS_MODE_EVENTS`<br> 2 | BOOLEAN | Both | The Listener opted in for DoS Mode event. |
| `QUIC_PARAM_LISTENER_PARTITION_INDEX`<br> (preview) | uint16_t | Both | The partition to use for listener callback events and incoming connections. |

Expand Down
44 changes: 44 additions & 0 deletions docs/XDP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# MsQuic over XDP

To avoid confusion, "XDP" refers to [XDP-for-windows](https://github.com/microsoft/xdp-for-windows). While Linux XDP has been experimented
upon in the past and shown some promise for running MsQuic, it is NOT a stable datapath actively being maintained today.

## What is XDP

XDP enables received packets to completely bypass the OS networking stack.

Applications can subscribe to XDP ring buffers to post packets to send,
and process packets that are received through AF_XDP sockets.

Additionally, applications can program XDP to determine the
logic for which packets to filter for, and what to do with them.

For instance: "drop all packets with a UDP header and destination port
42."

## Port reservation logic

The type of logic MsQuic programs into XDP looks like:
"redirect all packets with a destination port X to an AF_XDP socket."

This runs into the issue of **packet stealing.** If say there was an unrelated process
that binds an OS socket to the same port MsQuic used to program XDP, XDP will steal
that traffic from underneath it.

Which is why MsQuic will always create an OS UDP socket on the same port as the AF_XDP
socket to play nice with the rest of the stack.

There are *exceptions* to this port reservation.

- Sometimes, MsQuic may create a TCP OS socket instead, or both TCP and UDP (see [QTIP](./QTIP.md)).
- Sometimes, MsQuic may NOT create any OS sockets at all (see [CIBIR](./CIBIR.md)).



## MsQuic over XDP general architecture:

![](./images/XDP-arch.png)




Binary file added docs/images/XDP-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 9 additions & 3 deletions src/core/connection.c
Original file line number Diff line number Diff line change
Expand Up @@ -6708,12 +6708,18 @@ QuicConnParamSet(
Connection->CibirId[0] = (uint8_t)BufferLength - 1;
memcpy(Connection->CibirId + 1, Buffer, BufferLength);

uint64_t CibirIdValue = 0;
for (uint8_t i = 0; i < Connection->CibirId[0]; ++i) {
CibirIdValue = (CibirIdValue << 8) | Connection->CibirId[2 + i];
}

QuicTraceLogConnInfo(
CibirIdSet,
CibirIdSetInfo,
Connection,
"CIBIR ID set (len %hhu, offset %hhu)",
"CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
Connection->CibirId[0],
Connection->CibirId[1]);
Connection->CibirId[1],
(unsigned long long)CibirIdValue);

return QUIC_STATUS_SUCCESS;
}
Expand Down
12 changes: 9 additions & 3 deletions src/core/listener.c
Original file line number Diff line number Diff line change
Expand Up @@ -884,12 +884,18 @@ QuicListenerParamSet(
Listener->CibirId[0] = (uint8_t)BufferLength - 1;
memcpy(Listener->CibirId + 1, Buffer, BufferLength);

uint64_t CibirIdValue = 0;
for (uint8_t i = 0; i < Listener->CibirId[0]; ++i) {
CibirIdValue = (CibirIdValue << 8) | Listener->CibirId[2 + i];
}

QuicTraceLogVerbose(
ListenerCibirIdSet,
"[list][%p] CIBIR ID set (len %hhu, offset %hhu)",
ListenerCibirIdSetInfo,
"[list][%p] CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
Listener,
Listener->CibirId[0],
Listener->CibirId[1]);
Listener->CibirId[1],
(unsigned long long)CibirIdValue);

return QUIC_STATUS_SUCCESS;
}
Expand Down
18 changes: 10 additions & 8 deletions src/generated/linux/connection.c.clog.h
Original file line number Diff line number Diff line change
Expand Up @@ -853,21 +853,23 @@ tracepoint(CLOG_CONNECTION_C, LocalInterfaceSet , arg1, arg3);\


/*----------------------------------------------------------
// Decoder Ring for CibirIdSet
// [conn][%p] CIBIR ID set (len %hhu, offset %hhu)
// Decoder Ring for CibirIdSetInfo
// [conn][%p] CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)
// QuicTraceLogConnInfo(
CibirIdSet,
CibirIdSetInfo,
Connection,
"CIBIR ID set (len %hhu, offset %hhu)",
"CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
Connection->CibirId[0],
Connection->CibirId[1]);
Connection->CibirId[1],
(unsigned long long)CibirIdValue);
// arg1 = arg1 = Connection = arg1
// arg3 = arg3 = Connection->CibirId[0] = arg3
// arg4 = arg4 = Connection->CibirId[1] = arg4
// arg5 = arg5 = (unsigned long long)CibirIdValue = arg5
----------------------------------------------------------*/
#ifndef _clog_5_ARGS_TRACE_CibirIdSet
#define _clog_5_ARGS_TRACE_CibirIdSet(uniqueId, arg1, encoded_arg_string, arg3, arg4)\
tracepoint(CLOG_CONNECTION_C, CibirIdSet , arg1, arg3, arg4);\
#ifndef _clog_6_ARGS_TRACE_CibirIdSetInfo
#define _clog_6_ARGS_TRACE_CibirIdSetInfo(uniqueId, arg1, encoded_arg_string, arg3, arg4, arg5)\
tracepoint(CLOG_CONNECTION_C, CibirIdSetInfo , arg1, arg3, arg4, arg5);\

#endif

Expand Down
18 changes: 11 additions & 7 deletions src/generated/linux/connection.c.clog.h.lttng.h
Original file line number Diff line number Diff line change
Expand Up @@ -912,27 +912,31 @@ TRACEPOINT_EVENT(CLOG_CONNECTION_C, LocalInterfaceSet,


/*----------------------------------------------------------
// Decoder Ring for CibirIdSet
// [conn][%p] CIBIR ID set (len %hhu, offset %hhu)
// Decoder Ring for CibirIdSetInfo
// [conn][%p] CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)
// QuicTraceLogConnInfo(
CibirIdSet,
CibirIdSetInfo,
Connection,
"CIBIR ID set (len %hhu, offset %hhu)",
"CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
Connection->CibirId[0],
Connection->CibirId[1]);
Connection->CibirId[1],
(unsigned long long)CibirIdValue);
// arg1 = arg1 = Connection = arg1
// arg3 = arg3 = Connection->CibirId[0] = arg3
// arg4 = arg4 = Connection->CibirId[1] = arg4
// arg5 = arg5 = (unsigned long long)CibirIdValue = arg5
----------------------------------------------------------*/
TRACEPOINT_EVENT(CLOG_CONNECTION_C, CibirIdSet,
TRACEPOINT_EVENT(CLOG_CONNECTION_C, CibirIdSetInfo,
TP_ARGS(
const void *, arg1,
unsigned char, arg3,
unsigned char, arg4),
unsigned char, arg4,
unsigned long long, arg5),
TP_FIELDS(
ctf_integer_hex(uint64_t, arg1, (uint64_t)arg1)
ctf_integer(unsigned char, arg3, arg3)
ctf_integer(unsigned char, arg4, arg4)
ctf_integer(uint64_t, arg5, arg5)
)
)

Expand Down
46 changes: 46 additions & 0 deletions src/generated/linux/datapath_winuser.c.clog.h
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,52 @@ tracepoint(CLOG_DATAPATH_WINUSER_C, DatapathTestSetIpv6TrafficClassFailed , arg2



/*----------------------------------------------------------
// Decoder Ring for DatapathCibirWarning
// [data][%p] CIBIR detected, %s
// QuicTraceLogWarning(
DatapathCibirWarning,
"[data][%p] CIBIR detected, %s",
Socket,
"ignoring port collision by assuming some \
other MsQuic CIBIR process has reserved the OS port. \
Let's continue with initialization and skip port reservation.");
// arg2 = arg2 = Socket = arg2
// arg3 = arg3 = "ignoring port collision by assuming some \
other MsQuic CIBIR process has reserved the OS port. \
Let's continue with initialization and skip port reservation." = arg3
----------------------------------------------------------*/
#ifndef _clog_4_ARGS_TRACE_DatapathCibirWarning
#define _clog_4_ARGS_TRACE_DatapathCibirWarning(uniqueId, encoded_arg_string, arg2, arg3)\
tracepoint(CLOG_DATAPATH_WINUSER_C, DatapathCibirWarning , arg2, arg3);\

#endif




/*----------------------------------------------------------
// Decoder Ring for DatapathCibirIdUsed
// [data][%p] Using CIBIR ID (len %hhu, id 0x%llx)
// QuicTraceLogWarning(
DatapathCibirIdUsed,
"[data][%p] Using CIBIR ID (len %hhu, id 0x%llx)",
Socket,
Config->CibirIdLength,
(unsigned long long)CibirIdValue);
// arg2 = arg2 = Socket = arg2
// arg3 = arg3 = Config->CibirIdLength = arg3
// arg4 = arg4 = (unsigned long long)CibirIdValue = arg4
----------------------------------------------------------*/
#ifndef _clog_5_ARGS_TRACE_DatapathCibirIdUsed
#define _clog_5_ARGS_TRACE_DatapathCibirIdUsed(uniqueId, encoded_arg_string, arg2, arg3, arg4)\
tracepoint(CLOG_DATAPATH_WINUSER_C, DatapathCibirIdUsed , arg2, arg3, arg4);\

#endif




/*----------------------------------------------------------
// Decoder Ring for DatapathRecvEmpty
// [data][%p] Dropping datagram with empty payload.
Expand Down
54 changes: 54 additions & 0 deletions src/generated/linux/datapath_winuser.c.clog.h.lttng.h
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,60 @@ TRACEPOINT_EVENT(CLOG_DATAPATH_WINUSER_C, DatapathTestSetIpv6TrafficClassFailed,



/*----------------------------------------------------------
// Decoder Ring for DatapathCibirWarning
// [data][%p] CIBIR detected, %s
// QuicTraceLogWarning(
DatapathCibirWarning,
"[data][%p] CIBIR detected, %s",
Socket,
"ignoring port collision by assuming some \
other MsQuic CIBIR process has reserved the OS port. \
Let's continue with initialization and skip port reservation.");
// arg2 = arg2 = Socket = arg2
// arg3 = arg3 = "ignoring port collision by assuming some \
other MsQuic CIBIR process has reserved the OS port. \
Let's continue with initialization and skip port reservation." = arg3
----------------------------------------------------------*/
TRACEPOINT_EVENT(CLOG_DATAPATH_WINUSER_C, DatapathCibirWarning,
TP_ARGS(
const void *, arg2,
const char *, arg3),
TP_FIELDS(
ctf_integer_hex(uint64_t, arg2, (uint64_t)arg2)
ctf_string(arg3, arg3)
)
)



/*----------------------------------------------------------
// Decoder Ring for DatapathCibirIdUsed
// [data][%p] Using CIBIR ID (len %hhu, id 0x%llx)
// QuicTraceLogWarning(
DatapathCibirIdUsed,
"[data][%p] Using CIBIR ID (len %hhu, id 0x%llx)",
Socket,
Config->CibirIdLength,
(unsigned long long)CibirIdValue);
// arg2 = arg2 = Socket = arg2
// arg3 = arg3 = Config->CibirIdLength = arg3
// arg4 = arg4 = (unsigned long long)CibirIdValue = arg4
----------------------------------------------------------*/
TRACEPOINT_EVENT(CLOG_DATAPATH_WINUSER_C, DatapathCibirIdUsed,
TP_ARGS(
const void *, arg2,
unsigned char, arg3,
unsigned long long, arg4),
TP_FIELDS(
ctf_integer_hex(uint64_t, arg2, (uint64_t)arg2)
ctf_integer(unsigned char, arg3, arg3)
ctf_integer(uint64_t, arg4, arg4)
)
)



/*----------------------------------------------------------
// Decoder Ring for DatapathRecvEmpty
// [data][%p] Dropping datagram with empty payload.
Expand Down
18 changes: 10 additions & 8 deletions src/generated/linux/listener.c.clog.h
Original file line number Diff line number Diff line change
Expand Up @@ -68,21 +68,23 @@ tracepoint(CLOG_LISTENER_C, ListenerIndicateNewConnection , arg2, arg3);\


/*----------------------------------------------------------
// Decoder Ring for ListenerCibirIdSet
// [list][%p] CIBIR ID set (len %hhu, offset %hhu)
// Decoder Ring for ListenerCibirIdSetInfo
// [list][%p] CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)
// QuicTraceLogVerbose(
ListenerCibirIdSet,
"[list][%p] CIBIR ID set (len %hhu, offset %hhu)",
ListenerCibirIdSetInfo,
"[list][%p] CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
Listener,
Listener->CibirId[0],
Listener->CibirId[1]);
Listener->CibirId[1],
(unsigned long long)CibirIdValue);
// arg2 = arg2 = Listener = arg2
// arg3 = arg3 = Listener->CibirId[0] = arg3
// arg4 = arg4 = Listener->CibirId[1] = arg4
// arg5 = arg5 = (unsigned long long)CibirIdValue = arg5
----------------------------------------------------------*/
#ifndef _clog_5_ARGS_TRACE_ListenerCibirIdSet
#define _clog_5_ARGS_TRACE_ListenerCibirIdSet(uniqueId, encoded_arg_string, arg2, arg3, arg4)\
tracepoint(CLOG_LISTENER_C, ListenerCibirIdSet , arg2, arg3, arg4);\
#ifndef _clog_6_ARGS_TRACE_ListenerCibirIdSetInfo
#define _clog_6_ARGS_TRACE_ListenerCibirIdSetInfo(uniqueId, encoded_arg_string, arg2, arg3, arg4, arg5)\
tracepoint(CLOG_LISTENER_C, ListenerCibirIdSetInfo , arg2, arg3, arg4, arg5);\

#endif

Expand Down
18 changes: 11 additions & 7 deletions src/generated/linux/listener.c.clog.h.lttng.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,27 +44,31 @@ TRACEPOINT_EVENT(CLOG_LISTENER_C, ListenerIndicateNewConnection,


/*----------------------------------------------------------
// Decoder Ring for ListenerCibirIdSet
// [list][%p] CIBIR ID set (len %hhu, offset %hhu)
// Decoder Ring for ListenerCibirIdSetInfo
// [list][%p] CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)
// QuicTraceLogVerbose(
ListenerCibirIdSet,
"[list][%p] CIBIR ID set (len %hhu, offset %hhu)",
ListenerCibirIdSetInfo,
"[list][%p] CIBIR ID set (len %hhu, offset %hhu, id 0x%llx)",
Listener,
Listener->CibirId[0],
Listener->CibirId[1]);
Listener->CibirId[1],
(unsigned long long)CibirIdValue);
// arg2 = arg2 = Listener = arg2
// arg3 = arg3 = Listener->CibirId[0] = arg3
// arg4 = arg4 = Listener->CibirId[1] = arg4
// arg5 = arg5 = (unsigned long long)CibirIdValue = arg5
----------------------------------------------------------*/
TRACEPOINT_EVENT(CLOG_LISTENER_C, ListenerCibirIdSet,
TRACEPOINT_EVENT(CLOG_LISTENER_C, ListenerCibirIdSetInfo,
TP_ARGS(
const void *, arg2,
unsigned char, arg3,
unsigned char, arg4),
unsigned char, arg4,
unsigned long long, arg5),
TP_FIELDS(
ctf_integer_hex(uint64_t, arg2, (uint64_t)arg2)
ctf_integer(unsigned char, arg3, arg3)
ctf_integer(unsigned char, arg4, arg4)
ctf_integer(uint64_t, arg5, arg5)
)
)

Expand Down
1 change: 1 addition & 0 deletions src/inc/quic_datapath.h
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,7 @@ typedef enum CXPLAT_DATAPATH_FEATURES {
CXPLAT_DATAPATH_FEATURE_TTL = 0x00000080,
CXPLAT_DATAPATH_FEATURE_SEND_DSCP = 0x00000100,
CXPLAT_DATAPATH_FEATURE_RECV_DSCP = 0x00000200,
CXPLAT_DATAPATH_FEATURE_CIBIR = 0x00000400,
} CXPLAT_DATAPATH_FEATURES;

DEFINE_ENUM_FLAG_OPERATORS(CXPLAT_DATAPATH_FEATURES)
Expand Down
Loading
Loading