Skip to content

Modernize Akka.IO TCP: ByteString removal, Stream+Pipe transport, ReadOnlySequence<byte> migration, .NET 10 TFM#8132

Merged
Aaronontheweb merged 41 commits into
akkadotnet:devfrom
Aaronontheweb:feature/spec1-modernize-akka-io-tcp
May 10, 2026
Merged

Modernize Akka.IO TCP: ByteString removal, Stream+Pipe transport, ReadOnlySequence<byte> migration, .NET 10 TFM#8132
Aaronontheweb merged 41 commits into
akkadotnet:devfrom
Aaronontheweb:feature/spec1-modernize-akka-io-tcp

Conversation

@Aaronontheweb
Copy link
Copy Markdown
Member

@Aaronontheweb Aaronontheweb commented Mar 25, 2026

Summary

Milestone 1 of the Akka.NET 1.6 transport/serialization epic. Modernizes the Akka.IO TCP layer:

  • BREAKING: Tcp.Received.Data and Tcp.Write.Data are now ReadOnlySequence<byte> (was ByteString). All Akka.Streams DSL stages that flowed ByteString now flow ReadOnlySequence<byte> — Framing, JsonFraming, FileIO, StreamConverters, and the TCP source/sink.
  • BREAKING: Drop netstandard2.0 + net6.0 — all projects target net10.0 only.
  • Replace SocketAsyncEventArgs internals in TcpConnection with Stream + System.IO.Pipelines, fronted by a new ITransportConnection abstraction (enables a future TLS provider).
  • Actor-driven pipe reads via PipeTo — no cross-thread synchronization.
  • Eliminate per-frame copies in Framing and JsonObjectParser via BufferSegment-chained ReadOnlySequence<byte> (no merge byte[] allocations, defensive ToArray() compactions removed).

What changed

Area Change
TFM netstandard2.0 + net6.0net10.0 only
Tcp.Received.Data ByteStringReadOnlySequence<byte>
Tcp.Write.Data ByteStringReadOnlySequence<byte>
TcpConnection Full rewrite: SAEA → Stream + Pipe with actor-driven reads, fronted by ITransportConnection
ITransportConnection New abstraction: TcpTransportConnection (plaintext), shape ready for a future TLS implementation
Akka.Streams DSL Framing, JsonFraming, FileIO, StreamConverters, Tcp stream now flow ReadOnlySequence<byte>
Framing / JsonObjectParser Per-frame Concat byte[] allocation replaced by BufferSegment chain. Defensive parsedFrame.ToArray() / _buffer.ToArray() / emit.ToArray() compactions removed. JsonObjectParser uses SequenceReader<byte> for byte-by-byte scan.
Reviewer cleanups Dead NETSTANDARD2_1 blocks removed (ClusterClientDiscovery, AkkaAssertEqualityComparerAdapter, Akka.Streams.TestKit/TestUtils); ConfigureAwait(false) regression in ClusterClientDiscovery reverted with CancellationToken restored; TcpConnection state-machine doc restored; redundant [TcpConnection] log prefixes stripped.
Akka.Remote Minimal changes (most ByteString refs were Google.Protobuf.ByteString)
API baselines Akka.API.Tests baselines refreshed for the ReadOnlySequence<byte> public surface
Tests/Benchmarks All migrated, ByteStringBenchmarks deleted

Test status

  • dotnet build -c Release0 errors, 0 warnings
  • Akka.API.Tests — 18/18 passing
  • Akka.Streams.Tests Framing/JsonFraming — 44/44 passing
  • Akka.Tests IO suite — 51/51 passing
  • Akka.Streams TcpSpec — 18/19 passing (1 intermittent 1-byte loss in echo batch run)
  • Full Akka.Streams.Tests suite — ~1800 passing
  • Akka.Remote.Tests — 350+ passing

Known issues

  • 1 intermittent test: Echo_should_work_even_if_server_is_in_full_close_mode loses 1 byte (999 vs 1000) when run in batch, passes individually. Root cause under investigation.

Performance

A new FramingBenchmarks (#8202 against dev) was used to measure the framing system end-to-end across three states: dev (ByteString) → mid-PR (ReadOnlyMemory<byte>) → final (ReadOnlySequence<byte>). Full data tables in this comment. Headlines:

  • Framing allocations are flat-or-better than dev at scale. At 1 KB messages: LengthField_Decode 1752 B → 632 B (−64%), Delimiter_Decode stays at 425 B regardless of payload size, LengthField_Encode 489 B → 553 B.
  • Framing throughput at 1 KB: Delimiter_Decode 1.88M → 4.17M req/s (+122%), LengthField_Decode 2.29M → 3.47M (+52%).
  • TCP stays at the same wins from the SAEA → Pipe rewrite: −38% allocations vs dev at 40 clients, +21% throughput vs dev at 1 client (latency-bound).
  • Important note: the intermediate ReadOnlyMemory<byte> state of this PR was an allocation regression vs dev for any non-trivial payload (e.g. 1 KB delimiter went 449 B → 1441 B). The follow-on ReadOnlySequence<byte> work was required to recover and improve on dev.

Commits

  1. Migrate all projects to net10.0 single-target
  2. Delete ByteString and migrate Akka.IO to ReadOnlyMemory
  3. Migrate Akka.Streams, Akka.Remote, and Akka.Cluster from ByteString
  4. Add ITransportConnection abstraction and TcpTransportConnection
  5. Rewrite TcpConnection from SAEA to Stream + Pipe
  6. Migrate all test and benchmark projects from ByteString
  7. Fix TcpConnection: actor-driven reads, Props, blocking sockets
  8. Fix TcpConnection shutdown coordination and race conditions
  9. Clean up reviewer-flagged dead code and actor-context async issues (dd14510d)
  10. Migrate Akka.IO and Akka.Streams from ReadOnlyMemory<byte> to ReadOnlySequence<byte> (d49c02c1)
  11. Eliminate per-frame copies in Framing and JsonObjectParser (d52cf44e)
  12. Polish TcpConnection: phase doc, region-grouped state flags, drop log prefixes; refresh API baselines (baeb1364)

A pre-merge backup of the branch state at commit 8 (f2e80b3d8) is preserved at backup/spec1-modernize-akka-io-tcp-pre-ros on the fork.

Design decisions

See openspec/changes/modernize-akka-io-tcp/design.md for full rationale on the SAEA → Pipe rewrite and the buffer-type evolution.

@Aaronontheweb
Copy link
Copy Markdown
Member Author

Update after cleanup:

  • Fixed TcpConnection shutdown coordination, actor self-tells from async paths, and outbound opportunistic write batching. The large-message regression and the full-close / half-close TCP tests are now passing locally.
  • Removed the DNS fallback retry path from TcpOutgoingConnection. For DNS endpoints we now pick a single address based on the socket family and either connect or fail; no alternate-address fallback logic remains in this PR.
  • The one remaining local red is TcpIntegrationSpec.The_TCP_transport_implementation_should_properly_support_connecting_to_DNS_endpoints(family: InterNetworkV6), but I reproduced that same failure against a clean origin/dev worktree on this VM. On this machine localhost resolves only to 127.0.0.1 (getent ahosts localhost returns IPv4 only), which likely explains why it passes in CI but fails here.

@Aaronontheweb
Copy link
Copy Markdown
Member Author

Follow-up cleanup note:

There are still some legacy .NET Framework references and old project / conditional remnants elsewhere in the repo (older example projects, NETFRAMEWORK / NET48 test guards, stale docs / badges, etc.). We should clean those up in a separate follow-on task rather than broadening this PR further.

Recommendation for that later cleanup:

  • converge remaining projects on the central MSBuild TFM properties already defined in Directory.Build.props
  • modernize the old example projects to canonical modern .NET app SDK style where practical
  • remove stale NETFRAMEWORK / NET48 conditionals once the old targets are fully gone
  • sweep docs / badges / contributor guidance for remaining .NET Framework references

I intentionally did not fold that repo-wide cleanup into this PR.

@Aaronontheweb Aaronontheweb changed the title WIP: Modernize Akka.IO TCP — ByteString removal, Stream+Pipe, IStreamProvider WIP: Modernize Akka.IO TCP (ByteString removal, Stream+Pipe, IStreamProvider) + Migrate to .NET 10 TFM Mar 25, 2026
@Aaronontheweb
Copy link
Copy Markdown
Member Author

Akka.IO TCP Benchmark: SAEA (dev) vs Stream+Pipe (this PR)

Environment: Linux Ubuntu 24.04.4, Intel Core i9-9900K 3.60GHz, 8 cores, .NET 10.0.5, ServerGC

Summary (10-byte messages, LongRun)

Clients SAEA (dev) req/s Stream+Pipe req/s Change Alloc (dev) Alloc (new)
1 326,405 393,954 +21% 473 B 258 B
3 908,387 1,209,850 +33% ~0 B 246 B
5 1,392,469 2,109,859 +52% 471 B 231 B
7 1,770,914 2,946,997 +66% 454 B 224 B
10 2,088,483 3,758,319 +80% 448 B 221 B
20 2,257,921 3,886,077 +72% 442 B 222 B
30 2,321,135 3,938,591 +70% 442 B 222 B
40 2,311,378 4,253,284 +84% 441 B 223 B

Peak throughput: 2.3M → 4.25M req/sec (+84%). Allocations cut ~50%.


Full BenchmarkDotNet output — dev baseline (SAEA)
BenchmarkDotNet v0.15.4, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Intel Core i9-9900K CPU 3.60GHz (Coffee Lake), 1 CPU, 8 logical and 8 physical cores
.NET SDK 10.0.103
  [Host]   : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
  LongRun  : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3

Runtime=.NET 10.0  Concurrent=True  Server=True  InvocationCount=1  UnrollFactor=1
Method MessageLength ClientsCount Mean Error StdDev Req/sec Gen0 Gen1 Gen2 Allocated
ClientServerCommunication 10 1 3,063.7 ns 62.04 ns 92.86 ns 326,405.24 0.0380 - - 473 B
ClientServerCommunication 10 3 1,100.9 ns 52.48 ns 78.55 ns 908,387.09 0.0720 0.0010 0.0010 -
ClientServerCommunication 10 5 718.1 ns 25.82 ns 38.65 ns 1,392,469.48 0.0230 - - 471 B
ClientServerCommunication 10 7 564.7 ns 28.78 ns 43.08 ns 1,770,913.55 0.0590 - - 454 B
ClientServerCommunication 10 10 478.8 ns 32.56 ns 48.74 ns 2,088,482.90 0.0230 - - 448 B
ClientServerCommunication 10 20 442.9 ns 22.70 ns 33.97 ns 2,257,921.00 0.0220 - - 442 B
ClientServerCommunication 10 30 430.8 ns 11.56 ns 17.30 ns 2,321,134.50 0.0210 - - 442 B
ClientServerCommunication 10 40 432.6 ns 24.70 ns 36.98 ns 2,311,378.02 0.0210 0.0010 - 441 B
ClientServerCommunication 100 1 3,500.5 ns 304.66 ns 456.00 ns 285,673.90 0.2580 - - 650 B
ClientServerCommunication 100 3 1,260.9 ns 94.96 ns 142.13 ns 793,093.77 0.0510 0.0010 0.0010 106 B
ClientServerCommunication 100 5 767.2 ns 29.28 ns 43.82 ns 1,303,420.22 0.0880 0.0010 - 652 B
ClientServerCommunication 100 7 577.4 ns 31.25 ns 46.77 ns 1,731,867.47 0.0430 - - 637 B
ClientServerCommunication 100 10 525.0 ns 28.49 ns 42.64 ns 1,904,581.55 0.0640 0.0010 - 628 B
ClientServerCommunication 100 20 478.9 ns 21.85 ns 32.71 ns 2,088,048.36 0.0500 0.0010 - 622 B
ClientServerCommunication 100 30 465.3 ns 22.26 ns 33.32 ns 2,149,020.88 0.0410 0.0010 - 622 B
ClientServerCommunication 100 40 480.8 ns 27.18 ns 40.68 ns 2,079,784.42 0.0300 0.0010 - 621 B
Full BenchmarkDotNet output — Stream+Pipe (this PR)
BenchmarkDotNet v0.15.4, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
Intel Core i9-9900K CPU 3.60GHz (Coffee Lake), 1 CPU, 8 logical and 8 physical cores
.NET SDK 10.0.103
  [Host]   : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3
  LongRun  : .NET 10.0.5 (10.0.5, 10.0.526.15411), X64 RyuJIT x86-64-v3

Runtime=.NET 10.0  Concurrent=True  Server=True  InvocationCount=1  UnrollFactor=1
Method MessageLength ClientsCount Mean Error StdDev Req/sec Gen0 Gen1 Gen2 Allocated
ClientServerCommunication 10 1 2,538.4 ns 48.04 ns 71.90 ns 393,954.34 0.0130 - - 258 B
ClientServerCommunication 10 3 826.5 ns 29.40 ns 44.00 ns 1,209,849.81 0.0160 - - 246 B
ClientServerCommunication 10 5 474.0 ns 25.94 ns 38.82 ns 2,109,858.80 0.0100 - - 231 B
ClientServerCommunication 10 7 339.3 ns 11.75 ns 17.58 ns 2,946,996.65 0.0110 - - 224 B
ClientServerCommunication 10 10 266.1 ns 16.30 ns 24.39 ns 3,758,318.77 0.0110 - - 221 B
ClientServerCommunication 10 20 257.3 ns 18.30 ns 27.39 ns 3,886,077.14 0.0110 - - 222 B
ClientServerCommunication 10 30 253.9 ns 23.70 ns 35.47 ns 3,938,591.15 0.0100 - - 222 B
ClientServerCommunication 10 40 235.1 ns 24.71 ns 36.98 ns 4,253,283.60 0.0090 - - 223 B
ClientServerCommunication 100 1 2,646.1 ns 45.40 ns 67.96 ns 377,910.42 0.0420 0.0010 0.0010 -
ClientServerCommunication 100 3 882.0 ns 37.90 ns 56.73 ns 1,133,789.46 0.0210 - - 427 B
ClientServerCommunication 100 5 591.1 ns 25.50 ns 38.17 ns 1,691,862.31 0.0210 - - 414 B
ClientServerCommunication 100 7 420.7 ns 23.59 ns 35.31 ns 2,376,924.89 0.0200 - - 404 B
ClientServerCommunication 100 10 348.4 ns 19.28 ns 28.85 ns 2,870,038.51 0.0170 - - 401 B
ClientServerCommunication 100 20 307.0 ns 13.04 ns 19.51 ns 3,257,684.92 0.0200 0.0010 - 402 B
ClientServerCommunication 100 30 295.0 ns 15.11 ns 22.61 ns 3,389,985.80 0.0190 0.0010 - 402 B
ClientServerCommunication 100 40 295.9 ns 17.94 ns 26.85 ns 3,379,028.22 0.0180 0.0010 - 403 B

@Aaronontheweb
Copy link
Copy Markdown
Member Author

Spike: ITransportConnection with duplex pipe write path

Branch: spike/transport-connection-duplex-pipe

Explored replacing the Channel<WriteCommand> + direct stream.WriteAsync write path with a duplex pipe architecture encapsulated in a new ITransportConnection interface. The transport owns both read and write pipes plus their pump loops, handling all buffer management and flush batching internally.

What changed

  • ITransportConnection interface: WriteAsync, FlushAsync, ShutdownAsync, CloseAsync, Abort, plus ReadCompleted/WriteCompleted task observability
  • TcpTransportConnection implementation: read pump (Stream → Input pipe) + write pump (Output pipe → Stream)
  • TcpConnection simplified: removed Channel, WriteToStreamAsync, WriteBatchToStreamAsync, ArrayPool batching. Write path is now just _transport.WriteAsync(write.Data, ct) — pipe handles coalescing.

Why

NetworkStream.WriteAsync is unbuffered — every call is a sendmsg syscall. With ReadOnlySequence<byte> on the write path (needed for M4 zero-copy framing), iterating segments would mean N syscalls for N segments. The pipe-based design writes segments into the pipe buffer (memcpy) and the write pump flushes everything in one batch when it wakes up — same implicit coalescing pattern as Kestrel's DoSend loop and DotNetty's FlushConsolidationHandler.

Benchmark results (same machine, same config as PR numbers)

Clients SAEA (dev) Channel+Stream (PR) Pipe (spike) vs SAEA
1 326K 394K 372K +14%
5 1.39M 2.11M 1.90M +36%
10 2.09M 3.76M 3.17M +52%
20 2.26M 3.89M 3.39M +50%
40 2.31M 4.25M 3.60M +56%

Peak: 3.60M req/sec (10B messages, 40 clients). Allocations: 222-249B/op (down from 441-473B SAEA).

Note: significant run-to-run variance on this VM (~30%). The Channel+Stream approach benchmarked at both 4.25M and 2.87M on different runs. Dedicated hardware needed for clearer signal.

Test status

  • 42/42 TCP tests pass (23 TcpIntegration + 19 TcpSpec)
  • Full solution builds clean (0 errors, 0 warnings)

@Aaronontheweb
Copy link
Copy Markdown
Member Author

Dedicated Hardware Benchmark: Three-Way Comparison

Hardware: Intel Core i7-6700K 4.00GHz (Skylake), 4 physical / 8 logical cores, Linux Ubuntu 24.04.2
Runtime: .NET 10.0.5, ServerGC, LongRun (3 launches × 10 iterations)

All three branches benchmarked on the same dedicated machine with no other workloads — much cleaner signal than the VM runs posted earlier.

10-byte messages

Clients SAEA (dev) Channel+Stream ITransportConnection (pipe) Pipe vs SAEA
1 186K 216K 212K +14%
3 441K 601K 586K +33%
5 670K 1.02M 970K +45%
7 866K 1.36M 1.30M +50%
10 1.00M 1.73M 1.74M +74%
20 1.16M 1.97M 2.02M +74%
30 1.22M 2.07M 2.08M +70%
40 1.25M 2.07M 2.04M +63%

100-byte messages

Clients SAEA (dev) Channel+Stream ITransportConnection (pipe) Pipe vs SAEA
1 180K 209K 202K +12%
5 652K 976K 906K +39%
10 983K 1.65M 1.58M +61%
20 1.13M 1.90M 1.83M +62%
30 1.20M 1.97M 1.93M +61%
40 1.20M 1.98M 1.90M +58%

Allocations per operation (10B, 40 clients)

Design Allocated/op
SAEA (dev) 441 B
Channel+Stream 238 B
ITransportConnection (pipe) 224 B

Key Takeaway

On dedicated hardware, Channel+Stream and ITransportConnection (pipe) are within 5% of each other — statistically identical. Both deliver +63-74% over SAEA at scale with ~50% allocation reduction.

The pipe design is now merged into this branch. It replaces the Channel + direct stream.WriteAsync write path with a duplex pipe architecture encapsulated in ITransportConnection. The actor writes to the pipe (memcpy), the write pump flushes to the stream (syscall) — implicit Kestrel-style flush batching with no per-segment syscall risk for ReadOnlySequence<byte> data.

Copy link
Copy Markdown
Member Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done with review yet, but need to flush these pending

Comment thread src/benchmark/Akka.Benchmarks/IO/ByteStringBenchmarks.cs Outdated
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFrameworks>$(NetFrameworkTestVersion);$(NetTestVersion);net6.0</TargetFrameworks>
<TargetFramework>$(NetTestVersion)</TargetFramework>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.NET 10 or bust

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vast majority of "API changes" are from migrating .NET 6 -> .NET 10, but there are legitimate API changes here

{
[Akka.Annotations.InternalApi]
public readonly struct ChunkedMessage
public readonly struct ChunkedMessage : System.IEquatable<Akka.Delivery.Internal.ChunkedMessage>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to restructure this since we can't depend on ByteString's equality members to do the heavy lifting for us on equality by value any more.

LittleEndian = 1,
}
[System.Diagnostics.DebuggerDisplay("(Count = {_count}, Buffers = {_buffers})")]
public sealed class ByteString : System.Collections.Generic.IEnumerable<byte>, System.Collections.IEnumerable, System.IEquatable<Akka.IO.ByteString>
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted by design - used the compilation errors as a "to do" list for Claude, but this class has been an absolute performance pig for years and I'm not sad it's gone.

Comment thread src/core/Akka.Cluster/Serialization/ReliableDeliverySerializer.cs Outdated
Add OpenSpec CLI skills and prompts for Claude Code and GitHub Copilot
to support formal specification workflows for the Akka.NET 1.6
transport and serialization epic.
Spec 1 of the Akka.NET 1.6 transport/serialization epic. Replaces
ByteString with System.Memory types and SocketAsyncEventArgs with
Stream + System.IO.Pipelines in Akka.IO TCP actors.

Includes proposal, design, capability specs (system-memory-io,
stream-pipe-transport), and implementation task breakdown.
Spec 2 of the Akka.NET 1.6 transport/serialization epic. Adds TLS
support at the Akka.IO level via TlsStreamProvider, leveraging the
IStreamProvider abstraction from Spec 1. All existing DotNetty TLS
HOCON configuration works unchanged.
Spec 3 of the Akka.NET 1.6 transport/serialization epic. Replaces
DotNetty with an Akka.Streams TCP-based transport. Features integrated
framing + serialization via FrameBufferWriter (IBufferWriter<byte>),
binary PDU encoding, and full DotNetty HOCON config compatibility.
Spec 4: SerializerV2 with IBufferWriter<byte>/ReadOnlySequence<byte>
API, SerializerV1Adapter, MessagePackSerializer, mechanical port of
internal Protobuf serializers. Source generator deferred.

Spec 5: Performance validation using RemotePingPong benchmark. New
transport must exceed DotNetty. Covers flush batching, Pipe tuning,
buffer pool optimization, and continuous benchmark tracking.
Defines 5 sequential milestones with branches, completion criteria,
and orchestration strategy. Includes DotNetty performance baseline
(~680K msgs/sec peak on .NET 10). Each milestone is reviewed by a
human before proceeding to the next.
Opus captain orchestrates Sonnet workers through OpenSpec task lists.
Reads tasks.md, dispatches task groups, verifies builds, fixes errors
in a loop, runs tests, and archives the change on completion. Designed
to execute one milestone at a time with human review between milestones.
Branch should be created manually before invoking the program,
not by the orchestrator itself.
- Replace netstandard2.0 + net6.0 multi-targeting with net10.0 only
- Remove NetLibVersion and NetFrameworkTestVersion from Directory.Build.props
- Remove all netstandard-conditional ItemGroup blocks from csproj files
- Remove Polyfill package references (no longer needed on net10.0)
- Remove BCL packages now included in net10.0 (System.Collections.Immutable, etc.)
- Suppress SYSLIB0050/0051 obsolete serialization warnings
- Update OpenSpec files to reflect net10.0 decision
- Update OpenProse milestone-runner to commit after each task group
- Change Tcp.Write.Data and Tcp.Received.Data from ByteString to ReadOnlyMemory<byte>
- Update all Write.Create() factory overloads for ReadOnlyMemory<byte>
- Delete Akka.Util.ByteString class entirely
- Move ByteOrder enum to ByteHelpers.cs
- Fix TcpConnection send/receive to work with ReadOnlyMemory<byte>
- Remove ByteString-based SocketAsyncEventArgs extensions
- Migrate Udp and UdpConnected message types to ReadOnlyMemory<byte>
- Fix ChunkedMessage and ProducerController/ConsumerController delivery code
- Akka.csproj builds with 0 errors, 0 warnings
Akka.Streams:
- Replace all ByteString with ReadOnlyMemory<byte> in DSL, TcpStages,
  FileIO, StreamConverters, Framing, JsonFraming
- Rewrite DelimiterFramingStage and LengthFieldFramingStage for Memory<byte>
- Rewrite JsonObjectParser for ReadOnlyMemory<byte> with Span-based access
- Update IOSources, IOSinks, FilePublisher/Subscriber, InputStreamPublisher

Akka.Remote:
- Fix ByteOrder ambiguity in DotNetty transport settings (DotNetty vs Akka.Util)
- Most ByteString references were already Google.Protobuf.ByteString (unchanged)

Akka.Cluster:
- Update ReliableDeliverySerializer for ReadOnlyMemory<byte> ChunkedMessage

All library projects build with 0 errors, 0 warnings.
- Create IStreamProvider interface with ConnectAsync/Close contract
- Create TcpStreamProvider: plaintext NetworkStream from connected Socket
- Update TcpOutgoingConnection to accept IStreamProvider (defaults to TcpStreamProvider)
- Update TcpListener to wrap accepted sockets in NetworkStream
- Update TcpIncomingConnection to accept Stream parameter
- Stream+Pipe usage deferred to Task Group 4; existing SAEA code untouched
Replace SocketAsyncEventArgs-based I/O with three background tasks
coordinated through the actor mailbox (TurboMQTT pattern):

- ReadFromStreamAsync: stream.ReadAsync → PipeWriter with backpressure
- ReadFromPipeAsync: PipeReader → byte[] copy → Tcp.Received delivery
- WriteToStreamAsync: Channel<WriteCommand> → stream.WriteAsync → ACK

Flow control:
- SuspendReading/ResumeReading via SemaphoreSlim gate
- Pull mode: auto-suspend after each Tcp.Received delivery

Shutdown sequences:
- Tcp.Close: flush writes → cancel reads → close stream → Tcp.Closed
- Tcp.Abort: immediate CTS cancel → RST → Tcp.Aborted
- Tcp.ConfirmedClose: FIN → await peer FIN → Tcp.ConfirmedClosed
- EOF: PipeWriter complete → Tcp.PeerClosed
- I/O error: Tcp.ErrorClosed with cause message

Lifecycle: Task.WhenAll tracking, Interlocked.CompareExchange CTS guard,
self-tell before caller-tell ordering.

Buffers/ and SocketEventArgsPool retained for UDP usage.
- Fix ~198 ByteString compilation errors across 7 test/benchmark projects
- Akka.Streams.Tests: migrate TcpSpec, FileSourceSpec, JsonFramingSpec,
  InputStreamSourceSpec, OutputStreamSinkSpec, FlowGroupBySpec, BugSpec, etc.
- Akka.Streams.Tests.TCK: update ByteString type references
- Akka.Docs.Tests: update TelnetClient, EchoConnection, StreamTcpDocTests
- Akka.Benchmarks: delete ByteStringBenchmarks.cs, fix TcpOperationsBenchmarks
- Akka.Cluster.Tests: fix ReliableDeliverySerializerSpecs
- Akka.Tests: fix remaining ByteString references
- Full solution builds with 0 errors, 0 warnings
…ckets

Major fixes to the Stream+Pipe TcpConnection rewrite:

- Remove SemaphoreSlim gate — replace with actor-driven PipeTo pattern
  for pipe reads. All flow control is now in the actor's message loop,
  no cross-thread synchronization needed.
- Fix TcpOutgoingConnection Props creation (reflection needs explicit args)
- Set Socket.Blocking=true before wrapping in NetworkStream
- Fix EOF detection: detect from PipeReader.IsCompleted instead of racing
  StreamEof self-tell with buffered pipe data
- Pull mode works correctly: each ResumeReading triggers one pipe read

12/22 TcpSpec tests passing (7 remaining are close/abort/error shutdown)
- Never complete PipeWriter with exception (causes PipeReader.ReadAsync to
  throw, bypassing actor message loop and losing buffered data)
- Wrap RequestPipeRead in try-catch for defense in depth
- Handle cancelled pipe reads (send IoTaskFailed instead of silent drop)
- Track IoTasksCompleted with boolean flag to prevent message loss across
  behaviour transitions
- Add TryFinishClose() for coordinated shutdown readiness checks
- Add StreamEof handler to PeerSentEofBehaviour (prevent dead letters)
- Fix HandleConfirmedClose to always wait for WritesFlushed
- Add volatile _readStreamHasError flag for cross-thread error detection
- Add drain read when pipe completes with buffered data

18/19 TcpSpec tests passing (1 intermittent in batch, 3 pre-skipped)
- ChunkedMessage: implement IEquatable with Span.SequenceEqual for
  ReadOnlyMemory<byte> content equality (ReliableDeliverySerializer tests)
- TcpOutgoingConnection: pass DnsEndPoint directly to Socket.ConnectAsync
  for dual-stack sockets instead of manual resolution (DNS endpoint test)
- DotNetty TLS tests: handle TargetInvocationException wrapping and
  CryptographicException from .NET 10's X509CertificateLoader
- DeltaPropagationSelector: clamp Slice length to prevent
  ArgumentOutOfRangeException in round-robin propagation
- Revert dual-stack DnsEndPoint passthrough to Socket.ConnectAsync;
  always use Akka's async DNS resolver for consistent cross-platform
  behavior (Windows DNS timeout for unresolvable hosts is too slow)
- Update CoreAPISpec approval baselines for all public API changes
  (ByteString removal, ChunkedMessage IEquatable, IStreamProvider, etc.)
- Remove System.Runtime.Loader package (unnecessary on net10.0)
- Replace ByteString.FromString with Encoding.UTF8.GetBytes in Executor
- Fix TcpLoggingServer: decode Tcp.Received.Data to string via UTF8,
  use .Length instead of .Count on ReadOnlyMemory<byte>
- Fix nullable reference warnings in MultiNodeTestCase
- Akka.Cluster.TestKit.Xunit2: change TargetFrameworks (with undefined
  NetLibVersion) to single TargetFramework using NetStandardLibVersion
- Akka.Remote.TestKit.Xunit2.Tests: same fix for NetFrameworkTestVersion
- Update nuspec templates: netstandard2.0 -> net10.0 publish paths
The Xunit2 multi-node test adapter hardcoded inclusion of
xunit.runner.utility.netstandard15.dll which doesn't exist on net10.0.
This was a netstandard-era shim no longer needed.
@Aaronontheweb Aaronontheweb force-pushed the feature/spec1-modernize-akka-io-tcp branch from d4b00f6 to 62bae43 Compare May 9, 2026 22:57
Use ByteString.Memory directly instead of ToByteArray().AsMemory() in
the SequencedMessage and DurableQueue.MessageSent chunk paths to skip
an unnecessary byte[] allocation and copy. Also modernize the spec data
to collection expressions and u8 string literals.
Copy link
Copy Markdown
Member Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all of it and overall it looks very good, but:

  1. Why not ReadOnlySequence<T> instead of ReadOnlyMemory<T> - this would make it possible to do things like frame-length encoding without additional byte allocations, which is going to be a hotpath scenario for writes.
  2. There seems to be a bit of buffer-copying going on in a few places where it shouldn't be necessary any longer. All of those call sites are going to need to be revisited individually.

- script: dotnet slopwatch analyze -d . --fail-on error --stats
displayName: 'Run Slopwatch Analysis'

- template: azure-pipeline.template.yaml
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not doing .NET Framework CI/CD any longer

return null;
}

json = await response.Content.ReadAsStringAsync(ct);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change shouldn't have been made - for starters, we should never really call ConfigureAwait(false) on tasks executing inside an actor. Second, we're not adding .NET Standard 2.1 support so this don't make any sense - also, why would we discard the CancellationToken usage there?

var i = (int)(_deltaNodeRoundRobinCounter % all.Length);
slice = all.Slice(i, sliceSize).ToImmutableArray();

var remainingFromI = all.Length - i;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a bounds checking safety issue here but I need to double check if this impacts the math around this adversely - my understanding was that if sliceSize is greater than the length of the array it safely truncates, but I am probably mistaken.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine

new IPEndPoint(resolved.Ipv6.First(), remoteAddress.Port));
else // one or the other
Register(new IPEndPoint(resolved.Addr, remoteAddress.Port), null);
// Pass DnsEndPoint directly to Socket.ConnectAsync — the runtime
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

welp, that certainly eliminates a lot of code....

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/core/Akka/IO/Udp.cs
protected SocketCompleted(SocketAsyncEventArgs eventArgs)
{
Data = ByteString.CopyFrom(eventArgs.Buffer, eventArgs.Offset, eventArgs.BytesTransferred);
var copy = new byte[eventArgs.BytesTransferred];
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ugly and I'm not sure how necessary it is

protected SocketCompleted(SocketAsyncEventArgs eventArgs)
{
Data = ByteString.CopyFrom(eventArgs.Buffer, eventArgs.Offset, eventArgs.BytesTransferred);
var copy = new byte[eventArgs.BytesTransferred];
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, kind of ugly and I'm not sure how necessary the copy is here. Probably is.

.ContinueWith(_ => { }, cancellationToken, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
}

#if NETSTANDARD2_1
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More possibly dead code for backporting - we'll evaluate it later.

- ClusterClientDiscovery: drop NETSTANDARD2_1 conditional, remove
  ConfigureAwait(false) (inappropriate inside actor), and pass the
  CancellationToken through the single ReadAsStringAsync call.
- AkkaAssertEqualityComparerAdapter: drop NETSTANDARD2_0/2_1 branch;
  project targets net10.0 only and Nullable is enabled.
- Akka.Streams.TestKit/TestUtils: delete the NETSTANDARD2_1
  TaskWaitAsyncPolyfill block (no consumers; BCL provides WaitAsync).
- TcpIntegrationSpec: TODO note pointing at issue akkadotnet#8178 for the DNS
  resolution path; full fix is a separate PR per reviewer.
…ySequence<byte>

Replace ReadOnlyMemory<byte> with ReadOnlySequence<byte> across the public
TCP and Streams DSL surface so the read path can flow non-contiguous buffers
end-to-end without forcing flatten copies in downstream framing stages.

Public API changes:
- Tcp.Received.Data is now ReadOnlySequence<byte> (matches Tcp.Write.Data).
- Akka.Streams Tcp DSL (IncomingConnection.Flow, BindAndHandle,
  OutgoingConnection) now emits/consumes ReadOnlySequence<byte>.
- Framing.Delimiter, Framing.LengthField, SimpleFramingProtocol*,
  JsonFraming.ObjectScanner all flow ReadOnlySequence<byte>.
- FileIO.FromFile/ToFile, StreamConverters.* now use ReadOnlySequence<byte>.
- JsonObjectParser.Offer/Poll signatures updated; the empty-buffer fast path
  preserves the original zero-copy behaviour for single-segment inputs.

Internal:
- TcpConnection still copies pipe segments into a byte[] before delivery
  (Tell is non-blocking; eliminating the copy requires an ack protocol that
  is out of scope), but the result is wrapped as ReadOnlySequence<byte> so
  downstream stages can chain segments without further flattening.
- IO sinks (FileSubscriber, OutputStreamSubscriber) iterate sequence
  segments instead of using a single .Span path.
- Framing and JsonObjectParser keep their existing per-frame allocation
  footprint via bridge code; the SequenceReader-based rewrite that
  eliminates Concat / ToArray() compactions ships in the next commit.

Tests, benchmarks, docs, and examples updated to construct
ReadOnlySequence<byte> instances and consume via ToArray()/FirstSpan/CopyTo
extension methods.

Build: 0 errors, 0 warnings across the full solution.
Replace the byte[]-allocating concat helpers and defensive ToArray()
compactions with zero-copy ReadOnlySequence<byte> slicing and segment
chains. The framing/parsing hot path no longer allocates merge buffers
or compact copies on every frame; only the segment node objects when
multiple inputs need to be chained.

- Add Implementation/BufferSegment.cs: small ReadOnlySequenceSegment<byte>
  helper with a Concat method that links existing segments without
  copying data.
- Framing.cs:
  - Concat now delegates to BufferSegment.Concat (zero data copy).
  - SimpleFramingProtocolEncoder pushes header+payload as a chained
    sequence instead of allocating a combined byte[].
  - DelimiterFraming.IndexOf / HasSubstring use SequenceReader<byte>
    so multi-segment inputs scan without materialization.
  - DelimiterFramingStage and LengthFieldFramingStage drop the
    parsedFrame.ToArray() and _buffer.ToArray() defensive copies; slicing
    a ReadOnlySequence is a struct view, consumed segments fall out of
    scope when _buffer is reassigned.
- JsonObjectParser.cs:
  - Internal storage is now ReadOnlySequence<byte>; multi-Offer scenarios
    chain segments via BufferSegment.Concat (no merge byte[]).
  - SeekObject scans via SequenceReader<byte> (no per-byte materialization).
  - Poll slices the buffer without compacting the remainder.

Tests: Framing/JsonFraming spec suite passes (44/44); Akka.Tests IO
suite passes (51/51). Full solution still builds clean.
… prefixes; refresh API baselines

- Restore the ASCII phase diagram at the top of TcpConnection.cs reflecting
  the post-rewrite states: Connecting → AwaitReg → Open → (PeerSentEof |
  Closing) → Closed. Map each state to its Become(...) handler.
- Group transient connection-state flags (_peerClosed, _outputShutdown,
  _keepOpenOnPeerClosed, _closingGracefully, _readPumpCompleted,
  _readPumpHasError, _readPumpError) inside a #region with a paragraph
  documenting the close-handshake invariants — what each flag means, when
  it is set, and how the read/write paths gate on combinations.
- Strip the redundant "[TcpConnection] " prefix from all Log.Debug calls.
  The log source already carries the actor path, which identifies the
  connection unambiguously.
- Refresh the Akka.Streams and Akka core API approval baselines for the
  ReadOnlyMemory<byte> → ReadOnlySequence<byte> public surface changes.

Tests: TcpIntegrationSpec + TcpConnectionBatchingSpec all green (24/24).
API approval tests pass.
@Aaronontheweb
Copy link
Copy Markdown
Member Author

Benchmark update — three-way comparison after folding in the ROS migration work

Following review feedback, the ByteString → ReadOnlyMemory migration originally in this PR has been extended to ReadOnlySequence with an accompanying perf rewrite of Framing + JsonObjectParser. The new commits on this branch (dd14510d through baeb1364) add:

  • Tcp.Received.Data flips from ReadOnlyMemory<byte>ReadOnlySequence<byte> (matches Tcp.Write.Data)
  • All Akka.Streams DSL stages that emitted ReadOnlyMemory<byte> now emit ReadOnlySequence<byte>
  • Framing.Concat replaced with a zero-copy BufferSegment.Concat segment chain (no merge byte[] alloc)
  • Defensive parsedFrame.ToArray() / _buffer.ToArray() / emit.ToArray() compactions removed across both Framing stages and JsonObjectParser
  • JsonObjectParser switched to a SequenceReader<byte>-driven scan
  • Reviewer cleanups (dead NETSTANDARD2_1 blocks, ConfigureAwait(false) regression in ClusterClientDiscovery, restored TcpConnection state-machine doc, stripped redundant [TcpConnection] log prefixes)

Backup of the pre-merge branch state preserved at https://github.com/Aaronontheweb/akka.net/tree/backup/spec1-modernize-akka-io-tcp-pre-ros (commit f2e80b3d8) in case rollback is needed.

Benchmarks

A new BenchmarkDotNet FramingBenchmarks was added in #8202 (against dev), then ported (with type adaptations) to each branch and run in succession on the same hardware in one sitting.

Hardware: AMD Ryzen 9 9900X (Zen 5, 12C/24T), 64 GB RAM, Linux Ubuntu, .NET 10.0.7, ServerGC, BDN default Job. Three branches benchmarked:

  • dev — ByteString + SAEA TCP
  • rom — this PR before the ROS work (ByteString → ReadOnlyMemory, Stream+Pipe TCP)
  • ros — this PR after the ROS work (ReadOnlyMemory → ReadOnlySequence + Framing/JsonObjectParser perf rewrite)

Suite A — Framing allocations (bytes per framed message)

MessageCount = 100,000 per iteration; Allocated column = bytes-per-framed-message.

Benchmark MessageSize dev (ByteString) rom (ROM) ros (ROS) dev → ros
LengthField_Encode 64 489 B 521 B 553 B +13%
LengthField_Encode 1024 489 B 1481 B 553 B flat / vs ROM −63%
LengthField_Decode 64 792 B 680 B 632 B −20%
LengthField_Decode 1024 1752 B 2600 B 632 B −64% / vs ROM −76%
Delimiter_Decode 64 449 B 481 B 425 B −5%
Delimiter_Decode 1024 449 B 1441 B 425 B −5% / vs ROM −70%
JsonFraming_MultiChunk 64 1302 B 1409 B 1644 B +26%
JsonFraming_MultiChunk 1024 1298 B 3954 B 1627 B +25% / vs ROM −59%

Key observations:

  1. The ROM-only intermediate state was a measurable allocation regression vs dev for any non-trivial payload size. ReadOnlyMemory<byte> lacks ByteString's rope-like internal structure, so every Concat allocates a fresh byte[] of the combined size. At 1 KB messages this showed up as 449 B → 1441 B (delimiter), 489 B → 1481 B (encode), and 1752 B → 2600 B (decode).

  2. The ROS migration recovers the regression and improves on dev. With ReadOnlySequence<byte> + segment-chain concat + dropped defensive compactions, allocations become flat (constant) regardless of payload size — 553 B / 632 B / 425 B across the LengthField + Delimiter paths.

  3. JsonFraming on ROS is ~25% more allocations than dev (constant ~340 B/object overhead from BufferSegment node allocations along the multi-chunk chain). Still ~60% better than the rom intermediate. Follow-up opportunity: segment node pooling could close that gap.

Suite A — Framing throughput (req/sec)

Benchmark MessageSize dev rom ros dev → ros
LengthField_Encode 64 5.28M 5.30M 4.96M −6%
LengthField_Encode 1024 4.73M 3.86M 4.72M flat
LengthField_Decode 64 3.45M 4.18M 3.07M −11%
LengthField_Decode 1024 2.29M 2.80M 3.47M +52%
Delimiter_Decode 64 4.60M 5.12M 4.88M +6%
Delimiter_Decode 1024 1.88M 2.15M 4.17M +122%
JsonFraming_MultiChunk 64 1.17M 1.45M 1.05M −10%
JsonFraming_MultiChunk 1024 0.21M 0.31M 0.24M +14%

At 1 KB the ROS rewrite delivers +122% delimiter throughput and +52% length-field decode throughput vs dev. At 64 B the picture is mixed — SequenceReader<byte> setup adds fixed overhead that bigger messages amortize.

Suite B — TCP (TcpOperationsBenchmarks)

Spot checks at representative cells:

Allocations per echo round-trip (10 B, 40 clients):

dev rom ros
450 B 227 B 277 B
  • ros vs dev: −38%

Throughput (10 B, 30 clients, peak cell):

dev rom ros
5.58M 5.05M 5.70M
  • ros vs dev: +2%, vs rom: +13%

Throughput (10 B, 1 client, latency-bound):

dev rom ros
721K 864K 873K
  • +21% over dev at minimum concurrency

The TCP path is essentially unchanged between rom and ros (expected — the SAEA → Stream+Pipe rewrite was already in rom, and ROS only changes the buffer type). High-concurrency cells (20–40 clients) show large StdDev / median–mean divergence on all three branches — likely loopback TCP buffer / kernel scheduler saturation rather than CPU.

Bottom line

  • The ROM-only state of this PR was a framing allocation regression that would have shipped if merged as-is.
  • With the ROS migration folded in, framing allocations go flat-or-better than dev at scale, and throughput at 1 KB jumps +50–120%.
  • TCP path retains its existing wins from rom (−38% allocations, +21% latency-bound throughput vs dev).

Files / artifacts

Caveats

  • BDN default Job, not LongRun. Numbers are clean for framing (low StdDev); 20–40 client TCP cells are noisier and would benefit from --launchCount 3 to tighten.
  • Ryzen 9 9900X is faster than the i7-6700K used for the original PR's reference benchmarks — relative deltas compress on faster hardware. The 1 KB framing wins are the most-defensible signal because per-message work dominates fixed overhead.

@Aaronontheweb Aaronontheweb changed the title WIP: Modernize Akka.IO TCP (ByteString removal, Stream+Pipe, IStreamProvider) + Migrate to .NET 10 TFM Modernize Akka.IO TCP: ByteString removal, Stream+Pipe transport, ReadOnlySequence<byte> migration, .NET 10 TFM May 10, 2026
@Aaronontheweb Aaronontheweb marked this pull request as ready for review May 10, 2026 15:00
The ByteString-flavored FramingBenchmarks.cs came in via the dev → PR
merge (akkadotnet#8202 landed first on dev). That branch retired ByteString as
part of this PR, so the benchmark fails to compile here. Swap to the
ReadOnlySequence<byte> variant that matches the post-migration types
on this branch.

Same benchmark structure and parameters as the dev version (deferred-
source pattern, MessageCount=100K, MessageSize ∈ {64, 1024}); only the
buffer-type construction and consumption sites differ.
Copy link
Copy Markdown
Member Author

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - numbers and design look good.

@Aaronontheweb Aaronontheweb enabled auto-merge (squash) May 10, 2026 15:34
@Aaronontheweb Aaronontheweb added the akka.net v1.6 Akka.NET v1.6-related issues label May 10, 2026
@Aaronontheweb Aaronontheweb added this to the 1.6.0 milestone May 10, 2026
@Aaronontheweb Aaronontheweb merged commit 7f120d1 into akkadotnet:dev May 10, 2026
11 checks passed
@Aaronontheweb Aaronontheweb deleted the feature/spec1-modernize-akka-io-tcp branch May 10, 2026 16:20
@to11mtm
Copy link
Copy Markdown
Member

to11mtm commented May 10, 2026

My biggest worry in this, (Outside of, we handle DisposeAsync properly in actor lifecycle right?) is that there may (or may not!) be other libraries related to Akka.NET that use ByteString. I'm not aware of any (off the top of my head, anyway) but any such libraries will be broken. Same for user code and will require refactoring/etc.

Suggested mitigations:

Option 1: Keep Bytestring, adapt #7491 as a bridge

Option 2: Provide a version of ByteString that basically acts as a fancy ReadOnlySequenceBuilder

Option 3: ??? (happy for ideas...)

Main concern is making sure devs are able to more easily migrate to 1.6 without having to fight tech debt cost factor in planning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants