Document: P4003R1
Authors: Vinnie Falco, Steve Gerbino, Mungo Gill
Date: 2026-03-31
Audience: LEWG
Link: wg21.link/p4003r1
New paper from Vinnie Falco (of Boost.Beast fame) proposing the IoAwaitable protocol as the standard vocabulary for coroutine-native byte-oriented I/O. The pitch is basically: auto [ec, n] = co_await socket.read_some(buf); - and then only enough infrastructure to make that line work.
The paper positions this as a "third way" - distinct from both the Networking TS executor model and sender/receiver (P2300). It proposes an io_env struct (executor, stop token, frame allocator), a two-argument await_suspend as the core protocol, and a type-erased executor_ref. Claims to complement std::execution rather than compete with it.
Notable: SG14 has formally recommended this direction in P4029R0, saying networking should not be built on P2300. There's also a companion design rationale paper (P4172R0), shipping implementations on three platforms (Capy + Corosio), and a field experience report from a derivatives exchange (P4125R1).
The paper asks for floor time in LEWG and proposes five straw polls, starting with "is this a distinct approach" and ending with "advance IoAwaitable."
Reminder: be civil. Paper authors sometimes read these threads. Argue the technical merits, not the people.
can we please just get networking in the standard before I retire
bold of you to assume C++ developers ever retire. we just move to management and stop writing code, which is basically the same as dying
my grandchildren will inherit my half-finished networking proposal along with my student loans
i've been hearing "networking next cycle" since I was an intern. I have grey hair now.
the absolute state of C++ async:
2005: "we need networking"
2012: "we need executors first"
2017: "we need sender/receiver first"
2021: "sender/receiver covers networking"
2023: "actually only sender/receiver for networking"
2026: "actually there's a third way"
2029: "we need to unify the three ways first"
2032: networking
to be fair the timeline is more like:
2005: "we need networking"
2024: "we have std::execution"
2026: "we still need networking"
the
connect/startdance in P2300 makes me feel like I'm writing Java with extra template instantiationsOk, I actually read the paper. Twice. Here's where I landed.
The comparison table in Section 6 is the strongest argument in the paper and I'm surprised it's buried that deep. The key row is
await_ready can return true- for awaitables yes, for senders "structurally impossible." That's not a minor difference. If your I/O completes synchronously (data already in the kernel buffer, say), the awaitable path can skip the suspension entirely. The sender path throughconnect/startcannot. For high-throughput byte streams where you're reading small chunks in a loop, that adds up.The executor concept is basically Asio with
coroutine_handle<>replacing completion handlers. That's... not a bad thing? The Networking TS executors were rejected partly because the completion handler model was clunky for coroutines. This strips it down todispatch(continuation&)which returns a handle for symmetric transfer. Clean.What I'm less convinced by:
1. The "structurally impossible" claim. It's true under P2300R10's current architecture, but the paper presents it as a fundamental limitation rather than an implementation choice. A sender could potentially define an
await_ready-like fast path if the design cared to. The paper is right that P2300 doesn't, but "impossible" is doing a lot of heavy lifting.2. The frame allocator via
get_current_frame_allocator()/set_current_frame_allocator(). The paper doesn't say thread-local but that's the obvious implementation. What happens when you have work-stealing schedulers moving coroutines between threads? The "safe_resume" protocol is mentioned but deferred to P4172R0.3. "The two belong in the standard together." This is the hard part. Not technically - conceptually. How does a user know when to reach for
co_await socket.read_some(buf)vsexecution::then(async_read(socket, buf), ...)? The paper says byte I/O is sequential and execution is DAG-shaped. Real programs aren't that clean.That said - the code speaks for itself. Three platforms, a Compiler Explorer demo, and a derivatives exchange using it. That's more implementation experience than most papers bring to LEWG.
This. I've measured this in production. With io_uring, you get synchronous completions on ~40% of reads when the buffer is warm. Skipping the suspend/resume round trip on 40% of operations is material.
Let me push back on the "implementation experience" framing.
The Kona 2023 SG4 poll was consensus: networking should support only a sender/receiver model. 5-5-1-0-1. This paper's Section 7 acknowledges this and then says "coroutine-native I/O wasn't among the alternatives considered." That's technically true but politically naive.
You can't keep showing up with a third option every time the committee makes a directional decision. At some point the answer has to stick or the committee can't make progress on anything.
Poll 1 asks "is this a distinct approach." Obviously yes. That doesn't mean the committee is obligated to re-open the question.
I hear you on process stability. But the Kona poll was between the Networking TS executor model and sender/receiver. If a genuinely new option with shipping code shows up, "we already decided between two different things" isn't a great reason to refuse to look at it.
The question isn't whether P4003 is distinct - that's a tautology. The question is whether the coroutine model is better enough for byte I/O specifically to justify the cost of reconsidering. The Section 6 table is that argument.
"Better enough" is a judgment call that depends on how much you value committee bandwidth. There's a reason the direction group exists. But fine, I'll concede that if LEWG gives it floor time and the polls in Section 7 pass, that IS the process working. I just don't want to pretend this is costless.
From the networking side - the
continuationstruct with the intrusivenextpointer is a nice touch. Anyone who's done epoll-level work knows that per-operation allocation kills you under load. The fact thatdispatchreturns acoroutine_handle<>for symmetric transfer means you can chain operations without touching the allocator at all. That's the kind of detail that says "someone actually wrote a reactor."Firmware dev here. I opened this paper because of the frame allocator section and it didn't disappoint.
Section 3.5 numbers: MSVC with a recycling allocator hits 1265ms vs 3926ms with
std::allocator. That's 3.1x. Not a micro-benchmark curiosity. That's the difference between "coroutines are too expensive" and "coroutines are fine actually."The protocol propagates the allocator to every frame in the chain automatically. No
allocator_arg_tin every signature. That's the right call.We can't use coroutines right now because the frame allocation cost is unpredictable. If this protocol standardizes a way to plug in a pool allocator at launch and have it propagate silently, that changes the calculation. Not every embedded target can afford
new/deleteper coroutine frame, but a pre-allocated pool? That works.I'll be watching what happens with
get_current_frame_allocator()though. The paper is cagey about storage mechanism. Thread-local is fine for our use case (single-threaded event loop) but I know the multi-threaded crowd will have opinions.have you considered that coroutine frames could be constexpr allocated in C++29
The networking story is similar. Our production server does ~50k connections with coroutine-per-connection. Without a frame recycler, we were hemorrhaging memory from fragmentation. With one, steady state dropped 60%. The 3.1x figure tracks.
The key insight in the paper is that frame allocators are a first-class concern, not an afterthought you bolt on. Most coroutine tutorials pretend
operator newdoesn't exist.what exactly is a "recycling" frame allocator? like it reuses the same memory?
Exactly that. When a coroutine completes and its frame is freed, the allocator holds onto that memory instead of returning it to the OS. Next coroutine of a similar size gets the same block. Zero fragmentation in the steady state. Works like a freelist.
Edit: the Capy implementation is in the repo if you want to look at concrete code. github.com/cppalliance/capy
Let me push on the "structurally impossible" claim in Section 6.
The paper says this is because
connect(sndr, rcvr)creates an operation state that must bestart()-ed. But there's nothing stopping a sender-based wrapper from checking whether the operation has already completed before suspending. You could have anas_awaitablethat callsconnect+start, and if the receiver is synchronously signaled, sets a flag that makesawait_readyreturn true.That's not "structurally impossible." It's "not how P2300R10 is currently specified" and "would require implementation effort." Those are different.
Edit: Actually, re-reading more carefully - the paper's point might be that
start()is a void-returning function that signals completion through the receiver, not as a return value. So you'd need the receiver to set a thread-local flag or something. Which... is exactly what the paper is doing with frame allocators. Hmm.Edit2: OK I think the accurate statement is "structurally more expensive, not structurally impossible." The synchronous path exists but costs you the
connect+startoverhead plus a flag check. For byte I/O in a tight loop that matters. Conceding the point partially.Your Edit2 is right. Here's what the hot path actually looks like:
When you're doing 100k reads/sec on a hot TCP connection, the op_state construction on every operation is the overhead.
"Edit2: OK I think the accurate statement is..."
character development
It's about the steady state. Individual operations are nanoseconds either way. But coroutine byte I/O in a loop - read, process, write, repeat - the awaitable path eliminates the per-operation ceremony. Over millions of operations that's your budget.
For anyone who wants to see the actual protocol without reading the full paper, here's the minimum viable IoAwaitable:
The two-argument
await_suspendis the whole protocol. Theio_env*gives you executor, stop token, and frame allocator. That's it. Everything else in the paper is infrastructure around this one signature.Godbolt demo from the paper: godbolt.org/z/Wzrb7McrT
wait this actually looks... nice? like, significantly nicer than the sender equivalent? what's the catch?
A godbolt link is not a proposal. The catch is that getting from "look at this cool code" to standardized wording takes years of committee work. The paper itself acknowledges this by asking for floor time rather than proposing wording.
Process analysis of the straw polls in Section 7:
Polls 1-2 are the gates. If the committee doesn't agree that this is a distinct approach AND that new research warrants consideration, polls 3-5 never happen.
Poll 1 is straightforward - yes, coroutine-native I/O is distinct from both the Networking TS executor model and S/R. Nobody's disputing that.
Poll 2 is the real fight. "New research... warrants consideration" is polite for "please re-open a question you thought was settled." The Kona SG4 poll was consensus. Some committee members will view this as relitigating. Others will say three platforms + production deployment + SG14 backing is exactly the kind of new information that justifies revisiting.
Polls 3-5 are technical. Frame allocator propagation, separate compilation, advancing the protocol. These are the easy votes IF polls 1-2 pass.
My prediction: polls 1-2 get rough consensus but with real SA votes. The old guard who drove the Kona decision aren't going to be happy. This will be a contentious session.
Accurate read. The SG14 backing (P4029R0) will carry weight though. When your low-latency constituency formally says "don't build networking on P2300," that changes the calculus.
I just want to point out that we are in year 21 of the C++ networking saga and the current state of the art is five competing proposals, three history papers explaining why we have five competing proposals, and a companion paper explaining the companion paper to the paper explaining the proposal.
The paper itself says P0592R0 listed networking as a C++20 priority. C++20 shipped five years ago. We're talking about C++29 now.
I'm going to go write a Python script that does what this paper does in 3 lines and then cry into my CMakeLists.txt.
remember when the biggest controversy was whether
autoshould be allowed in function signatures? simpler times.at this rate the heat death of the universe will arrive before
std::socketAs someone who used Beast heavily before switching shops - Falco knows the networking space. Beast is battle-tested in production at scale. The
execution_context+ service model in Section 4 is clearly descended from Asio, which is both a strength (proven design) and a weakness (some committee members have Asio fatigue).The real question is whether the committee can evaluate this on technical merit or whether it'll get caught in the political crossfire between the P2300 camp and the "just ship the Networking TS already" camp. This paper tries to position itself as a "third way" but the executor model is close enough to Asio that opponents will call it Asio-with-coroutines.
Which... it kind of is? And that might be fine?
Asio-with-coroutines minus completion handlers minus strand complexity minus the io_context-executor coupling. The executor concept in Section 4.3 is dramatically simpler than Asio's. But yeah, the lineage is obvious and the committee has history with Asio.
Meanwhile in Rust:
One line. No concepts. No frame allocators. No executor model. No companion paper. No twenty-year standards process.
I'm not saying "just use Rust" but I am gesturing in its general direction.
We get it. Rust exists. Every single thread.
The Rust comparison is actually interesting here though. Rust's async is essentially the same model -
.awaitsuspends the future, the executor decides when to poll it. The difference is Rust baked the executor abstraction into the ecosystem early (tokio won) instead of standardizing it. C++ is trying to standardize first and that's why we're in year 21.[removed]
what did they say?
something about how C++ should just adopt Rust's borrow checker. you know, the usual.
[deleted]
🚀 10x Your C++ Skills! Advanced coroutine masterclass - only $499! Use code NETWORKING for 20% off! www.definitely-not-a-scam.io/cpp-masterclass 🚀