Document: P4003R0
Authors: Vinnie Falco, Steve Gerbino, Mungo Gill
Date: 2026-02-22
Audience: LEWG
A proposal for a coroutine-native I/O protocol built from working code. The core is two concepts - IoAwaitable and IoRunnable - a type-erased executor, and a thread-local frame allocator. The paper argues that five C++20 coroutine properties (type erasure through coroutine_handle<>, promise customization, stackless frames, symmetric transfer, and compiler-managed state) together make coroutines the optimal basis for byte-oriented I/O.
Implementation experience comes from Boost.Capy and Corosio - a complete networking stack (sockets, timers, DNS, TLS, HTTP) on multiple platforms. Frame allocator recycling shows 3.1x speedup over std::allocator on MSVC. Same author line as P4007R0 and P4014R0.
Paper: P4003R0 · Target: LEWG · Type: Proposal
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.
This is the sentence I've been waiting to read in a WG21 paper for six years. Not "we think this could work." Not "future work includes implementation." They built the thing, benchmarked it, and now they're telling us what they found. More papers should work this way.
The Boost.Beast guy built a networking library. This is not his first rodeo. The question is whether the committee will listen to someone who has shipped production I/O code, or whether the debate will stay theoretical.
The core design decision is the two-argument
await_suspend:Standard awaitables take one argument. This takes two - the second carries the executor, stop token, and frame allocator. The upside: compile-time protocol checking. If you
co_awaitan IoAwaitable from a non-conforming promise, you get a compile error, not a silent runtime bug. The downside: IoAwaitables don't compose with standard awaitables. You're in the IoAwaitable world or you're not.The paper addresses this explicitly - the static_assert in
await_transformcatches mismatches at the right point. It's a deliberate walled garden with a clearly marked door.The walled garden is the problem. If I have an existing awaitable library that works with
task<T>,lazy<T>, or any other coroutine type, it won't work with IoAwaitable tasks without modification. Every awaitable in the ecosystem needs a second overload ofawait_suspendto participate.The paper compares this to the alternative - templating on the promise type and extracting the environment. That approach compiles silently with non-conforming awaitables and produces runtime bugs. The two-argument form is a forced opt-in. You either implement the protocol or you don't. No middle ground.
Is that the right tradeoff? Depends on whether you value interop with the existing awaitable ecosystem or compile-time correctness guarantees. The paper chose correctness.
The frame allocator uses thread-local storage for propagation. Section 5.5 addresses the concerns - fibers, thread migration, M:N schedulers - and argues the window is narrow (set before
operator new, cleared after allocation). But TLS for a critical-path resource makes me nervous.What happens when someone runs this on a fiber-based scheduler that multiplexes coroutines across threads? The paper says "the thread-local is written immediately before the coroutine allocation and read immediately during allocation" so the window is tiny. But tiny is not zero.
The alternative is putting the allocator in the coroutine signature - which makes every
io_taska template on the allocator type. The paper explicitly rejects this. Ergonomics win.task<T>nottask<T, Alloc>.The 3.1x speedup over
std::allocatoron MSVC is real. Frame sizes repeat and lifetimes nest. A recycling allocator exploiting that pattern is faster than mimalloc (1.28x). The TLS propagation is how you get the allocator tooperator newbefore the frame exists.The benchmarks are from one implementation on two compilers. I'd want to see numbers from at least one other team before baking TLS into a standard protocol. Single-vendor data for a standardization proposal always makes me cautious.
The paper positions this as "coroutines for I/O" but
std::executionalready handles I/O through senders. P2762 has been exploring sender-based networking. The committee already made a strategic decision that networking should be based on P2300. This paper is proposing a parallel, incompatible model.The 2021 poll had no consensus on stopping the Networking TS (13 SF, 13 WF, 8 N, 6 WA, 10 SA). The "networking should be based on P2300" poll was weak consensus with the chairs noting many neutrals and no concrete sender-based networking paper in hand. Five years later, there's still no sender-based networking paper with implementation experience. This paper has a working stack.
P2762 exists. The
beman::executionwork exists. Sender-based networking is being worked on. It's harder because senders provide more guarantees - zero allocation, compile-time work graphs, cancellation through the type system. That takes longer to design correctly."Takes longer to design correctly" is doing a lot of heavy lifting. Boost.Asio has shipped production networking for 20 years. This paper builds on that design lineage with coroutines. At some point "takes longer" becomes "maybe the approach is wrong for this domain."
The protocol is genuinely small. Two concepts, one struct, one type-erased executor. That's the entire public surface. Compare to the surface area of
std::executionsender algorithms. The paper's line about narrow abstractions - iterators, RAII, allocators - is earned here. IoAwaitable captures one thing: how a coroutine gets its I/O context.Saint-Exupéry would approve.
Small protocol, sure. But the implementation mixin in Section 7 (
io_awaitable_promise_base) is not small. And the IoRunnable concept has seven requirements. The protocol is narrow at the concept boundary but deep behind it. Which is fine - that's how good abstractions work - but let's not pretend it's trivial.Every three years someone proposes networking for C++. Every three years the committee argues about whether it should be based on the current executor/async model du jour. This has been going on since 2005. The Networking TS. The Executors TS. Now senders. Now this. At least this one has a working library.
Boost.Asio has been shipping production networking since 2003. Boost.Beast added HTTP/WebSocket in 2017. Boost.Cobalt added coroutines. The ecosystem exists. The question is why the standard still doesn't have sockets.
Section 6 makes a strong case for coroutine frames as natural type erasure. A
task<T>erases everything behind acoroutine_handle<>. You get runtime polymorphism without virtual functions. Theany_read_streamexample in Appendix B shows how this enables type-erased I/O streams with one template parameter instead of the usual template spaghetti.Same authors as P4007R0 and P4014R0. Three papers, one argument: I/O should use coroutines directly, not route through senders. The committee will want to see analysis from someone who doesn't have a competing library before treating this as settled direction. That said, the implementation experience is real, and the Asio design lineage is decades deep.
Who else would write this paper? The people who have built production I/O libraries are the ones qualified to say what I/O needs. Asking for "someone without a library" is asking for a theorist. We tried that approach already. It produced executors.
The
executor_reftype erasure is clean. Two function pointers (dispatch and post), one void pointer for context. That's the entire executor interface. No concepts, no traits, no customization points. A socket just needs to know how to resume a coroutine.executor_reftells it how.Appendix A is the best introduction to async I/O fundamentals I've read in a WG21 paper. Event loops, completion handlers, executors, strands, cancellation - explained from scratch with diagrams. Every new C++ developer writing networking code should start there.
Cancellation through
std::stop_tokenpropagating from launch site to I/O object to platform primitive (CancelIoEx,IORING_OP_ASYNC_CANCEL). Cooperative, composable, and it builds on what's already in the standard. This is how cancellation should work for I/O.Between P4003 (the protocol), P4007 (the sender boundary analysis), P4014 (the sub-language guide), and P2583 (symmetric transfer gap) - there are now four papers making the case for coroutine I/O as a first-class model. The committee can agree or disagree but they can't say the case hasn't been made.
[removed by moderator]
The execution context owning its I/O objects maps directly to how io_uring and IOCP actually work. The socket knows its ring. The operation goes through that ring. The completion comes back on that ring. This is what a platform-native design looks like when you don't force an abstraction layer between the coroutine and the kernel.
Suggested straw poll: "Coroutine-driven async I/O should have the same freedom to optimize for its domain as heterogeneous compute did." I predict SF/F/N/A/SA of roughly 4/8/5/3/2. Not unanimous. Not killed. Enough to keep the direction alive.
Can we please just get sockets in the standard. I don't care if they're sender-based, coroutine-based, or pigeon-based. Just ship something.
IP over Avian Carriers is actually standardized (RFC 1149). C++ networking is not. Think about that.
I've been using Rust's Tokio for three years now. Every time I read a WG21 paper about C++ async I feel like I'm watching a documentary about the space shuttle program. Impressive engineering. Decade-long timelines. Enormous cost. And my TCP server already works.
committee gonna committee