r/wg21
P4003R0 - Coroutines for I/O WG21
Posted by u/async_io_watcher · 16 hr. ago

Document: P4003R0
Authors: Vinnie Falco, Steve Gerbino, Mungo Gill
Date: 2026-02-22
Audience: LEWG

A proposal for a coroutine-native I/O protocol built from working code. The core is two concepts - IoAwaitable and IoRunnable - a type-erased executor, and a thread-local frame allocator. The paper argues that five C++20 coroutine properties (type erasure through coroutine_handle<>, promise customization, stackless frames, symmetric transfer, and compiler-managed state) together make coroutines the optimal basis for byte-oriented I/O.

Implementation experience comes from Boost.Capy and Corosio - a complete networking stack (sockets, timers, DNS, TLS, HTTP) on multiple platforms. Frame allocator recycling shows 3.1x speedup over std::allocator on MSVC. Same author line as P4007R0 and P4014R0.

▲ 412 points (83% upvoted) · 58 comments
sorted by: best
u/AutoModerator 1 point 16 hr. ago pinned comment

Paper: P4003R0 · Target: LEWG · Type: Proposal

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

u/finally_someone_tried 278 points 15 hr. ago 🏆
This paper is a research report drawn from working code: a complete coroutine-only networking library.

This is the sentence I've been waiting to read in a WG21 paper for six years. Not "we think this could work." Not "future work includes implementation." They built the thing, benchmarked it, and now they're telling us what they found. More papers should work this way.

u/implementation_first 94 points 14 hr. ago

The Boost.Beast guy built a networking library. This is not his first rodeo. The question is whether the committee will listen to someone who has shipped production I/O code, or whether the debate will stay theoretical.

u/awaitable_protocol_nerd 198 points 15 hr. ago

The core design decision is the two-argument await_suspend:

coroutine_handle<> await_suspend(
    coroutine_handle<> cont,
    io_env const* env);

Standard awaitables take one argument. This takes two - the second carries the executor, stop token, and frame allocator. The upside: compile-time protocol checking. If you co_await an IoAwaitable from a non-conforming promise, you get a compile error, not a silent runtime bug. The downside: IoAwaitables don't compose with standard awaitables. You're in the IoAwaitable world or you're not.

The paper addresses this explicitly - the static_assert in await_transform catches mismatches at the right point. It's a deliberate walled garden with a clearly marked door.

u/generic_awaitable_fan 82 points 14 hr. ago

The walled garden is the problem. If I have an existing awaitable library that works with task<T>, lazy<T>, or any other coroutine type, it won't work with IoAwaitable tasks without modification. Every awaitable in the ecosystem needs a second overload of await_suspend to participate.

u/awaitable_protocol_nerd 57 points 13 hr. ago

The paper compares this to the alternative - templating on the promise type and extracting the environment. That approach compiles silently with non-conforming awaitables and produces runtime bugs. The two-argument form is a forced opt-in. You either implement the protocol or you don't. No middle ground.

Is that the right tradeoff? Depends on whether you value interop with the existing awaitable ecosystem or compile-time correctness guarantees. The paper chose correctness.

u/tls_considered_harmful 156 points 14 hr. ago

The frame allocator uses thread-local storage for propagation. Section 5.5 addresses the concerns - fibers, thread migration, M:N schedulers - and argues the window is narrow (set before operator new, cleared after allocation). But TLS for a critical-path resource makes me nervous.

What happens when someone runs this on a fiber-based scheduler that multiplexes coroutines across threads? The paper says "the thread-local is written immediately before the coroutine allocation and read immediately during allocation" so the window is tiny. But tiny is not zero.

u/frame_allocator_enjoyer 68 points 13 hr. ago

The alternative is putting the allocator in the coroutine signature - which makes every io_task a template on the allocator type. The paper explicitly rejects this. Ergonomics win. task<T> not task<T, Alloc>.

The 3.1x speedup over std::allocator on MSVC is real. Frame sizes repeat and lifetimes nest. A recycling allocator exploiting that pattern is faster than mimalloc (1.28x). The TLS propagation is how you get the allocator to operator new before the frame exists.

u/tls_considered_harmful 34 points 12 hr. ago

The benchmarks are from one implementation on two compilers. I'd want to see numbers from at least one other team before baking TLS into a standard protocol. Single-vendor data for a standardization proposal always makes me cautious.

u/sender_networking_advocate 124 points 14 hr. ago

The paper positions this as "coroutines for I/O" but std::execution already handles I/O through senders. P2762 has been exploring sender-based networking. The committee already made a strategic decision that networking should be based on P2300. This paper is proposing a parallel, incompatible model.

u/asio_veteran_2012 89 points 13 hr. ago

The 2021 poll had no consensus on stopping the Networking TS (13 SF, 13 WF, 8 N, 6 WA, 10 SA). The "networking should be based on P2300" poll was weak consensus with the chairs noting many neutrals and no concrete sender-based networking paper in hand. Five years later, there's still no sender-based networking paper with implementation experience. This paper has a working stack.

u/sender_networking_advocate 41 points 12 hr. ago

P2762 exists. The beman::execution work exists. Sender-based networking is being worked on. It's harder because senders provide more guarantees - zero allocation, compile-time work graphs, cancellation through the type system. That takes longer to design correctly.

u/asio_veteran_2012 27 points 11 hr. ago

"Takes longer to design correctly" is doing a lot of heavy lifting. Boost.Asio has shipped production networking for 20 years. This paper builds on that design lineage with coroutines. At some point "takes longer" becomes "maybe the approach is wrong for this domain."

u/narrow_abstraction_enjoyer 108 points 13 hr. ago

The protocol is genuinely small. Two concepts, one struct, one type-erased executor. That's the entire public surface. Compare to the surface area of std::execution sender algorithms. The paper's line about narrow abstractions - iterators, RAII, allocators - is earned here. IoAwaitable captures one thing: how a coroutine gets its I/O context.

We could not find anything to remove from it without losing function.

Saint-Exupéry would approve.

u/complexity_realist 42 points 12 hr. ago

Small protocol, sure. But the implementation mixin in Section 7 (io_awaitable_promise_base) is not small. And the IoRunnable concept has seven requirements. The protocol is narrow at the concept boundary but deep behind it. Which is fine - that's how good abstractions work - but let's not pretend it's trivial.

u/networking_ts_survivor 87 points 14 hr. ago

Every three years someone proposes networking for C++. Every three years the committee argues about whether it should be based on the current executor/async model du jour. This has been going on since 2005. The Networking TS. The Executors TS. Now senders. Now this. At least this one has a working library.

u/just_use_boost_beast 48 points 13 hr. ago

Boost.Asio has been shipping production networking since 2003. Boost.Beast added HTTP/WebSocket in 2017. Boost.Cobalt added coroutines. The ecosystem exists. The question is why the standard still doesn't have sockets.

u/type_erasure_skeptic 62 points 12 hr. ago

Section 6 makes a strong case for coroutine frames as natural type erasure. A task<T> erases everything behind a coroutine_handle<>. You get runtime polymorphism without virtual functions. The any_read_stream example in Appendix B shows how this enables type-erased I/O streams with one template parameter instead of the usual template spaghetti.

u/disclosure_reader 53 points 11 hr. ago

Same authors as P4007R0 and P4014R0. Three papers, one argument: I/O should use coroutines directly, not route through senders. The committee will want to see analysis from someone who doesn't have a competing library before treating this as settled direction. That said, the implementation experience is real, and the Asio design lineage is decades deep.

u/who_else_would_build_it 38 points 10 hr. ago

Who else would write this paper? The people who have built production I/O libraries are the ones qualified to say what I/O needs. Asking for "someone without a library" is asking for a theorist. We tried that approach already. It produced executors.

u/executor_ref_enjoyer 44 points 12 hr. ago

The executor_ref type erasure is clean. Two function pointers (dispatch and post), one void pointer for context. That's the entire executor interface. No concepts, no traits, no customization points. A socket just needs to know how to resume a coroutine. executor_ref tells it how.

u/appendix_a_appreciator 36 points 10 hr. ago

Appendix A is the best introduction to async I/O fundamentals I've read in a WG21 paper. Event loops, completion handlers, executors, strands, cancellation - explained from scratch with diagrams. Every new C++ developer writing networking code should start there.

u/stop_token_propagation 31 points 11 hr. ago

Cancellation through std::stop_token propagating from launch site to I/O object to platform primitive (CancelIoEx, IORING_OP_ASYNC_CANCEL). Cooperative, composable, and it builds on what's already in the standard. This is how cancellation should work for I/O.

u/coroutine_partisan 26 points 9 hr. ago

Between P4003 (the protocol), P4007 (the sender boundary analysis), P4014 (the sub-language guide), and P2583 (symmetric transfer gap) - there are now four papers making the case for coroutine I/O as a first-class model. The committee can agree or disagree but they can't say the case hasn't been made.

[deleted] -12 points 8 hr. ago

[removed by moderator]

u/io_uring_for_everything 22 points 8 hr. ago

The execution context owning its I/O objects maps directly to how io_uring and IOCP actually work. The socket knows its ring. The operation goes through that ring. The completion comes back on that ring. This is what a platform-native design looks like when you don't force an abstraction layer between the coroutine and the kernel.

u/straw_poll_predictor 18 points 7 hr. ago

Suggested straw poll: "Coroutine-driven async I/O should have the same freedom to optimize for its domain as heterogeneous compute did." I predict SF/F/N/A/SA of roughly 4/8/5/3/2. Not unanimous. Not killed. Enough to keep the direction alive.

u/great_another_10_years 14 points 6 hr. ago

Can we please just get sockets in the standard. I don't care if they're sender-based, coroutine-based, or pigeon-based. Just ship something.

u/rfc_1149_implementer 9 points 5 hr. ago

IP over Avian Carriers is actually standardized (RFC 1149). C++ networking is not. Think about that.

u/skill_issue_42 7 points 4 hr. ago

I've been using Rust's Tokio for three years now. Every time I read a WG21 paper about C++ async I feel like I'm watching a documentary about the space shuttle program. Impressive engineering. Decade-long timelines. Enormous cost. And my TCP server already works.

u/committee_gonna_committee 4 points 2 hr. ago

committee gonna committee