r/wg21 · Posted by u/modern_cpp_news · 8 hr. ago
P4003R1 - Ask: IoAwaitable for Coroutine-Native Byte-Oriented I/O WG21
327 · 89% Upvoted

Document: P4003R1
Authors: Vinnie Falco, Steve Gerbino, Mungo Gill
Date: 2026-03-31
Audience: LEWG
Link: wg21.link/p4003r1

New paper from Vinnie Falco (of Boost.Beast fame) proposing the IoAwaitable protocol as the standard vocabulary for coroutine-native byte-oriented I/O. The pitch is basically: auto [ec, n] = co_await socket.read_some(buf); - and then only enough infrastructure to make that line work.

The paper positions this as a "third way" - distinct from both the Networking TS executor model and sender/receiver (P2300). It proposes an io_env struct (executor, stop token, frame allocator), a two-argument await_suspend as the core protocol, and a type-erased executor_ref. Claims to complement std::execution rather than compete with it.

Notable: SG14 has formally recommended this direction in P4029R0, saying networking should not be built on P2300. There's also a companion design rationale paper (P4172R0), shipping implementations on three platforms (Capy + Corosio), and a field experience report from a derivatives exchange (P4125R1).

The paper asks for floor time in LEWG and proposes five straw polls, starting with "is this a distinct approach" and ending with "advance IoAwaitable."

44 comments · share · save · hide · report
sorted by: best
u/r_cpp_janitor 📌 stickied comment 8 hr. ago

Reminder: be civil. Paper authors sometimes read these threads. Argue the technical merits, not the people.

u/compiles_first_try 423 points 3 hr. ago 🏆

can we please just get networking in the standard before I retire

u/yet_another_cpp_dev 187 points 2 hr. ago

bold of you to assume C++ developers ever retire. we just move to management and stop writing code, which is basically the same as dying

u/segfault_enjoyer 89 points 1 hr. ago

my grandchildren will inherit my half-finished networking proposal along with my student loans

u/mass_pointer_deref 42 points 47 minutes ago

i've been hearing "networking next cycle" since I was an intern. I have grey hair now.

u/mass_pointer_deref 201 points 7 hr. ago 🏆

the absolute state of C++ async:

2005: "we need networking"
2012: "we need executors first"
2017: "we need sender/receiver first"
2021: "sender/receiver covers networking"
2023: "actually only sender/receiver for networking"
2026: "actually there's a third way"
2029: "we need to unify the three ways first"
2032: networking

u/definitely_not_a_compiler_dev 56 points 6 hr. ago

to be fair the timeline is more like:

2005: "we need networking"
2024: "we have std::execution"
2026: "we still need networking"

u/compiles_first_try 34 points 5 hr. ago

the connect/start dance in P2300 makes me feel like I'm writing Java with extra template instantiations

u/async_skeptic concurrency nerd 156 points 5 hr. ago

Ok, I actually read the paper. Twice. Here's where I landed.

The comparison table in Section 6 is the strongest argument in the paper and I'm surprised it's buried that deep. The key row is await_ready can return true - for awaitables yes, for senders "structurally impossible." That's not a minor difference. If your I/O completes synchronously (data already in the kernel buffer, say), the awaitable path can skip the suspension entirely. The sender path through connect/start cannot. For high-throughput byte streams where you're reading small chunks in a loop, that adds up.

The executor concept is basically Asio with coroutine_handle<> replacing completion handlers. That's... not a bad thing? The Networking TS executors were rejected partly because the completion handler model was clunky for coroutines. This strips it down to dispatch(continuation&) which returns a handle for symmetric transfer. Clean.

What I'm less convinced by:

1. The "structurally impossible" claim. It's true under P2300R10's current architecture, but the paper presents it as a fundamental limitation rather than an implementation choice. A sender could potentially define an await_ready-like fast path if the design cared to. The paper is right that P2300 doesn't, but "impossible" is doing a lot of heavy lifting.

2. The frame allocator via get_current_frame_allocator() / set_current_frame_allocator(). The paper doesn't say thread-local but that's the obvious implementation. What happens when you have work-stealing schedulers moving coroutines between threads? The "safe_resume" protocol is mentioned but deferred to P4172R0.

3. "The two belong in the standard together." This is the hard part. Not technically - conceptually. How does a user know when to reach for co_await socket.read_some(buf) vs execution::then(async_read(socket, buf), ...)? The paper says byte I/O is sequential and execution is DAG-shaped. Real programs aren't that clean.

That said - the code speaks for itself. Three platforms, a Compiler Explorer demo, and a derivatives exchange using it. That's more implementation experience than most papers bring to LEWG.

u/coroutine_mechanic compiler adjacent 67 points 4 hr. ago
The key row is await_ready can return true

This. I've measured this in production. With io_uring, you get synchronous completions on ~40% of reads when the buffer is warm. Skipping the suspend/resume round trip on 40% of operations is material.

u/not_on_the_committee 43 points 3 hr. ago

Let me push back on the "implementation experience" framing.

The Kona 2023 SG4 poll was consensus: networking should support only a sender/receiver model. 5-5-1-0-1. This paper's Section 7 acknowledges this and then says "coroutine-native I/O wasn't among the alternatives considered." That's technically true but politically naive.

You can't keep showing up with a third option every time the committee makes a directional decision. At some point the answer has to stick or the committee can't make progress on anything.

Poll 1 asks "is this a distinct approach." Obviously yes. That doesn't mean the committee is obligated to re-open the question.

u/async_skeptic 28 points 2 hr. ago

I hear you on process stability. But the Kona poll was between the Networking TS executor model and sender/receiver. If a genuinely new option with shipping code shows up, "we already decided between two different things" isn't a great reason to refuse to look at it.

The question isn't whether P4003 is distinct - that's a tautology. The question is whether the coroutine model is better enough for byte I/O specifically to justify the cost of reconsidering. The Section 6 table is that argument.

u/not_on_the_committee 15 points 1 hr. ago

"Better enough" is a judgment call that depends on how much you value committee bandwidth. There's a reason the direction group exists. But fine, I'll concede that if LEWG gives it floor time and the polls in Section 7 pass, that IS the process working. I just don't want to pretend this is costless.

u/daily_reactor_loop epoll was a mistake 38 points 3 hr. ago

From the networking side - the continuation struct with the intrusive next pointer is a nice touch. Anyone who's done epoll-level work knows that per-operation allocation kills you under load. The fact that dispatch returns a coroutine_handle<> for symmetric transfer means you can chain operations without touching the allocator at all. That's the kind of detail that says "someone actually wrote a reactor."

u/embedded_for_20_years bare metal or bust 134 points 6 hr. ago

Firmware dev here. I opened this paper because of the frame allocator section and it didn't disappoint.

Section 3.5 numbers: MSVC with a recycling allocator hits 1265ms vs 3926ms with std::allocator. That's 3.1x. Not a micro-benchmark curiosity. That's the difference between "coroutines are too expensive" and "coroutines are fine actually."

The protocol propagates the allocator to every frame in the chain automatically. No allocator_arg_t in every signature. That's the right call.

We can't use coroutines right now because the frame allocation cost is unpredictable. If this protocol standardizes a way to plug in a pool allocator at launch and have it propagate silently, that changes the calculation. Not every embedded target can afford new/delete per coroutine frame, but a pre-allocated pool? That works.

I'll be watching what happens with get_current_frame_allocator() though. The paper is cagey about storage mechanism. Thread-local is fine for our use case (single-threaded event loop) but I know the multi-threaded crowd will have opinions.

u/constexpr_everything 22 points 5 hr. ago

have you considered that coroutine frames could be constexpr allocated in C++29

u/daily_reactor_loop epoll was a mistake 45 points 4 hr. ago

The networking story is similar. Our production server does ~50k connections with coroutine-per-connection. Without a frame recycler, we were hemorrhaging memory from fragmentation. With one, steady state dropped 60%. The 3.1x figure tracks.

The key insight in the paper is that frame allocators are a first-class concern, not an afterthought you bolt on. Most coroutine tutorials pretend operator new doesn't exist.

u/throwaway_84729 18 points 3 hr. ago

what exactly is a "recycling" frame allocator? like it reuses the same memory?

u/embedded_for_20_years bare metal or bust 25 points 1 hr. ago

Exactly that. When a coroutine completes and its frame is freed, the allocator holds onto that memory instead of returning it to the OS. Next coroutine of a similar size gets the same block. Zero fragmentation in the steady state. Works like a freelist.

Edit: the Capy implementation is in the repo if you want to look at concrete code. github.com/cppalliance/capy

u/template_instantiation_limit 89 points 4 hr. ago

Let me push on the "structurally impossible" claim in Section 6.

await_ready can return true - Yes (awaitable) / No - structurally impossible (sender)

The paper says this is because connect(sndr, rcvr) creates an operation state that must be start()-ed. But there's nothing stopping a sender-based wrapper from checking whether the operation has already completed before suspending. You could have an as_awaitable that calls connect + start, and if the receiver is synchronously signaled, sets a flag that makes await_ready return true.

That's not "structurally impossible." It's "not how P2300R10 is currently specified" and "would require implementation effort." Those are different.

Edit: Actually, re-reading more carefully - the paper's point might be that start() is a void-returning function that signals completion through the receiver, not as a return value. So you'd need the receiver to set a thread-local flag or something. Which... is exactly what the paper is doing with frame allocators. Hmm.

Edit2: OK I think the accurate statement is "structurally more expensive, not structurally impossible." The synchronous path exists but costs you the connect + start overhead plus a flag check. For byte I/O in a tight loop that matters. Conceding the point partially.

u/daily_reactor_loop epoll was a mistake 52 points 3 hr. ago

Your Edit2 is right. Here's what the hot path actually looks like:

// Awaitable path (sync complete):
await_ready() -> true
await_resume() -> result
// 0 extra allocations, 0 suspensions

// Sender path (sync complete):
connect(sndr, rcvr) -> op_state
start(op_state) -> signals rcvr synchronously
// still paid for op_state construction

When you're doing 100k reads/sec on a hot TCP connection, the op_state construction on every operation is the overhead.

u/UB_enjoyer_69 44 points 2 hr. ago

"Edit2: OK I think the accurate statement is..."

character development

u/coroutine_mechanic compiler adjacent 31 points 2 hr. ago

It's about the steady state. Individual operations are nanoseconds either way. But coroutine byte I/O in a loop - read, process, write, repeat - the awaitable path eliminates the per-operation ceremony. Over millions of operations that's your budget.

u/coroutine_mechanic compiler adjacent 78 points 6 hr. ago

For anyone who wants to see the actual protocol without reading the full paper, here's the minimum viable IoAwaitable:

struct my_read_awaitable {
    bool await_ready() noexcept { return has_data_; }

    void await_suspend(
        std::coroutine_handle<> h,
        io_env const* env) {
        // submit to OS, executor resumes us
        env->executor.post(cont_);
    }

    std::pair<error_code, size_t>
    await_resume() noexcept { return {ec_, n_}; }
};

The two-argument await_suspend is the whole protocol. The io_env* gives you executor, stop token, and frame allocator. That's it. Everything else in the paper is infrastructure around this one signature.

Godbolt demo from the paper: godbolt.org/z/Wzrb7McrT

u/throwaway_84729 35 points 5 hr. ago

wait this actually looks... nice? like, significantly nicer than the sender equivalent? what's the catch?

u/not_on_the_committee 12 points 4 hr. ago

A godbolt link is not a proposal. The catch is that getting from "look at this cool code" to standardized wording takes years of committee work. The paper itself acknowledges this by asking for floor time rather than proposing wording.

u/not_on_the_committee 72 points 5 hr. ago

Process analysis of the straw polls in Section 7:

Polls 1-2 are the gates. If the committee doesn't agree that this is a distinct approach AND that new research warrants consideration, polls 3-5 never happen.

Poll 1 is straightforward - yes, coroutine-native I/O is distinct from both the Networking TS executor model and S/R. Nobody's disputing that.

Poll 2 is the real fight. "New research... warrants consideration" is polite for "please re-open a question you thought was settled." The Kona SG4 poll was consensus. Some committee members will view this as relitigating. Others will say three platforms + production deployment + SG14 backing is exactly the kind of new information that justifies revisiting.

Polls 3-5 are technical. Frame allocator propagation, separate compilation, advancing the protocol. These are the easy votes IF polls 1-2 pass.

My prediction: polls 1-2 get rough consensus but with real SA votes. The old guard who drove the Kona decision aren't going to be happy. This will be a contentious session.

u/async_skeptic concurrency nerd 29 points 4 hr. ago

Accurate read. The SG14 backing (P4029R0) will carry weight though. When your low-latency constituency formally says "don't build networking on P2300," that changes the calculus.

u/senior_build_system_victim 67 points 8 hr. ago

I just want to point out that we are in year 21 of the C++ networking saga and the current state of the art is five competing proposals, three history papers explaining why we have five competing proposals, and a companion paper explaining the companion paper to the paper explaining the proposal.

The paper itself says P0592R0 listed networking as a C++20 priority. C++20 shipped five years ago. We're talking about C++29 now.

I'm going to go write a Python script that does what this paper does in 3 lines and then cry into my CMakeLists.txt.

u/move_semantics_were_enough 23 points 7 hr. ago

remember when the biggest controversy was whether auto should be allowed in function signatures? simpler times.

u/just_mass_assign 11 points 6 hr. ago

at this rate the heat death of the universe will arrive before std::socket

u/former_boost_contributor retired from template metaprogramming 45 points 4 hr. ago

As someone who used Beast heavily before switching shops - Falco knows the networking space. Beast is battle-tested in production at scale. The execution_context + service model in Section 4 is clearly descended from Asio, which is both a strength (proven design) and a weakness (some committee members have Asio fatigue).

The real question is whether the committee can evaluate this on technical merit or whether it'll get caught in the political crossfire between the P2300 camp and the "just ship the Networking TS already" camp. This paper tries to position itself as a "third way" but the executor model is close enough to Asio that opponents will call it Asio-with-coroutines.

Which... it kind of is? And that might be fine?

u/daily_reactor_loop epoll was a mistake 19 points 3 hr. ago

Asio-with-coroutines minus completion handlers minus strand complexity minus the io_context-executor coupling. The executor concept in Section 4.3 is dramatically simpler than Asio's. But yeah, the lineage is obvious and the committee has history with Asio.

u/rust_evangelist_strike_force 12 points 7 hr. ago

Meanwhile in Rust:

let mut buf = [0u8; 1024];
let n = socket.read(&mut buf).await?;

One line. No concepts. No frame allocators. No executor model. No companion paper. No twenty-year standards process.

I'm not saying "just use Rust" but I am gesturing in its general direction.

u/former_boost_contributor -2 points 6 hr. ago

We get it. Rust exists. Every single thread.

u/definitely_not_a_compiler_dev 34 points 5 hr. ago

The Rust comparison is actually interesting here though. Rust's async is essentially the same model - .await suspends the future, the executor decides when to poll it. The difference is Rust baked the executor abstraction into the ecosystem early (tokio won) instead of standardizing it. C++ is trying to standardize first and that's why we're in year 21.

[removed by moderator]

[removed]

u/turbo_llama_9000 8 points 4 hr. ago

what did they say?

u/segfault_enjoyer 15 points 3 hr. ago

something about how C++ should just adopt Rust's borrow checker. you know, the usual.