Authors: Vinnie Falco, Mungo Gill
Document: P4007R0
Date: 2026-02-22
Target: LEWG
Link: wg21.link/p4007r0
This paper identifies four structural gaps where the sender model meets coroutines: error reporting (the three-channel model can't express partial I/O results without losing data), error returns (co_yield with_error reverses six years of coroutine convention), frame allocator propagation (senders get the allocator they don't need, coroutines need the one they don't get), and symmetric transfer (sender algorithms are structs with void-returning completions - no handle to transfer to).
The key argument: each gap is the cost of a property the sender model requires for compile-time analysis. They're tradeoffs, not defects. Mandating that standard networking be built on the sender model forces coroutine I/O users to pay all four costs. The companion paper P4003R0 demonstrates a coroutine-native I/O design where these costs don't exist.
The recommendation: ship std::execution for C++26 (it earned its place), defer task to C++29 (the forwarding poll for C++29 was unanimous; for C++26 it was weak consensus with "if possible"), and explore coroutine-native I/O alongside sender-based designs. The SG4 poll at Kona 2023 offered two alternatives - the Networking TS executor model and the sender model. A coroutine-native approach was not among the choices.
Reminder: be civil. The paper authors sometimes read these threads.
The paper's strongest section is 3.3. Let me walk through it because I don't think most people are going to click through to the PDF.
P2300R10's own motivating example (Section 1.3.3) implements a composed read using sender pipelines.
async_readcompletes throughset_valuewith the byte count, or throughset_errorwith the error code. If the second read fails partway through - connection reset after transferring some bytes -set_errorfires. The byte count has no channel.thenonly handlesset_value. The error propagates past it, thedynamic_bufferwith its partially valid contents is destroyed.This is verifiable. Go check the stdexec repo. Check Leahy's use_sender adaptor. In the
set_errorpath, the byte count is discarded.For anyone who's done production socket programming: how many bytes transferred before the error matters. Connection reset after 47 bytes of a 1024-byte read is qualitatively different from connection reset after 0 bytes. Boost.Asio got this right twenty-five years ago with
void(error_code, size_t). The three-channel model cannot express it without losing something.The three-channel model exists for a reason though. Type-level routing is what makes sender algorithms composable without runtime inspection.
upon_errorattaches at the type level.let_valuechains successes. You can't get that if everything goes throughset_value.The paper acknowledges this in Appendix A.1 but then treats "I/O doesn't fit the model" as evidence the model is wrong, rather than evidence the model has a specific scope. Which, to be fair, is what Section 7 eventually says - tradeoff, not defect. But the framing for the three sections before that is pure prosecution.
A prosecution with receipts. The paper traces this question back five years - P2430R0 in 2021, LEWG telecons through that fall, the concurrent queue debate at Wroclaw, Hagenberg. Every proposed resolution carries real costs and none has shipped. At some point the evidence supports "structural constraint" over "missing insight."
The scope argument is the paper's point exactly:
std::executionserves specific domains, not all of async C++. Ship it for those domains. Don't mandate I/O pay costs that serve compile-time analysis it doesn't need.Fair on the timeline. But "ship execution without task" means coroutine users get zero value from the entire
std::executioninvestment in C++26. Every networking library already has its own task type. We had a chance to standardize interop and you're saying wait another cycle.And the partial success problem has known mitigations. Dimov's mapping preserves partial results for the n>0 case. Not perfect, but not nothing.
Dimov's mapping routes
(ec, 0)throughset_error, which P3552's task delivers as an exception. ECONNRESET with zero bytes becomes stack unwinding. That's section 3.7, and it's the part I can't get past.Coroutine interop requires the awaitable protocol, not type identity. The paper cites P3552 itself on this: "different coroutine task implementations can live side by side." The interop argument for rushing task doesn't hold when interop doesn't require type identity.
can we please just get networking in the standard before I retire. this is like watching two people argue about the color of the bike shed while the building has no plumbing
bold of you to assume you'll retire before C++ gets networking
inb4 "just use Rust" but seriously. Rust shipped tokio, async-std, and actual production networking years ago. we're still debating which channel an ECONNRESET goes through
here we go
to be fair Rust's async model has its own holy wars. Pin is universally hated, the runtime fragmentation (tokio vs async-std vs smol) is a real problem, and they're still arguing about whether to add generators. the grass is not greener.
at least they shipped something that works end to end while we debate tradeoff taxonomy
I spent an hour with the PDF. Some observations from someone who uses both sender pipelines and coroutines in production:
The paper is better than most committee papers at separating observation from advocacy. The disclosure section opens by listing what the authors' own design can't do - compile-time work graphs, heterogeneous dispatch, cooperative runtime assumption. You don't see that often.
The strongest arguments, in order:
1. Section 3.3 - data loss. The claim that every published sender implementation of composed I/O loses partial results on the error path is verifiable and, as far as I can tell, correct. This isn't theoretical.
2. Section 7.5 - ABI lock-in. Once shipped, the three-channel model, connect/start, and
void await_suspendbecome ABI. Closing any of the four gaps means changing these relationships. "Ship task is the risky choice, not the safe one" is a sentence that will make people uncomfortable because it might be right.3. Section 5.4 - silent fallback. Forgetting
allocator_argat one call site falls back tostd::allocatorwith no diagnostic. That's a production bug factory.The weakest argument is Section 8, "Why Wait To Ship?" - it reads like a separate paper stapled on at the end. The cost-of-waiting analysis is thin compared to the gap analysis.
The task type table in 7.5 is devastating though. Every independent implementation: one template parameter. P3552: two. When the entire ecosystem converges on something and your design diverges, that's a signal.
best summary in this thread tbh
the disclosure section is genuinely refreshing. you rarely see paper authors lead with "here are the things our own design can't do." most committee papers are pure advocacy from start to finish
From Section 4:
In every other coroutine context in the C++ ecosystem,
co_yieldmeans "produce a value and continue." Here it means "fail and terminate."cppcoro, folly::coro, Boost.Cobalt, Boost.Asio, libcoro, asyncpp - six years of established practice. None uses
co_yieldfor error signaling. P1713R0 proposed lifting the return_void/return_value restriction in 2019. No consensus in Cologne. Now P3950R0 proposes the same change because task requires it.A language rule that served every coroutine library for a decade must be reconsidered because one library needs it. The rule is not the problem.
imagine explaining to a junior dev that
co_yielddoesn't yield andco_returndoesn't return an error. "well you see, the sender model needs three channels, and the promise can only definereturn_voidorreturn_valuebut not both, so..."Baker proposed this exact language change in 2019 (P1713R0). No consensus. Now Leahy proposes it in P3950R0 because task requires it and Müller confirmed the need in P3801R0.
three separate authors, spanning six years, identifying the same constraint. that's a pattern.
The committee has been here before. Two data points:
1. The P2300 deferral from C++23. Same pattern - ongoing design changes, open questions, "ship now iterate later" pressure. The committee deferred.
std::executiongot substantially better as a result.2. The forwarding polls. "Forward P3552R1 to LWG for C++29" - SF:5 / F:7 / N:0 / A:0 / SA:0. Unanimous. "Forward P3552R1 to LWG with a recommendation to apply for C++26 (if possible)" - SF:5 / F:3 / N:4 / A:1 / SA:0. Weak consensus, with "if possible" qualifier.
C++29 was unanimous. C++26 was conditional and weak. Kühl himself filed sixteen open concerns in P3796R1. The frame allocator poll got five neutral votes and nothing else - the entire room abstained.
The paper's framing is right: shipping task is the risky choice, not the safe one.
C++29 is an eternity. by then we'll all be writing in whatever language the LLMs prefer to generate
we deferred P2300 from C++23 for exactly this pattern. it got better. the precedent supports deferring task too
and look how much better
std::executiongot. but at some point you have to ship. C++26 without task means coroutine users get nothing from the entirestd::executioninvestment this cycle. "iterate independently" is easy to say when you're not the one waitingI've been doing embedded C++ for two decades. Frame allocation overhead is not theoretical for us.
The benchmark table from Section 5 (4-deep coroutine call chain, 2M iterations):
MSVC with recycling allocator: 1265ms. MSVC with std::allocator: 3927ms. 3.1x speedup just by controlling the frame allocator.
To get that speedup with P3552, you need Appendix B's five-layer ceremony: custom environment type, task alias with that env,
allocator_argat every call site, environment construction at the launch site,write_envinjection. Forgetting any one step silently falls back tostd::allocator. No compile error, no warning, no diagnostic.In production code with dozens of coroutine call sites, someone will miss one. Users will profile, see heap allocation cost, and conclude coroutines are slow. Coroutines are not slow. The fast path is too hard to use.
Edit: I checked the beman::task reference implementation. allocator_arg is the only propagation mechanism. No thread-local fallback, no automatic propagation.
thread-local propagation is the obvious answer. set the allocator once at the launch site, every coroutine in the chain picks it up automatically via
promise_type::operator new. no signature pollution, no silent fallbackthat's literally what P4003 demonstrates. section 5. the
operator newhook + resume-time restore using C++17 evaluation order guarantees. zero signature impact.the paper's offer (section 2.4) to work with the P3552 authors on making thread-local propagation work for task is worth taking seriously
five layers of machinery just to allocate a coroutine frame. the sender/receiver experience™
the senders vs coroutines discourse has more revisions than the papers themselves at this point
skill issue
I work in finance. The paper quotes Sutter: "We already use C++26's std::execution in production for an entire asset class." I believe it. We've evaluated stdexec. For work graph construction and deterministic pipelines it's excellent.
But my connection handler that processes 50K concurrent TCP sessions is not a GPU dispatch graph. ECONNRESET is not exceptional - it's every third connection on a bad day. Dimov's mapping routes (ECONNRESET, 0) through
set_error. P3552's task delivers that as an exception. Stack unwinding for every client disconnect.Boost.Asio got this right decades ago.
if (ec)is the right model for I/O errors. The paper's recommendation - ship execution for its domains, let coroutine I/O iterate independently - is the pragmatic call.Sutter literally said they use it in production at Citadel. that's the strongest endorsement
std::executionhasfor HFT pipelines. not for their HTTP infrastructure. the paper's point is that execution serves specific domains well, not that it failed. "narrowing the scope is not admitting failure - it is recognizing success where it exists and clearing the path where it does not." that's the right framing
Section 6 deserves more attention than it's getting in this thread. The symmetric transfer gap:
P2300's sender-awaitable uses
void await_suspend. Synchronous completions call.resume()as a function call withinset_value. Stack grows by one frame per synchronous completion. Sender algorithms are structs, not coroutines. Nocoroutine_handleexists at intermediate points. There is nothing to symmetric-transfer to.P2583R0 surveys six production coroutine libraries. Five use symmetric transfer. Different authors, platforms, goals - same mechanism. P0913R1 was adopted specifically to enable this.
just use a trampoline scheduler lol
P0913R1 was adopted specifically to eliminate trampolines. Suggesting we add them back as a workaround for the sender bridge is not a fix, it's reverting the progress C++20 made
The table in Section 7.5 speaks for itself:
asyncpp:
template<typename T> class task- 1 param.Boost.Cobalt:
template<typename T> class task- 1 param.cppcoro:
template<typename T> class task- 1 param.libcoro:
template<typename T> class task- 1 param.P3552:
template<typename T, typename Environment> class task- 2 params.When every independent implementation converges on the same signature and your design diverges, that's not a feature. The second parameter leaks the sender model's execution environment into the coroutine's public API surface.
the env parameter is a real design smell. every time a library's type signature differs from the ecosystem consensus in the same way, for the same structural reason, it's worth asking whether the structural reason is load-bearing or accidental
[removed by moderator]
what did they say?
something about sender/receiver being Java Enterprise patterns for C++. you know, the usual
as someone who uses Boost.Asio daily for production networking, the paper's characterization of I/O error handling matches my experience exactly. ECONNRESET is not exceptional. it's Tuesday
Asio has been the standard in everything but name for 20 years. and it works with error codes just fine. the fact that the committee still hasn't figured out how to standardize that pattern says something
I'm just here waiting for someone to benchmark what all those nested sender template types do to compile times
sender template instantiation is rough. our build went from 30s to 4 minutes when we tried stdexec. probably user error but still
so let me get this straight. the authors of P4003 wrote a paper arguing that the committee should explore the approach described in P4003. I am shocked
they literally disclose that in section 1. quote:
how many papers do you see where the authors lead with the weaknesses of their own work? the disclosure section lists four limitations of the coroutine-native approach before making a single argument against the sender approach. that's unusual transparency for a committee paper
can someone ELI5 what a sender even is? I've been writing async code with Asio for a year and I've never needed one
that's kind of the paper's point
this paper is just vendor advocacy against std::execution disguised as technical analysis. the committee already voted. senders are in the standard. task is on track. move on. not every paper from the same group of authors needs a reddit thread where everyone pretends the four gaps are news - Kühl and the P2300 authors are already addressing these in follow-up papers