r/wg21 - Senders and Coroutines

P4007R0 - Senders and Coroutines WG21

Posted by u/sender_receiver_watcher · 8 hr. ago

Authors: Vinnie Falco, Mungo Gill
Document: P4007R0
Date: 2026-02-22
Target: LEWG
Link: wg21.link/p4007r0

This paper identifies four structural gaps where the sender model meets coroutines: error reporting (the three-channel model can't express partial I/O results without losing data), error returns (co_yield with_error reverses six years of coroutine convention), frame allocator propagation (senders get the allocator they don't need, coroutines need the one they don't get), and symmetric transfer (sender algorithms are structs with void-returning completions - no handle to transfer to).

The key argument: each gap is the cost of a property the sender model requires for compile-time analysis. They're tradeoffs, not defects. Mandating that standard networking be built on the sender model forces coroutine I/O users to pay all four costs. The companion paper P4003R0 demonstrates a coroutine-native I/O design where these costs don't exist.

The recommendation: ship std::execution for C++26 (it earned its place), defer task to C++29 (the forwarding poll for C++29 was unanimous; for C++26 it was weak consensus with "if possible"), and explore coroutine-native I/O alongside sender-based designs. The SG4 poll at Kona 2023 offered two alternatives - the Networking TS executor model and the sender model. A coroutine-native approach was not among the choices.

▲ 312 points (82% upvoted) · 48 comments

sorted by: best

▲ ▼

u/AutoModerator 1 point 8 hr. ago pinned comment

Reminder: be civil. The paper authors sometimes read these threads.

Reply Share Report

▲ ▼

u/networking_at_scale 247 points 7 hr. ago 🏆

The paper's strongest section is 3.3. Let me walk through it because I don't think most people are going to click through to the PDF.

P2300R10's own motivating example (Section 1.3.3) implements a composed read using sender pipelines. async_read completes through set_value with the byte count, or through set_error with the error code. If the second read fails partway through - connection reset after transferring some bytes - set_error fires. The byte count has no channel. then only handles set_value. The error propagates past it, the dynamic_buffer with its partially valid contents is destroyed.

To our knowledge, no published sender code - P2300R10, stdexec, Leahy's adaptor - discriminates the channel based on bytes transferred.

This is verifiable. Go check the stdexec repo. Check Leahy's use_sender adaptor. In the set_error path, the byte count is discarded.

For anyone who's done production socket programming: how many bytes transferred before the error matters. Connection reset after 47 bytes of a 1024-byte read is qualitatively different from connection reset after 0 bytes. Boost.Asio got this right twenty-five years ago with void(error_code, size_t). The three-channel model cannot express it without losing something.

Reply Share Report

▲ ▼

u/sender_pipeline_user 45 points 6 hr. ago

The three-channel model exists for a reason though. Type-level routing is what makes sender algorithms composable without runtime inspection. upon_error attaches at the type level. let_value chains successes. You can't get that if everything goes through set_value.

The paper acknowledges this in Appendix A.1 but then treats "I/O doesn't fit the model" as evidence the model is wrong, rather than evidence the model has a specific scope. Which, to be fair, is what Section 7 eventually says - tradeoff, not defect. But the framing for the three sections before that is pure prosecution.

Reply Share Report

▲ ▼

u/networking_at_scale 67 points 6 hr. ago

A prosecution with receipts. The paper traces this question back five years - P2430R0 in 2021, LEWG telecons through that fall, the concurrent queue debate at Wroclaw, Hagenberg. Every proposed resolution carries real costs and none has shipped. At some point the evidence supports "structural constraint" over "missing insight."

The scope argument is the paper's point exactly: std::execution serves specific domains, not all of async C++. Ship it for those domains. Don't mandate I/O pay costs that serve compile-time analysis it doesn't need.

Reply Share Report

▲ ▼

u/sender_pipeline_user 23 points 5 hr. ago

Fair on the timeline. But "ship execution without task" means coroutine users get zero value from the entire std::execution investment in C++26. Every networking library already has its own task type. We had a chance to standardize interop and you're saying wait another cycle.

And the partial success problem has known mitigations. Dimov's mapping preserves partial results for the n>0 case. Not perfect, but not nothing.

Reply Share Report

▲ ▼

u/networking_at_scale 38 points 5 hr. ago

Dimov's mapping routes (ec, 0) through set_error, which P3552's task delivers as an exception. ECONNRESET with zero bytes becomes stack unwinding. That's section 3.7, and it's the part I can't get past.

Coroutine interop requires the awaitable protocol, not type identity. The paper cites P3552 itself on this: "different coroutine task implementations can live side by side." The interop argument for rushing task doesn't hold when interop doesn't require type identity.

Reply Share Report

▲ ▼

u/still_waiting_for_networking 189 points 7 hr. ago 🏆

can we please just get networking in the standard before I retire. this is like watching two people argue about the color of the bike shed while the building has no plumbing

Reply Share Report

▲ ▼

u/compile_time_refugee 112 points 6 hr. ago

bold of you to assume you'll retire before C++ gets networking

Reply Share Report

▲ ▼

u/sendme_a_receiver 145 points 7 hr. ago

inb4 "just use Rust" but seriously. Rust shipped tokio, async-std, and actual production networking years ago. we're still debating which channel an ECONNRESET goes through

Reply Share Report

▲ ▼

u/not_bitter_about_concepts 87 points 6 hr. ago

here we go

Reply Share Report

▲ ▼

u/actually_knows_rust 52 points 6 hr. ago

to be fair Rust's async model has its own holy wars. Pin is universally hated, the runtime fragmentation (tokio vs async-std vs smol) is a real problem, and they're still arguing about whether to add generators. the grass is not greener.

Reply Share Report

▲ ▼

u/compile_time_refugee 34 points 5 hr. ago

at least they shipped something that works end to end while we debate tradeoff taxonomy

Reply Share Report

▲ ▼

u/actually_read_papers 134 points 4 hr. ago

I spent an hour with the PDF. Some observations from someone who uses both sender pipelines and coroutines in production:

The paper is better than most committee papers at separating observation from advocacy. The disclosure section opens by listing what the authors' own design can't do - compile-time work graphs, heterogeneous dispatch, cooperative runtime assumption. You don't see that often.

The strongest arguments, in order:

1. Section 3.3 - data loss. The claim that every published sender implementation of composed I/O loses partial results on the error path is verifiable and, as far as I can tell, correct. This isn't theoretical.

2. Section 7.5 - ABI lock-in. Once shipped, the three-channel model, connect/start, and void await_suspend become ABI. Closing any of the four gaps means changing these relationships. "Ship task is the risky choice, not the safe one" is a sentence that will make people uncomfortable because it might be right.

3. Section 5.4 - silent fallback. Forgetting allocator_arg at one call site falls back to std::allocator with no diagnostic. That's a production bug factory.

The weakest argument is Section 8, "Why Wait To Ship?" - it reads like a separate paper stapled on at the end. The cost-of-waiting analysis is thin compared to the gap analysis.

The task type table in 7.5 is devastating though. Every independent implementation: one template parameter. P3552: two. When the entire ecosystem converges on something and your design diverges, that's a signal.

Reply Share Report

▲ ▼

u/daily_segfault 28 points 3 hr. ago

best summary in this thread tbh

Reply Share Report

▲ ▼

u/template_enthusiast_42 19 points 3 hr. ago

the disclosure section is genuinely refreshing. you rarely see paper authors lead with "here are the things our own design can't do." most committee papers are pure advocacy from start to finish

Reply Share Report

▲ ▼

u/coroutine_practitioner 121 points 6 hr. ago

From Section 4:

std::execution::task<std::size_t>
do_read(tcp_socket& s, buffer& buf)
{
    auto [ec, n] = co_await s.async_read(buf);
    if (ec)
        co_yield with_error(ec); // terminates the coroutine
    co_return n;
}

In every other coroutine context in the C++ ecosystem, co_yield means "produce a value and continue." Here it means "fail and terminate."

cppcoro, folly::coro, Boost.Cobalt, Boost.Asio, libcoro, asyncpp - six years of established practice. None uses co_yield for error signaling. P1713R0 proposed lifting the return_void/return_value restriction in 2019. No consensus in Cologne. Now P3950R0 proposes the same change because task requires it.

A language rule that served every coroutine library for a decade must be reconsidered because one library needs it. The rule is not the problem.

Reply Share Report

▲ ▼

u/operator_new_enjoyer 76 points 5 hr. ago

imagine explaining to a junior dev that co_yield doesn't yield and co_return doesn't return an error. "well you see, the sender model needs three channels, and the promise can only define return_void or return_value but not both, so..."

Reply Share Report

▲ ▼

u/async_skeptic 41 points 5 hr. ago

Baker proposed this exact language change in 2019 (P1713R0). No consensus. Now Leahy proposes it in P3950R0 because task requires it and Müller confirmed the need in P3801R0.

three separate authors, spanning six years, identifying the same constraint. that's a pattern.

Reply Share Report

Promoted

NVIDIA stdexec

Reference implementation of P2300 std::execution. Structured concurrency for C++. github.com/NVIDIA/stdexec

▲ ▼

u/process_archaeology 98 points 5 hr. ago

The committee has been here before. Two data points:

1. The P2300 deferral from C++23. Same pattern - ongoing design changes, open questions, "ship now iterate later" pressure. The committee deferred. std::execution got substantially better as a result.

2. The forwarding polls. "Forward P3552R1 to LWG for C++29" - SF:5 / F:7 / N:0 / A:0 / SA:0. Unanimous. "Forward P3552R1 to LWG with a recommendation to apply for C++26 (if possible)" - SF:5 / F:3 / N:4 / A:1 / SA:0. Weak consensus, with "if possible" qualifier.

C++29 was unanimous. C++26 was conditional and weak. Kühl himself filed sixteen open concerns in P3796R1. The frame allocator poll got five neutral votes and nothing else - the entire room abstained.

The paper's framing is right: shipping task is the risky choice, not the safe one.

Reply Share Report

▲ ▼

u/UB_is_a_feature 43 points 4 hr. ago

C++29 is an eternity. by then we'll all be writing in whatever language the LLMs prefer to generate

Reply Share Report

▲ ▼

u/async_skeptic 31 points 4 hr. ago

we deferred P2300 from C++23 for exactly this pattern. it got better. the precedent supports deferring task too

Reply Share Report

▲ ▼

u/sender_pipeline_user 18 points 4 hr. ago

and look how much better std::execution got. but at some point you have to ship. C++26 without task means coroutine users get nothing from the entire std::execution investment this cycle. "iterate independently" is easy to say when you're not the one waiting

Reply Share Report

▲ ▼

u/embedded_for_20_years 87 points 5 hr. ago

I've been doing embedded C++ for two decades. Frame allocation overhead is not theoretical for us.

The benchmark table from Section 5 (4-deep coroutine call chain, 2M iterations):

MSVC with recycling allocator: 1265ms. MSVC with std::allocator: 3927ms. 3.1x speedup just by controlling the frame allocator.

To get that speedup with P3552, you need Appendix B's five-layer ceremony: custom environment type, task alias with that env, allocator_arg at every call site, environment construction at the launch site, write_env injection. Forgetting any one step silently falls back to std::allocator. No compile error, no warning, no diagnostic.

In production code with dozens of coroutine call sites, someone will miss one. Users will profile, see heap allocation cost, and conclude coroutines are slow. Coroutines are not slow. The fast path is too hard to use.

Edit: I checked the beman::task reference implementation. allocator_arg is the only propagation mechanism. No thread-local fallback, no automatic propagation.

Reply Share Report

▲ ▼

u/coroutine_spaghetti 29 points 4 hr. ago

thread-local propagation is the obvious answer. set the allocator once at the launch site, every coroutine in the chain picks it up automatically via promise_type::operator new. no signature pollution, no silent fallback

Reply Share Report

▲ ▼

u/networking_at_scale 22 points 4 hr. ago

that's literally what P4003 demonstrates. section 5. the operator new hook + resume-time restore using C++17 evaluation order guarantees. zero signature impact.

the paper's offer (section 2.4) to work with the P3552 authors on making thread-local propagation work for task is worth taking seriously

Reply Share Report

▲ ▼

u/xX_constexpr_Xx 56 points 4 hr. ago

five layers of machinery just to allocate a coroutine frame. the sender/receiver experience™

Reply Share Report

▲ ▼

u/daily_segfault 78 points 7 hr. ago

the senders vs coroutines discourse has more revisions than the papers themselves at this point

Reply Share Report

▲ ▼

u/turbo_llama_9000 35 points 6 hr. ago

skill issue

Reply Share Report

▲ ▼

u/hft_latency_nerd 72 points 5 hr. ago

I work in finance. The paper quotes Sutter: "We already use C++26's std::execution in production for an entire asset class." I believe it. We've evaluated stdexec. For work graph construction and deterministic pipelines it's excellent.

But my connection handler that processes 50K concurrent TCP sessions is not a GPU dispatch graph. ECONNRESET is not exceptional - it's every third connection on a bad day. Dimov's mapping routes (ECONNRESET, 0) through set_error. P3552's task delivers that as an exception. Stack unwinding for every client disconnect.

A server handling thousands of connections sees ECONNRESET constantly - clients disconnect, networks flap, load balancers probe, mobile users lose signal.

Boost.Asio got this right decades ago. if (ec) is the right model for I/O errors. The paper's recommendation - ship execution for its domains, let coroutine I/O iterate independently - is the pragmatic call.

Reply Share Report

▲ ▼

u/sender_pipeline_user 14 points 4 hr. ago

Sutter literally said they use it in production at Citadel. that's the strongest endorsement std::execution has

Reply Share Report

▲ ▼

u/hft_latency_nerd 27 points 4 hr. ago

for HFT pipelines. not for their HTTP infrastructure. the paper's point is that execution serves specific domains well, not that it failed. "narrowing the scope is not admitting failure - it is recognizing success where it exists and clearing the path where it does not." that's the right framing

Reply Share Report

▲ ▼

u/concurrent_by_trade 56 points 5 hr. ago

Section 6 deserves more attention than it's getting in this thread. The symmetric transfer gap:

// Coroutine-native: constant stack
coroutine_handle<> await_suspend(
    coroutine_handle<> h) noexcept
{
    auto next = start(state);
    return next ? next : noop_coroutine();
}

// Sender bridge: stack grows
void await_suspend(
    coroutine_handle<> h) noexcept
{
    start(state); // .resume() within set_value
}

P2300's sender-awaitable uses void await_suspend. Synchronous completions call .resume() as a function call within set_value. Stack grows by one frame per synchronous completion. Sender algorithms are structs, not coroutines. No coroutine_handle exists at intermediate points. There is nothing to symmetric-transfer to.

P2583R0 surveys six production coroutine libraries. Five use symmetric transfer. Different authors, platforms, goals - same mechanism. P0913R1 was adopted specifically to enable this.

Reply Share Report

▲ ▼

u/operator_new_enjoyer 18 points 4 hr. ago

just use a trampoline scheduler lol

Reply Share Report

▲ ▼

u/concurrent_by_trade 32 points 4 hr. ago

P0913R1 was adopted specifically to eliminate trampolines. Suggesting we add them back as a workaround for the sender bridge is not a fix, it's reverting the progress C++20 made

Reply Share Report

▲ ▼

u/library_design_nerd 45 points 3 hr. ago

The table in Section 7.5 speaks for itself:

asyncpp: template<typename T> class task - 1 param.
Boost.Cobalt: template<typename T> class task - 1 param.
cppcoro: template<typename T> class task - 1 param.
libcoro: template<typename T> class task - 1 param.
P3552: template<typename T, typename Environment> class task - 2 params.

When every independent implementation converges on the same signature and your design diverges, that's not a feature. The second parameter leaks the sender model's execution environment into the coroutine's public API surface.

Reply Share Report

▲ ▼

u/template_enthusiast_42 21 points 2 hr. ago

the env parameter is a real design smell. every time a library's type signature differs from the ecosystem consensus in the same way, for the same structural reason, it's worth asking whether the structural reason is load-bearing or accidental

Reply Share Report

▲ ▼

[deleted] -14 points 6 hr. ago

[removed by moderator]

▲ ▼

u/not_bitter_about_concepts 15 points 6 hr. ago

what did they say?

Reply Share Report

▲ ▼

u/turbo_llama_9000 24 points 5 hr. ago

something about sender/receiver being Java Enterprise patterns for C++. you know, the usual

Reply Share Report

Promoted

CppCon 2026 - Aurora, CO

Early bird ends May 15. The conference for the C++ community.

▲ ▼

u/async_void_warrior 38 points 6 hr. ago

as someone who uses Boost.Asio daily for production networking, the paper's characterization of I/O error handling matches my experience exactly. ECONNRESET is not exceptional. it's Tuesday

Reply Share Report

▲ ▼

u/compile_time_refugee 25 points 5 hr. ago

Asio has been the standard in everything but name for 20 years. and it works with error codes just fine. the fact that the committee still hasn't figured out how to standardize that pattern says something

Reply Share Report

▲ ▼

u/build_system_victim 34 points 5 hr. ago

I'm just here waiting for someone to benchmark what all those nested sender template types do to compile times

Reply Share Report

▲ ▼

u/xX_constexpr_Xx 21 points 4 hr. ago

sender template instantiation is rough. our build went from 30s to 4 minutes when we tried stdexec. probably user error but still

Reply Share Report

▲ ▼

u/definitely_not_a_committee_member 22 points 5 hr. ago

so let me get this straight. the authors of P4003 wrote a paper arguing that the committee should explore the approach described in P4003. I am shocked

Reply Share Report

▲ ▼

u/actually_read_papers 47 points 4 hr. ago

they literally disclose that in section 1. quote:

A coroutine-only design cannot express compile-time work graphs, does not support heterogeneous dispatch, and assumes a cooperative runtime. Those are real costs.

how many papers do you see where the authors lead with the weaknesses of their own work? the disclosure section lists four limitations of the coroutine-native approach before making a single argument against the sender approach. that's unusual transparency for a committee paper

Reply Share Report

▲ ▼

u/junior_dev_2024 15 points 47 minutes ago

can someone ELI5 what a sender even is? I've been writing async code with Asio for a year and I've never needed one

Reply Share Report

▲ ▼

u/async_skeptic 31 points 38 minutes ago

that's kind of the paper's point

Reply Share Report

▲ ▼

u/throwaway_cpp_47291 -8 points † 4 hr. ago

this paper is just vendor advocacy against std::execution disguised as technical analysis. the committee already voted. senders are in the standard. task is on track. move on. not every paper from the same group of authors needs a reddit thread where everyone pretends the four gaps are news - Kühl and the P2300 authors are already addressing these in follow-up papers

Reply Share Report