r/wg21
P4014R0 - The Sender Sub-Language WG21
Posted by u/execution_observer_26 · 14 hr. ago

Document: P4014R0
Authors: Vinnie Falco, Mungo Gill
Date: 2026-02-22
Audience: LEWG

The paper frames std::execution (P2300) as a continuation-passing-style sub-language within C++ - with its own control flow primitives, variable binding model, error handling, and iteration strategy. It maps every major C++ control flow construct to its sender equivalent, traces the theoretical roots to monads and CPS, and walks through progressively complex examples from a simple pipeline to a full retry algorithm.

The argument: the complexity serves GPU dispatch, HFT, and embedded well, but networking and I/O should have the same freedom to choose their own model. Ends with two suggested straw polls for LEWG. Same authors as P4007R0 and P4003R0.

▲ 487 points (81% upvoted) · 62 comments
sorted by: best
u/AutoModerator 1 point 14 hr. ago pinned comment

Paper: P4014R0 · Target: LEWG · Type: Directional

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

u/coroutine_partisan 312 points 13 hr. ago 🏆

The retry algorithm comparison is the whole paper in one slide.

Sender version: 100+ lines, deferred construction helper, receiver adaptor with CRTP, operation state lifecycle with std::optional, completion signature transformation, template metaprogramming across four struct definitions.

Coroutine version:

template <class F>
auto retry(F make_sender) -> task<T> {
    for (;;) {
        try {
            co_return co_await make_sender();
        } catch (...) {}
    }
}

Seven lines. Same behavior. I understand why the sender version exists - it handles the full completion signature algebra. But for the other 90% of us, this is the paper's strongest argument.

u/sender_apologist 89 points 12 hr. ago

The coroutine version allocates a frame on every retry. The sender version doesn't. The sender version propagates completion signatures through the type system so the compiler sees the full pipeline at once. The coroutine version type-erases through task<T>.

These aren't the same algorithm with different syntax. They have different performance characteristics and different composition properties. The paper shows the syntax difference without showing the semantic difference.

u/coroutine_partisan 54 points 11 hr. ago

The question isn't whether the sender version does more. It's whether anyone retrying a network request needs it to. I've written retry loops for HTTP clients, database connections, message queue consumers. Not once did I need zero-allocation retry with compile-time signature transformation.

u/hft_latency_nerd 28 points 10 hr. ago

I have. But I'm not going to pretend my use case is everyone's use case. That's literally what the paper is arguing.

u/async_historian 187 points 13 hr. ago

Section 4 is going to make some people uncomfortable. The paper traces Eric Niebler's published writing from 2017 to 2024 and documents the shift from "90% of all async code should be coroutines" to "senders are the foundation, coroutines are one way to consume them." It's all public blog posts with direct quotes and links. No private communications, no inference.

Whether you read this as "the emphasis changed as the target problems changed" or "the vision drifted" depends on which side of the sender/coroutine divide you sit on. But the timeline itself is just facts.

u/api_design_matters 67 points 12 hr. ago

People's views evolve as they work on harder problems. Calling it "emphasis change" rather than "intellectual growth" is a framing choice. Niebler started working on GPU dispatch and discovered that senders could do things coroutines structurally cannot. His views changed because the problem space expanded. That's normal.

u/async_historian 41 points 11 hr. ago

Agreed. The paper says exactly that - "the emphasis naturally shifted as the target problems changed." It's not accusing anyone of bad faith. It's documenting a design trajectory so the committee can see where the current design came from and decide if the balance is right for all the domains C++ serves.

u/just_ship_it_already 34 points 10 hr. ago

The timeline is just links to his own blog posts. If anything it's a service - now I don't have to find them myself.

u/library_design_snob 142 points 12 hr. ago

Whether you agree with the paper's conclusions or not, calling std::execution a "sub-language" is not loaded - it's accurate. The equivalence table in Section 2 maps sequential statements, local variables, return, try/catch, for/while, if/else, throw, and break/continue to sender primitives. When a library provides its own versions of every fundamental control flow construct, it has become a programming model, not an API.

Template metaprogramming is a sub-language. constexpr evaluation is a sub-language. The preprocessor is a sub-language. Calling senders a sub-language puts them in that category, which is descriptive, not pejorative. The paper even says it's "an achievement."

u/ranges_v3_survivor 78 points 11 hr. ago

You could make the same argument about ranges. views::filter | views::transform | views::take replaces for loops and conditionals with a pipeline model. Nobody calls it a sub-language.

u/library_design_snob 63 points 10 hr. ago

Ranges replace iteration patterns. Senders replace control flow, variable binding, error handling, iteration, AND resource management. Ranges compose over values. Senders compose over computations. The paper's Section 3 table maps senders to CPS, monadic bind, and algebraic effect theory. Ranges don't reach that level of abstraction.

u/stop_when_enjoyer 108 points 13 hr. ago

Three of four algorithms in the P2300R7 motivating example aren't in C++26. stop_when: removed. timeout: never proposed. first_successful: never proposed.

The motivating example motivates a library that doesn't exist.

u/sender_apologist 56 points 12 hr. ago

They exist in stdexec. Just not in the standard. Which is kind of the point of iterating - ship the core, add algorithms in C++29.

u/networking_in_2035 22 points 11 hr. ago

"Ship the core, add the algorithms later, and the coroutine integration is tracked by 29 open issues in a companion paper." What a pitch.

u/gpu_dispatch_daily 94 points 12 hr. ago

I work on GPU dispatch pipelines. Senders give me zero-allocation composition with compile-time work graphs that the compiler can optimize end-to-end. Nothing else in C++ does this. The "complexity" in Section 5 is the complexity of expressing what I need expressed. If you don't need it, don't write sender algorithms - compose the existing ones.

u/io_context_veteran 72 points 11 hr. ago

Nobody's asking to take that away. The paper explicitly says senders serve your domain well. The question is whether networking developers should also have to route every co_await through the sender composition layer when they don't need zero-allocation pipelines or compile-time work graphs. P2583R0 shows that the sender bridge structurally prevents symmetric transfer. That's a real cost for I/O.

u/gpu_dispatch_daily 38 points 10 hr. ago

You already have coroutines. They're in the language. You can use them right now without senders.

u/io_context_veteran 24 points 9 hr. ago

Without a standard task type that works well for I/O, without standard I/O objects, without a standard event loop. The coroutine language feature shipped in C++20. The library support for using it on I/O is still Boost-or-roll-your-own six years later. Meanwhile std::execution::task is in the WP with a companion paper tracking 29 open issues.

u/gpu_dispatch_daily 12 points 8 hr. ago

Fair. P3796R1 is... a document.

u/code_review_bot_9000 86 points 11 hr. ago

Section 5.6, the backtracker example. I've read it three times. I still can't trace where the fail continuation goes through three levels of std::move and two nested let_value lambdas. This is a code review rejection in any codebase I've worked in.

u/actually_writes_senders 42 points 10 hr. ago

In practice you compose the algorithms, you don't write them. I've used stdexec for over a year and I've never implemented a custom sender algorithm. just | then | let_value covers 95% of what I need. The paper is showing the internals to make a point, but users don't normally touch that layer.

u/code_review_bot_9000 31 points 9 hr. ago

Right, and the fold example in Section 5.5 - the one that users would write - requires any_sender_of<> for type erasure because the recursive type would be infinite. A type-erased sender. For folding a range. The coroutine version is four lines with a for loop.

u/lewg_process_watcher 78 points 11 hr. ago

The suggested straw polls are well-constructed. "std::execution serves coroutine-driven async I/O less ideally than heterogeneous compute." I don't think anyone on LEWG would disagree with that. The question is whether they'll actually take it and what follows from it.

u/committee_tea_leaves 53 points 10 hr. ago

Taking the poll would require acknowledging that the coroutine integration was under-designed relative to the sender algorithms. That's politically expensive when you've just shipped std::execution into the WP.

u/lewg_process_watcher 27 points 9 hr. ago

P3796R1 already acknowledged it. Twenty-nine issues. The paper's authors aren't making a controversial claim - they're building on what Kühl and Müller already documented.

u/abstraction_skeptic 71 points 10 hr. ago

The nvexec precedent argument in Section 6.2 is clever but it proves more than intended. nvexec reimplements every standard sender algorithm with CUDA extensions - bulk, then, when_all, continues_on, let_value, split, reduce - all in .cuh files. And it requires a non-standard compiler.

If the domain that senders were specifically designed for needs a complete reimplementation with non-standard extensions, maybe "universal asynchrony abstraction" was always the wrong goal. Maybe the right goal is domain-specific models that share boundaries. Which is... what the paper argues.

u/sender_apologist 45 points 9 hr. ago

nvexec reimplements the algorithms because GPU kernels can't run standard C++. That's a hardware limitation, not a design failure. The sender abstraction is what lets the same pipeline description target either CPU or GPU with different algorithm implementations. That's the universality working as designed.

u/disclosure_appreciator 64 points 12 hr. ago

Points for putting the disclosure section first. The authors are upfront about having written P4003R0 (Coroutines for I/O) and P4007R0 (Senders and Coroutines). They have a position. They're telling you what it is before the argument starts.

u/three_papers_one_take 37 points 11 hr. ago

It's a trilogy at this point. P4007 documents integration costs. P2583 documents symmetric transfer. P4014 documents what the sender model IS. Three different lenses on the same structural question. Agree or disagree, that's a thorough job.

u/plt_crossover_fan 58 points 13 hr. ago

The theoretical foundations table in Section 3 - Moggi (1991), Lambda Papers (1975), Danvy and Filinski (1990) - is the nerdiest thing I've read this week and I loved every line. They're really going back to first principles to explain why senders look the way they do. This should be required reading for anyone implementing a sender algorithm.

u/ci_budget_exceeded 52 points 10 hr. ago

The trade-off table in Section 6.1 lists "long compile times" as a cost. I just want compile times to go down. Every new template-heavy feature makes it worse and my CI already takes 45 minutes.

u/hft_latency_nerd 17 points 9 hr. ago

The compile time IS the feature. That's where the optimizer sees the entire work graph and eliminates allocations. You're paying at compile time so you don't pay at runtime.

u/ci_budget_exceeded 8 points 8 hr. ago

My boss does not consider compile time a feature.

u/networking_in_2035 47 points 13 hr. ago

Can we get networking in the standard before I retire. I started this job writing C++ in 2018.

u/just_use_boost_beast 19 points 12 hr. ago

Bold of you to assume you'll retire before C++32.

[deleted] score hidden 9 hr. ago

[removed by moderator]

u/what_did_they_say 14 points 8 hr. ago

What did they say?

u/just_ship_it_already 23 points 7 hr. ago

Something about how P2300 was a conspiracy to sell more GPUs. You know, the usual.

u/template_dungeon_42 39 points 12 hr. ago

Senders: solving yesterday's problems with tomorrow's compile times since 2020.

u/p2300_concepts_parallel 32 points 10 hr. ago

Is P2300 the new concepts? Years of design, massive committee investment, and a significant chunk of the community wondering if the cure is worse than the disease.

u/concepts_defender_2020 18 points 9 hr. ago

Concepts actually work and are universally useful now. We're at year 1 of senders. Give it time.

u/embedded_for_20_years 28 points 9 hr. ago

The zero-allocation property matters to me. In my world, malloc is a bug. Senders give me composable async without touching the heap. That's real. But I also write approximately zero networking code, so I have no opinion on the I/O question.

u/everyone_can_win 24 points 8 hr. ago
The standard is stronger when each domain gets the model it needs, and neither is forced to use the other's tool.

This is the most reasonable conclusion I've seen in any of the sender/coroutine papers. Nobody loses anything. GPU keeps senders. I/O gets coroutines. Both exist. Both are standardized. Done.

u/committee_tea_leaves 11 points 7 hr. ago

The hard part isn't the principle. The hard part is implementation: who designs the I/O model, who reviews it, and how long does it take through the process. We've been waiting for networking since 2005.

u/haskell_refugee_2021 21 points 11 hr. ago

As someone who used Haskell for five years before switching to C++, seeing just mapped to monadic return and let_value mapped to bind made my entire career flash before my eyes. The paper is right that this is CPS. It's also right that most C++ developers never asked for CPS.

u/senior_cpp_dev_tired 18 points 7 hr. ago

*laughs in Boost.Asio still working fine for my use case*

u/rust_mentioned_drink 15 points 8 hr. ago

Meanwhile Rust shipped async/await, Tokio is the de facto standard runtime, and nobody argues about whether the abstraction is a sub-language. It just works.

u/actually_writes_senders 9 points 7 hr. ago

Rust doesn't target GPUs with the same abstraction. Different problem space, different solution space.

u/implementation_reality 13 points 6 hr. ago

I implemented a sender-based connection pool at work last year. Took three weeks. Reimplemented it with coroutines in four days. Both versions handle 40K concurrent connections. The coroutine version is 400 lines shorter and two junior devs can maintain it.

u/conference_talk_addict 11 points 5 hr. ago

For anyone who wants more context, watch Eric Niebler's CppCon 2024 talk on senders and Lewis Baker's symmetric transfer talk. Two different perspectives from two people who both understand the design deeply.

u/compiles_first_try 8 points 4 hr. ago

The paper acknowledges that senders are "an achievement" and says the committee should be "proud of it." That's not a hit piece. It's asking for coexistence. I don't understand why that's controversial.

u/error_code_appreciator 6 points 3 hr. ago

The "concurrent selection" row in the equivalence table being marked (absent) is doing a lot of quiet work. when_all shipped but its dual didn't. You can fork but you can't race.

u/great_another_10_years 4 points 2 hr. ago

Great, another paper that will take 10 years to get through LEWG.