Document: P4014R0
Authors: Vinnie Falco, Mungo Gill
Date: 2026-02-22
Audience: LEWG
The paper frames std::execution (P2300) as a continuation-passing-style sub-language within C++ - with its own control flow primitives, variable binding model, error handling, and iteration strategy. It maps every major C++ control flow construct to its sender equivalent, traces the theoretical roots to monads and CPS, and walks through progressively complex examples from a simple pipeline to a full retry algorithm.
The argument: the complexity serves GPU dispatch, HFT, and embedded well, but networking and I/O should have the same freedom to choose their own model. Ends with two suggested straw polls for LEWG. Same authors as P4007R0 and P4003R0.
Paper: P4014R0 · Target: LEWG · Type: Directional
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.
The retry algorithm comparison is the whole paper in one slide.
Sender version: 100+ lines, deferred construction helper, receiver adaptor with CRTP, operation state lifecycle with
std::optional, completion signature transformation, template metaprogramming across four struct definitions.Coroutine version:
Seven lines. Same behavior. I understand why the sender version exists - it handles the full completion signature algebra. But for the other 90% of us, this is the paper's strongest argument.
The coroutine version allocates a frame on every retry. The sender version doesn't. The sender version propagates completion signatures through the type system so the compiler sees the full pipeline at once. The coroutine version type-erases through
task<T>.These aren't the same algorithm with different syntax. They have different performance characteristics and different composition properties. The paper shows the syntax difference without showing the semantic difference.
The question isn't whether the sender version does more. It's whether anyone retrying a network request needs it to. I've written retry loops for HTTP clients, database connections, message queue consumers. Not once did I need zero-allocation retry with compile-time signature transformation.
I have. But I'm not going to pretend my use case is everyone's use case. That's literally what the paper is arguing.
Section 4 is going to make some people uncomfortable. The paper traces Eric Niebler's published writing from 2017 to 2024 and documents the shift from "90% of all async code should be coroutines" to "senders are the foundation, coroutines are one way to consume them." It's all public blog posts with direct quotes and links. No private communications, no inference.
Whether you read this as "the emphasis changed as the target problems changed" or "the vision drifted" depends on which side of the sender/coroutine divide you sit on. But the timeline itself is just facts.
People's views evolve as they work on harder problems. Calling it "emphasis change" rather than "intellectual growth" is a framing choice. Niebler started working on GPU dispatch and discovered that senders could do things coroutines structurally cannot. His views changed because the problem space expanded. That's normal.
Agreed. The paper says exactly that - "the emphasis naturally shifted as the target problems changed." It's not accusing anyone of bad faith. It's documenting a design trajectory so the committee can see where the current design came from and decide if the balance is right for all the domains C++ serves.
The timeline is just links to his own blog posts. If anything it's a service - now I don't have to find them myself.
Whether you agree with the paper's conclusions or not, calling
std::executiona "sub-language" is not loaded - it's accurate. The equivalence table in Section 2 maps sequential statements, local variables, return, try/catch, for/while, if/else, throw, and break/continue to sender primitives. When a library provides its own versions of every fundamental control flow construct, it has become a programming model, not an API.Template metaprogramming is a sub-language. constexpr evaluation is a sub-language. The preprocessor is a sub-language. Calling senders a sub-language puts them in that category, which is descriptive, not pejorative. The paper even says it's "an achievement."
You could make the same argument about ranges.
views::filter | views::transform | views::takereplaces for loops and conditionals with a pipeline model. Nobody calls it a sub-language.Ranges replace iteration patterns. Senders replace control flow, variable binding, error handling, iteration, AND resource management. Ranges compose over values. Senders compose over computations. The paper's Section 3 table maps senders to CPS, monadic bind, and algebraic effect theory. Ranges don't reach that level of abstraction.
Three of four algorithms in the P2300R7 motivating example aren't in C++26.
stop_when: removed.timeout: never proposed.first_successful: never proposed.The motivating example motivates a library that doesn't exist.
They exist in stdexec. Just not in the standard. Which is kind of the point of iterating - ship the core, add algorithms in C++29.
"Ship the core, add the algorithms later, and the coroutine integration is tracked by 29 open issues in a companion paper." What a pitch.
I work on GPU dispatch pipelines. Senders give me zero-allocation composition with compile-time work graphs that the compiler can optimize end-to-end. Nothing else in C++ does this. The "complexity" in Section 5 is the complexity of expressing what I need expressed. If you don't need it, don't write sender algorithms - compose the existing ones.
Nobody's asking to take that away. The paper explicitly says senders serve your domain well. The question is whether networking developers should also have to route every
co_awaitthrough the sender composition layer when they don't need zero-allocation pipelines or compile-time work graphs. P2583R0 shows that the sender bridge structurally prevents symmetric transfer. That's a real cost for I/O.You already have coroutines. They're in the language. You can use them right now without senders.
Without a standard task type that works well for I/O, without standard I/O objects, without a standard event loop. The coroutine language feature shipped in C++20. The library support for using it on I/O is still Boost-or-roll-your-own six years later. Meanwhile
std::execution::taskis in the WP with a companion paper tracking 29 open issues.Fair. P3796R1 is... a document.
Section 5.6, the backtracker example. I've read it three times. I still can't trace where the
failcontinuation goes through three levels ofstd::moveand two nestedlet_valuelambdas. This is a code review rejection in any codebase I've worked in.In practice you compose the algorithms, you don't write them. I've used stdexec for over a year and I've never implemented a custom sender algorithm.
just | then | let_valuecovers 95% of what I need. The paper is showing the internals to make a point, but users don't normally touch that layer.Right, and the fold example in Section 5.5 - the one that users would write - requires
any_sender_of<>for type erasure because the recursive type would be infinite. A type-erased sender. For folding a range. The coroutine version is four lines with a for loop.The suggested straw polls are well-constructed. "std::execution serves coroutine-driven async I/O less ideally than heterogeneous compute." I don't think anyone on LEWG would disagree with that. The question is whether they'll actually take it and what follows from it.
Taking the poll would require acknowledging that the coroutine integration was under-designed relative to the sender algorithms. That's politically expensive when you've just shipped
std::executioninto the WP.P3796R1 already acknowledged it. Twenty-nine issues. The paper's authors aren't making a controversial claim - they're building on what Kühl and Müller already documented.
The nvexec precedent argument in Section 6.2 is clever but it proves more than intended. nvexec reimplements every standard sender algorithm with CUDA extensions -
bulk,then,when_all,continues_on,let_value,split,reduce- all in.cuhfiles. And it requires a non-standard compiler.If the domain that senders were specifically designed for needs a complete reimplementation with non-standard extensions, maybe "universal asynchrony abstraction" was always the wrong goal. Maybe the right goal is domain-specific models that share boundaries. Which is... what the paper argues.
nvexec reimplements the algorithms because GPU kernels can't run standard C++. That's a hardware limitation, not a design failure. The sender abstraction is what lets the same pipeline description target either CPU or GPU with different algorithm implementations. That's the universality working as designed.
Points for putting the disclosure section first. The authors are upfront about having written P4003R0 (Coroutines for I/O) and P4007R0 (Senders and Coroutines). They have a position. They're telling you what it is before the argument starts.
It's a trilogy at this point. P4007 documents integration costs. P2583 documents symmetric transfer. P4014 documents what the sender model IS. Three different lenses on the same structural question. Agree or disagree, that's a thorough job.
The theoretical foundations table in Section 3 - Moggi (1991), Lambda Papers (1975), Danvy and Filinski (1990) - is the nerdiest thing I've read this week and I loved every line. They're really going back to first principles to explain why senders look the way they do. This should be required reading for anyone implementing a sender algorithm.
The trade-off table in Section 6.1 lists "long compile times" as a cost. I just want compile times to go down. Every new template-heavy feature makes it worse and my CI already takes 45 minutes.
The compile time IS the feature. That's where the optimizer sees the entire work graph and eliminates allocations. You're paying at compile time so you don't pay at runtime.
My boss does not consider compile time a feature.
Can we get networking in the standard before I retire. I started this job writing C++ in 2018.
Bold of you to assume you'll retire before C++32.
[removed by moderator]
What did they say?
Something about how P2300 was a conspiracy to sell more GPUs. You know, the usual.
Senders: solving yesterday's problems with tomorrow's compile times since 2020.
Is P2300 the new concepts? Years of design, massive committee investment, and a significant chunk of the community wondering if the cure is worse than the disease.
Concepts actually work and are universally useful now. We're at year 1 of senders. Give it time.
The zero-allocation property matters to me. In my world,
mallocis a bug. Senders give me composable async without touching the heap. That's real. But I also write approximately zero networking code, so I have no opinion on the I/O question.This is the most reasonable conclusion I've seen in any of the sender/coroutine papers. Nobody loses anything. GPU keeps senders. I/O gets coroutines. Both exist. Both are standardized. Done.
The hard part isn't the principle. The hard part is implementation: who designs the I/O model, who reviews it, and how long does it take through the process. We've been waiting for networking since 2005.
As someone who used Haskell for five years before switching to C++, seeing
justmapped to monadic return andlet_valuemapped to bind made my entire career flash before my eyes. The paper is right that this is CPS. It's also right that most C++ developers never asked for CPS.*laughs in Boost.Asio still working fine for my use case*
Meanwhile Rust shipped async/await, Tokio is the de facto standard runtime, and nobody argues about whether the abstraction is a sub-language. It just works.
Rust doesn't target GPUs with the same abstraction. Different problem space, different solution space.
I implemented a sender-based connection pool at work last year. Took three weeks. Reimplemented it with coroutines in four days. Both versions handle 40K concurrent connections. The coroutine version is 400 lines shorter and two junior devs can maintain it.
For anyone who wants more context, watch Eric Niebler's CppCon 2024 talk on senders and Lewis Baker's symmetric transfer talk. Two different perspectives from two people who both understand the design deeply.
The paper acknowledges that senders are "an achievement" and says the committee should be "proud of it." That's not a hit piece. It's asking for coexistence. I don't understand why that's controversial.
The "concurrent selection" row in the equivalence table being marked (absent) is doing a lot of quiet work.
when_allshipped but its dual didn't. You can fork but you can't race.Great, another paper that will take 10 years to get through LEWG.
[removed by moderator]