r/wg21 - P2300R10 - std::execution

r/wg21

P2300R10 – std::execution WG21

Posted by u/standards_watcher_2024 · 18 hr. ago

Document: P2300R10 · Authors: Michał Dominiak, Lewis Baker, Lee Howes, Kirk Shoop, Michael Garland, Eric Niebler, Bryce Adelstein Lelbach · Date: 2024-06-28 · Audience: LEWG, LWG

After fourteen years, ten revisions, multiple design pivots, and at least one vote that failed to reach consensus, P2300R10 proposes the standard async execution framework C++ has been waiting for—or dreading, depending on who you ask. The paper defines schedulers, senders, and receivers as the core abstractions for asynchronous execution, with composable algorithms (then, let_value, when_all, bulk, etc.) and a cancellation model built on generalized stop tokens. R10 removes ensure_started and start_detached (deemed foot-guns, to be replaced by P3149 async_scope), renames transfer to continues_on and on to starts_on, fixes the sender algorithm customization mechanism via P3303R1, and adds the __cpp_lib_senders feature test macro.

The design is purely lazy—no work happens until you call start() on a connected operation state—and the paper spends considerable ink defending this choice against eager alternatives. The proposed wording is massive, touching stop tokens, the <execution> header, and adding a new subclause for asynchronous operations. Reference implementations exist at NVIDIA/stdexec and Meta/libunifex, with Intel’s bare-metal variant targeting embedded. Given this paper has been debated in SG1, LEWG, and plenary for over a decade, everyone has an opinion.

▲ 1,247 points (78% upvoted) · 97 comments

sorted by: best

▲▼

u/AutoModerator 1 point 18 hr. ago 📌 pinned

Paper Metadata

Document: P2300R10 · Title: std::execution · Authors: Michał Dominiak, Lewis Baker, Lee Howes, Kirk Shoop, Michael Garland, Eric Niebler, Bryce Adelstein Lelbach · Date: 2024-06-28 · Target: LEWG, LWG · Revision: R10 (previous: R9)

I am a bot, and this action was performed automatically. Please contact the moderators if you have any questions.

Reply Share Report

▲▼

u/another_cpp_dev_2019 23 points 17 hr. ago

Is there a quick summary of what changed from R9? I’m still mentally processing the tag_invoke removal.

ReplyShareReport

▲▼

u/paper_trail_2019 1 point 18 hr. ago 📌 pinned

Reminder: be civil. Several of the paper authors read these threads, and this topic has a history of going off the rails. Focus on the technical content.

ReplyShareReport

▲▼

u/retirement_countdown_42 456 points 17 hr. ago 🏆

I’ve been waiting for a standard async framework since I started my C++ career. I’m about to retire.

ReplyShareReport

▲▼

u/void_star_veteran 287 points 17 hr. ago

I’ve shipped three products, had two kids, and changed careers since executors were first proposed. My oldest is learning to code. In Rust.

ReplyShareReport

▲▼

u/UB_enjoyer_69 198 points 16 hr. ago

your kids have better taste than the committee

ReplyShareReport

▲▼

u/constexpr_everything_2024 92 points 16 hr. ago

This paper has more revisions than my git repo has commits.

ReplyShareReport

▲▼

u/async_skeptic 89 points 16 hr. ago 🏆🏆

The biggest under-discussed change in R10 is the removal of ensure_started and start_detached. These were the escape hatches—the “I need to fire off some work and not block on it” tools. Their removal forces every use of senders into a fully structured concurrency model where every operation must be awaited through a scope.

The paper points to P3149 (async_scope) as the replacement, but P3149 is a separate paper that isn’t in the same revision. So we’re shipping the async framework without the tool you need for one of the most common async patterns. It’s like shipping <algorithm> but putting std::sort in a companion paper.

I understand why they were removed. P3187R1 makes a solid case that ensure_started is a foot-gun—if the returned sender is destroyed before the operation completes, you get detached work, which is exactly the problem structured concurrency is supposed to solve. But the alternative is “wait for async_scope to ship.” In the meantime, users who need fire-and-forget have to either roll their own or reach for the reference implementation’s extensions.

ensure_started, start_detached, execute, and execute_may_block_caller are removed from the proposal. They are to be replaced with safer and more structured APIs by “async_scope — Creating scopes for non-sequential concurrency”

That “to be replaced” is doing a lot of heavy lifting. Is the committee comfortable shipping a framework with this dependency on a not-yet-adopted companion paper?

ReplyShareReport

▲▼

u/sender_receiver_enjoyer 45 points 15 hr. ago

The removal is the right call. ensure_started was fundamentally at odds with the structured concurrency model. If you want fire-and-forget, you need a scope that owns the lifetime of that work. That’s what counting_scope in P3149 provides. It’s not a gap; it’s a deliberate sequencing of features.

The committee has explicitly committed to shipping async_scope for C++26. P3109 lays out the plan. This isn’t a “networking someday” situation.

ReplyShareReport

▲▼

u/asio_since_2005 23 points 14 hr. ago

I run a server that needs to spawn background tasks for cleanup on disconnect. In Asio I do this today with co_spawn on a detached executor. What’s the P2300 answer in R10? “Wait for the next paper”?

ReplyShareReport

▲▼

u/sender_receiver_enjoyer 15 points 13 hr. ago

You use counting_scope::spawn(). The scope owns the lifetime. When the scope is destroyed, it joins all outstanding work. That’s better than detached—you can’t leak.

ReplyShareReport

▲▼

u/template_error_victim 67 points 15 hr. ago

TL;DR they removed the easy way to do things and the replacement is in a different paper that hasn’t shipped yet. Peak committee energy.

ReplyShareReport

▲▼

u/coroutine_pragmatist 34 points 12 hr. ago

I went back and re-read the rationale in P3187. The key argument is:

If the sender returned from ensure_started is destroyed before it is connected, the underlying operation is detached, violating the structured concurrency invariant that all work is eventually joined.

Which is fair. But the fix is to make the destructor join, not to remove the feature entirely. std::jthread solved exactly this problem for threads.

Edit: I know blocking destructors are controversial. My point is that there’s a design space between “detach silently” and “don’t exist.”

Edit2: actually rethinking this. The problem with a blocking destructor on a sender is that you don’t know what thread will run the destructor. If it’s the event loop thread, you deadlock. OK fine, removal is defensible.

ReplyShareReport

▲▼

u/segfault_sommelier -12 points 17 hr. ago

skill issue

ReplyShareReport

▲▼

u/asio_since_2005 156 points † 16 hr. ago

So we now have a 150-page async execution framework in C++26, and you still can’t read from a socket. The paper doesn’t mention networking once. Not in the motivation, not in the examples, not in the design rationale.

Asio has been production-ready since 2003. Chris Kohlhoff has maintained it through twenty years of standardization chaos. The Networking TS was built on it. And instead of adopting the thing that works, we got a decade-long argument about whether execution resources should be lazy or eager, and now we have a framework that can parallelize std::inclusive_scan across a GPU but can’t open a TCP connection.

I get it. P2300 is “the foundation.” Networking comes “later.” P2762 shows how networking senders might look. But “later” has been the answer for networking since C++11, and I’m running out of patience.

ReplyShareReport

▲▼

u/sender_receiver_enjoyer 78 points 15 hr. ago

P2300 is the execution model. Networking is an I/O concern that builds on top of the execution model. You don’t ship TCP in the same paper as the scheduler concept for the same reason you don’t ship std::format in the same paper as char_traits.

The whole reason the Networking TS stalled is that it baked in its own execution model (Asio’s completion token mechanism) rather than building on a standard one. P2300 fixes that by providing the standard execution model first. Then you layer networking on top, and it composes with everything else.

ReplyShareReport

▲▼

u/asio_since_2005 45 points 14 hr. ago

“First the foundation, then the building.” Sure. Except the Asio completion token model composes today. I compose async operations with deferred, use_awaitable, and parallel_group. It works. It’s deployed. I don’t need to understand transform_completion_signatures_of to read from a socket.

Show me one production deployment of stdexec that involves networking. Not a toy echo server. A production system.

ReplyShareReport

▲▼

u/sender_receiver_enjoyer 34 points 13 hr. ago

Asio’s completion token model doesn’t compose with parallel algorithms, bulk operations, or structured concurrency. It’s a callback-adapter layer on top of proactor I/O. Senders compose with everything—that’s why they’re worth the complexity.

libunifex is deployed at Meta’s scale. NVIDIA uses stdexec internally for GPU pipeline orchestration. Intel has a bare-metal variant running on microcontrollers. The deployment exists, just not in the “TCP echo server” domain you’re looking at.

ReplyShareReport

▲▼

u/asio_since_2005 28 points † 12 hr. ago

So the async framework for C++ is deployed at two companies—both of which employ the paper’s authors. And neither deployment involves I/O. You’re proving my point: this is a GPU compute framework that got generalized into a standard, not a general-purpose async framework that happens to support GPUs.

Asio is deployed at thousands of companies for networking. That’s a different scale of validation.

ReplyShareReport

▲▼

u/sender_receiver_enjoyer 21 points 11 hr. ago

libunifex supports io_uring. It does I/O. The networking bridge (stdexec asioexec) was merged months ago. And the design explicitly supports I/O—see section 1.4, the Windows socket recv example. The paper just doesn’t standardize I/O because that’s a separate concern.

ReplyShareReport

▲▼

u/networking_when 201 points 14 hr. ago 🏆

I just want to read from a socket without a PhD in template metaprogramming. Is that too much to ask.

ReplyShareReport

▲▼

u/just_use_rust_lol -8 points 16 hr. ago

just use tokio lmao

ReplyShareReport

▲▼

u/cmake_survivor 142 points 17 hr. ago

committee gonna committee

ReplyShareReport

▲▼

u/retirement_countdown_42 145 points 16 hr. ago

14 years. Executors have been in progress since 2012. There are people on the committee who started as grad students and now have gray hair because of this topic.

ReplyShareReport

▲▼

u/constexpr_everything_2024 56 points 15 hr. ago

and people say C++ moves too fast

ReplyShareReport

▲▼

u/embedded_for_20_years 67 points 15 hr. ago

Firmware developer here. The zero-allocation property of operation states is the part of this paper that nobody is talking about. When you connect a sender chain, the resulting operation state composes statically. The paper’s run_loop example uses space in its operation states to build an intrusive linked list of work items—all on the stack. Zero heap allocations.

Intel’s bare-metal implementation proves this works in constrained environments. I’ve been building ad-hoc versions of exactly this pattern for twenty years. Having it in the standard with proper type safety and composability is genuinely exciting.

The people complaining about complexity are looking at the implementer-facing API. The user-facing API for someone who just wants to compose then | continues_on | bulk is clean.

ReplyShareReport

▲▼

u/parallel_algorithms_guy 34 points 14 hr. ago

The intrusive linked list in run_loop is exactly what we’ve been doing by hand for decades on embedded systems. Having it formalized in a standard type that composes with generic algorithms is worth the complexity of the specification. The specification is complex so that the usage doesn’t have to be.

ReplyShareReport

▲▼

u/turbo_llama_9000 18 points 13 hr. ago

will this fit in 64K though

ReplyShareReport

▲▼

u/embedded_for_20_years 12 points 12 hr. ago

The Intel implementation runs on STM32. So yes, if your toolchain supports C++20 concepts. The code size overhead is mostly template instantiation, which you’re already paying for if you use any modern C++.

ReplyShareReport

Promoted

NVIDIA stdexec — Reference implementation of P2300 std::execution

Structured concurrency for C++. GPU extensions in the nvexec namespace. github.com/NVIDIA/stdexec

▲▼

u/api_ergonomics_matter 134 points 15 hr. ago 🏆🏆🏆

The paper’s priorities in section 1.2 say “Make it easy to be correct by construction.” Then section 1.5.1 shows the implementation of then—the simplest possible sender adaptor—and it requires:

A custom receiver type with CRTP-like inheritance
A sender type with a connect member and a get_completion_signatures member
transform_completion_signatures_of<S, Env, _except_ptr_sig, _set_value_t>
A connect_result_t return type
Perfect forwarding with C-style casts ((F&&) f_)

For then. The “hello world” of sender adaptors. If you want to write a custom sender for your execution resource, the learning curve is vertical.

The user-facing API (schedule | then | continues_on) is genuinely nice. But the gap between “using senders” and “implementing senders” is the widest I’ve seen in any standard library feature, and that includes ranges.

ReplyShareReport

▲▼

u/library_design_nerd 78 points 14 hr. ago

Eric Niebler literally said “the P2300 crew have collectively done a terrible job of making this work accessible.” At least the authors know. The question is whether acknowledging the problem counts as addressing it.

ReplyShareReport

▲▼

u/template_error_victim 56 points 13 hr. ago

imagine debugging a template error from transform_completion_signatures_of<S, Env, completion_signatures<set_error_t(exception_ptr)>, _set_value_t>

ReplyShareReport

▲▼

u/UB_enjoyer_69 89 points 12 hr. ago

laughs in compile times

ReplyShareReport

▲▼

u/cmake_survivor 45 points 11 hr. ago

I compiled a hello world with stdexec and it took 47 seconds. I am not joking. Forty-seven seconds. The binary was 8 bytes of useful work and 12 MB of template instantiations.

ReplyShareReport

▲▼

u/embedded_for_20_years 43 points 14 hr. ago

The end-user API is actually pretty clean. The pipe syntax reads left-to-right. You don’t need to understand completion signatures to use senders any more than you need to understand iterator traits to use a range-based for loop. Same tradeoff as ranges: the spec is complex so the usage doesn’t have to be.

ReplyShareReport

▲▼

u/api_ergonomics_matter 21 points 13 hr. ago

Fair point on ranges. But with ranges, if I need to write a custom view, I can look at existing views and the pattern is learnable. With senders, the customization mechanism goes through transform_sender on an execution domain, which requires understanding the entire domain dispatch machinery in section 5.4. That’s a qualitatively different barrier.

ReplyShareReport

▲▼

u/throwaway_wg21_opinion -15 points 15 hr. ago

tell me you’ve never shipped production code without telling me

ReplyShareReport

▲▼

u/latency_matters_ns 56 points 14 hr. ago

Section 4.9.4 contains this remarkable admission:

It is important to note, however, that initial review of this design in the SG1 concurrency subgroup raised some concerns related to runtime overhead of the design in single-threaded scenarios and these concerns are still being investigated.

“Still being investigated.” In R10. The revision being proposed for the standard. The cancellation overhead concern from SG1 is documented but unresolved in the text that’s going into C++26.

To be clear: the design mitigates this with never_stop_token and unstoppable_token, which should let the compiler optimize out the cancellation path. But “should” and “does” are different things, and the paper itself isn’t confident enough to remove the caveat.

ReplyShareReport

▲▼

u/compiler_frontend_dev 34 points 13 hr. ago

I work on a major compiler. The never_stop_token optimization works with -O2 in practice. The unstoppable_token concept is a compile-time check, so the compiler sees bool_constant<(!tok.stop_possible())>::value and eliminates the cancellation code path entirely.

But it requires the stop token type to be statically known as never_stop_token, not just a stoppable_token that happens to return false from stop_possible() at runtime. That’s the right design—it puts the optimization decision in the type system.

ReplyShareReport

▲▼

u/latency_matters_ns 28 points 12 hr. ago

Good to know. So the SG1 concern is effectively addressed by implementation quality, not by the specification? That’s... fine, I guess, but it means the paper is relying on optimizers doing the right thing with a complex type-level protocol. Which historically has not been C++’s strongest suit.

ReplyShareReport

▲▼

u/lambda_all_the_things 87 points 16 hr. ago

great, another paper that will take 10 years to get through LEWG

ReplyShareReport

▲▼

u/void_star_veteran 145 points 15 hr. ago

it already took 14

ReplyShareReport

▲▼

u/gpu_pipeline_architect 78 points 13 hr. ago

Section 4.10 is the most important part of this paper and the least discussed. The decision to make all senders lazy is what enables GPU kernel fusion, static operation state composition, and zero-overhead structured concurrency. If senders could be eager, every algorithm would need to handle the race between “the operation completed before connect was called” and “the operation hasn’t started yet,” which is exactly the std::future problem we’re trying to escape.

The paper lays out five failure modes of eager senders (UB on destruction, detached work, blocking destructors, type-erased stop callbacks, loss of execution context). Every one of them is a real bug I’ve debugged in production GPU code. Laziness eliminates all five by construction.

ReplyShareReport

▲▼

u/latency_matters_ns 45 points 12 hr. ago

In HFT we need to start work immediately when market data arrives. The latency between “data received” and “computation begins” is measured in nanoseconds. With lazy senders, there’s always a connect-then-start overhead between constructing the sender and beginning execution.

The paper’s argument in 4.10.1 against eager senders doesn’t account for the case where you know the operation won’t be cancelled, the receiver is ready, and the only thing between you and execution is the framework’s own laziness overhead.

ReplyShareReport

▲▼

u/gpu_pipeline_architect 34 points 11 hr. ago

The paper addresses exactly this in section 4.10:

users who intend to launch work earlier may write an algorithm to achieve that goal

You can build eager on top of lazy. You cannot build lazy on top of eager without introducing synchronization. The design correctly defaults to the composable primitive and lets users opt into eagerness when they need it.

ReplyShareReport

▲▼

u/latency_matters_ns 28 points † 10 hr. ago

“Build eager on top of lazy” adds overhead. In our measurements with stdexec, the connect+start path adds 15-30ns compared to a direct function call. At our scale, that’s 2-3 ticks of market data processing. I’m not saying the design is wrong—I’m saying the paper claims “zero overhead” abstraction and it isn’t, for this use case.

ReplyShareReport

▲▼

u/gpu_pipeline_architect 21 points 9 hr. ago

Your 15-30ns overhead is my kernel fusion. The paper never claims zero overhead for all use cases. Section 1.2 says “care about all reasonable use cases, domains and platforms,” and section 4.10 explicitly acknowledges that the lazy model trades eager-start latency for composability and correctness. That’s a design choice, not a bug.

ReplyShareReport

▲▼

u/former_boost_contributor_42 15 points 8 hr. ago

This is why the standard should support both models. Instead we got one camp’s preference enshrined as the only option. The committee had a chance to provide eager-when-available, lazy-by-default, and chose not to.

ReplyShareReport

▲▼

u/definitely_not_a_committee_member 67 points 12 hr. ago

For people who haven’t been following: the committee history of this paper is wild.

2012: SG1 starts discussing executors (N3562)
2016: P0443 (Unified Executors) begins, goes through 14 revisions
2020: P2300R0 reformulates executors around senders/receivers
Jan 2022: vote to forward P2300R4 to LWG for C++23: SF:23, WF:14, N:0, WA:6, SA:11. No consensus. Zero neutral votes.
2024: P3109 creates a concrete plan, builds consensus. P2300R10 is proposed for C++26.

The 2022 vote is the one to study. Thirty-seven people voted in favor, seventeen against, and nobody abstained. That vote tells you everything about how polarizing this design is. The fact that we got from there to a shippable proposal in two years is, frankly, impressive committee work.

ReplyShareReport

▲▼

u/async_skeptic 45 points 11 hr. ago

Zero neutral votes on an executors poll is the most on-brand WG21 thing I’ve ever heard. Everyone had an opinion and nobody was willing to sit it out.

ReplyShareReport

▲▼

u/retirement_countdown_42 123 points 10 hr. ago

54 people in a room, all with strong opinions about async C++, zero abstentions. Surprised nobody got stabbed.

ReplyShareReport

▲▼

u/library_design_nerd 23 points 10 hr. ago

P3109 was the turning point. Once there was a concrete plan with milestones—what ships in C++26, what waits for C++29—the opposition softened. Not because the design changed, but because the process became legible. Half the SA votes in 2022 were probably “not ready yet” rather than “wrong direction.”

ReplyShareReport

▲▼

[deleted] 11 hr. ago

[removed by moderator]

ReplyShareReport

▲▼

u/turbo_llama_9000 12 points 10 hr. ago

what did they say?

ReplyShareReport

▲▼

u/networking_when 34 points 10 hr. ago

something about the Networking TS being sabotaged, you know the usual

ReplyShareReport

▲▼

u/paper_trail_2019 1 point 10 hr. ago

Rule 2. Conspiracy theories about committee process are not constructive.

ReplyShareReport

▲▼

u/teaching_cpp_since_03 45 points 10 hr. ago

I teach a graduate C++ course. My plan for integrating senders is: year one, coroutines and ranges; year two, senders. You need the prerequisites. Students need to be comfortable with concepts, operator| composition, and the idea of lazy evaluation before they can approach this.

The good news: the pipe syntax (schedule(sch) | then(f) | continues_on(other_sch) | then(g)) is genuinely intuitive once you’ve seen range adaptors. The bad news: the first time a student makes a mistake and gets a template error message, they’ll lose a week.

ReplyShareReport

▲▼

u/api_ergonomics_matter 34 points 9 hr. ago

Two years of prerequisites to use the standard async library. This is a real cost that the paper doesn’t acknowledge. The motivation says “care about all reasonable use cases” but the onboarding curve says “care about use cases where the developer has a graduate degree in template metaprogramming.”

ReplyShareReport

▲▼

u/UB_enjoyer_69 78 points 8 hr. ago

and this is why bootcamp kids pick JavaScript

ReplyShareReport

Promoted

CppCon 2026 — Aurora, CO

Early bird ends May 15. The conference for the C++ community. cppcon.org

▲▼

u/another_cpp_dev_2019 56 points 12 hr. ago

R10 changelog for names alone:

transfer → continues_on
on → starts_on
New on = starts_on + continues_on
get_delegatee_scheduler → get_delegation_scheduler
read → read_env

R9 renamed in_place_stop_* to inplace_stop_*. R8 renamed make_completion_signatures to transform_completion_signatures_of. Every revision renames things. If you’ve been writing code against the reference implementation, you’ve rewritten your import statements every six months.

ReplyShareReport

▲▼

u/compiles_first_try 89 points 11 hr. ago

I’ve rewritten my blog tutorial three times. At this point I’m going to wait for C++29 before I publish anything, just so I don’t have to update it again when R11 renames sync_wait to blocks_caller_until_done_with_optional_variant_tuple.

ReplyShareReport

▲▼

u/library_design_nerd 23 points 10 hr. ago

The rename from transfer to continues_on is actually an improvement. “Transfer” is ambiguous—transfer what? continues_on(snd, sch) reads as “the work described by snd continues on sch.” Same for starts_on. The new names describe the operation. See P3175R3 for the rationale.

ReplyShareReport

▲▼

u/another_cpp_dev_2019 12 points 9 hr. ago

Fair, continues_on is clearer. My complaint isn’t about any individual rename—it’s that ten revisions of renames means the entire pre-standardization ecosystem’s documentation is wrong. Every CppCon talk, every blog post, every Stack Overflow answer uses the old names.

ReplyShareReport

▲▼

u/coroutine_pragmatist 78 points 10 hr. ago

Has anyone noticed there’s no schedule_after or schedule_at? The paper mentions in section 4.2:

A time_scheduler concept that extends scheduler to support time-based scheduling. Such a concept might provide access to schedule_after(sched, duration), schedule_at(sched, time_point) and now(sched) APIs.

“Future papers.” So C++26 gets an async framework where you can't say “do this in 5 seconds.” Timer-based scheduling is one of the most basic async patterns—timeouts, debouncing, polling, heartbeats—and it’s deferred to a future standard.

ReplyShareReport

▲▼

u/sender_receiver_enjoyer 34 points 9 hr. ago

Time-based scheduling requires a clock source, which is execution-resource-specific. A thread pool uses steady_clock. An event loop uses its own timer mechanism. An embedded system uses a hardware timer. Standardizing all of that in the same paper would triple its size. The right move is to ship the core framework and layer timers on top.

ReplyShareReport

▲▼

u/networking_when 156 points 8 hr. ago 🏆

ah yes, the C++ tradition of shipping half the feature and promising the rest later. See also: modules (no standard build system), coroutines (no task or generator until C++23), ranges (no to<vector> until C++23). We keep standardizing the hard infrastructure and leaving out the parts people actually use.

ReplyShareReport

▲▼

u/template_wizard_emeritus 34 points 9 hr. ago

The completion signatures mechanism (section 5.8) is probably the most underappreciated innovation in the paper. It’s a type-level protocol that lets a sender statically declare every way it can complete—value types, error types, and whether it can send stopped. This means the compiler can check at connect time whether the receiver can handle all possible completions.

Compare this to std::future<T>, which can only send one value type or an exception_ptr. Senders can send multiple different value types depending on runtime conditions, and the type system tracks all of them. That’s why you see things like into_variant—it collapses all possible value completions into a single variant type when you want to unify them.

// A sender that might send int or string depending on a condition
using sigs = completion_signatures<
  set_value_t(int),
  set_value_t(std::string),
  set_error_t(std::exception_ptr),
  set_stopped_t()
>;

ReplyShareReport

▲▼

u/compiler_frontend_dev 21 points 8 hr. ago

transform_completion_signatures_of is the real power tool here. It lets you write an adaptor that transforms value completions while passing through error and stopped completions automatically. Once you understand it, writing adaptors becomes mechanical. The problem is the “once you understand it” part takes about a week of staring at the spec.

ReplyShareReport

▲▼

u/just_use_rust_lol 23 points † 15 hr. ago

Meanwhile Rust has had async/await since 2019, tokio has been production-ready for years, and their async story “just works.” But sure, let’s spend another decade debating whether senders should be lazy.

ReplyShareReport

▲▼

u/async_skeptic 34 points 14 hr. ago

Rust async is single-threaded by default and their Send/Sync pain for multi-threaded runtimes is well documented. Their model also doesn’t address heterogeneous compute (GPUs, FPGAs, DSPs). Different design space, different tradeoffs.

ReplyShareReport

▲▼

u/compiles_first_try 45 points 13 hr. ago

Every time someone mentions Rust async, they forget to mention the function coloring problem, the Pin<Box<dyn Future>> dance, and the fact that select! is a macro because the type system can’t express it. Async is hard in every language.

ReplyShareReport

▲▼

u/turbo_llama_9000 210 points 12 hr. ago 🏆

Sir, this is a Wendy’s

ReplyShareReport

▲▼

u/process_cynic_throwaway -47 points † 11 hr. ago

The author list is four NVIDIA employees and two Meta employees. The reference implementation is an NVIDIA project. The primary deployment is at NVIDIA and Meta. The design prioritizes GPU compute and large-scale server workloads, which are NVIDIA and Meta’s business domains.

This is a competent design for the problems the authors personally face. The committee should recognize that the recommendation to adopt this as the standard async model for all of C++ comes from authors with a documented advocacy position and commercial interest in this specific design direction. Corroborating analysis from authors outside the GPU/hyperscaler bubble would strengthen the case.

ReplyShareReport

▲▼

u/parallel_algorithms_guy 34 points 10 hr. ago

That’s not a criticism, that’s a feature. You want the people who actually need async at scale to design the async framework. Eric Niebler is also the person who designed range-v3, which became std::ranges. Lewis Baker wrote cppcoro. These aren’t corporate drones pushing a product—they’re the people who actually understand the problem space.

ReplyShareReport

▲▼

u/process_cynic_throwaway 23 points 9 hr. ago

I didn’t say the design is wrong. I said the recommendation is advocacy-informed rather than neutral. Eric Niebler’s track record is excellent. But “this person has been right before” is not the same as “we don’t need independent validation.” What does someone who writes database engines, or embedded firmware, or game engines think about this design? We should hear from them too.

ReplyShareReport

▲▼

u/networking_when 234 points 14 hr. ago

can we please just get networking in the standard before I retire

ReplyShareReport

▲▼

u/cpp_mastery_course -8 points 13 hr. ago

Want to 10x your C++ skills? Our Advanced C++ Masterclass covers async, coroutines, and more! Use code SENDERS20 for 20% off. learnmoderncpp.io

ReplyShareReport

▲▼

u/cmake_survivor 12 points 12 hr. ago

report and move on

ReplyShareReport

▲▼

u/frame_budget_warrior 34 points 8 hr. ago

Game engine perspective: we evaluated stdexec for our ECS job system. The runtime performance was competitive with our hand-rolled fiber-based scheduler—within 5% on our benchmark suite. The structured concurrency model maps well to frame-scoped work: spawn all jobs for a frame, sync_wait at the frame boundary, guaranteed completion.

The compile times were brutal. Our job system compiles in 3 seconds. The stdexec version took 90 seconds for the same translation unit. For a game studio that does 200+ builds per day, that’s a non-starter. We’re watching this space but won’t adopt until compile times improve by at least 10x.

ReplyShareReport

▲▼

u/gpu_pipeline_architect 28 points 7 hr. ago

Compile times are an implementation quality issue, not a design issue. The basic-sender exposition-only class template in R8+ is specifically designed to reduce template instantiation depth. As compilers get better at concepts and as <execution> moves into the standard library (precompiled), this will improve. Modules will also help—eventually.

ReplyShareReport

▲▼

u/frame_budget_warrior 12 points 6 hr. ago

I’ve heard “modules will help” for four years. My build system doesn’t even support modules yet and we’re shipping on three platforms. The 47-second hello world someone mentioned upthread was not a joke, was it.

ReplyShareReport

▲▼

u/cmake_survivor 34 points 9 hr. ago

tangentially, has anyone benchmarked <execution> compile times compared to, say, <ranges> or <format>? My CI pipeline already takes 2 hours and I’m not thrilled about adding another heavy header.

ReplyShareReport

▲▼

u/constexpr_everything_2024 23 points 8 hr. ago

modules will fix this

ReplyShareReport

▲▼

u/lambda_all_the_things 15 points 7 hr. ago

modules will fix this

ReplyShareReport

▲▼

u/void_star_veteran 187 points 6 hr. ago 🏆

“modules will fix this” 🤡

ReplyShareReport

▲▼

u/actually_read_the_paper 12 points 4 hr. ago

I actually read the whole paper. All of it. Here’s what everyone is missing while arguing about naming and networking:

The sender algorithm customization mechanism in section 5.4 is the actual innovation. When you call then(snd, f), the algorithm doesn’t just wrap the sender in a generic adaptor. It queries the sender’s completion scheduler for an execution domain, and then asks that domain to transform_sender the algorithm. This means a CUDA scheduler can intercept bulk(snd, shape, f) and turn it into a GPU kernel launch without the user writing any GPU-specific code.

auto snd = schedule(cuda_sch)
         | then([] { return 42; })
         | bulk(1024, [](int i, int v) { /* runs on GPU */ });
// The CUDA domain intercepts bulk() and maps it
// to a kernel launch. No __global__, no <<<>>>.

This is why P3303R1 (fixing transform_sender in connect/get_completion_signatures) was critical for R10. Without it, the domain dispatch happened too early and the scheduler couldn’t see the full sender chain. The fix makes it lazy—transform happens at connect time, when the scheduler has full context.

This is the part that justifies the complexity. Not then. Not sync_wait. The domain dispatch is what makes senders a platform rather than just another callback library.

ReplyShareReport

▲▼

u/gpu_pipeline_architect 8 points 3 hr. ago

This is the comment. The domain dispatch mechanism is what separates P2300 from “just another async library.” Without it, every GPU runtime would need to customize every algorithm. With it, the domain sees the whole pipeline and can fuse operations. That’s the magic.

ReplyShareReport

▲▼

u/compiles_first_try 67 points 6 hr. ago

Unpopular opinion apparently: I’ve been using coroutines for 3 years and this is the first time I’ve seen a coherent story for async in C++ that doesn’t feel like duct tape. The pipe syntax is nice, the structured concurrency model is sound, and the fact that operation states compose on the stack without allocation is genuinely novel for a standard library feature. I’ll take it.

ReplyShareReport

▲▼

u/UB_enjoyer_69 89 points 5 hr. ago

brave of you to say something positive on r/wg21

ReplyShareReport

▲▼

u/compiler_frontend_dev 34 points 6 hr. ago

I work on a major compiler and the amount of work required to implement this header is... significant. The basic-sender exposition-only type alone has more moving parts than most standard headers. We’re looking at 6-12 months of implementation work after the wording is frozen. Users should not expect <execution> to be available in their compiler the day C++26 ships.

ReplyShareReport

▲▼

u/template_error_victim 56 points 5 hr. ago

oh no

ReplyShareReport

▲▼

u/definitely_knows_what_volatile_does 78 points 8 hr. ago

proposed wording for my codebase:

auto do_my_job = schedule(coffee_machine)
    | then([] { return read_email(); })
    | continues_on(panic_scheduler)
    | then([](auto email) {
        if (email.from == "CTO")
            return open_resume_site();
        return do_actual_work();
      })
    | bulk(99, [](int i, auto result) {
        // pretend to be productive
      });

ReplyShareReport

▲▼

u/lambda_all_the_things 45 points 7 hr. ago

you forgot | stopped_as_error(std::make_error_code(std::errc::operation_canceled)) for when you rage-quit

ReplyShareReport

▲▼

u/coroutine_pragmatist 23 points 4 hr. ago

I thought about this more and there’s a gap nobody’s mentioned: the interaction between senders and coroutines.

Section 5.7 says you can co_await a sender in a coroutine that uses with_awaitable_senders. That’s great. But the paper doesn’t provide a standard coroutine task type. So you can co_await senders, but only if you write your own promise type first. The gap between “coroutines exist” and “senders exist” and “they work together seamlessly” is still two or three papers wide.

ReplyShareReport

▲▼

u/async_skeptic 12 points 3 hr. ago

This is the biggest practical gap. Most C++ developers who’ve used async at all have used coroutines. They want to write co_await read_from_socket(), not schedule(sch) | then(f) | continues_on(sch2) | then(g). A standard task<T> that bridges senders and coroutines is what makes this usable for the median developer. Without it, P2300 is infrastructure for library authors, not end users.

ReplyShareReport

▲▼

u/throwaway_84729 34 points 13 hr. ago

is this the thread where we complain about executors taking too long, or the thread where we complain about them being too complicated? I want to make sure I’m posting in the right one.

ReplyShareReport

▲▼

u/retirement_countdown_42 67 points 12 hr. ago

yes

ReplyShareReport

▲▼

u/definitely_not_herb 5 points 2 hr. ago

Boost/Folly/libunifex already does this. But sure, let’s spend another meeting cycle on wording for get_delegation_scheduler.

ReplyShareReport

▲▼

u/former_boost_contributor_42 3 points 47 minutes ago

The whole point of standardization is that you don’t need to pick between Boost/Folly/libunifex. One vocabulary. One set of concepts. sender auto works with any execution resource. That’s worth the pain, even if it takes 14 years.

ReplyShareReport