r/wg21 - Symmetric Transfer and Sender Composition

r/wg21

P2583R0 - Symmetric Transfer and Sender Composition WG21

Posted by u/coroutine_chain_observer · 14 hr. ago

Document: P2583R0
Authors: Mungo Gill, Vinnie Falco
Date: 2026-02-22
Audience: LEWG

C++20 symmetric transfer (P0913) lets await_suspend return a coroutine_handle<> so one coroutine suspends and another resumes as a tail call - constant stack space. std::execution sender algorithms create receivers that are structs, not coroutines. No coroutine_handle<> exists at any intermediate point. When a coroutine co_awaits a sender that completes synchronously, the stack grows by one frame per completion.

The paper argues this isn't a missing feature - it's architectural. The sender model's zero-allocation composition and symmetric transfer's constant-stack property can't both be satisfied. Surveys six production coroutine libraries - five of six use symmetric transfer. Documents that std::execution::task's proposed fix only covers task-to-task, not the general case. Same author line as P4003R0 and P4007R0.

▲ 394 points (82% upvoted) · 56 comments

sorted by: best

▲ ▼

u/AutoModerator 1 point 14 hr. ago pinned comment

Paper: P2583R0 · Target: LEWG · Type: Directional

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

Reply Share Report

▲ ▼

u/stack_overflow_preventer 267 points 13 hr. ago 🏆

The four facts in Section 5 are the whole paper:

1. Sender algorithms create receivers that are structs, not coroutines. These structs have no coroutine_handle<>.
2. Even coroutine-backed receivers complete through void-returning set_value.
3. The handle exists inside the receiver but the protocol provides no way to return it.
4. await_suspend cannot return what neither the composition layer nor the protocol provides.

That's the proof. You can't symmetric-transfer to a struct. You can't return a handle from a void function. The gap is structural.

Reply Share Report

▲ ▼

u/sender_algorithm_author 104 points 12 hr. ago

Facts 1-4 are correct. The conclusion - that it's "architectural" rather than "missing" - is where it gets debatable. If set_value returned coroutine_handle<>, the handle would propagate. The paper dismisses this because "every non-coroutine receiver would need to produce one." But noop_coroutine() exists for exactly this case.

Reply Share Report

▲ ▼

u/stack_overflow_preventer 72 points 11 hr. ago

The paper addresses this. Returning noop_coroutine() transfers control to a no-op. The continuation remains suspended. You'd need to resume it separately, defeating the purpose. Symmetric transfer is a tail call to the continuation. noop_coroutine makes it a tail call to nothing.

Reply Share Report

▲ ▼

u/sender_algorithm_author 38 points 10 hr. ago

OK, fair point. The struct receiver in the middle doesn't have a continuation to transfer to - it completes by calling set_value on the next receiver. The handle only exists at the terminal coroutine-backed receiver. So you'd need every intermediate struct to somehow propagate the handle backward. Which is... changing the receiver abstraction.

Reply Share Report

▲ ▼

u/cppcoro_enjoyer 178 points 13 hr. ago

The library survey is convincing. cppcoro, folly::coro, Boost.Cobalt, libcoro, Boost.Capy - five of six libraries independently converged on coroutine_handle<>-returning await_suspend in their final awaiters. That's not a design choice. That's the ecosystem telling you what the right answer is.

Lewis Baker literally implemented symmetric transfer in cppcoro and then co-authored P2300. He knows both models.

Reply Share Report

▲ ▼

u/asyncpp_author_adjacent 47 points 12 hr. ago

asyncpp is the sixth - it uses event-based notification instead of symmetric transfer. Worth noting that it's the outlier, not the rule.

Reply Share Report

▲ ▼

u/zero_alloc_or_die 142 points 12 hr. ago

The paper is honest about the tradeoff. Making sender algorithms into coroutines would give you handles for symmetric transfer. It would also give you heap-allocated frames at every intermediate point. Senders' zero-allocation composition is the entire value proposition for GPU and HFT use cases. You can't have both.

The sender model's zero-allocation composition property and symmetric transfer's constant-stack property cannot both be satisfied. One requires structs. The other requires coroutines.

This is a real engineering tradeoff, not a design flaw.

Reply Share Report

▲ ▼

u/io_context_veteran 68 points 11 hr. ago

Right. And the paper's conclusion isn't "senders are bad." It's "the domains that need zero-allocation composition should use senders, and the domains that need constant-stack coroutine chains should use coroutines." Different tools for different problems. The issue is that std::execution::task forces coroutines through the sender composition layer whether they need it or not.

Reply Share Report

Promoted

CppCon 2026 - Aurora, CO - Early bird ends May 15.

The conference for the C++ community.

▲ ▼

u/task_type_watcher 118 points 13 hr. ago

Section 6 on std::execution::task is the most concrete part. Kühl's P3796R1 says the spec doesn't mention symmetric transfer and the task gets wrapped in affine_on which produces a different sender type. Müller's P3801R0 shows the iterative-code-that's-actually-recursive stack overflow. Both acknowledge the fix is partial.

The paper adds Section 8: neither sync_wait nor spawn avoids the sender composition layer. Every path into std::execution::task goes through sender algorithms. Every path out loses symmetric transfer.

Reply Share Report

▲ ▼

u/inline_scheduler_victim 74 points 12 hr. ago

Müller's quote is devastating:

Having iterative code that is actually recursive is a potential security vulnerability.

A for loop that calls co_await f(i) N times with inline_scheduler grows O(N) stack frames. With symmetric transfer, it's O(1). The proposed task-to-task fix only handles the case where both sides are tasks. A task awaiting any other sender still grows the stack.

Reply Share Report

▲ ▼

u/trampoline_scheduler_fan 83 points 11 hr. ago

The paper is too dismissive of trampoline schedulers. Yes, they add runtime overhead. Yes, P0913 was designed to eliminate that overhead. But trampolines work in practice. Java's virtual threads use a similar approach. Go's goroutines use a stack-switching model. Runtime mitigation isn't elegant but it ships.

Reply Share Report

▲ ▼

u/zero_cost_absolutist 52 points 10 hr. ago

"Don't pay for what you don't use." The language already has a zero-cost mechanism for this specific problem. Using a runtime mitigation instead is paying for something you already paid for at language design time. The paper says exactly this: "the runtime overhead in the completion path that P0913R1 was specifically adopted to eliminate."

Reply Share Report

▲ ▼

u/disclosure_reader 58 points 12 hr. ago

Section 1 disclosure is appreciated. The authors developed P4003R0 and P4007R0. They have a position. They state it upfront. The limitation they document exists independently of their position - Kühl and Müller documented the same issue from different angles.

Reply Share Report

▲ ▼

u/tail_call_dreamer 46 points 10 hr. ago

Müller identifies guaranteed tail calls as the language-level fix. If set_value could be a tail call, the stack wouldn't grow. But C++ doesn't have guaranteed tail calls. And adding them is a much bigger lift than fixing the sender model.

Reply Share Report

▲ ▼

u/compiler_impl_perspective 28 points 9 hr. ago

Guaranteed tail calls in C++ would be a massive language feature. Destructors, exception handling, stack unwinding - all of these interact with tail call optimization. It's not impossible but it's a multi-year EWG effort. The paper is right to mention it and right not to propose it.

Reply Share Report

▲ ▼

u/implementation_first 38 points 11 hr. ago

Section 8.4 shows a coroutine-native launcher (Boost.Capy) that avoids the sender pipeline entirely. The task chain uses symmetric transfer throughout. The gap doesn't arise because the composition layer is the awaitable protocol, not the sender protocol. That's the existence proof that the problem is specific to the sender launch path.

Reply Share Report

▲ ▼

u/spawn_doesnt_compile 34 points 9 hr. ago

Section 8.2: spawn(task, token) doesn't compile because the task is scheduler-affine and counting_scope doesn't provide a scheduler. The working pattern wraps in on(sch, task). But on is a sender algorithm - Section 5 proved those can't support symmetric transfer. So the only launch mechanism that works also loses symmetric transfer. Catch-22.

Reply Share Report

Promoted

C++ Alliance - Open source C++ libraries. Community infrastructure.

cppalliance.org

▲ ▼

u/coroutine_partisan 29 points 8 hr. ago

Between this paper, P4007R0, and P4014R0 there are now three papers making the same argument from different angles: symmetric transfer (this paper), interface costs (P4007), and the sub-language analysis (P4014). The committee can't say they weren't warned.

Reply Share Report

▲ ▼

[deleted] score hidden 7 hr. ago

[removed by moderator]

Reply Share Report

▲ ▼

u/what_did_they_say 11 points 6 hr. ago

Something about symmetric transfer being "a solution looking for a problem." Rule 2.

Reply Share Report

▲ ▼

u/p0913_author_adjacent 24 points 10 hr. ago

Gor Nishanov's original motivation for P0913:

Recursive generators, zero-overhead futures and other facilities require efficient coroutine to coroutine control transfer.

P0913 was adopted to solve exactly the problem this paper documents. Six years later, the standard task type doesn't use the mechanism. That should concern people.

Reply Share Report

▲ ▼

u/great_another_10_years 19 points 7 hr. ago

This is why we can't have nice things.

Reply Share Report

▲ ▼

u/just_use_rust_already 15 points 6 hr. ago

Tokio doesn't have this problem because Rust's async model doesn't try to unify GPU dispatch and TCP sockets under the same abstraction. Different tools for different jobs. C++ keeps trying to build one tool that does everything and ending up with a tool that does everything poorly.

Reply Share Report

▲ ▼

u/actually_writes_cpp 8 points 5 hr. ago

That's literally what this paper is arguing for. Different models for different domains. You're agreeing with the paper while thinking you're dunking on C++.

Reply Share Report

▲ ▼

u/embedded_for_20_years 12 points 5 hr. ago

The O(N) stack growth for N synchronous completions is not just a performance concern. In embedded, stack is a fixed resource. An iterative loop that grows the stack unboundedly is a hard failure, not a soft one. If std::execution::task can't prevent that for the general case, it's unusable for us.

Reply Share Report

▲ ▼

u/skill_issue_42 6 points 3 hr. ago

I read the whole paper, understood every word, and now I'm depressed. Symmetric transfer is beautiful. The fact that it can't work through sender algorithms is genuinely sad.