Document: P2583R0
Authors: Mungo Gill, Vinnie Falco
Date: 2026-02-22
Audience: LEWG
C++20 symmetric transfer (P0913) lets await_suspend return a coroutine_handle<> so one coroutine suspends and another resumes as a tail call - constant stack space. std::execution sender algorithms create receivers that are structs, not coroutines. No coroutine_handle<> exists at any intermediate point. When a coroutine co_awaits a sender that completes synchronously, the stack grows by one frame per completion.
The paper argues this isn't a missing feature - it's architectural. The sender model's zero-allocation composition and symmetric transfer's constant-stack property can't both be satisfied. Surveys six production coroutine libraries - five of six use symmetric transfer. Documents that std::execution::task's proposed fix only covers task-to-task, not the general case. Same author line as P4003R0 and P4007R0.
Paper: P2583R0 · Target: LEWG · Type: Directional
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.
The four facts in Section 5 are the whole paper:
That's the proof. You can't symmetric-transfer to a struct. You can't return a handle from a void function. The gap is structural.
Facts 1-4 are correct. The conclusion - that it's "architectural" rather than "missing" - is where it gets debatable. If
set_valuereturnedcoroutine_handle<>, the handle would propagate. The paper dismisses this because "every non-coroutine receiver would need to produce one." Butnoop_coroutine()exists for exactly this case.The paper addresses this. Returning
noop_coroutine()transfers control to a no-op. The continuation remains suspended. You'd need to resume it separately, defeating the purpose. Symmetric transfer is a tail call to the continuation.noop_coroutinemakes it a tail call to nothing.OK, fair point. The struct receiver in the middle doesn't have a continuation to transfer to - it completes by calling
set_valueon the next receiver. The handle only exists at the terminal coroutine-backed receiver. So you'd need every intermediate struct to somehow propagate the handle backward. Which is... changing the receiver abstraction.The library survey is convincing. cppcoro, folly::coro, Boost.Cobalt, libcoro, Boost.Capy - five of six libraries independently converged on
coroutine_handle<>-returningawait_suspendin their final awaiters. That's not a design choice. That's the ecosystem telling you what the right answer is.Lewis Baker literally implemented symmetric transfer in cppcoro and then co-authored P2300. He knows both models.
asyncpp is the sixth - it uses event-based notification instead of symmetric transfer. Worth noting that it's the outlier, not the rule.
The paper is honest about the tradeoff. Making sender algorithms into coroutines would give you handles for symmetric transfer. It would also give you heap-allocated frames at every intermediate point. Senders' zero-allocation composition is the entire value proposition for GPU and HFT use cases. You can't have both.
This is a real engineering tradeoff, not a design flaw.
Right. And the paper's conclusion isn't "senders are bad." It's "the domains that need zero-allocation composition should use senders, and the domains that need constant-stack coroutine chains should use coroutines." Different tools for different problems. The issue is that
std::execution::taskforces coroutines through the sender composition layer whether they need it or not.Section 6 on
std::execution::taskis the most concrete part. Kühl's P3796R1 says the spec doesn't mention symmetric transfer and the task gets wrapped inaffine_onwhich produces a different sender type. Müller's P3801R0 shows the iterative-code-that's-actually-recursive stack overflow. Both acknowledge the fix is partial.The paper adds Section 8: neither
sync_waitnorspawnavoids the sender composition layer. Every path intostd::execution::taskgoes through sender algorithms. Every path out loses symmetric transfer.Müller's quote is devastating:
A for loop that calls
co_await f(i)N times withinline_schedulergrows O(N) stack frames. With symmetric transfer, it's O(1). The proposed task-to-task fix only handles the case where both sides are tasks. A task awaiting any other sender still grows the stack.The paper is too dismissive of trampoline schedulers. Yes, they add runtime overhead. Yes, P0913 was designed to eliminate that overhead. But trampolines work in practice. Java's virtual threads use a similar approach. Go's goroutines use a stack-switching model. Runtime mitigation isn't elegant but it ships.
"Don't pay for what you don't use." The language already has a zero-cost mechanism for this specific problem. Using a runtime mitigation instead is paying for something you already paid for at language design time. The paper says exactly this: "the runtime overhead in the completion path that P0913R1 was specifically adopted to eliminate."
Section 1 disclosure is appreciated. The authors developed P4003R0 and P4007R0. They have a position. They state it upfront. The limitation they document exists independently of their position - Kühl and Müller documented the same issue from different angles.
Müller identifies guaranteed tail calls as the language-level fix. If
set_valuecould be a tail call, the stack wouldn't grow. But C++ doesn't have guaranteed tail calls. And adding them is a much bigger lift than fixing the sender model.Guaranteed tail calls in C++ would be a massive language feature. Destructors, exception handling, stack unwinding - all of these interact with tail call optimization. It's not impossible but it's a multi-year EWG effort. The paper is right to mention it and right not to propose it.
Section 8.4 shows a coroutine-native launcher (Boost.Capy) that avoids the sender pipeline entirely. The task chain uses symmetric transfer throughout. The gap doesn't arise because the composition layer is the awaitable protocol, not the sender protocol. That's the existence proof that the problem is specific to the sender launch path.
Section 8.2:
spawn(task, token)doesn't compile because the task is scheduler-affine andcounting_scopedoesn't provide a scheduler. The working pattern wraps inon(sch, task). Butonis a sender algorithm - Section 5 proved those can't support symmetric transfer. So the only launch mechanism that works also loses symmetric transfer. Catch-22.Between this paper, P4007R0, and P4014R0 there are now three papers making the same argument from different angles: symmetric transfer (this paper), interface costs (P4007), and the sub-language analysis (P4014). The committee can't say they weren't warned.
[removed by moderator]
Something about symmetric transfer being "a solution looking for a problem." Rule 2.
Gor Nishanov's original motivation for P0913:
P0913 was adopted to solve exactly the problem this paper documents. Six years later, the standard task type doesn't use the mechanism. That should concern people.
This is why we can't have nice things.
Tokio doesn't have this problem because Rust's async model doesn't try to unify GPU dispatch and TCP sockets under the same abstraction. Different tools for different jobs. C++ keeps trying to build one tool that does everything and ending up with a tool that does everything poorly.
That's literally what this paper is arguing for. Different models for different domains. You're agreeing with the paper while thinking you're dunking on C++.
The O(N) stack growth for N synchronous completions is not just a performance concern. In embedded, stack is a fixed resource. An iterative loop that grows the stack unboundedly is a hard failure, not a soft one. If
std::execution::taskcan't prevent that for the general case, it's unusable for us.I read the whole paper, understood every word, and now I'm depressed. Symmetric transfer is beautiful. The fact that it can't work through sender algorithms is genuinely sad.
committee gonna committee