Author: Dietmar Kühl
Document: P3941R2
Date: 2026-03-14
Target: LEWG
Link: wg21.link/p3941r2
Dietmar Kühl's latest tackles the messy state of affine_on in std::execution::task - the algorithm responsible for making sure your coroutine resumes on the same scheduler it was running on before a co_await. R2 addresses five NB comments at once, all stemming from concerns raised in P3796R1.
The headline changes: affine_on drops its scheduler parameter and pulls it from the receiver's environment instead. Schedulers used with affine_on must now be infallible - no set_error or set_stopped completions allowed when you're trying to resume. inline_scheduler, task_scheduler, and run_loop::scheduler can meet this bar. parallel_scheduler probably can't.
The paper also removes change_coroutine_scheduler entirely, arguing that locals destroyed on the wrong scheduler and invisible overhead for all tasks make it a net negative. The replacement is nesting a task with starts_on - more verbose but structurally cleaner. There's also an optional new query get_start_scheduler to separate "where should I schedule new work" from "where was I started."
P3941R2 | Scheduler Affinity | Dietmar Kühl | LEWG
https://wg21.link/p3941r2
Reminder: be civil. The paper authors sometimes read these threads.
great, another execution paper. at this rate we'll have async hello world standardized by C++32
I've read this paper three times and I still can't explain what
affine_ondoes to my coworkers without drawing a diagram on a whiteboard and then erasing the whole thing halfway throughIt means your coroutine goes back to the scheduler it came from after a
co_await. Like a boomerang. A boomerang made of templates and completion signatures.Seriously though, it's NB comment resolution - this is on the fast track for C++26, not a new feature proposal.
That's Section 3.7. In its entirety. The full analysis. A 15-page paper rewriting the algorithm's parameters, constraints, and semantics, and the naming section is two sentences.
WG21 naming committee in action. "We have identified the problem. Meeting adjourned."
be grateful. it could have been
basic_affine_on_view_adaptor_closure_tThe elephant in the room is
parallel_scheduler.The paper requires infallible schedulers for
affine_on, then says:And specifically about
parallel_scheduler:So we're standardizing
taskwith scheduler affinity baked in, and one of the four standard schedulers can't participate. The paper acknowledges this but offers "the user can adapt the scheduler" without providing any such adapter in the standard library. The only adaptation strategies listed are (1) callstd::terminateon failure, or (2) silently break affinity and hope nobody notices.I understand why infallibility is the right constraint - if scheduling fails you genuinely cannot guarantee resumption on the right execution agent. But the practical consequence is that
task+parallel_scheduleris a compile error. That seems like something LEWG should discuss explicitly rather than discover after the fact.From where I sit this is exactly right. On embedded targets we need deterministic guarantees about where code runs. If scheduling can fail mid-
co_awaitand you end up on the wrong execution agent, that's not an error you can recover from - it's a category violation. Your ISR handler is now running user-mode code.The
parallel_schedulergap is real but I'd rather have a hard constraint I can reason about than a soft one that silently degrades. Make fallible schedulers opt-in with an explicit adapter and force the user to confront what "scheduling failed" means for their domain.Fair point for embedded. My concern is more server-side: we're building executors for thread pools and work-stealing queues where scheduling failure is a real operational condition (pool at capacity, thread creation failure). Requiring the user to write an adapter before they can use
taskwith the standard thread pool scheduler is a usability cliff that won't show up until someone tries it in production.The paper's own Section 3.3.3 basically says "if this turns out to be too strong we can relax it later." That's usually committee-speak for "we'll ship it and see who complains."
Something I haven't seen anyone mention: Section 4 contains two complete alternative wordings. One adds
get_start_scheduleras a new query that defaults toget_scheduler. The other overloads the existingget_schedulerwith the "started on" semantics.The paper identifies a real dual-use problem -
get_schedulercurrently means both "which scheduler should I use for new work" and "which scheduler was I started on." Those aren't the same thing. But instead of picking a direction, it ships both options to LEWG.I'd bet on
get_start_schedulergetting consensus. The fallback-to-get_schedulerdefault means existing code keeps working, and the separation of concerns is cleaner. But LEWG is going to need to poll this explicitly, and I don't see it flagged as an open question in the change history.The dual-use ambiguity was already causing confusion in P3796R1. Having a single query that means different things depending on who's asking is the kind of thing that generates four more papers and a study group.
get_start_scheduleris the right call. The alternative is having every algorithm that cares about "where was I started" document which interpretation ofget_schedulerit uses.I agree the two problems with
change_coroutine_schedulerare real. Locals on the wrong scheduler is a correctness issue, and the invisible storage cost for all tasks is unfortunate. But look at the proposed alternative:You're asking users to create a lambda, invoke it immediately, wrap it in
starts_on, andco_awaitthe result. That's four concepts chained together to do whatco_await change_coroutine_scheduler(s)did in one line.The paper calls this "a bit verbose." That's underselling it. The mental model goes from "I'm now on scheduler S" to "I'm spawning a nested coroutine that runs on scheduler S and I get the result back." Those are different programming models with different failure surfaces.
The verbosity is the feature.
change_coroutine_schedulermakes it look like changing a local variable when it's actually restructuring the execution graph. The nested task makes the scope explicit - you can see exactly where the different scheduler starts and ends.Compare:
The second form makes the lifetime boundary visible. You can't accidentally destroy locals on the wrong scheduler because they live in a different coroutine frame.
I take your point on scoping. The correctness argument is sound. My worry is adoption cost. People coming from Asio or libunifex have a mental model where changing execution context is a lightweight operation. We're replacing it with "wrap your logic in a nested coroutine" which, even if structurally better, is going to be a migration stumbling block.
And the lambda-immediately-invoked pattern is going to confuse anyone who isn't steeped in this idiom.
Yeah, the adoption concern is real. I think the right move is: remove
change_coroutine_scheduler(the paper is correct on the problems), but make sure the ecosystem has a convenience wrapper - something likerun_on(scheduler, callable)- that hides the lambda-IILE-co_awaitboilerplate. It doesn't need to be in this paper, but it should exist before people start writing productiontaskcode.Edit: actually,
on(sch, nested_task)already exists. The gap is more about discoverability than missing functionality.In Tokio you just
spawna task and the runtime handles affinity. Fifteen pages of wording changes to achieve what a runtime scheduler does by default.C++ doesn't have a runtime. That's literally the design constraint. You're comparing a language with a built-in task scheduler to one where the scheduler is a library type the user provides.
this is exactly why sender/receiver will be the next Concepts - technically correct and nobody outside the committee and two companies will use it for a decade
Worth noting context: Kühl is a long-time P2300 contributor at Bloomberg and the author of P3552 (
std::execution::task). This isn't a drive-by - he's fixing problems in his own design based on feedback from NB review.That said, the paper's recommendation to require infallible schedulers for
affine_onnaturally favors the execution model where schedulers are lightweight and deterministic - which is the Bloomberg use case. LEWG might want to hear from implementers whose thread pool schedulers can't easily guarantee infallibility before treating the constraint as settled.Putting the infallibility constraint in concrete terms. The paper is proposing a static check at
connecttime:This is actually elegant. You find out at compile time, not when your server is handling 10k connections at 3 AM, that your scheduler can't guarantee affinity. The trade-off is that
parallel_scheduleris locked out, but stdexec'sstatic_thread_poolalready meets this bar in practice - it just needs the signatures to declare it.TIL
unstoppable_tokenexists. The sender/receiver API surface is truly somethingWhy not just use
continues_onand skip all this complexity? It does the same thing.It doesn't.
continues_ontakes the scheduler as a parameter and forwards stop tokens.affine_ongets the scheduler from the receiver environment, requires infallibility, and blocks stop token propagation to the scheduling operation. The constraints are what make affinity guaranteeable -continues_oncan't promise it'll actually land on the target scheduler if the scheduling gets cancelled.