P3941R2 - Scheduler Affinity WG21
Posted by u/execution_enjoyer_23 · 8 hr. ago

Author: Dietmar Kühl
Document: P3941R2
Date: 2026-03-14
Target: LEWG
Link: wg21.link/p3941r2

Dietmar Kühl's latest tackles the messy state of affine_on in std::execution::task - the algorithm responsible for making sure your coroutine resumes on the same scheduler it was running on before a co_await. R2 addresses five NB comments at once, all stemming from concerns raised in P3796R1.

The headline changes: affine_on drops its scheduler parameter and pulls it from the receiver's environment instead. Schedulers used with affine_on must now be infallible - no set_error or set_stopped completions allowed when you're trying to resume. inline_scheduler, task_scheduler, and run_loop::scheduler can meet this bar. parallel_scheduler probably can't.

The paper also removes change_coroutine_scheduler entirely, arguing that locals destroyed on the wrong scheduler and invisible overhead for all tasks make it a net negative. The replacement is nesting a task with starts_on - more verbose but structurally cleaner. There's also an optional new query get_start_scheduler to separate "where should I schedule new work" from "where was I started."

▲ 89 points (84% upvoted) · 24 comments
sorted by: best
u/AutoModerator 1 point 8 hr. ago Pinned

P3941R2 | Scheduler Affinity | Dietmar Kühl | LEWG
https://wg21.link/p3941r2

Reminder: be civil. The paper authors sometimes read these threads.

u/turbo_llama_9000 156 points 8 hr. ago

great, another execution paper. at this rate we'll have async hello world standardized by C++32

u/daily_segfault 89 points 7 hr. ago

I've read this paper three times and I still can't explain what affine_on does to my coworkers without drawing a diagram on a whiteboard and then erasing the whole thing halfway through

u/execution_enjoyer_23 14 points 7 hr. ago

It means your coroutine goes back to the scheduler it came from after a co_await. Like a boomerang. A boomerang made of templates and completion signatures.

Seriously though, it's NB comment resolution - this is on the fast track for C++26, not a new feature proposal.

u/coroutine_hater_42 134 points 7 hr. ago 🏆

The name affine_on isn't great. It may be worth giving the algorithm a better name.

That's Section 3.7. In its entirety. The full analysis. A 15-page paper rewriting the algorithm's parameters, constraints, and semantics, and the naming section is two sentences.

u/template_wizard_2019 71 points 6 hr. ago

WG21 naming committee in action. "We have identified the problem. Meeting adjourned."

u/compiles_first_try 45 points 5 hr. ago

be grateful. it could have been basic_affine_on_view_adaptor_closure_t

u/scheduler_herder 52 points 6 hr. ago 🏆

The elephant in the room is parallel_scheduler.

The paper requires infallible schedulers for affine_on, then says:

It seems unlikely that all schedulers can be constrained to be infallible.

And specifically about parallel_scheduler:

It seems unlikely that this interface can be constrained to make it infallible.

So we're standardizing task with scheduler affinity baked in, and one of the four standard schedulers can't participate. The paper acknowledges this but offers "the user can adapt the scheduler" without providing any such adapter in the standard library. The only adaptation strategies listed are (1) call std::terminate on failure, or (2) silently break affinity and hope nobody notices.

I understand why infallibility is the right constraint - if scheduling fails you genuinely cannot guarantee resumption on the right execution agent. But the practical consequence is that task + parallel_scheduler is a compile error. That seems like something LEWG should discuss explicitly rather than discover after the fact.

u/embedded_for_20_years 33 points 5 hr. ago

From where I sit this is exactly right. On embedded targets we need deterministic guarantees about where code runs. If scheduling can fail mid-co_await and you end up on the wrong execution agent, that's not an error you can recover from - it's a category violation. Your ISR handler is now running user-mode code.

The parallel_scheduler gap is real but I'd rather have a hard constraint I can reason about than a soft one that silently degrades. Make fallible schedulers opt-in with an explicit adapter and force the user to confront what "scheduling failed" means for their domain.

u/scheduler_herder 19 points 4 hr. ago

Fair point for embedded. My concern is more server-side: we're building executors for thread pools and work-stealing queues where scheduling failure is a real operational condition (pool at capacity, thread creation failure). Requiring the user to write an adapter before they can use task with the standard thread pool scheduler is a usability cliff that won't show up until someone tries it in production.

The paper's own Section 3.3.3 basically says "if this turns out to be too strong we can relax it later." That's usually committee-speak for "we'll ship it and see who complains."

u/actually_reads_papers 38 points 5 hr. ago

Something I haven't seen anyone mention: Section 4 contains two complete alternative wordings. One adds get_start_scheduler as a new query that defaults to get_scheduler. The other overloads the existing get_scheduler with the "started on" semantics.

The paper identifies a real dual-use problem - get_scheduler currently means both "which scheduler should I use for new work" and "which scheduler was I started on." Those aren't the same thing. But instead of picking a direction, it ships both options to LEWG.

I'd bet on get_start_scheduler getting consensus. The fallback-to-get_scheduler default means existing code keeps working, and the separation of concerns is cleaner. But LEWG is going to need to poll this explicitly, and I don't see it flagged as an open question in the change history.

u/former_sg1_lurker 24 points 4 hr. ago

The dual-use ambiguity was already causing confusion in P3796R1. Having a single query that means different things depending on who's asking is the kind of thing that generates four more papers and a study group.

get_start_scheduler is the right call. The alternative is having every algorithm that cares about "where was I started" document which interpretation of get_scheduler it uses.

u/async_skeptic 31 points 5 hr. ago

I agree the two problems with change_coroutine_scheduler are real. Locals on the wrong scheduler is a correctness issue, and the invisible storage cost for all tasks is unfortunate. But look at the proposed alternative:

co_await ex::starts_on(s, [](parameters)->task<T, E> { logic }(arguments));

You're asking users to create a lambda, invoke it immediately, wrap it in starts_on, and co_await the result. That's four concepts chained together to do what co_await change_coroutine_scheduler(s) did in one line.

The paper calls this "a bit verbose." That's underselling it. The mental model goes from "I'm now on scheduler S" to "I'm spawning a nested coroutine that runs on scheduler S and I get the result back." Those are different programming models with different failure surfaces.

u/scheduler_herder 21 points 4 hr. ago

The verbosity is the feature. change_coroutine_scheduler makes it look like changing a local variable when it's actually restructuring the execution graph. The nested task makes the scope explicit - you can see exactly where the different scheduler starts and ends.

Compare:

// Before: scheduler change leaks
co_await change_coroutine_scheduler(pool);
auto result = co_await heavy_compute();
// locals above still on old scheduler

// After: scheduler change is scoped
auto result = co_await starts_on(pool,
    []() -> task<Result> {
        return co_await heavy_compute();
    }());

The second form makes the lifetime boundary visible. You can't accidentally destroy locals on the wrong scheduler because they live in a different coroutine frame.

u/async_skeptic 16 points 3 hr. ago

I take your point on scoping. The correctness argument is sound. My worry is adoption cost. People coming from Asio or libunifex have a mental model where changing execution context is a lightweight operation. We're replacing it with "wrap your logic in a nested coroutine" which, even if structurally better, is going to be a migration stumbling block.

And the lambda-immediately-invoked pattern is going to confuse anyone who isn't steeped in this idiom.

u/scheduler_herder 12 points 2 hr. ago

Yeah, the adoption concern is real. I think the right move is: remove change_coroutine_scheduler (the paper is correct on the problems), but make sure the ecosystem has a convenience wrapper - something like run_on(scheduler, callable) - that hides the lambda-IILE-co_await boilerplate. It doesn't need to be in this paper, but it should exist before people start writing production task code.

Edit: actually, on(sch, nested_task) already exists. The gap is more about discoverability than missing functionality.

u/just_use_rust_lol 5 points 6 hr. ago

In Tokio you just spawn a task and the runtime handles affinity. Fifteen pages of wording changes to achieve what a runtime scheduler does by default.

u/not_a_real_cxx_dev -3 points 5 hr. ago

C++ doesn't have a runtime. That's literally the design constraint. You're comparing a language with a built-in task scheduler to one where the scheduler is a library type the user provides.

u/build_system_victim 9 points 4 hr. ago

this is exactly why sender/receiver will be the next Concepts - technically correct and nobody outside the committee and two companies will use it for a decade

u/former_sg1_lurker 17 points 4 hr. ago

Worth noting context: Kühl is a long-time P2300 contributor at Bloomberg and the author of P3552 (std::execution::task). This isn't a drive-by - he's fixing problems in his own design based on feedback from NB review.

That said, the paper's recommendation to require infallible schedulers for affine_on naturally favors the execution model where schedulers are lightweight and deterministic - which is the Bloomberg use case. LEWG might want to hear from implementers whose thread pool schedulers can't easily guarantee infallibility before treating the constraint as settled.

u/embedded_for_20_years 28 points 3 hr. ago

Putting the infallibility constraint in concrete terms. The paper is proposing a static check at connect time:

// The completion signatures of schedule(sch)
// with an unstoppable_token environment
// must be exactly:
//   completion_signatures<set_value_t()>
//
// If your scheduler can complete with
// set_error_t(E) or set_stopped_t(),
// affine_on(sndr) won't compile.

This is actually elegant. You find out at compile time, not when your server is handling 10k connections at 3 AM, that your scheduler can't guarantee affinity. The trade-off is that parallel_scheduler is locked out, but stdexec's static_thread_pool already meets this bar in practice - it just needs the signatures to declare it.

u/compiles_first_try 5 points 2 hr. ago

TIL unstoppable_token exists. The sender/receiver API surface is truly something