r/wg21
P2929R2 - Proposal to add simd_invoke to std::simd WG21
Posted by u/simd_paper_tracker · 7 hr. ago

Document: P2929R2
Authors: Daniel Towner, Ruslan Arutyunyan (Intel)
Date: 2026-01-26
Audience: LEWG

Intel folks proposing simd::chunked_invoke - a utility that breaks large basic_vec values into register-sized chunks, applies a callable (typically wrapping a platform intrinsic) to each chunk, and reassembles the results via simd::cat. The paper title still says simd_invoke but the function was renamed to chunked_invoke in this revision to avoid confusion with std::invoke.

The motivation is practical: if your basic_vec is wider than a native register, you currently have to manually chunk, call intrinsics on each piece, and cat the results back together. This paper wraps that boilerplate into a single function. It also optionally passes a chunk index to the callable if the callable accepts one, using callable probing rather than a separate _indexed variant.

▲ 23 points (78% upvoted) · 8 comments
sorted by: best
u/constexpr_all_the_things 15 points 6 hr. ago

std::simd: the library that keeps growing before it ships. At this rate the API surface will be larger than the register file.

u/actual_simd_user 8 points 5 hr. ago

The core idea is a reasonable convenience. But the callable probing design gives me pause. From section 4.1.1:

a precedent has been set in the simd::permute function to allow Callables to be probed for their capabilities rather than naming the function to call it out

"Probing capabilities" means SFINAE on whether the callable accepts an extra parameter. The note in the wording even warns about ambiguous overloads. I'd rather have an explicit chunked_invoke_indexed than silent overload resolution deciding whether my lambda gets an index or not. The permute precedent isn't great justification for propagating the same pattern.

That said, the chunk-invoke-cat boilerplate is genuinely annoying. I've written it enough times to appreciate the motivation.

u/highway_fan_2023 11 points 5 hr. ago

Genuinely curious - what does this buy you that Highway or xsimd don't already handle? Both of those have had "apply op to native-sized pieces" as a core abstraction for years.

u/constexpr_all_the_things 4 points 4 hr. ago

Standardization, presumably. Though at the rate std::simd is moving, Highway will have rewritten their entire API twice before this ships.

u/register_pressure_victim 6 points 4 hr. ago

The tail chunk handling is the interesting part. The paper's 19-element example on AVX (native size 8) gives chunks of 8, 8, and 3. Your callable has to handle that trailing 3-element chunk correctly.

The mandate says:

The callable fn shall be invocable for every combination of chunk argument types that may be produced, including tail chunks of sizes less than N.

So your lambda needs to be a generic lambda or have overloads for every possible tail size. In practice that means a constexpr if chain inside the lambda to dispatch the right intrinsic per chunk size - which is pretty much the boilerplate this paper was supposed to eliminate.

The non-tail case is where the convenience wins. The tail case still requires the same manual dispatch.

u/naming_things_is_hard_cpp 3 points 3 hr. ago

From the revision history:

Changed the name of the function to avoid confusion with std::invoke.

Ironic that the paper title still says simd_invoke. chunked_invoke is better though - it signals that the chunking is the point, not just "invoke but for SIMD." Aligns with chunk and cat naming in the rest of the library.

u/lurking_undergrad_cpp 2 points 2 hr. ago

Wait, std::simd made it into C++26? I thought it was still in the parallelism TS.

u/actual_simd_user 5 points 1 hr. ago

P1928 is merging it into the working draft. Long road from the TS, but it's happening.