P4023R0 - Strategic Direction for AI in C++: Governance, and Ecosystem WG21
Posted by u/standards_watcher_24 · 10 hr. ago

Authors: Jeff Garland, Paul E. McKenney, Roger Orr, Bjarne Stroustrup, David Vandevoorde, Michael Wong (Directions Group)
Document: P4023R0
Date: 2026-02-23
Target: WG21 (Plenary)
Link: wg21.link/p4023r0

The Directions Group has dropped a paper on how C++ should deal with AI. Six authors including Bjarne, and this one touches two completely different nerve clusters at once.

Thrust I: Governance. Aligning WG21 with ISO/IEC JTC1's existing guidance on AI use. The human author is the "intelligence of record." AI can assist with research, summarization, and consistency checks, but generating normative wording or core design proposals without rigorous human verification is out. Bots are forbidden from attending ISO meetings. And the paper directly acknowledges the "AI slop" problem - voluminous but low-quality submissions wasting committee time.

Thrust II: The ImageNet Challenge. The DG is calling on the ecosystem - Boost, Beman Project, academics, open source foundations - to build a curated, human-validated dataset of modern idiomatic C++ (C++20/23/26). Tagged by domain (embedded, finance, AI), favoring spans over pointers, sender/receiver over callbacks, algorithms over raw loops. They also want tooling that surfaces intent at the call site for AI agents, including potentially connecting compilers to MCP.

This is a directional paper updating P2000 - no proposed wording, no straw polls. It is the DG saying "here is what we think the strategy should be." Whether the ecosystem actually builds this dataset is a different question entirely.

▲ 1,247 points (89% upvoted) · 92 comments
sorted by: best
u/AutoModerator 1 point 10 hr. ago pinned comment

Reminder: paper authors sometimes read these threads. Critique the paper, not the person. Rule 2 is enforced.

P4023R0 - Audience: WG21 (Plenary) - Authors: Jeff Garland, Paul E. McKenney, Roger Orr, Bjarne Stroustrup, David Vandevoorde, Michael Wong - PDF

u/pragma_once_and_done 89 points 9 hr. ago

tl;dr: the directions group said AI bad but also AI good but also please make an ImageNet for C++ but also we can't do it ourselves but someone should. got it.

u/constexpr_doomer 34 points 8 hr. ago

that's... actually not far off

u/the_real_training_data 178 points 8 hr. ago 🏆

I work in ML. The ImageNet analogy is doing a lot of heavy lifting here and it does not hold up.

ImageNet worked because image classification has a ground truth. A picture of a cat is a cat. You can label 14 million images and have humans agree on what they depict. The labels are objective.

Code quality is not objective. "Modern, idiomatic C++" is a moving target that the committee itself disagrees about. Is std::optional<T&> idiomatic? Depends on which mailing you read. Is structured bindings in a range-for idiomatic? Depends on who you ask. Is co_await on a sender idiomatic? P2300 hasn't shipped outside NVIDIA's stack.

ImageNet didn't need to resolve philosophical debates about what a cat is. A curated C++ dataset would need to resolve philosophical debates about what good code is - and those debates are literally what the committee does for a living, slowly, and without consensus half the time.

The paper proposes domain tags (ai/, embedded/, finance/) as if that's sufficient quality control. It is not. You need pedagogical scaffolding - progressive complexity, explicit rationale for each design choice, anti-pattern comparisons. A flat corpus of "good" examples is how we got the current mess of stackoverflow-trained models in the first place.

The goal is right. The analogy undersells the difficulty by a factor of ten.

u/ranges_all_day 67 points 7 hr. ago

this is the comment I came here for. saving this.

u/library_design_matters 45 points 7 hr. ago

You're right that ImageNet is a bad analogy, but the underlying point stands: AI models generate terrible C++ because they trained on terrible C++. The question isn't whether we need curated data - we do. The question is who's going to do the curation and what "correct" means for a language with this many dialects.

The Core Guidelines were supposed to be this. They have 400+ rules and still don't cover half the design space the paper mentions.

u/the_real_training_data 28 points 6 hr. ago

Fair. I should have said the analogy is misleading, not that the goal is wrong. The Core Guidelines point is good - they're a natural starting point and they already exist. The paper doesn't even mention them, which is strange.

u/just_ship_networking 423 points 9 hr. ago 🏆🏆

can we please just get networking in the standard before we start governing AI. priorities, people.

u/template_goes_brr 156 points 8 hr. ago

this comment is posted under every single paper regardless of topic and it always gets 400 upvotes

u/segfault_whisperer_42 87 points 8 hr. ago

Sir, this is a Wendy's.

u/actually_reads_papers 89 points 8 hr. ago

Read the whole thing. Thrust I is 90% restating what ISO/IEC JTC1 SC22 N5991 already says. The "author is the intelligence of record" principle, the prohibition on AI-generated normative text, the copyright concerns - that document covers all of it.

What P4023 adds is (a) making it explicit for WG21 context, (b) the "no bots in meetings" rule, and (c) the direct acknowledgment that AI slop is already happening. Point (c) is the interesting one. The DG is putting it on paper that they've seen low-quality AI-generated submissions. That's a political statement as much as a policy one.

Thrust II is where the new content is, and also where the paper is weakest. More below.

u/compiles_first_try_jk 52 points 7 hr. ago

so the Directions Group wrote a paper to say "what ISO already said, but with C++ branding"?

u/actually_reads_papers 38 points 6 hr. ago

For Thrust I, basically yes. The value-add is making it WG21-specific and putting the "slop" problem on the record. For Thrust II - the ImageNet challenge - that part is genuinely new. The problem is it has no execution plan. No funding, no hosting org, no timeline, no success criteria. "The ecosystem should" is not a plan.

u/library_design_matters 134 points 7 hr. ago

From the paper:

Library designers are encouraged to create APIs that are human consumable because it also helps agents to consume. Make intent evident, reducing the context required for an AI agent to use them correctly.

This is exactly backwards. APIs that are easy for humans and APIs that are easy for machines are not the same thing. Human-friendly APIs use overloading, implicit conversions, ADL, and contextual defaults. Machine-friendly APIs use explicit types, no overloading, flat namespaces, and named parameters.

Consider:

// human-friendly (overloaded, contextual)
auto result = connect(host, port);
auto result = connect(endpoint);
auto result = connect(uri, opts);

// machine-friendly (explicit, discoverable)
auto result = tcp_connect_to_host_port(host, port);
auto result = tcp_connect_to_endpoint(endpoint);
auto result = tcp_connect_to_uri(uri, opts);

An LLM doesn't know which connect overload to call without reading the header. A human knows from context. These are different design pressures. The paper glosses over this entirely.

u/compiler_frontend_dev 67 points 7 hr. ago

I actually agree with the paper here more than with you. The direction isn't "make C++ APIs look like Java." It's "surface intent at the call site." Inlay hints showing parameter names, concepts constraining template parameters, [[nodiscard]] on return types - these help both humans and machines.

The paper isn't asking you to stop overloading. It's asking you to make the intent of each overload discoverable without reading the implementation.

u/library_design_matters 48 points 6 hr. ago

You're thinking about parameter names and inlay hints. I'm talking about overload sets and ADL. An LLM calling swap(a, b) has no way to know if it's getting std::swap, a hidden friend found via ADL, or a namespace-scope overload - and the behavior differences matter. Inlay hints don't help there.

The paper's advice reduces to "write clearer code" which - yes, obviously. But it claims the same clarity that helps humans also helps agents, and that's where the reasoning breaks down.

u/compiler_frontend_dev 39 points 5 hr. ago

Fair point about ADL. That's a machine-hostile design pattern regardless of how you document it. I'll concede that the paper's framing of "human-friendly = agent-friendly" oversimplifies things. The reality is more like: some human-friendly patterns also help agents (concepts, nodiscard, strong types), and some don't (ADL, implicit conversions, overload sets with subtle SFINAE).

u/segfault_whisperer_42 112 points 5 hr. ago

did two people just have a civil technical disagreement on reddit and reach partial consensus? is this real life?

u/constexpr_doomer 267 points 8 hr. ago

so to summarize: C++ is so complex that even AI can't write it correctly, and the solution is... more papers

u/UB_is_a_feature_69 43 points 7 hr. ago

based

u/cpp_refugee_2024 89 points 7 hr. ago

meanwhile in Rust, the compiler just tells you what's wrong. no curated training dataset needed. the type system is the training data.

u/legacy_codebase_survivor 156 points 7 hr. ago

cool story, now compile your Rust project with the 50 million lines of existing C++ it needs to interface with

u/ranges_all_day 34 points 6 hr. ago

this is why we can't have nice things

u/definitely_not_copilot 23 points 6 hr. ago

Rust borrow checker violations produce better training signal than C++ UB. That's just a fact. A compiler error is a labeled negative example. UB at runtime is an unlabeled catastrophe that might pass all your tests.

[deleted] 7 hr. ago

[removed by moderator]

u/template_goes_brr 12 points 6 hr. ago

what did they say?

u/pragma_once_and_done 45 points 6 hr. ago

something about all DG papers being written by AI already, you know the usual

u/embedded_for_20_years 92 points 7 hr. ago

The paper lists domain tags: ai/, embedded/, finance/. As someone who has written embedded C++ for two decades: 99% of C++ training data online is web examples with std::cout and heap allocation. My production code has zero dynamic allocations, no exceptions, no RTTI, and compiles for a Cortex-M4 with 256K flash.

A "curated dataset" that includes an embedded/ tag is meaningless unless someone decides which embedded style. MISRA C++? AUTOSAR? Bare-metal RTOS? Safety-critical avionics? These are different worlds with different rules. The paper treats "embedded" as one thing. It is not.

I want this dataset to exist. I do not believe a volunteer effort can produce it. This needs institutional backing and money.

u/UB_is_a_feature_69 8 points 6 hr. ago

skill issue

u/grad_student_teaching_cpp 31 points 6 hr. ago

this is exactly the problem they're trying to solve though. the fact that AI generates std::cout hello-world when you ask for embedded C++ is the failure mode they're describing.

u/embedded_for_20_years 47 points 5 hr. ago

Right, but "describing the problem" and "having a plan to solve it" are different documents. The paper says "WG21 cannot solve this alone" and then stops. Who funds it? Who curates it? Who decides that my no-allocation Cortex-M4 code is more "correct" than someone's Arduino sketch? The paper outsources the hard part to "the ecosystem" without defining what that means.

u/safety_profile_watcher 76 points 6 hr. ago

The dependency nobody is talking about: you cannot curate "safe modern C++" training data until you define what safe C++ is.

The paper says "favoring spans over pointers" and "null ptr checks." But the safety profiles work (P3081) is still in flight. The boundary between "safe" and "unsafe" C++ is actively being debated in SG23. Training an AI on today's definition of "safe" means retraining when the profiles ship and the definition changes.

Three-point version of the problem:

  1. P4023 says "train AI on safe patterns"
  2. P3081/P2759/P3651 are still defining what "safe" means
  3. The dataset can't be curated before the definition stabilizes

The DG is proposing a dataset that depends on committee output that doesn't exist yet. That's a sequencing error.

u/compiles_first_try_jk -23 points 6 hr. ago

profiles are vapor until they ship. this whole paper is building on quicksand.

u/safety_profile_watcher 19 points 5 hr. ago

They have a reference implementation and active SG23 work. "Vapor" is unfair. "Not finished" is accurate. My point isn't that profiles will fail - it's that the training data challenge depends on their completion.

u/actually_reads_papers 34 points 5 hr. ago

P3081, P2759, P3651 all feed into this. The dependency graph is real. You can't have a curated "safe C++" corpus without a stable definition of safety, and we don't have one yet.

u/latency_is_not_optional 58 points 6 hr. ago

I work in HFT. We have been generating C++ with internal tooling for two years. Our training data is proprietary. Our patterns are proprietary. The code that matters - the code where nanoseconds count - will never appear in a public dataset because it's worth money.

A public curated corpus will produce AI that writes competent library code and terrible performance-critical code. Which is fine for 90% of use cases and useless for the 10% that pays my salary.

The paper doesn't grapple with this. The best C++ is behind NDAs.

u/UB_is_a_feature_69 24 points 5 hr. ago

tell me you've never shipped open source without telling me you've never shipped open source

u/async_skeptic 45 points 5 hr. ago

This is the silent majority problem. The people who write the most critical C++ can't contribute to the dataset. Abseil and Boost are the exception - high-quality public C++ that's actually used at scale. Everything else is either toy examples or locked behind corporate walls.

u/ai_wrote_this_comment 89 points 5 hr. ago

I have used Claude and GPT to help draft committee papers. Not generate - help draft. Research summaries, consistency checks, finding prior art, rewriting unclear paragraphs. The quality after human review is indistinguishable from fully human-written papers. I know because reviewers have told me my recent work is "noticeably clearer."

The governance section is solving a problem that doesn't exist yet. The real problem is not "AI is writing bad papers." The real problem is "bad papers exist and now people can produce them faster." That's a quality bar issue, not an AI issue.

u/legacy_codebase_survivor 56 points 5 hr. ago

that you know of

u/boost_contributor_since_04 72 points 5 hr. ago

I appreciate the transparency, and I don't doubt your workflow produces good results. But the governance question isn't about quality - it's about accountability.

If an AI hallucinates a technical claim that makes it into normative text - say, a mischaracterization of implementation-defined behavior that influences wording - who owns that error? The human author, yes. But the failure mode is different. A human making an error has understood the surrounding context and made a judgment call. An AI hallucinating has no understanding at all. The error surface is different even if the output looks identical.

u/ai_wrote_this_comment 34 points 4 hr. ago

The human author is the intelligence of record. Which is exactly what the paper says. The accountability sits with the author regardless of their tools. We don't audit whether someone used a spellchecker or a thesaurus. Why are we auditing whether they used an LLM for research?

u/boost_contributor_since_04 89 points 4 hr. ago

A spellchecker doesn't hallucinate new technical claims. That's not a fair comparison and you know it.

The real issue is the "slop" problem. We have already seen papers that are clearly 95% ChatGPT with five minutes of editing. I don't mean "the prose is suspiciously clean." I mean entire sections that read like prompted output - the hedging, the "it is worth noting that" constructions, the passive voice avalanche, the way every paragraph restates its own thesis. Three papers in the last mailing had this pattern.

The DG is putting governance in place because the quality bar is being gamed. That is not hypothetical.

u/ai_wrote_this_comment 23 points 3 hr. ago

Name them or it's FUD.

u/definitely_not_a_committee_member 167 points 3 hr. ago 🏆

I could name at least two from Hagenberg but I enjoy being invited to meetings

u/paper_trail_2019 1 point 3 hr. ago

Rule 2. Name-calling papers, not people. Last warning in this chain.

u/grad_student_teaching_cpp 78 points 3 hr. ago

the fact that you can't tell which papers are AI-generated is literally the entire argument for governance, not against it

u/segfault_whisperer_42 189 points 4 hr. ago

imagine being replaced by an AI that can't even get std::variant visitor syntax right

u/template_goes_brr 234 points 4 hr. ago 🏆

bold of you to assume I can get std::variant visitor syntax right

u/ranges_all_day 56 points 3 hr. ago

overloaded{} gang rise up

u/career_in_fortran 312 points 4 hr. ago 🏆🏆🏆

I was supposed to be replaced by Java. Then by C#. Then by Go. Then by Rust. Now by AI. I'm still here. The FORTRAN is still here. My codebase from 1997 is still running in production and nobody wants to touch it, which is exactly why I still have a job.

u/constexpr_doomer 78 points 3 hr. ago

the FORTRAN is eternal. the FORTRAN endures.

u/async_skeptic 95 points 4 hr. ago

The paper lists "Sender/Receiver over Callbacks" as a training data preference. Let's check the receipts.

P2300 (std::execution) was voted into C++26 but has zero production deployments outside of NVIDIA's stdexec. The API surface is complex enough that even experienced async developers need days to internalize the model. The reference implementation is a research artifact, not a production library.

We are asking AI to learn patterns that humans haven't adopted yet. The corpus would contain sender/receiver examples that approximately nobody has written in production. How is that "curated, human-validated" data? It's aspirational data. We're training the AI on what we wish people wrote, not what they actually write.

u/definitely_not_copilot 15 points 3 hr. ago

stdexec has been in libcu++ for a while now. It's not zero deployments.

u/async_skeptic 43 points 3 hr. ago

libcu++ is NVIDIA's internal stack. That's my point. One vendor's CUDA library is not "ecosystem adoption."

u/grad_student_teaching_cpp 123 points 4 hr. ago 🏆

I teach C++ to undergrads. The biggest AI problem isn't training data - it's that students submit AI-generated code they don't understand. It compiles. It passes the basic tests. And it has three undefined behaviors that only show up under ASan.

Last semester I got a submission that used reinterpret_cast<int*>(&float_val) to "convert" a float to int. The student couldn't explain why it worked on their machine. The AI gave them code that looked correct, compiled without warnings, and violated strict aliasing. No amount of curated training data fixes this - the AI doesn't understand UB any more than the student does.

The paper's heart is in the right place. But the crisis in my classroom isn't "AI generates C++98." It's "AI generates plausible-looking C++20 that's subtly broken in ways that require deep understanding to detect."

u/UB_is_a_feature_69 34 points 3 hr. ago

the "three undefined behaviors" thing hit different. I just audited an intern's code and found exactly this pattern.

u/compiles_first_try_jk 67 points 3 hr. ago

we need to teach C++ to the AI and use AI to teach C++. surely nothing can go wrong with this circular dependency.

u/the_real_training_data 45 points 3 hr. ago

This is exactly why the curated dataset matters - but also why the paper's approach is insufficient. Domain tags (ai/, embedded/, finance/) are not pedagogical scaffolding. You need progressive complexity, explicit rationale for each design choice, and anti-pattern comparisons showing why the wrong approach is wrong. A flat corpus of "good" examples doesn't teach; it just provides more sophisticated patterns for models to parrot without understanding.

u/compiler_frontend_dev 67 points 3 hr. ago

From the tooling section:

Make compilers answer questions about argument types and modern usage possibly connecting to MCP

MCP is a specific protocol from Anthropic that has existed for about a year. It might not exist in three years. Why is a directions paper - a document meant to set strategy for the next decade - name-dropping a specific vendor protocol? This is like a 2015 directions paper saying "possibly connecting to Google+."

The underlying idea - structured compiler queries for AI agents - is sound. But pin it to the concept, not the implementation. LSP took a decade to get where it is. MCP might be a footnote by C++29.

u/pragma_once_and_done 89 points 3 hr. ago

because Jeff uses Cursor

u/shared_ptr_of_grief 23 points 2 hr. ago

the language server protocol has entered the chat

u/compiler_frontend_dev 34 points 2 hr. ago

LSP has been "entering the chat" for 10 years and clangd still can't reliably parse template metaprogramming in large codebases. We should probably fix the protocol we have before adopting a new one.

u/shared_ptr_of_grief 78 points 3 hr. ago

Half the papers in the last mailing read like ChatGPT wrote them. The DG knows this. That's why Thrust I exists. This isn't preemptive governance - it's reactive damage control dressed up as strategy.

u/compiles_first_try_jk 34 points 2 hr. ago

which papers? people keep saying this and never back it up with specifics.

u/shared_ptr_of_grief 56 points 2 hr. ago

I'm not naming papers because rule 2 exists. But the pattern is recognizable: every paragraph restates its own thesis, "it is worth noting that" appears four times, the "related work" section reads like a prompted summary, and the proposed wording has grammatical patterns that no native English speaker produces. You know it when you see it.

u/template_goes_brr 89 points 2 hr. ago

"the passive voice avalanche" describes half the existing standard text. are we sure the standard wasn't AI-generated?

u/legacy_codebase_survivor 45 points 2 hr. ago

accusing someone's paper of being AI-generated without evidence is genuinely harmful behavior. papers are public. authors are real people. some of them are non-native English speakers and the "it is worth noting" pattern is common in academic ESL writing.

[deleted] 1 hr. ago

[removed by moderator]

u/not_on_the_committee 1 point 1 hr. ago

Thread locked. Rule 2 is not optional. If you want to discuss paper quality, do it without pointing fingers at specific submissions or authors.

u/ai_wrote_this_comment 134 points 2 hr. ago

the irony of discussing AI-generated slop on a thread where half the comments are probably AI-generated

🔒 This thread has been locked by moderators.
u/boost_contributor_since_04 56 points 2 hr. ago

Thrust II is basically asking for Boost 2.0 but for training data. The paper name-drops the Beman Project as a potential home for this kind of initiative, but Beman is already stretched thin incubating actual libraries for standardization. Adding "curate a massive cross-domain C++ corpus" to their scope without additional resources is wishful thinking.

The paper says "WG21 cannot solve this alone" but never identifies who can. Boost doesn't have the infrastructure for dataset curation. Beman doesn't have the funding. Academic groups could contribute but need grants. The only orgs with both the data and the money are the compiler vendors (Google, Microsoft, Apple, NVIDIA), and the paper doesn't ask them to do anything specific.

I want to see this happen. I don't see a mechanism for it to happen.

u/ranges_all_day 12 points 2 hr. ago

Beman Project for the curious: github.com/bemanproject - they're doing good work on near-standard library incubation but this would be a completely different kind of project.

u/embedded_for_20_years 23 points 2 hr. ago

The Beman Project doesn't even have funding for its current scope. The C++ Alliance does some of this but they're focused on Boost infrastructure. Who actually writes the check?

u/UB_is_a_feature_69 178 points 3 hr. ago

the paper says "Algorithms over Loops: Biasing generation toward <algorithm>." my brother in Christ, #include <algorithm> adds 2 seconds to my compile time

u/constexpr_doomer 56 points 2 hr. ago

laughs in compile times

u/segfault_whisperer_42 143 points 2 hr. ago

modules will fix this. (said increasingly nervous man for the 7th year in a row)

[deleted] 4 hr. ago

[deleted]

u/ranges_all_day 3 points 3 hr. ago

what was this about?

u/template_goes_brr 89 points 3 hr. ago

something about training AI on the Boost mailing list, which honestly would produce the most aggressive code reviewer ever built

u/compiles_first_try_jk 134 points 2 hr. ago

Honest question: if we had this magical curated modern C++ dataset, who decides what's "modern" and "idiomatic"? The committee that took 3 years to agree on a range adaptor closure semantics? The same body where SF/SA votes split 12-11?

Edit: I'm told the range adaptor thing was actually resolved fairly quickly. The point stands for basically everything else.

u/pragma_once_and_done 67 points 2 hr. ago

the real paper was the bike-shedding we did along the way

[deleted] -12 points 1 hr. ago

[removed by moderator]

u/legacy_codebase_survivor 2 points 1 hr. ago

report and move on

u/ranges_all_day 89 points 2 hr. ago

the fact that Bjarne Stroustrup co-authored a paper about AI governance for C++ is peak 2026

u/constexpr_doomer 34 points 1 hr. ago

wait until you see P4024 on quantum computing governance

u/library_design_matters 67 points 1 hr. ago

Read the whole thing twice. The real gap nobody is talking about:

Section 3 says:

the C++ ecosystem needs a curated, human validated collection of modern examples of high-quality dataset of Modern, Idiomatic C++

Section 5 says:

Make compilers answer questions about argument types and modern usage possibly connecting to MCP

These are two completely different strategies. One crowdsources quality through human curation. The other automates discovery through compiler tooling. The paper never reconciles them. Are we training AI on curated examples, or are we giving AI tools to query the compiler directly? Because those two approaches have fundamentally different failure modes, resource requirements, and timelines.

A paper that's setting "strategic direction" should at least acknowledge when its two thrusts point in different directions.

Edit: to be fair, maybe the answer is "both." But the paper doesn't say "both" - it just presents them adjacently without connecting the dots.

u/template_goes_brr 45 points 47 minutes ago

tl;dr: AI is both the future and the problem and the solution and we need a dataset and also governance and also tooling and also the ecosystem should do something and also we can't do it ourselves. got it.

u/definitely_not_copilot 23 points 23 minutes ago

disclosure: this comment was written by Claude