r/wg21 - ASCII character utilities

P3688R6 - ASCII character utilities WG21

Posted by u/template_wrangler_23 · 9 hr. ago

Authors: Jan Schultke, Corentin Jabot
Document: P3688R6
Date: 2026-02-21
Target: LEWG
Link: wg21.link/p3688r6

The <cctype> functions have been a quiet source of pain for decades - locale-dependent behavior, no constexpr support, UB traps with signed char, and zero support for Unicode character types. P3688R6 proposes a new <ascii> header with lightweight, locale-independent, constexpr alternatives for all the character classification and transformation functions you actually need when parsing ASCII text.

The paper covers 18 functions total - the usual suspects like ascii_is_digit, ascii_is_alphabetic, ascii_to_lower, plus additions like ascii_is_bit, ascii_is_any, and case-insensitive comparison helpers. Everything is constexpr, noexcept (mostly), and works with char, wchar_t, char8_t, char16_t, and char32_t.

SG16 has been iterating on the naming across revisions. R6 uses the ascii_is_* prefix convention after feedback from earlier revisions that used is_ascii_*. Six revisions and a handful of SG16 polls later, this is heading to LEWG.

▲ 67 points (91% upvoted) · 9 comments

sorted by: best

▲ ▼

u/just_ship_it_42 47 points 7 hr. ago

Abseil has had absl::ascii_is* for years. Nice that the standard is finally catching up. At least the naming ended up in the same ballpark.

Reply Share Report

▲ ▼

u/constexpr_everything_2024 12 points 47 minutes ago

The constexpr angle is what makes this worth standardizing over just using abseil. None of the existing solutions give you that. There is a godbolt demo in the paper showing the whole API. Also the char8_t/char16_t support - try passing those to std::isdigit and see what happens.

Reply Share Report

▲ ▼

u/still_waiting_for_networking 31 points 8 hr. ago

committee gonna committee. Six revisions to standardize c >= '0' && c <= '9'.

Reply Share Report

▲ ▼

u/template_wrangler_23 5 points 2 hr. ago

This is literally the simplest kind of proposal the committee has seen in months. If LEWG cannot fast-track something this straightforward, we have bigger problems.

Reply Share Report

▲ ▼

u/embedded_for_20_years 23 points 5 hr. ago

The design choice in section 3.7.3 is worth reading carefully. They considered three options for non-ASCII-compatible encodings and landed on "treat the input as ASCII regardless of the literal encoding." Which means:

depending on encoding, ascii_is_digit('0') may be false, which may be surprising to the user

This is the right call for protocol parsing - JSON, HTTP, XML are all ASCII/UTF-8 regardless of the host encoding. But it does mean the functions are not a drop-in replacement for <cctype> on every platform. On EBCDIC, '0' is 0xf0, not 0x30. The functions work on the numeric value, not the character you typed.

If your code already does c >= '0' && c <= '9' and works on EBCDIC, switching to ascii_is_digit will break you. Narrow use case, but the paper is honest about the tradeoff.

Reply Share Report

▲ ▼

u/UB_enjoyer_69 3 points 3 hr. ago

Wait, ascii_is_digit('0') can be false? On what planet?

Reply Share Report

▲ ▼

u/embedded_for_20_years 7 points 2 hr. ago

EBCDIC. Some of us still ship to mainframes. '0' is 0xf0 there, not 0x30. The paper calls these functions "ASCII utilities," not "literal encoding utilities" - it means it.

Reply Share Report

Promoted

Compiler Explorer

godbolt.org - because you need to see the assembly. godbolt.org

▲ ▼

u/api_design_matters 15 points 4 hr. ago

Section 3.13 dismisses namespace ascii because of hypothetical SIMD overloads:

would it be std::simd::ascii::is_lower or std::ascii::simd::is_lower?

That is not a convincing argument. We can cross the SIMD bridge when we get there, and nested namespaces are not that hard. A dedicated namespace would let users write using std::ascii::is_lower; instead of the mouthful std::ascii_is_lower. The ascii_is_* prefix works fine but std::ascii:: would have been cleaner.

Not a dealbreaker. But I expect this bikeshed to reopen in LEWG.

Reply Share Report

▲ ▼

u/ranges_convert 8 points 3 hr. ago

From the design discussion on function objects:

Cherry-picking the functions in <ascii> to be function objects is far from solving the general problem

I get the argument but this is still going to be painful in practice. Every single time you want to use one of these in ranges::find_if you are wrapping it in a lambda:

auto it = std::ranges::find_if(str, [](char c) {
    return std::ascii_is_digit(c);
});

Yes, the general LIFT problem exists. Section 3.6 punts to P3312R1 (Overload Set Types) as a potential general solution, which is not exactly around the corner. Knowing that does not make the boilerplate less annoying.

Reply Share Report