Boost.URL: Audited, Constexpr, and Polished

Portrait of Alan de Freitas Alan de Freitas · Apr 21, 2026

We had been putting off the Boost.URL security review for a while. There was always something more urgent. When the review finally happened, it confirmed what we hoped: the core parsing logic held up well. Around the same time, a constexpr feature request that we had been dismissing suddenly became a cross-library collaboration when other Boost maintainers started applying changes to their own libraries. And while working on Boost.Beast2 integration, we noticed friction in common URL operations that led us to clear a backlog of usability improvements.

Security Review

The C++ Alliance arranges professional security audits for the libraries we maintain. The results for Boost.Beast (2020) and Boost.JSON (2021) are publicly available. For Boost.URL, we always had the plan but kept delaying because there was so much other work to do first. That delay turned out to be a good thing: we found and fixed issues ourselves first, so the reviewers could focus on the subtle problems.

Laurel Lye Systems Engineering conducted three rounds of assessment. Each finding was manually reviewed against the source code and categorized as a confirmed bug (fixed), a false positive, or a deliberate design choice. For every confirmed bug, we also proposed new test cases to prevent regressions.

Round 1: 1,207 Findings (February 2, 2026)

The first assessment was the broadest. Of 1,207 findings, 15 were confirmed bugs resulting in fix commits. The vast majority were false positives or by-design patterns:

Verdict CRITICAL HIGH MEDIUM LOW INFO Total
FIXED 1 9 0 2 3 15
FALSE POSITIVE 3 47 46 186 110 392
BY DESIGN 0 129 445 170 56 800
Total 4 185 491 358 169 1,207

The single CRITICAL fix was a loop condition in url_base that dereferenced *it before checking it != end. Three other CRITICAL findings were false positives: the audit flagged raw-pointer writes in the format engine, but these use a two-phase measure/format design that guarantees the buffer is pre-sized correctly.

Most false positives fell into recognizable themes:

  • BOOST_ASSERT as sole bounds check (29 HIGH findings): internal _unsafe functions rely on preconditions validated by the public API. The _unsafe suffix signals the contract. This is the standard Boost/STL pattern (std::vector::operator[] vs at()).
  • Non-owning view lifetime safety (27 HIGH findings): string_view and url_view types do not own their data. The audit flagged potential use-after-free, but lifetime management is the caller’s responsibility by design.
  • Atomic reference counting (multiple findings across all rounds): the audit tool did not recognize the #ifdef BOOST_URL_DISABLE_THREADS guard that switches between std::atomic<std::size_t> and plain std::size_t.
Round 1 fix commits
  • bcdc891 CRITICAL: url_base loop condition order
  • ec15fce HIGH: encode() UB pointer arithmetic for small buffers
  • 81fcb95 HIGH: LLONG_MIN negation UB in format
  • 42c8fe7 HIGH: ci_less::operator() return type
  • 76279f5 HIGH: incorrect noexcept in segments_base::front() and back()
  • d4ae92d HIGH: recycled_ptr::get() nullptr when empty
  • 8d98fe6 LOW: decode() noexcept on throwing template

The proportion of false positives to confirmed bugs was large enough that we discussed a second round with Laurel Lye, where we shared the false positive categories we had identified so they could be more targeted.

Round 2: 27 Findings (February 17, 2026)

The second assessment was more targeted. The reviewers had learned from our Round 1 triage and produced fewer false positives:

Verdict HIGH MEDIUM LOW INFO Total
FIXED 7 3 1 1 12
FALSE POSITIVE 2 2 0 0 4
BY DESIGN 0 0 1 1 2
ALREADY FIXED 0 5 4 0 9
Total 9 10 6 2 27

9 of the 27 findings had already been fixed in Round 1 commits. The new confirmed bugs included a heap overflow in format center-alignment padding (lpad = w / 2 used total width instead of padding amount), an infinite loop in decode_view::ends_with with empty strings, and an OOB read in ci_is_less on mismatched-length strings.

Both rounds are tracked in PR #982.

Round 2 fix commits
  • d06df88 HIGH: format center-alignment padding
  • 4fe2438 HIGH: decode_view::ends_with with empty string
  • f5727ed HIGH: stale pattern n.path after colon-encoding
  • d045d71 HIGH: ci_is_less OOB read
  • 88efbae HIGH: recycled_ptr copy self-assignment
  • fe4bdf6 MEDIUM: url move self-assignment
  • ab5d812 MEDIUM: encode_one signed char right-shift
  • b662a8f MEDIUM: encode() noexcept on throwing template
  • 5bc52ed LOW: port_rule has_number for port zero at end of input
  • 9c9850f INFO: ci_equal arguments by const reference
  • 4f466ce test: public interface boundary and fuzz tests

Round 3: 15 Findings (April 2, 2026)

The third round was the shortest and the most precise. Of 15 findings, 4 were confirmed bugs and 11 were false positives. No CRITICAL findings. The false positives were the same recurring themes (atomic refcounting, pre-validated format strings, preconditions guaranteed by callers).

Verdict HIGH MEDIUM LOW Total
FIXED 0 1 3 4
FALSE POSITIVE 4 6 1 11
Total 4 7 4 15

The confirmed bugs were more subtle: a decoded-length calculation error in segments_iter_impl::decrement() that only manifested during backward iteration over percent-encoded paths, two noexcept specifications on functions that allocate std::string (which can throw bad_alloc), and a memcpy with null source when size is zero (undefined behavior per the C standard, even though it copies nothing).

This round is tracked in PR #988.

Round 3 fix commits
  • 7cd6702 MEDIUM: segments_iter_impl decoded-length in decrement()
  • b55336c LOW: param noexcept on throwing constructor
  • 7c0665d LOW: string_view_base noexcept on throwing operator std::string()
  • 003696d LOW: url_view memcpy with null source when size is zero

The progression from 1,207 findings to 27 to 15 shows the reviewers learning the peculiarities of our codebase. The ratio of false positives dropped with each round, and the confirmed bugs got more subtle.

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e4eee8", "primaryBorderColor": "#affbd6", "primaryTextColor": "#000000", "lineColor": "#baf9d9", "secondaryColor": "#f0eae4", "tertiaryColor": "#ebeaf4", "fontSize": "14px"}}}%% mindmap root((Confirmed Bugs)) UB in edge cases encode_one right-shift LLONG_MIN negation pointer arithmetic Self-assignment url move recycled_ptr copy OOB reads ci_is_less decode_view ends_with Incorrect noexcept encode / decode segments_base front/back param constructor string_view_base operator Iterator bugs segments decoded-length Null pointer recycled_ptr get url_view memcpy

Compile-Time URL Parsing

constexpr URL parsing has been one of the most recurring requests since the library’s inception. Every few months someone would ask about it, and every few months we would decide the refactoring cost was too high. The parsing engine is heavily buffer-oriented, and moving enough code into headers for constexpr evaluation required careful refactoring without breaking the shared library build.

When we finally prototyped it, the diff touched thousands of lines, but most of those were code being moved from source files to headers rather than new logic. The actual new code was limited to alternative code paths to bypass non-literal types and refactoring url_view_base to eliminate a self-referencing pointer that prevented constexpr evaluation. Still, given the size of the change, we initially marked it as unactionable and moved on to the security review.

Beyond the refactoring cost, we had blockers beyond our control. Our parsing code depended on boost::optional (not a literal type, no constexpr constructors), boost::variant2 (not literal when containing optional), and boost::system::result (could not be constructed with a custom error_code in constexpr because error_category::failed() is virtual). Without changes to those libraries, constexpr URL parsing was not possible regardless of how much we refactored our own code.

The Conversation That Changed Everything

Then Peter Dimov, the maintainer of Boost.System and Boost.Variant2, joined the conversation. We had assumed that system::result<T> could not be constexpr in C++14 because it wraps error_code, which uses virtual functions. Peter pointed out that system::result<T> is already a literal type in C++14 when T is literal and the error code is not custom. Boost.URL uses a custom error code category, and constructing a result from a custom error_code requires calling error_category::failed(), which is virtual and therefore not constexpr before C++20. Peter offered to fix this in Boost.System (#141, af53f17) for C++20 so that custom error codes would also work at compile time.

Allowing constexpr virtual functions in C++20

Peter Dimov is also one of the authors of P1064: “Allowing Virtual Function Calls in Constant Expressions”, the C++ committee proposal that made constexpr virtual functions possible in C++20. The paper uses error_code and error_category as the motivating example.

That shifted the problem. Instead of building our own constexpr_result<T> type to bypass the entire error handling system, we could use system::result directly in C++20. The scope of the refactoring shrank, and we focused on C++20 as the initial target. The remaining blocker was that system::result<T> requires T to be a literal type, and we use boost::optional heavily in our parsing code. boost::optional was not a literal type.

Andrzej Krzemieński, the Boost.Optional maintainer, started working on it. The conversation went back and forth on the C++14 constraints: std::addressof is not constexpr until C++17, mandatory copy elision is only available in C++17, and there were questions about what subset of constructors could realistically become constexpr in C++14. After several iterations (including a feature/constexpr branch), the constexpr implementation landed on develop.

With optional becoming literal, boost::variant2 containing optional could also become literal. All three blockers were now resolved. Peter had fixed Boost.System, Andrzej had fixed Boost.Optional, and we contributed fixes to Boost.Variant2. There was no going back: we could no longer dismiss the constexpr feature after three library maintainers had already done their part.

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#f7f9ff", "primaryBorderColor": "#9aa7e8", "primaryTextColor": "#1f2a44", "lineColor": "#b4bef2", "secondaryColor": "#fbf8ff", "tertiaryColor": "#ffffff", "fontSize": "14px"}}}%% flowchart TD A[Boost.URL constexpr parsing] --> B[Boost.Optional] A --> C[Boost.Variant2] A --> D[Boost.System] B --> E[boost::optional constexpr] C --> F[boost::variant2::variant constexpr] D --> G[boost::system::result constexpr] D --> H[boost::system::error_code constexpr]
Cross-library commits for constexpr support

Boost.URL (PR #976, PR #981)

  • 0a2c39f feat: constexpr URL parsing for C++20
  • b9db439 build: remove -Wno-maybe-uninitialized from GCC flags (see below)
  • 59b4540 fix: suppress GCC false-positive -Wmaybe-uninitialized in tuple_rule (see below)

Boost.Optional (issue #143, PR #145)

  • 3df2337 make optional constexpr in C++14
  • 046357c add more robust constexpr support
  • 88e2378 add -Wmaybe-uninitialized pragma (see below)

Boost.Variant2 (PR #57)

  • b6ce8ac add missing -Wmaybe-uninitialized pragma (see below)

Boost.System (issue #141)

  • af53f17 add constexpr to virtual functions on C++20 or later

Error Handling at Compile Time

Boost.URL attaches source location information to error codes for better diagnostics at runtime. In a constexpr context, BOOST_CURRENT_LOCATION is not available, so the BOOST_URL_CONSTEXPR_RETURN_EC macro branches on __builtin_is_constant_evaluated(): at compile time it returns the error enum directly, at runtime it attaches the source location.

#if defined(BOOST_URL_HAS_CXX20_CONSTEXPR)
# define BOOST_URL_CONSTEXPR_RETURN_EC(ev) \
    do { \
        if (__builtin_is_constant_evaluated()) { \
            return (ev); \
        } \
        return [](auto e) { \
            BOOST_URL_RETURN_EC(e); \
        }(ev); \
    } while(0)
#endif

The -Wmaybe-uninitialized Problem

GCC’s -Wmaybe-uninitialized flagged code inside boost::optional and boost::variant2 union storage constructors. The root cause was neither library.

The inlining chain: Boost.URL’s parsing code constructs a variant2::variant that contains an optional alternative. At -O3, GCC inlines the entire chain:

  • Parse function
  • Variant construction
  • variant2 storage
  • optional storage
  • Union constructor

After inlining, GCC sees a union with a dummy_ member and a value_ member, and it cannot prove which member is active. It conflates the “uninitialized dummy” path with the “initialized value” path. The in_place_index_t<I> dispatch guarantees which member is initialized, but GCC’s data flow analysis loses track across the nested layers. -fsanitize=address makes it worse by changing inlining thresholds.

The compiler blames the wrong library. The root cause is in variant2’s union storage, but when variant2 contains an optional, GCC reports the warning in optional’s code. The pragma has to go where GCC reports it, not where the issue originates. We contributed pragmas to both Boost.Optional and Boost.Variant2, and replaced Boost.URL’s blanket -Wno-maybe-uninitialized flag with targeted pragmas.

This particular false positive requires GCC 14+, -O3, ASan, on x86_64 Linux, with a variant2::variant containing a boost::optional, constructed through a system::result dereference. Change any one of those conditions and the warning disappears.

This leaves an open question for the Boost ecosystem: when a false positive surfaces because library A’s optimizer behavior interacts with library B’s union storage and gets reported in library C’s code, who is responsible for the pragma? For now, we placed pragmas where GCC reports the issue, but the underlying problem recurs every time a new combination of types triggers the same inlining pattern.

The Shared Library Problem

Making URL parsing constexpr means the parsing functions must be available in headers. But Boost.URL is a compiled library, and on MSVC, __declspec(dllexport) on a class exports all members, including inline and constexpr ones. This causes LNK2005 (duplicate symbol) errors for any class that mixes compiled and header-only members.

Each class must follow exactly one of two policies:

  • (a) Fully compiled: class BOOST_URL_DECL C. All members in .cpp files. No inline or constexpr members.
  • (b) Fully header-only: class BOOST_SYMBOL_VISIBLE C. All inline/constexpr/template. No .cpp file.

We documented the full rationale in config.hpp. We suspect other C++ libraries have not encountered this because they either do not test shared library builds as extensively as we do, or they are header-only.

The Result

Boost.URL can now parse URLs at compile time under C++20 (PR #976). All parse functions (parse_uri, parse_uri_reference, parse_relative_ref, parse_absolute_uri, and parse_origin_form) are fully constexpr. A malformed URL literal becomes a compile error rather than a runtime failure:

// Parsed and validated at compile time.
// A malformed literal would fail to compile.
constexpr url_view api_base =
    parse_uri("https://api.example.com/v2").value();

Pre-parsed constexpr URL views also serve as zero-cost constants: because all parsing happens during compilation, components like scheme, host, and port are available at runtime with no parsing overhead. This is useful for applications that compare against well-known endpoints, pre-populate configuration defaults, or build routing tables without paying for string parsing at startup.

The constexpr feature taught us that dismissing a request because the cost seems too high for one library misses the bigger picture. Once Peter Dimov and the other maintainers got involved, the cost was shared and the scope shrank. In the Boost ecosystem, a feature that seems expensive in isolation can become practical when the dependencies cooperate.

Usability Improvements

While integrating Boost.URL into Boost.Beast2, the Beast2 authors noticed friction in common operations that worked correctly but required more code than they should. At the same time, several community issues had been open for a while. We used this as an opportunity to address both.

Convenience Functions

The most requested feature was get_or for query containers: look up a query parameter by key and return a default value if it is not present.

Before:

auto it = url.params().find("page");
auto page = it != url.params().end() ? (*it).value : "1";

After:

auto page = url.params().get_or("page", "1");

We also added standalone decode functions for working with individual URL components without constructing a full URL object:

auto plain = decode("My%20Stuff");
assert(plain && *plain == "My Stuff");

auto n = decoded_size("Program%20Files");
assert(n && *n == 13);

C++20 Integration

enable_borrowed_range is now specialized for 10 Boost.URL view types (segments_view, params_view, decode_view, and others). Unlike a std::vector, which owns its data, Boost.URL views point into the URL’s buffer without owning it. When a temporary view is destroyed, its iterators still point to valid memory. enable_borrowed_range tells the compiler this is safe, so algorithms like std::ranges::find can return iterators from temporary views without the compiler rejecting the code:

segments_view::iterator it;
{
    segments_view ps("/path/to/file.txt");
    it = ps.begin();
}
// iterator is still valid (points to external buffer)
assert(*it == "path");

The grammar system gained user-provided RangeRule support. Custom grammar rules for parsing URL components satisfy a concept requiring first() and next() methods returning system::result<value_type>:

struct my_range_rule
{
    using value_type = core::string_view;

    system::result<value_type>
    first(char const*& it, char const* end) const noexcept;

    system::result<value_type>
    next(char const*& it, char const* end) const noexcept;
};

The motivation was performance and API clarity (#943). Previously, grammar::range<T> always type-erased the rule through a recycled_ptr with string storage. Stateless rules were paying for storage they did not need. With user-provided RangeRule, range<T, RangeRule> detects empty rules and avoids the type-erasure overhead entirely.

Performance

Component offsets in url_impl changed from size_t to uint32_t, reducing the size of every URL object on 64-bit platforms. The maximum URL size is capped at UINT32_MAX - 1 (enforced by a static_assert). Constructing a segments_view or segments_encoded_view from a URL is now a constant-time operation: offsets are computed directly from iterator indices without scanning the path.

Other improvements

Fixes

  • a87998a params_iter_impl::decrement() computed incorrect decoded key/value sizes when a query parameter’s value contains literal = characters (PR #978, #972)
  • 60c281a decode_view::remove_prefix/remove_suffix asserted n <= size() instead of preventing undefined behavior (PR #978, #973)
  • 01e0571 decode_view was forward-declared but not complete when pct_string_view::operator*() was declared (PR #963)

Refactors

Documentation

Tests

Most of these improvements came from real usage. The Beast2 integration exposed friction that we would not have found from inside the library, and the community issues represented patterns that multiple users had independently hit. The best usability feedback comes from people who are actually building something with the library.

Acknowledgments and Reflections

The constexpr work benefited from the contributions of Peter Dimov (Boost.System, Boost.Variant2) and Andrzej Krzemieński (Boost.Optional), who applied fixes to their libraries so that Boost.URL could proceed. The Beast2 usability feedback came from the Beast2 authors as they integrated Boost.URL into the new design.

The work on Boost.URL has shifted. The problems we are solving now (edge cases found by professional auditors, compiler limitations for constexpr, usability friction from real integrations) are different from the problems we used to solve. They are smaller and more specific, but they matter more because real people hit them.

The complete set of changes is available in the Boost.URL repository.

All Posts by This Author