Ruben's Q2 2024 Update

Jul 9, 2024

On C++20 modules and Boost

This quarter started with exciting discussions about the possibility to introduce C++20 modules in Boost. I’ve dedicated a lot of time to study and reduce Boost.MySQL build times, so I promptly volunteered to conduct some investigation on the benefits and costs of modules.

I’ve written two articles (available here and here) about this topic. They can be roughly summed up as:

  • Module clean builds aren’t as fast as I’d expect, but partial re-builds are much faster.
  • Modules are still poorly supported by compilers and tooling, although work is being performed on them.
  • Making a library consumable as a module requires non-trivial development, testing and maintenance effort.
  • At the point of writing, mixing standard includes and imports causes problems (although the standard mandates the opposite). As a consequence, if we’re to support consuming a certain Boost library as a module, all its dependencies must support it, too.
  • At the point of writing, most of the Boost community considers the effort is too big, and prefers to wait until modules are more stable.
  • Distributing modules so they can be consumed by CMake is highly non-trivial and one of the biggest blockers.

I’m happy for this discussion to have taken place. We’ve all learnt a lot, and we now know enough to make informed decisions.

using std::cpp 2024

I’ve had the honor of getting a talk on Boost.Asio’s universal async model accepted for using std::cpp 2024. You can find the slides and code samples here.

The best part of it has been getting in touch with the community. I’ve met a lot of real Asio users and been able to discuss their pain-points in person. I’m happy to see most people working with C++17 and C++20 rather than old standards. Many people manifested interest in C++20 coroutines, but are not willing to roll their own awaitable types - they just wanted to call co_await and go. I was surprised to learn that they didn’t know that asio::deferred already could do this for them, so I think I got the talk topic right.

I think we need to lead by example, and I’d like to re-write servertech-chat using C++20 coroutines - it’s a great example for newcomers to learn these.

I’ve also been answering many of the questions coming up in #boost-asio in Slack.

Client-side SQL formatting enhancements

Boost.MySQL client-side SQL allows composing queries client-side without incurring in injection vulnerabilities. It works great for simple queries, but it was too verbose for cases involving ranges. Consider batch lookup:

// Retrieve all users matching the IDs provided in ids
asio::awaitable<std::vector<user>> lookup_users(any_connection& conn, std::span<const std::int64_t> ids)
{
    // Compose the query
    mysql::format_context ctx(conn.format_opts().value());
    ctx.append_raw("SELECT * FROM user WHERE id IN (");
    bool is_first = true;
    for (auto id: ids)
    {
        // Comma separator
        if (!is_first) ctx.append_raw(", ");
        is_first = false;

        // Actual id
        mysql::format_sql_to(ctx, "{}", id);
    }
    ctx.append_raw(")");
    std::string query = std::move(ctx).get().value();

    // Run it
    mysql::static_results<user> res;
    co_await conn.async_execute(query, res);
    co_return {res.rows().begin(), res.rows().end()};
}

That’s verbose and easy to get wrong. And the price of an error here can be a vulnerability. To solve this, we’ve added built-in support for ranges:

asio::awaitable<std::vector<user>> lookup_users(any_connection& conn, std::span<const std::int64_t> ids)
{
    // Compose the query. May generate "SELECT * FROM user WHERE id IN (10, 21, 202)"
    auto query = mysql::format_sql(conn.format_opts().value(), "SELECT * FROM user WHERE id IN ({})", ids);

    // Run it
    mysql::static_results<user> res;
    co_await conn.async_execute(query, res);
    co_return {res.rows().begin(), res.rows().end()};
}

Much better, isn’t it? And if you need additional functionality, mysql::sequence allows to pass custom glue strings and per-element formatting functions. Most cases, including batch inserts, can be expressed in terms of a single format string.

Our next step here is implementing an easy-to-use execution request that colaesces composing the query and executing it in a single step. This came up during the review, and it’s finally going to be a reality.

Pipeline mode

MySQL client/server protocol is strictly half duplex. The client sends a request, and the server responds. Performing some measuring, some use cases involving lightweight requests are dominated by round-trip time. In these cases, coalescing individual requests into a single message helps performance.

Use cases fitting this description include connection setup code and preparing/closing statements in batch.

The connection_pool class has been using this feature internally for a release, and we now expose it for the general public:

// Sets up a connection for re-use. connection_pool cleans up connections in a similar way
asio::awaitable<void> setup_connection(any_connection& conn)
{
    // Build a pipeline describing what to do.
    mysql::pipeline_request req;
    req.add_reset_connection() // wipe session state
        .add_set_character_set(mysql::utf8mb4_charset) // SET NAMES utf8mb4
        .add_execute("SET time_zone = '+00:00'"); // Use UTC as time zone
    std::vector<mysql::stage_response> res;

    // Execute the pipeline
    co_await conn.async_run_pipeline(req, res, asio::deferred);
}

We can write this because we fully control the library’s serialization and networking, rather than wrapping other MySQL clients.

Other Boost.MySQL work

I’ve also worked on lesser (but necessary) tasks on Boost.MySQL, including enabling buffer size limits for any_connection, adding support for C++20 time types to our date and datetime types, as well as some bug fixing and refactoring.

All Posts by This Author