A postgres library for Boost
Do you know Boost.MySQL? If you’ve been reading my posts, you probably do. Many people have wondered ‘why not Postgres?’. Well, the time is now. TL;DR: I’m writing the equivalent of Boost.MySQL, but for PostgreSQL. You can find the code here.
Since libPQ is already a good library, the NativePG project intends to be more ambitious than Boost.MySQL. In addition to the expected Asio interface, I intend to provide a sans-io API that exposes primitives like message serialization.
Throughout this post, I will go into the intended library design and the rationales behind its design.
The lowest level: message serialization
PostgreSQL clients communicate with the server using a binary protocol on top of TCP, termed the frontend/backend protocol. The protocol defines a set of messages used for interactions. For example, when running a query, the following happens:
┌────────┐ ┌────────┐
│ Client │ │ Server │
└───┬────┘ └───┬────┘
│ │
│ Query │
│ ──────────────────────────────────────────> │
│ │
│ RowDescription │
│ <────────────────────────────────────────── │
│ │
│ DataRow │
│ <────────────────────────────────────────── │
│ │
│ CommandComplete │
│ <────────────────────────────────────────── │
│ │
│ ReadyForQuery │
│ <────────────────────────────────────────── │
│ │
In the lowest layer, this library provides functions to serialize and parse such messages. The goal here is being as efficient as possible. Parsing functions are non-allocating, and use an approach inspired by Boost.Url collections:
Parsing database types
The PostgreSQL type system is quite rich. In addition to the usual SQL built-in types, it supports advanced scalars like UUIDs, arrays and user-defined aggregates.
When running a query, libPQ exposes retrieved data as either raw text or bytes.
This is what the server sends in the DataRow packets shown above.
To do something useful with the data, users likely need parsing and serializing
such types.
The next layer of NativePG is in charge of providing such functions. This will likely contain some extension points for users to plug in their types. This is the general form of such functions:
system::error_code parse(span<const std::byte> from, T& to, const connection_state&);
void serialize(const T& from, dynamic_buffer& to, const connection_state&);
Note that some types might require access to session configuration. For instance, dates may be expressed using different wire formats depending on the connection’s runtime settings.
At the time of writing, only ints and strings are supported, but this will be extended soon.
Composing requests
Efficiency in database communication is achieved with pipelining. A network round-trip with the server is worth a thousand allocations in the client. It is thus critical that:
- The protocol properly supports pipelining. This is the case with PostgreSQL.
- The client should expose an interface to it, and make it very easy to use. libPQ does the first, and NativePG intends to achieve the second.
NativePG pipelines by default. In NativePG, a request object is always
a pipeline:
// Create a request
request req;
// These two queries will be executed as part of a pipeline
req.add_query("SELECT * FROM libs WHERE author = $1", {"Ruben"});
req.add_query("DELETE FROM libs WHERE author <> $1", {"Ruben"});
Everything you may ask the server can be added to request.
This includes preparing and executing statements, establishing
pipeline synchronization points, and so on.
It aims to be close enough to the protocol to be powerful,
while also exposing high-level functions to make things easier.
Reading responses
Like request, the core response mechanism aims to be as close
to the protocol as possible. Since use cases here are much more varied,
there is no single response class, but a concept, instead.
This is what a response_handler looks like:
struct my_handler {
// Check that the handler is compatible with the request,
// and prepare any required data structures. Called once at the beginning
handler_setup_result setup(const request& req, std::size_t pipeline_offset);
// Called once for every message received from the server
// (e.g. `RowDescription`, `DataRow`, `CommandComplete`)
void on_message(const any_request_message& msg);
// The overall result of the operation (error_code + diagnostic string).
// Called after the operation has finished.
const extended_error& result() const;
};
Note that on_message is not allowed to report errors.
Even if a handler encounters a problem with a message
(imagine finding a NULL for a field where the user isn’t expecting one),
this is a user error, rather than a protocol error.
Subsequent steps in the pipeline must not be affected by this.
This is powerful but very low-level. Using this mechanism, the library exposes an interface to parse the result of a query into a user-supplied struct, using Boost.Describe:
struct library
{
std::int32_t id;
std::string name;
std::string cpp_version;
};
BOOST_DESCRIBE_STRUCT(library, (), (id, name, cpp_version))
// ...
std::vector<library> libs;
auto handler = nativepg::into(libs); // this is a valid response_handler
Network algorithms
Given a user request and response handler, how do we send these to the server?
We need a set of network algorithms to achieve this. Some of these are trivial:
sending a request to the server is an asio::write on the request’s buffer.
Others, however, are more involved:
- Reading a pipeline response needs to verify that the message sequence is what we expected, for security, and handle errors gracefully.
- The handshake algorithm, in charge of authentication when we connect to the server, needs to respond to server authentication challenges, which may come in different forms.
Writing these using asio::async_compose is problematic because:
- They become tied to Boost.Asio.
- They are difficult to test.
- They result in long compile times and code bloat due to templating.
At the moment, these are written as finite state machines, similar to how OpenSSL behaves in non-blocking mode:
// Reads the response of a pipeline (simplified).
// This is a hand-wired generator.
class read_response_fsm {
public:
// User-supplied arguments: request and response
read_response_fsm(const request& req, response_handler_ref handler);
// Yielded to signal that we should read from the server
struct read_args { span<std::byte> buffer; };
// Yielded to signal that we're done
struct done_args { system::error_code result; };
variant<read_args, done_args>
resume(connection_state&, system::error_code io_result, std::size_t bytes_transferred);
};
The idea is that higher-level code should call resume until it returns
a done_args value. This allows de-coupling from the underlying I/O runtime.
Since NativePG targets C++20, I’m considering rewriting this as a coroutine. Boost.Capy (currently under development - hopefully part of Boost soon) could be a good candidate for this.
Putting everything together: the Asio interface
At the end of the day, most users just want a connection object they can easily
use. Once all the sans-io parts are working, writing it is pretty straight-forward.
This is what end user code looks like:
// Create a connection
connection conn{co_await asio::this_coro::executor};
// Connect
co_await conn.async_connect(
{.hostname = "localhost", .username = "postgres", .password = "", .database = "postgres"}
);
std::cout << "Startup complete\n";
// Compose our request and response
request req;
req.add_query("SELECT * FROM libs WHERE author = $1", {"Ruben"});
std::vector<library> libs;
// Run the request
co_await conn.async_exec(req, into(libs));
Auto-batch connections
While connection is good, experience has shown me that it’s still
too low-level for most users:
- Connection establishment is manual with
async_connect. - No built-in reconnection or health checks.
- No built-in concurrent execution of requests.
That is,
async_execfirst writes the request, then reads the response. Other requests may not be executed during this period. This limits the connection’s throughput.
For this reason, NativePG will provide some higher-level interfaces that will make server communication easier and more efficient. To get a feel of what we need, we should first understand the two main usage patterns that we expect.
Most of the time, connections are used in a stateless way. For example, consider querying data from the server:
request req;
req.add_query("SELECT * FROM libs WHERE author = $1", {"Ruben"});
co_await conn.async_exec(req, res);
This query is not mutating connection state in any way. Other queries could be inserted before and after it without making any difference.
I plan to add a higher-level connection type, similar to
redis::connection in Boost.Redis, that automatically
batches concurrent requests and handles reconnection.
The key differences with connection would be:
- Several independent tasks can share an auto-batch connection.
This is an error for
connection. - If several requests are queued at the same time, the connection may send them together to the server using a single system call.
- There is no
async_connectin an auto-batch connection. Reconnection is handled automatically.
Note that this pattern is not exclusive to read-only or individual queries. Transactions can work by using protocol features:
request req;
req.set_autosync(false); // All subsequent queries are part of the same transaction
req.add_query("UPDATE table1 SET x = $1 WHERE y = 2", {42});
req.add_query("UPDATE table2 SET x = $1 WHERE y = 42", {2});
req.add_sync(); // The two updates run atomically
co_await conn.async_exec(req, res);
Connection pools
I mentioned there were two main usage scenarios in the library. Sometimes, it is required to use connections in a stateful way:
request req;
req.add_simple_query("BEGIN"); // start a transaction manually
req.add_query("SELECT * FROM library WHERE author = $1 FOR UPDATE", {"Ruben"}); // lock rows
co_await conn.async_exec(req, lib);
// Do something in the client that depends on lib
if (lib.id == "Boost.MySQL")
co_return; // don't
// Now compose another request that depends on what we read from lib
req.clear();
req.add_query("UPDATE library SET status = 'deprecated' WHERE id = $1", {lib.id});
req.add_simple_query("COMMIT");
co_await conn.async_exec(req, ignore);
The key point here is that this pattern requires exclusive access to conn.
No other requests should be interleaved between the first and the second
async_exec invocations.
The best way to solve this is by using a connection pool. This is what client code could look like:
co_await pool.async_exec([&] (connection& conn) -> asio::awaitable<system::error_code> {
request req;
req.add_simple_query("BEGIN");
req.add_query("SELECT balance, status FROM accounts WHERE user_id = $1 FOR UPDATE", {user_id});
account_info acc;
co_await conn.async_exec(req, into(acc));
// Check if account has sufficient funds and is active
if (acc.balance < payment_amount || acc.status != "active")
co_return error::insufficient_funds;
// Call external payment gateway API - this CANNOT be done in SQL
auto result = co_await payment_gateway.process_charge(user_id, payment_amount);
// Compose next request based on the external API response
req.clear();
if (result.success) {
req.add_query(
"UPDATE accounts SET balance = balance - $1 WHERE user_id = $2",
{payment_amount, user_id}
);
req.add_simple_query("COMMIT");
}
co_await conn.async_exec(req, ignore);
// The connection is automatically returned to the pool when this coroutine completes
co_return result.success ? error_code{} : error::payment_failed;
});
I explicitly want to avoid having a connection_pool::async_get_connection()
function, like in Boost.MySQL. This function returns a proxy object that grants access
to a free connection. When destroyed, the connection is returned to the pool.
This pattern looks great on paper, but runs into severe complications in
multi-threaded code. The proxy object’s destructor needs to mutate the pool’s state,
thus needing at least an asio::dispatch to the pool’s executor, which may or may not
be a strand. It is so easy to get wrong that Boost.MySQL added a pool_params::thread_safe boolean
option to take care of this automatically, adding extra complexity. Definitely something to avoid.
SQL formatting
As we’ve seen, the protocol has built-in support for adding
parameters to queries (see placeholders like $1). These placeholders
are expanded in the server securely.
While this covers most cases, sometimes we need to generate SQL
that is too dynamic to be handled by the server. For instance,
a website might allow multiple optional filters, translating into
WHERE clauses that might or might not be present.
These use cases require SQL generated in the client. To do so,
we need a way of formatting user-supplied values without
running into SQL injection vulnerabilities. The final piece
of the library becomes a format_sql function akin to the
one in Boost.MySQL.
Final thoughts
While the plan is clear, there is still much to be done here. There are dedicated APIs for high-throughput data copying and push notifications that need to be implemented. Some of the described APIs have a solid working implementation, while others still need some work. All in all, I hope that this library can soon reach a state where it can be useful to people.
All Posts by This Author
- 01/23/2026 A postgres library for Boost
- 10/07/2025 Levelling up Boost.Redis
- 07/10/2025 Ready, Set, Redis!
- 04/10/2025 Moving Boost forward: Asio, coroutines, and maybe even modules
- 01/06/2025 Boost.MySQL 1.87 and the new Boost citizens
- 10/20/2024 Boost.MySQL 1.87 features: with_params and with_diagnostics
- 10/27/2023 Ruben's Q3 2023 Update
- View All Posts...