Apache Arrow over a pluggable wire.
A DuckDB-side extension, your worker, and a transport between them — with a bind / process / finalize lifecycle that maps cleanly onto DuckDB's vectorized engine.
Big picture
Your systems, arriving as SQL
DuckDB issues an Arrow IPC request through the transport; the worker dispatches to the matching method and streams an Arrow IPC response back. The wire is the same whether the worker is a subprocess or a service across the network.
The transport layer
Same protocol, your choice of pipe
vgi-rpc abstracts the wire so you can switch transports without changing your worker.
Pipes
OS pipes between processes on the same machine. Lowest setup cost. The default for the spawned-subprocess case.
Unix sockets
Local domain sockets when the worker is already running. Same performance envelope as pipes; survives independently of the DuckDB process.
Shared memory
Zero-copy Arrow IPC over a memory region. Highest throughput for very large batches on the same host. Opt-in.
HTTP
Workers anywhere on the network. Trades latency for reach — useful for shared inference services and remote teams.
Benchmarks and a comparison matrix live on the dedicated RPC site. vgi-rpc.query.farm →
Function lifecycle
bind → process → finalize
Every VGI function (scalar, table, aggregate) follows the same lifecycle, just with different hooks.
bind
DuckDB asks the worker what types it returns and validates argument types. Type bounds (`type_bound=...`) are enforced here, before any data moves. Errors at this stage are reported as SQL plan errors.
process
For each Arrow record batch DuckDB hands the worker, the worker runs `compute()` and returns the result batch. The vectorized engine on both sides keeps this loop tight.
finalize
Aggregates and table-in/out functions get a final call to flush state — emit the running aggregate, close the cursor, release resources.
For aggregates this maps to initialize / update / finalize on the Python class:
class RowCount(AggregateFunction):
@classmethod
def initialize(cls) -> int:
return 0
@classmethod
def update(cls, state: int, batch: pa.RecordBatch) -> int:
return state + batch.num_rows
@classmethod
def finalize(cls, state: int) -> int:
return state Wire format
Apache Arrow IPC, all the way down
Every request and response is an Arrow IPC stream — a self-describing schema followed by record batches. That's it. There's no bespoke serialization, no JSON envelope, no language-specific wire types. If you can read Arrow, you can implement a worker.
The full byte-level specification — opcodes, framing, error handling — is published on the RPC site.
Read the Wire Protocol Spec →