Building a VGI worker
A worker is a small program that exposes functions and tables to DuckDB. Write it in your language, run it, and ATTACH it — that's the whole contract.
The contract
A worker is a small program
A function is a class with a typed compute method;
you list your functions on a Worker and run it.
The reference SDK is Python, and
the same shape exists in other languages.
Its docstring becomes the function's description in DuckDB.
# Install the reference Python SDK.
pip install vgi from typing import Annotated
import pyarrow as pa
import pyarrow.compute as pc
from vgi import ScalarFunction, Param, Returns, Worker
class UpperCase(ScalarFunction):
"""Convert string values to uppercase."""
@classmethod
def compute(
cls,
value: Annotated[pa.StringArray, Param(doc="String value to uppercase")],
) -> Annotated[pa.StringArray, Returns()]:
return pc.utf8_upper(value)
class MyWorker(Worker):
catalog_name = "my_funcs"
functions = [UpperCase]
if __name__ == "__main__":
MyWorker().run()
Types are declared with Annotated hints, and
compute receives and returns
Apache Arrow arrays — so you work on
whole columns at once (here, with pyarrow.compute)
rather than row by row. If compute raises, the
error surfaces back in the DuckDB query.
Build → run → attach → query
Attach it like any worker
Point ATTACH at your script. With a local path,
VGI launches the worker as a pooled subprocess for you; with an
http(s):// URL it reaches a worker you host. Either
way, the functions and tables it exposes are now ordinary SQL.
-- In DuckDB (or Haybarn) with the VGI extension loaded:
INSTALL vgi FROM community;
LOAD vgi;
-- Attach your worker. With a path, VGI launches it as a subprocess for you.
ATTACH 'my_funcs' (TYPE vgi, LOCATION './my_worker.py');
-- Its functions are now native SQL.
SELECT upper_case(name) FROM users; See Architecture for the transport options (local subprocess, HTTP, Unix socket, and the launcher that keeps one warm worker shared across clients).
Beyond scalar UDFs
Every function shape
A worker isn't limited to one-value-in, one-value-out. It can implement the full range of DuckDB function shapes, all surfaced as native SQL.
Scalar
One value in, one value out per row — like the UpperCase example. The everyday UDF.
Table
Returns a set of rows. The common case for wrapping an API, dataset, or service as a queryable table.
Table-in-out (streaming)
Receives rows and emits rows as it goes — for transforms that process a stream without buffering all of it.
Buffered table
Sees every input row before emitting any output — for operations that need the whole input first.
Aggregate
Combines many rows into one result, usable in GROUP BY like SUM or COUNT.
Windowed aggregate
An aggregate evaluated over a moving window, usable with OVER (...).
Workers also expose schemas, tables, and views with column statistics and pushdown — see the Python SDK examples for each shape.