Building a VGI worker

A worker is a small program that exposes functions and tables to DuckDB. Write it in your language, run it, and ATTACH it — that's the whole contract.

The contract

A worker is a small program

A function is a class with a typed compute method; you list your functions on a Worker and run it. The reference SDK is Python, and the same shape exists in other languages. Its docstring becomes the function's description in DuckDB.

terminal
# Install the reference Python SDK.
pip install vgi
my_worker.py
from typing import Annotated
import pyarrow as pa
import pyarrow.compute as pc
from vgi import ScalarFunction, Param, Returns, Worker


class UpperCase(ScalarFunction):
    """Convert string values to uppercase."""

    @classmethod
    def compute(
        cls,
        value: Annotated[pa.StringArray, Param(doc="String value to uppercase")],
    ) -> Annotated[pa.StringArray, Returns()]:
        return pc.utf8_upper(value)


class MyWorker(Worker):
    catalog_name = "my_funcs"
    functions = [UpperCase]


if __name__ == "__main__":
    MyWorker().run()

Types are declared with Annotated hints, and compute receives and returns Apache Arrow arrays — so you work on whole columns at once (here, with pyarrow.compute) rather than row by row. If compute raises, the error surfaces back in the DuckDB query.

Build → run → attach → query

Attach it like any worker

Point ATTACH at your script. With a local path, VGI launches the worker as a pooled subprocess for you; with an http(s):// URL it reaches a worker you host. Either way, the functions and tables it exposes are now ordinary SQL.

DuckDB / Haybarn shell
-- In DuckDB (or Haybarn) with the VGI extension loaded:
INSTALL vgi FROM community;
LOAD vgi;

-- Attach your worker. With a path, VGI launches it as a subprocess for you.
ATTACH 'my_funcs' (TYPE vgi, LOCATION './my_worker.py');

-- Its functions are now native SQL.
SELECT upper_case(name) FROM users;

See Architecture for the transport options (local subprocess, HTTP, Unix socket, and the launcher that keeps one warm worker shared across clients).

Beyond scalar UDFs

Every function shape

A worker isn't limited to one-value-in, one-value-out. It can implement the full range of DuckDB function shapes, all surfaced as native SQL.

Scalar

One value in, one value out per row — like the UpperCase example. The everyday UDF.

Table

Returns a set of rows. The common case for wrapping an API, dataset, or service as a queryable table.

Table-in-out (streaming)

Receives rows and emits rows as it goes — for transforms that process a stream without buffering all of it.

Buffered table

Sees every input row before emitting any output — for operations that need the whole input first.

Aggregate

Combines many rows into one result, usable in GROUP BY like SUM or COUNT.

Windowed aggregate

An aggregate evaluated over a moving window, usable with OVER (...).

Workers also expose schemas, tables, and views with column statistics and pushdown — see the Python SDK examples for each shape.