📡

Events

Hook into DuckDB's internal events — query begin/end, transactions, connections — and ship them as JSON to any external program via stdin. Audit logging, error alerting, observability integration, and SIEM feeds, all configured with PRAGMAs.

Install

-- Install the extension
INSTALL events FROM community;

-- Load it into your session
LOAD events;

-- Configure where events go and what to capture
SET events_destination = '/usr/local/bin/event-handler.py';
SET events_types = ['query_begin', 'query_end'];

-- Optional: don't block queries on the handler
SET events_async = TRUE;

-- Optional: tag every event with a session name
SET events_session_name = 'analytics-replica';

-- Every query now produces a JSON event line on the handler's stdin

Technical Overview

Why Use Events?

Capture DuckDB's internal lifecycle — queries executed, transactions opened and closed, errors raised, connections churning — and stream the timeline as JSON lines into any external program. In-process telemetry, audit trails, cost tracking, and SIEM feeds, all wired up with PRAGMAs.

📡 What this extension is for

Events makes the otherwise-opaque inside of a DuckDB process observable. You get a structured event line for every query, transaction, and connection — without instrumenting your application code or scraping logs after the fact.

  • Audit trails: Capture query_begin / query_end for every statement and ship the JSON to a tamper-resistant log store. Every query, with timestamps, error status, and the originating session name.
  • Error alerting: query_end carries has_error, error_message, and error_type when a query fails — pipe the stream into a handler that pages on regressions or unexpected error types.
  • Cost and usage tracking: Pair query_begin and query_end by query_id to get per-query latency and per-session volume. Use events_session_name to attribute events to tenants, jobs, or replicas.
  • Transaction lifecycle observability: transaction_begin / transaction_commit / transaction_rollback each carry the same transaction_id and is_read_only flag — so you can reconstruct exactly which statements ran inside which transaction.

⚙️ How it works

The extension hooks DuckDB's internal event surface and pipes one JSON object per event into the stdin of an external program. There is no SQL function to call — configuration is entirely PRAGMA-driven, so wiring up event capture is four SET statements.

  • Configure with PRAGMAs: Set events_destination to a fully-qualified executable path. Pick the event types to capture with events_types. Optional: events_session_name tags every event, and events_async switches between synchronous and fire-and-forget delivery.
  • Seven event types: connection_opened, connection_closed, query_begin, query_end, transaction_begin, transaction_commit, and transaction_rollback. Default capture is ['query_begin', 'query_end'] — the smallest set that still gives you query-level audit visibility.
  • JSON lines on stdin: Every event becomes one line of JSON on the handler's stdin. Common fields are event, timestamp (ISO 8601), database_path, session_name, connection_id, and process_id. Type-specific fields layer on top — query_id, transaction_id, error_message, etc.
  • Sync or fire-and-forget: Default delivery is synchronous: DuckDB writes the event and waits for the handler to read it before returning from the originating query. Reliable, but each event adds latency. SET events_async = TRUE switches to fire-and-forget — no blocking, but events can be dropped if the handler is slow or dies.

🛡️ Production caveats

What to know before pointing this at a production database.

  • Process-scoped, not durable: Configuration lives in the running DuckDB process — restart and it's gone. Set the PRAGMAs at session start (or wrap them in a startup script). Same caveat as the cronjob extension: events are observable while the process is up, not durable across restarts.
  • One handler per session: Only one destination at a time. If you need fan-out — file plus Kafka plus alerting — point events_destination at a small dispatcher that reads stdin and forks the stream itself (tee, a Python script, vector, etc.).
  • Sync mode blocks queries: In synchronous mode (the default) a slow or hung handler will hold up every query. Add timeouts in the handler, or switch to events_async = TRUE for high-throughput workloads where the occasional dropped event is acceptable.
  • Handler runs as DuckDB: The destination program is launched by the DuckDB process and inherits its privileges. Treat the events_destination path as security-sensitive — point it at a fully-qualified, read-only-by-others script.

🎯 Common Use Cases

In-process audit log

Capture query_begin / query_end and append the JSON lines to a daily file or a tamper-evident store. Every statement that touched the database, with timestamps and error status.

Real-time error alerting

Filter query_end events for has_error = true and post the payload to PagerDuty, OpsGenie, Slack, or your SIEM. No log scraping, no polling — events arrive as they happen.

Per-session cost attribution

Tag each DuckDB process with events_session_name (tenant id, job id, replica name) and emit query_begin/query_end pairs. Aggregate downstream for query latency, count, and error rate per attribution key.

Transaction replay and forensics

Capture all three transaction events plus query events. Group by transaction_id to reconstruct exactly which statements ran inside each commit or rollback — useful for incident postmortems.

Deep Dive

Technical Details

What you can do with four PRAGMAs

The single most useful pattern: turn every query DuckDB runs into a structured JSON line on the stdin of a program of your choice, with no SQL functions to call and no application code to change:

LOAD events;

SET events_destination  = '/usr/local/bin/event-handler.py';
SET events_types        = ['query_begin', 'query_end'];
SET events_session_name = 'analytics-replica';
SET events_async        = TRUE;

From this point on, every statement you run produces a query_begin and query_end line on the handler’s stdin — paired by query_id, tagged with the session name, with has_error and error_message populated when something goes wrong. The handler can be a shell script, a Python service, a Go binary, or anything else that reads stdin a line at a time.

Process-scoped, just like cronjob

Events is configured per running DuckDB process, not in the database file. PRAGMAs are forgotten on restart — set them again at session start, or stash them in a startup script. The same caveat applies to the sibling cronjob extension: both are “operate your DuckDB process” tools and observability is only available while the process is up.

For a durable audit trail, point events_destination at a handler that writes to durable storage (file, S3, Kafka, syslog) — the destination is durable, the configuration is not.

How delivery works

The extension launches the program at events_destination as a child process and writes one JSON event per line to its stdin. The handler is a long-lived consumer — DuckDB does not re-spawn it per event.

  • Synchronous (default). DuckDB writes the event line and waits for the handler to consume it before returning from the originating query. Reliable: every event the database emits is acknowledged by the handler before the query completes. The cost is latency on every query.
  • SET events_async = TRUE. Fire-and-forget. DuckDB does not block on the handler. If the handler stalls, dies, or can’t keep up, events are dropped silently. Use for high-throughput workloads where “best-effort observability” is the right tradeoff.

There’s no in-between mode — pick one per session.

Event types and what they carry

All events share a common envelope: event, timestamp (ISO 8601), database_path, session_name, connection_id, process_id. Per-type fields layer on top:

EventExtra fields
connection_opened, connection_closed(envelope only)
query_beginquery_id, transaction_id, attached_databases[]
query_endquery_id, transaction_id, has_error, error_message?, error_type?, attached_databases[]
transaction_begin, transaction_committransaction_id, start_timestamp, is_read_only
transaction_rollbacktransaction_id, start_timestamp, is_read_only, has_error, error_message?, error_type?

query_id correlates query_begin and query_end for the same statement; transaction_id ties query events to the surrounding transaction. attached_databases is an array of {name, path, type, read_only, temporary} records describing every database visible to the connection at event time — useful when you have multiple ATTACH-ed catalogs and want to attribute activity to one of them.

Choosing what to capture

The default ['query_begin', 'query_end'] is the right starting point: query-level audit visibility, two events per statement. Scale up or down as needed:

  • Audit-only: keep the default. Two events per query, both with query_id so you can pair them.
  • Errors-only: ['query_end'] and filter has_error = true in the handler — half the volume.
  • Connection churn: add connection_opened, connection_closed to spot connection leaks or ephemeral session storms.
  • Transactional forensics: add the three transaction_* events; group by transaction_id to reconstruct exactly which statements ran inside each commit or rollback.

There’s no per-query filter at the extension layer — every event of an enabled type is emitted. Filter downstream in the handler if you need to.

Operational guidance

  • Path must be fully qualified. events_destination is invoked directly, not via a shell. python3 my_handler.py won’t work — point at the executable file (/usr/local/bin/my-handler.py with a shebang, or /usr/bin/python3 with the script as argv[1] if you wrap it).
  • The handler runs as DuckDB. It inherits the database process’s privileges and environment. Treat the destination path as security-sensitive: write-protect it from the database user.
  • Sync mode = your handler is on the critical path. A 50ms-per-event handler adds 50ms to every query. Add timeouts and a fast path; or accept the tradeoff and switch to async.
  • Use events_session_name for attribution. When several DuckDB processes feed the same handler — replicas, tenant-per-process layouts, CI runners — give each one a session name so events are distinguishable downstream.

Reference

Extension Contents

Quick reference to all available functions and settings organized by category.

Name Description
Configuration

PRAGMAs that wire DuckDB's internal event stream to an external program. Set events_destination to a fully-qualified executable path, pick which event types to capture with events_types, optionally tag the stream with events_session_name, and switch between synchronous (default) and fire-and-forget delivery via events_async.

events_async When TRUE, events are delivered fire-and-forget — DuckDB doesn't wait for the handler before returning from the originating query
events_destination Path to an external program that receives event JSON via stdin
events_session_name Optional string included in every event
events_types List of event types to capture

Configuration

Settings

Configure the events extension behavior using these pragma settings.

events_async

When TRUE, events are delivered fire-and-forget — DuckDB doesn't wait for the handler before returning from the originating query. Trades reliability for latency.

Default Value: false

events_destination

Path to an external program that receives event JSON via stdin. Set to enable event forwarding; unset to disable. Must be a fully-qualified executable path.

Default Value: null

events_session_name

Optional string included in every event. Useful for distinguishing events from multiple DuckDB processes feeding the same handler.

Default Value: ''

events_types

List of event types to capture. Available: 'connection_opened', 'connection_closed', 'query_begin', 'query_end', 'transaction_begin', 'transaction_commit', 'transaction_rollback'. Default: ['query_begin', 'query_end'].

Default Value: '[query_begin, query_end]'

Practical Examples

Cookbook

Real-world recipes and patterns for common use cases.

Wire up a handler

LOAD events;

-- Required: a fully-qualified path to an executable
SET events_destination = '/usr/local/bin/event-handler.py';

-- Pick the event types (default is ['query_begin', 'query_end'])
SET events_types = ['query_begin', 'query_end'];

-- Optional: a tag included in every event line
SET events_session_name = 'analytics-replica';

-- Optional: don't block queries on the handler
SET events_async = TRUE;

Append every query to a daily log file

The smallest possible audit trail — one JSONL file per day, written by a shell handler:

#!/usr/bin/env bash
# /usr/local/bin/log-to-file.sh — append every event to a daily log
exec >> "/var/log/duckdb/events-$(date +%F).jsonl"
cat
SET events_destination = '/usr/local/bin/log-to-file.sh';
SET events_types       = ['query_begin', 'query_end'];

-- Every query now appends two lines to today's log file
SELECT count(*) FROM orders;

Page on query errors

query_end includes has_error, error_message, and error_type when a query fails. Filter in the handler and post to your alerting system:

#!/usr/bin/env python3
# /usr/local/bin/error-alerter.py
import json, sys, urllib.request

for line in sys.stdin:
    e = json.loads(line)
    if e.get('event') == 'query_end' and e.get('has_error'):
        urllib.request.urlopen(
            'https://alerts.example.com/page',
            data=line.encode(),
        )
SET events_destination = '/usr/local/bin/error-alerter.py';
SET events_types       = ['query_end'];
SET events_async       = TRUE;     -- don't let a slow alerter block queries

-- A failed query produces a query_end with has_error=true
SELECT * FROM nonexistent_table;

Track transaction lifecycle

Capture the full begin/commit/rollback sequence and group by transaction_id:

SET events_destination = '/usr/local/bin/txn-tracker.py';
SET events_types       = ['transaction_begin', 'transaction_commit', 'transaction_rollback'];

BEGIN;
INSERT INTO orders VALUES (1, 'widget', 99.99);
COMMIT;
-- Two events: transaction_begin and transaction_commit, same transaction_id.

A transaction_rollback carries has_error and error_message when the rollback was triggered by a failure.

Per-tenant attribution

When several DuckDB processes feed the same handler, tag each one and group downstream by session_name:

-- In the per-tenant startup script
SET events_destination  = '/usr/local/bin/multi-tenant-collector.py';
SET events_session_name = 'tenant-acme';
SET events_types        = ['query_begin', 'query_end'];
SET events_async        = TRUE;
# /usr/local/bin/multi-tenant-collector.py — bucket events by tenant
import json, sys, collections

counts = collections.Counter()
for line in sys.stdin:
    e = json.loads(line)
    if e.get('event') == 'query_end':
        counts[e.get('session_name', 'unknown')] += 1
        # ... ship to your metrics backend, or write to per-tenant files

Pair query_begin and query_end for latency

Both events share a query_id. Maintain a small in-memory map in the handler:

#!/usr/bin/env python3
# /usr/local/bin/latency-recorder.py
import json, sys
from datetime import datetime

starts = {}
for line in sys.stdin:
    e = json.loads(line)
    qid = e.get('query_id')
    ts  = datetime.fromisoformat(e['timestamp'].replace('Z', '+00:00'))

    if e['event'] == 'query_begin':
        starts[qid] = ts
    elif e['event'] == 'query_end' and qid in starts:
        latency_ms = (ts - starts.pop(qid)).total_seconds() * 1000
        print(f"qid={qid} latency_ms={latency_ms:.1f} error={e.get('has_error')}",
              flush=True)
SET events_destination = '/usr/local/bin/latency-recorder.py';
SET events_types       = ['query_begin', 'query_end'];

Fan out to multiple consumers

Only one events_destination is allowed. Point it at a small dispatcher that forks the stream:

#!/usr/bin/env bash
# /usr/local/bin/fanout.sh — split the event stream three ways
exec tee \
  >(cat >> /var/log/duckdb/events.jsonl) \
  >(curl -s -X POST -H 'content-type: application/x-ndjson' \
        --data-binary @- https://collector.example.com/ingest) \
  > /dev/null
SET events_destination = '/usr/local/bin/fanout.sh';

Re-applying configuration on session start

PRAGMAs do not persist across restarts (same caveat as the cronjob extension). Stash the four SET statements in a startup file that every connection runs:

-- /etc/duckdb/init.sql
LOAD events;
SET events_destination  = '/usr/local/bin/event-handler.py';
SET events_types        = ['query_begin', 'query_end'];
SET events_session_name = 'prod-replica-1';
SET events_async        = TRUE;
duckdb -init /etc/duckdb/init.sql /data/analytics.duckdb

Platform Support

Compatibility

Extension availability may vary by platform and DuckDB version. Check below to ensure this extension supports your environment before installation.

Quick Facts

Software License MIT
Pricing Free
Written In C++
Source Available Yes
View on GitHub
Usage
2,718+
loads in last 30 days

Platforms

Linux
Linux (musl)
macOS
Windows
WASM
Supported platform architectures
Linux: x86_64, aarch64
Linux (musl): Not available
macOS: Intel, Apple Silicon
Windows: x86_64
WASM: mvp, threads, eh
Compiled binary sizes
Platform Architecture Size
Linux aarch64 2.66 MB
Linux x86_64 3.03 MB
macOS Apple Silicon 2.27 MB
macOS Intel 2.62 MB
Windows x86_64 7.51 MB
WASM eh 18.1 KB
WASM mvp 15.6 KB
WASM threads 15.6 KB

Gzipped download size from the DuckDB community-extensions registry.

DuckDB Versions

Release calendar
Supported
v1.5.2

Stream DuckDB Events Out

Install Events to forward query, transaction, and connection events as JSON lines to any handler program — log shipping, audit trails, observability.