Events
Hook into DuckDB's internal events — query begin/end, transactions, connections — and ship them as JSON to any external program via stdin. Audit logging, error alerting, observability integration, and SIEM feeds, all configured with PRAGMAs.
Install
-- Install the extension
INSTALL events FROM community;
-- Load it into your session
LOAD events;
-- Configure where events go and what to capture
SET events_destination = '/usr/local/bin/event-handler.py';
SET events_types = ['query_begin', 'query_end'];
-- Optional: don't block queries on the handler
SET events_async = TRUE;
-- Optional: tag every event with a session name
SET events_session_name = 'analytics-replica';
-- Every query now produces a JSON event line on the handler's stdin Technical Overview
Why Use Events?
Capture DuckDB's internal lifecycle — queries executed, transactions opened and closed, errors raised, connections churning — and stream the timeline as JSON lines into any external program. In-process telemetry, audit trails, cost tracking, and SIEM feeds, all wired up with PRAGMAs.
📡 What this extension is for
Events makes the otherwise-opaque inside of a DuckDB process observable. You get a structured event line for every query, transaction, and connection — without instrumenting your application code or scraping logs after the fact.
- • Audit trails: Capture
query_begin/query_endfor every statement and ship the JSON to a tamper-resistant log store. Every query, with timestamps, error status, and the originating session name. - • Error alerting:
query_endcarrieshas_error,error_message, anderror_typewhen a query fails — pipe the stream into a handler that pages on regressions or unexpected error types. - • Cost and usage tracking: Pair
query_beginandquery_endbyquery_idto get per-query latency and per-session volume. Useevents_session_nameto attribute events to tenants, jobs, or replicas. - • Transaction lifecycle observability:
transaction_begin/transaction_commit/transaction_rollbackeach carry the sametransaction_idandis_read_onlyflag — so you can reconstruct exactly which statements ran inside which transaction.
⚙️ How it works
The extension hooks DuckDB's internal event surface and pipes one JSON object per event into the stdin of an external program. There is no SQL function to call — configuration is entirely PRAGMA-driven, so wiring up event capture is four SET statements.
- • Configure with PRAGMAs: Set
events_destinationto a fully-qualified executable path. Pick the event types to capture withevents_types. Optional:events_session_nametags every event, andevents_asyncswitches between synchronous and fire-and-forget delivery. - • Seven event types:
connection_opened,connection_closed,query_begin,query_end,transaction_begin,transaction_commit, andtransaction_rollback. Default capture is['query_begin', 'query_end']— the smallest set that still gives you query-level audit visibility. - • JSON lines on stdin: Every event becomes one line of JSON on the handler's stdin. Common fields are
event,timestamp(ISO 8601),database_path,session_name,connection_id, andprocess_id. Type-specific fields layer on top —query_id,transaction_id,error_message, etc. - • Sync or fire-and-forget: Default delivery is synchronous: DuckDB writes the event and waits for the handler to read it before returning from the originating query. Reliable, but each event adds latency.
SET events_async = TRUEswitches to fire-and-forget — no blocking, but events can be dropped if the handler is slow or dies.
🛡️ Production caveats
What to know before pointing this at a production database.
- • Process-scoped, not durable: Configuration lives in the running DuckDB process — restart and it's gone. Set the PRAGMAs at session start (or wrap them in a startup script). Same caveat as the
cronjobextension: events are observable while the process is up, not durable across restarts. - • One handler per session: Only one destination at a time. If you need fan-out — file plus Kafka plus alerting — point
events_destinationat a small dispatcher that reads stdin and forks the stream itself (tee, a Python script,vector, etc.). - • Sync mode blocks queries: In synchronous mode (the default) a slow or hung handler will hold up every query. Add timeouts in the handler, or switch to
events_async = TRUEfor high-throughput workloads where the occasional dropped event is acceptable. - • Handler runs as DuckDB: The destination program is launched by the DuckDB process and inherits its privileges. Treat the
events_destinationpath as security-sensitive — point it at a fully-qualified, read-only-by-others script.
🎯 Common Use Cases
In-process audit log
Capture query_begin / query_end and append the JSON lines to a daily file or a tamper-evident store. Every statement that touched the database, with timestamps and error status.
Real-time error alerting
Filter query_end events for has_error = true and post the payload to PagerDuty, OpsGenie, Slack, or your SIEM. No log scraping, no polling — events arrive as they happen.
Per-session cost attribution
Tag each DuckDB process with events_session_name (tenant id, job id, replica name) and emit query_begin/query_end pairs. Aggregate downstream for query latency, count, and error rate per attribution key.
Transaction replay and forensics
Capture all three transaction events plus query events. Group by transaction_id to reconstruct exactly which statements ran inside each commit or rollback — useful for incident postmortems.
Deep Dive
Technical Details
What you can do with four PRAGMAs
The single most useful pattern: turn every query DuckDB runs into a structured JSON line on the stdin of a program of your choice, with no SQL functions to call and no application code to change:
LOAD events;
SET events_destination = '/usr/local/bin/event-handler.py';
SET events_types = ['query_begin', 'query_end'];
SET events_session_name = 'analytics-replica';
SET events_async = TRUE;
From this point on, every statement you run produces a query_begin and query_end line on the handler’s stdin — paired by query_id, tagged with the session name, with has_error and error_message populated when something goes wrong. The handler can be a shell script, a Python service, a Go binary, or anything else that reads stdin a line at a time.
Events is configured per running DuckDB process, not in the database file. PRAGMAs are forgotten on restart — set them again at session start, or stash them in a startup script. The same caveat applies to the sibling cronjob extension: both are “operate your DuckDB process” tools and observability is only available while the process is up.
For a durable audit trail, point events_destination at a handler that writes to durable storage (file, S3, Kafka, syslog) — the destination is durable, the configuration is not.
How delivery works
The extension launches the program at events_destination as a child process and writes one JSON event per line to its stdin. The handler is a long-lived consumer — DuckDB does not re-spawn it per event.
- Synchronous (default). DuckDB writes the event line and waits for the handler to consume it before returning from the originating query. Reliable: every event the database emits is acknowledged by the handler before the query completes. The cost is latency on every query.
SET events_async = TRUE. Fire-and-forget. DuckDB does not block on the handler. If the handler stalls, dies, or can’t keep up, events are dropped silently. Use for high-throughput workloads where “best-effort observability” is the right tradeoff.
There’s no in-between mode — pick one per session.
Event types and what they carry
All events share a common envelope: event, timestamp (ISO 8601), database_path, session_name, connection_id, process_id. Per-type fields layer on top:
| Event | Extra fields |
|---|---|
connection_opened, connection_closed | (envelope only) |
query_begin | query_id, transaction_id, attached_databases[] |
query_end | query_id, transaction_id, has_error, error_message?, error_type?, attached_databases[] |
transaction_begin, transaction_commit | transaction_id, start_timestamp, is_read_only |
transaction_rollback | transaction_id, start_timestamp, is_read_only, has_error, error_message?, error_type? |
query_id correlates query_begin and query_end for the same statement; transaction_id ties query events to the surrounding transaction. attached_databases is an array of {name, path, type, read_only, temporary} records describing every database visible to the connection at event time — useful when you have multiple ATTACH-ed catalogs and want to attribute activity to one of them.
Choosing what to capture
The default ['query_begin', 'query_end'] is the right starting point: query-level audit visibility, two events per statement. Scale up or down as needed:
- Audit-only: keep the default. Two events per query, both with
query_idso you can pair them. - Errors-only:
['query_end']and filterhas_error = truein the handler — half the volume. - Connection churn: add
connection_opened,connection_closedto spot connection leaks or ephemeral session storms. - Transactional forensics: add the three
transaction_*events; group bytransaction_idto reconstruct exactly which statements ran inside each commit or rollback.
There’s no per-query filter at the extension layer — every event of an enabled type is emitted. Filter downstream in the handler if you need to.
Operational guidance
- Path must be fully qualified.
events_destinationis invoked directly, not via a shell.python3 my_handler.pywon’t work — point at the executable file (/usr/local/bin/my-handler.pywith a shebang, or/usr/bin/python3with the script asargv[1]if you wrap it). - The handler runs as DuckDB. It inherits the database process’s privileges and environment. Treat the destination path as security-sensitive: write-protect it from the database user.
- Sync mode = your handler is on the critical path. A 50ms-per-event handler adds 50ms to every query. Add timeouts and a fast path; or accept the tradeoff and switch to async.
- Use
events_session_namefor attribution. When several DuckDB processes feed the same handler — replicas, tenant-per-process layouts, CI runners — give each one a session name so events are distinguishable downstream.
Reference
Extension Contents
Quick reference to all available functions and settings organized by category.
| Name | Description | |
|---|---|---|
| Configuration PRAGMAs that wire DuckDB's internal event stream to an external program. Set | ||
| events_async | When TRUE, events are delivered fire-and-forget — DuckDB doesn't wait for the handler before returning from the originating query | |
| events_destination | Path to an external program that receives event JSON via stdin | |
| events_session_name | Optional string included in every event | |
| events_types | List of event types to capture | |
Configuration
Settings
Configure the events extension behavior using these pragma settings.
events_async
When TRUE, events are delivered fire-and-forget — DuckDB doesn't wait for the handler before returning from the originating query. Trades reliability for latency.
false events_destination
Path to an external program that receives event JSON via stdin. Set to enable event forwarding; unset to disable. Must be a fully-qualified executable path.
null events_session_name
Optional string included in every event. Useful for distinguishing events from multiple DuckDB processes feeding the same handler.
'' events_types
List of event types to capture. Available: 'connection_opened', 'connection_closed', 'query_begin', 'query_end', 'transaction_begin', 'transaction_commit', 'transaction_rollback'. Default: ['query_begin', 'query_end'].
'[query_begin, query_end]' Practical Examples
Cookbook
Real-world recipes and patterns for common use cases.
Wire up a handler
LOAD events;
-- Required: a fully-qualified path to an executable
SET events_destination = '/usr/local/bin/event-handler.py';
-- Pick the event types (default is ['query_begin', 'query_end'])
SET events_types = ['query_begin', 'query_end'];
-- Optional: a tag included in every event line
SET events_session_name = 'analytics-replica';
-- Optional: don't block queries on the handler
SET events_async = TRUE;
Append every query to a daily log file
The smallest possible audit trail — one JSONL file per day, written by a shell handler:
#!/usr/bin/env bash
# /usr/local/bin/log-to-file.sh — append every event to a daily log
exec >> "/var/log/duckdb/events-$(date +%F).jsonl"
cat
SET events_destination = '/usr/local/bin/log-to-file.sh';
SET events_types = ['query_begin', 'query_end'];
-- Every query now appends two lines to today's log file
SELECT count(*) FROM orders;
Page on query errors
query_end includes has_error, error_message, and error_type when a query fails. Filter in the handler and post to your alerting system:
#!/usr/bin/env python3
# /usr/local/bin/error-alerter.py
import json, sys, urllib.request
for line in sys.stdin:
e = json.loads(line)
if e.get('event') == 'query_end' and e.get('has_error'):
urllib.request.urlopen(
'https://alerts.example.com/page',
data=line.encode(),
)
SET events_destination = '/usr/local/bin/error-alerter.py';
SET events_types = ['query_end'];
SET events_async = TRUE; -- don't let a slow alerter block queries
-- A failed query produces a query_end with has_error=true
SELECT * FROM nonexistent_table;
Track transaction lifecycle
Capture the full begin/commit/rollback sequence and group by transaction_id:
SET events_destination = '/usr/local/bin/txn-tracker.py';
SET events_types = ['transaction_begin', 'transaction_commit', 'transaction_rollback'];
BEGIN;
INSERT INTO orders VALUES (1, 'widget', 99.99);
COMMIT;
-- Two events: transaction_begin and transaction_commit, same transaction_id.
A transaction_rollback carries has_error and error_message when the rollback was triggered by a failure.
Per-tenant attribution
When several DuckDB processes feed the same handler, tag each one and group downstream by session_name:
-- In the per-tenant startup script
SET events_destination = '/usr/local/bin/multi-tenant-collector.py';
SET events_session_name = 'tenant-acme';
SET events_types = ['query_begin', 'query_end'];
SET events_async = TRUE;
# /usr/local/bin/multi-tenant-collector.py — bucket events by tenant
import json, sys, collections
counts = collections.Counter()
for line in sys.stdin:
e = json.loads(line)
if e.get('event') == 'query_end':
counts[e.get('session_name', 'unknown')] += 1
# ... ship to your metrics backend, or write to per-tenant files
Pair query_begin and query_end for latency
Both events share a query_id. Maintain a small in-memory map in the handler:
#!/usr/bin/env python3
# /usr/local/bin/latency-recorder.py
import json, sys
from datetime import datetime
starts = {}
for line in sys.stdin:
e = json.loads(line)
qid = e.get('query_id')
ts = datetime.fromisoformat(e['timestamp'].replace('Z', '+00:00'))
if e['event'] == 'query_begin':
starts[qid] = ts
elif e['event'] == 'query_end' and qid in starts:
latency_ms = (ts - starts.pop(qid)).total_seconds() * 1000
print(f"qid={qid} latency_ms={latency_ms:.1f} error={e.get('has_error')}",
flush=True)
SET events_destination = '/usr/local/bin/latency-recorder.py';
SET events_types = ['query_begin', 'query_end'];
Fan out to multiple consumers
Only one events_destination is allowed. Point it at a small dispatcher that forks the stream:
#!/usr/bin/env bash
# /usr/local/bin/fanout.sh — split the event stream three ways
exec tee \
>(cat >> /var/log/duckdb/events.jsonl) \
>(curl -s -X POST -H 'content-type: application/x-ndjson' \
--data-binary @- https://collector.example.com/ingest) \
> /dev/null
SET events_destination = '/usr/local/bin/fanout.sh';
Re-applying configuration on session start
PRAGMAs do not persist across restarts (same caveat as the cronjob extension). Stash the four SET statements in a startup file that every connection runs:
-- /etc/duckdb/init.sql
LOAD events;
SET events_destination = '/usr/local/bin/event-handler.py';
SET events_types = ['query_begin', 'query_end'];
SET events_session_name = 'prod-replica-1';
SET events_async = TRUE;
duckdb -init /etc/duckdb/init.sql /data/analytics.duckdb Platform Support
Compatibility
Extension availability may vary by platform and DuckDB version. Check below to ensure this extension supports your environment before installation.
Quick Facts
Platforms
Supported platform architectures
Compiled binary sizes
| Platform | Architecture | Size |
|---|---|---|
| Linux | aarch64 | 2.66 MB |
| Linux | x86_64 | 3.03 MB |
| macOS | Apple Silicon | 2.27 MB |
| macOS | Intel | 2.62 MB |
| Windows | x86_64 | 7.51 MB |
| WASM | eh | 18.1 KB |
| WASM | mvp | 15.6 KB |
| WASM | threads | 15.6 KB |
Gzipped download size from the DuckDB community-extensions registry.
Stream DuckDB Events Out
Install Events to forward query, transaction, and connection events as JSON lines to any handler program — log shipping, audit trails, observability.