Cronjob
Schedule SQL queries to run on a cron expression — but only while the DuckDB process is alive. Six-field expressions with second-level resolution, list/inspect/cancel from SQL. Best fit: periodic refreshes, recurring exports, scheduled webhook alerts, and maintenance tasks inside a long-running DuckDB process.
Install
-- Install the extension
INSTALL cronjob FROM community;
-- Load it into your session
LOAD cronjob;
-- Run a query every 15 seconds during hours 1–4
SELECT cron('SELECT now()', '*/15 * 1-4 * * *');
-- Inspect scheduled jobs
SELECT * FROM cron_jobs();
-- Cancel a job by its job_id
SELECT cron_delete('task_0'); Technical Overview
Why Use Cronjob?
Schedule SQL queries to run on a cron expression directly from inside DuckDB — no external scheduler, no shell wrapper, no extra process. The catch is the scope: schedules live in memory inside the running DuckDB process. When the process exits, every job is forgotten. Best fit when DuckDB is already a long-running service (a server, a daemon, an analytics worker), not a CLI you start and stop.
⏰ What this extension is for
Cronjob turns a long-running DuckDB process into its own task scheduler. Register a SQL query against a cron expression and DuckDB fires it on schedule — useful when the work is already SQL and you want to keep it that way.
- • Periodic refresh of materialized data: Rebuild a rollup table every 5 minutes, or recompute a daily aggregate at 02:00 — wrap a
CREATE OR REPLACE TABLEincronand the scheduler keeps it current as long as the process is up. - • Recurring exports: Schedule a
COPY ... TOto write Parquet, CSV, or JSON to disk / S3 on a cadence. Pair with thehttpfsextension for cloud destinations. - • Periodic webhook / Slack alerts: Combine with the
http_clientorwebmacroextensions: a scheduled query fires an HTTP POST whenever a metric crosses a threshold. - • Maintenance tasks: Schedule
ANALYZE,VACUUM, statistics rebuilds, log-table pruning, or any other housekeeping that should happen on a cadence rather than on every query.
⚙️ How it works
cron(query, schedule) registers an entry in an in-memory job table managed by the extension. A background thread inside the DuckDB process wakes on the cron tick, executes the registered SQL through the same connection, and records the result in cron_jobs.
- • Six-field cron expressions: Schedules use the six-field form
second minute hour day-of-month month day-of-week, with the standard*,?,,,-,/operators andMON–SUNweekday names. Sanity-check expressions on crontab.guru (note: crontab.guru parses the five-field POSIX form — Cronjob's leadingsecondsfield is an extension). - • In-memory job registry: Jobs live in process memory only. There is no table on disk, no WAL entry, no cluster coordination.
cron_jobsis the introspection table;cron_deleteis the un-register call. - • Same connection, same access: Scheduled queries run against the database the scheduler was registered from, with the same access rights. Anything you can run interactively, you can schedule.
- • No persistence: Restart the process and every registration is gone. There is no file to delete, no stale entry to clean up — and equally, no resume after a crash. Restart-safe scheduling lives outside this extension.
🛡️ Production caveats
The fundamentals to internalize before scheduling anything important.
- • Schedules die with the process: This is the central caveat. A
kill -9, a deploy, an OOM, or a clean shutdown all wipe the registry — there is no resume. If the work needs to happen on a wall-clock cadence regardless of process state, drive the run from an external orchestrator (systemcron(8),systemdtimers, Kubernetes CronJob) and have that invoke DuckDB. - • No multi-process coordination: Two DuckDB processes that both load Cronjob and register the same job will both run it. There is no leader election, no lease — pick one process to own the schedule, or de-duplicate downstream.
- • Errors are recorded, not retried: A failed query writes its error into
last_resultincron_jobs. The extension does not retry, back off, or alert — wrap your scheduled query in its own error-reporting logic (write to an audit table, fire a webhook) if you need notification. - • Long-running queries block the next tick: Each registered job is single-threaded against itself. If a query takes longer than the schedule interval, the next tick waits — no overlap, no parallel runs of the same job.
- • Experimental status: Marked experimental upstream. The function surface is small and stable in spirit, but pin a known-good extension version in production until it's promoted.
🎯 Common Use Cases
Refresh a rollup table on a cadence
CREATE OR REPLACE TABLE wrapped in cron — the simplest path to an always-fresh aggregate without an external scheduler.
Periodic exports to disk or S3
Schedule a COPY ... TO to write Parquet to a local path or, with httpfs, to S3 / GCS — daily reports without leaving SQL.
Threshold-triggered webhooks
Pair with the http_client extension: a query that POSTs to a Slack / PagerDuty webhook whenever a metric crosses a threshold, scheduled to run every minute.
Routine maintenance
Schedule ANALYZE, prune log tables older than N days, or rebuild secondary indexes — work that doesn't belong on the hot path of any single query.
Deep Dive
Technical Details
What you can do with one query
The shortest path from “I want this SQL to run every 5 minutes” to a live schedule:
SELECT cron(
'CREATE OR REPLACE TABLE hourly_rollup AS
SELECT bucket, SUM(amount) AS total
FROM events
WHERE ts >= now() - INTERVAL 1 HOUR
GROUP BY bucket',
'0 */5 * * * *'
);
-- → 'task_0'
cron registers the query against a six-field cron expression and returns a job_id. A background thread in this DuckDB process wakes on each tick, runs the query, and writes the outcome into cron_jobs. No external scheduler, no shell wrapper.
Cronjob lives entirely inside the running DuckDB process. A graceful shutdown, a kill, a crash, an OOM, a deploy — any of these wipe every registered job. There is no on-disk registry, no WAL entry, no resume after restart.
If the work needs to happen on wall-clock time regardless of process state, drive it from an external scheduler — system cron(8), systemd timers, or Kubernetes CronJob — and have it invoke DuckDB. Use this extension when DuckDB is already a long-running service and you want SQL-defined scheduling inside that lifetime.
Cron expression format
Cronjob uses six fields, with seconds as the leading field — one more than POSIX cron(8):
┌────────── second (0–59)
│ ┌──────── minute (0–59)
│ │ ┌────── hour (0–23)
│ │ │ ┌──── day of month (1–31)
│ │ │ │ ┌── month (1–12)
│ │ │ │ │ ┌ day of week (0–6, Sun..Sat, or MON–SUN)
* * * * * *
Operators: * (any), ? (no specific value, used in day-of-month / day-of-week), , (list), - (range), / (step). crontab.guru is the standard sanity-check tool — note that crontab.guru parses the five-field POSIX form, so prepend a 0 (or */N ) to translate.
A few common patterns:
| Pattern | Meaning |
|---|---|
0 0 * * * * | Top of every hour |
0 */5 * * * * | Every 5 minutes |
*/15 * * * * * | Every 15 seconds |
0 0 0 * * * | Daily at midnight |
0 0 7 ? * MON-FRI | Weekdays at 07:00 |
0 0 0 1 * ? | First of every month |
What runs, where
cron registers the query against the same database connection the scheduler was loaded into. That has two consequences:
- The scheduled query inherits the same access rights, attached databases, and secrets as the registering session.
- If you load Cronjob in a session that targets an in-memory database, the schedule disappears with the database — even before process exit.
Each job runs single-threaded against itself: if a tick’s query takes longer than the schedule interval, the next tick waits its turn. No overlap, no parallel runs of the same registration.
Inspect and cancel
cron_jobs is the introspection table — every registered job, its next scheduled run, the last run, and the last result (or error message):
SELECT job_id, schedule, next_run, status, last_run, last_result
FROM cron_jobs()
ORDER BY next_run;
cron_delete un-registers a job by job_id. It returns TRUE if a matching job existed; restart wipes everything anyway, so this is for un-registration during a live process:
SELECT cron_delete('task_0');
Pairing with other extensions
Cronjob is one of two “operate your DuckDB process” extensions in this collection — pick by what you want to drive:
- Cronjob schedules SQL queries on a wall-clock cadence inside the process.
- Events streams query / transaction / connection events out of the process to an external handler.
Together they cover push (Events fires on internal activity) and pull (Cronjob fires on time). Both share the same fundamental scope: they only operate while the DuckDB process is alive.
For the alerting and exporting recipes, Cronjob composes naturally with:
http_client— schedule a query that POSTs to a webhook (Slack, PagerDuty, generic HTTP).webmacro— wrap repeated HTTP integrations as named macros, then schedule the macro.httpfs— schedule aCOPY ... TOthat writes Parquet to S3 / GCS.
When to use system cron instead
If any of the following are true, reach for system cron(8) / systemd timers / Kubernetes CronJob and have that invoke duckdb -c '...':
- The DuckDB process is short-lived (a CLI invocation, a script, a query runner).
- The schedule must survive deploys, restarts, or host moves.
- You need delivery guarantees, retries, or observability the host scheduler already provides.
- The work spans multiple databases or extends beyond what one DuckDB session can express.
Cronjob is for the opposite case: DuckDB is already a long-running service, and the convenience of “schedule it from SQL” outweighs the cost of “schedule lifetime equals process lifetime.”
Reference
Extension Contents
Quick reference to all available functions and settings organized by category.
| Name | Description | |
|---|---|---|
| Scheduling Register, list, and cancel cron-style scheduled SQL queries. Six-field cron expressions with second-level resolution. Schedules live in memory inside the DuckDB process — when the process exits, every job is forgotten. For restart-safe scheduling, drive the DuckDB run from an external orchestrator (system [ | ||
| cron() | Register a SQL query to run on a six-field cron expression (second minute hour day-of-month month day-of-week) | |
| cron_delete() | Cancel a scheduled job by job_id (the value returned from cron or visible in cron_jobs) | |
| cron_jobs() | Table function listing every scheduled job in this process — job_id, query, schedule, next_run, status, last_run, and last_result | |
API Reference
Function Documentation
Detailed documentation for each function including signatures, parameters, and examples.
cron
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | VARCHAR | Positional | |
col1 | VARCHAR | Positional |
Returns
Description
Register a SQL query to run on a six-field cron expression (second minute hour day-of-month month day-of-week). Returns a job_id (e.g. task_0) you can later pass to cron_delete or look up in cron_jobs. The schedule fires only while this DuckDB process is alive — when the process exits, the registration is gone. Use crontab.guru to sanity-check expressions before registering.
Examples
Every 15 seconds during hours 1–4
SELECT cron('SELECT now()', '*/15 * 1-4 * * *'); Weekdays at 7am — daily metrics export
SELECT cron('CALL daily_metrics_export()', '0 0 7 ? * MON-FRI'); Top of every hour — refresh a materialized table
SELECT cron('CREATE OR REPLACE TABLE hourly_rollup AS SELECT ...', '0 0 * * * *'); Related Functions
cron_delete
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | VARCHAR | Positional |
Returns
Description
Cancel a scheduled job by job_id (the value returned from cron or visible in cron_jobs). Returns TRUE if a matching job was found and removed. Process restarts also cancel jobs — cron_delete is for un-registering while DuckDB is still running.
Examples
Cancel a specific job
SELECT cron_delete('task_0'); Related Functions
cron_jobs
Signature
Parameters
| Parameter | Type | Mode | Description |
|---|
Returns
Description
Table function listing every scheduled job in this process — job_id, query, schedule, next_run, status, last_run, and last_result. The introspection primitive: filter by status to find failing jobs, or join last_result against expected output to assert that recent runs succeeded.
Examples
List all scheduled jobs
SELECT * FROM cron_jobs(); Find jobs that have never run
SELECT job_id, schedule FROM cron_jobs() WHERE last_run IS NULL; Related Functions
Practical Examples
Cookbook
Real-world recipes and patterns for common use cases.
Schedule a query
-- Refresh an hourly rollup every 5 minutes
SELECT cron(
'CREATE OR REPLACE TABLE hourly_rollup AS
SELECT bucket, SUM(amount) AS total
FROM events
WHERE ts >= now() - INTERVAL 1 HOUR
GROUP BY bucket',
'0 */5 * * * *'
);
-- → 'task_0'
cron returns a job_id (e.g. task_0) — capture it if you need to cancel later. Schedules tick only while this DuckDB process is running.
Inspect what’s scheduled
-- All scheduled jobs and their last result
SELECT job_id, schedule, next_run, status, last_run, last_result
FROM cron_jobs()
ORDER BY next_run;
-- Jobs that have never run yet
SELECT job_id, schedule
FROM cron_jobs()
WHERE last_run IS NULL;
-- Recently failed runs
SELECT job_id, last_run, last_result
FROM cron_jobs()
WHERE status = 'error';
Cancel a job
SELECT cron_delete('task_0');
-- TRUE if a matching job was found
Restart of the DuckDB process also cancels every registered job — cron_delete is for un-registering during a live process.
Periodic refresh of materialized data
-- Every 5 minutes: rebuild a recent-events rollup
SELECT cron(
'CREATE OR REPLACE TABLE recent_events AS
SELECT * FROM events WHERE ts >= now() - INTERVAL 1 HOUR',
'0 */5 * * * *'
);
-- Daily at 02:00: rebuild a heavy daily aggregate
SELECT cron(
'CREATE OR REPLACE TABLE daily_summary AS
SELECT date_trunc(''day'', ts) AS day,
user_id,
SUM(amount) AS total
FROM events
GROUP BY 1, 2',
'0 0 2 * * *'
);
Recurring exports to disk or S3
With the httpfs extension loaded, COPY ... TO writes to S3 / GCS just like any local path:
-- Local Parquet snapshot every hour
SELECT cron(
'COPY (SELECT * FROM events WHERE ts >= now() - INTERVAL 1 HOUR)
TO ''/var/exports/events_'' || strftime(now(), ''%Y%m%d_%H'') || ''.parquet''
(FORMAT PARQUET)',
'0 0 * * * *'
);
-- Daily export to S3 at 03:00
SELECT cron(
'COPY (SELECT * FROM daily_summary)
TO ''s3://my-bucket/exports/daily/'' || strftime(now(), ''%Y-%m-%d'') || ''.parquet''
(FORMAT PARQUET)',
'0 0 3 * * *'
);
Periodic Slack / webhook alerts
Pair with the http_client extension to fire a webhook when a metric crosses a threshold:
-- Every minute: post to Slack if error rate is high
SELECT cron(
'WITH r AS (
SELECT 100.0 * SUM(CASE WHEN status >= 500 THEN 1 END) / COUNT(*) AS pct
FROM requests WHERE ts >= now() - INTERVAL 5 MINUTE
)
SELECT http_post(
''https://hooks.slack.com/services/XXX/YYY/ZZZ'',
''{"text":"Error rate '' || round(pct, 2) || ''%"}'',
MAP {''Content-Type'': ''application/json''}
)
FROM r WHERE pct > 5',
'0 * * * * *'
);
The query writes nothing to disk — http_post is a side effect that fires only when the WHERE filter passes.
Maintenance: ANALYZE and pruning
-- Nightly ANALYZE at 01:30
SELECT cron('ANALYZE', '0 30 1 * * *');
-- Prune log rows older than 30 days, weekly on Sunday at 04:00
SELECT cron(
'DELETE FROM audit_log WHERE ts < now() - INTERVAL 30 DAY',
'0 0 4 ? * SUN'
);
Six-field cron quick reference
┌────────── second (0–59)
│ ┌──────── minute (0–59)
│ │ ┌────── hour (0–23)
│ │ │ ┌──── day of month (1–31)
│ │ │ │ ┌── month (1–12)
│ │ │ │ │ ┌ day of week (0–6, Sun..Sat, or MON–SUN)
* * * * * *
| Pattern | Meaning |
|---|---|
0 0 * * * * | Top of every hour |
0 */5 * * * * | Every 5 minutes |
*/15 * * * * * | Every 15 seconds |
0 0 0 * * * | Daily at midnight |
0 0 7 ? * MON-FRI | Weekdays at 07:00 |
0 0 0 1 * ? | First of every month |
Operators: * (any), ? (no specific value), , (list), - (range), / (step). Sanity-check on crontab.guru — note crontab.guru is five-field POSIX, so drop the leading seconds field when pasting.
Restart-safe scheduling: external orchestrator
When jobs must survive process restarts, drive DuckDB from system cron(8) instead of using this extension. Example crontab entry:
*/5 * * * * /usr/local/bin/duckdb /var/data/analytics.db -c "CREATE OR REPLACE TABLE hourly_rollup AS SELECT ..."
Or a systemd timer, or a Kubernetes CronJob. Use the Cronjob extension only when DuckDB is already a long-running service and the schedule’s lifetime can equal the process’s.
Platform Support
Compatibility
Extension availability may vary by platform and DuckDB version. Check below to ensure this extension supports your environment before installation.
Quick Facts
Platforms
Supported platform architectures
Compiled binary sizes
| Platform | Architecture | Size |
|---|---|---|
| Linux | aarch64 | 9.15 MB |
| Linux | x86_64 | 10.41 MB |
| Linux (musl) | x86_64 | 11.01 MB |
| macOS | Apple Silicon | 7.19 MB |
| macOS | Intel | 8.18 MB |
| Windows | x86_64 | 7.51 MB |
| WASM | eh | 18.3 KB |
| WASM | mvp | 16.0 KB |
| WASM | threads | 15.9 KB |
Gzipped download size from the DuckDB community-extensions registry.
Cron-style scheduling inside DuckDB
Install Cronjob to register repeating SQL tasks — refresh aggregates, export reports, run ANALYZE, fire webhooks — that tick along while your DuckDB process is running.