Cronjob

Schedule SQL queries to run on a cron expression — but only while the DuckDB process is alive. Six-field expressions with second-level resolution, list/inspect/cancel from SQL. Best fit: periodic refreshes, recurring exports, scheduled webhook alerts, and maintenance tasks inside a long-running DuckDB process.

Install

-- Install the extension
INSTALL cronjob FROM community;

-- Load it into your session
LOAD cronjob;

-- Run a query every 15 seconds during hours 1–4
SELECT cron('SELECT now()', '*/15 * 1-4 * * *');

-- Inspect scheduled jobs
SELECT * FROM cron_jobs();

-- Cancel a job by its job_id
SELECT cron_delete('task_0');

Technical Overview

Why Use Cronjob?

Schedule SQL queries to run on a cron expression directly from inside DuckDB — no external scheduler, no shell wrapper, no extra process. The catch is the scope: schedules live in memory inside the running DuckDB process. When the process exits, every job is forgotten. Best fit when DuckDB is already a long-running service (a server, a daemon, an analytics worker), not a CLI you start and stop.

What this extension is for

Cronjob turns a long-running DuckDB process into its own task scheduler. Register a SQL query against a cron expression and DuckDB fires it on schedule — useful when the work is already SQL and you want to keep it that way.

  • Periodic refresh of materialized data: Rebuild a rollup table every 5 minutes, or recompute a daily aggregate at 02:00 — wrap a CREATE OR REPLACE TABLE in cron and the scheduler keeps it current as long as the process is up.
  • Recurring exports: Schedule a COPY ... TO to write Parquet, CSV, or JSON to disk / S3 on a cadence. Pair with the httpfs extension for cloud destinations.
  • Periodic webhook / Slack alerts: Combine with the http_client or webmacro extensions: a scheduled query fires an HTTP POST whenever a metric crosses a threshold.
  • Maintenance tasks: Schedule ANALYZE, VACUUM, statistics rebuilds, log-table pruning, or any other housekeeping that should happen on a cadence rather than on every query.

⚙️ How it works

cron(query, schedule) registers an entry in an in-memory job table managed by the extension. A background thread inside the DuckDB process wakes on the cron tick, executes the registered SQL through the same connection, and records the result in cron_jobs.

  • Six-field cron expressions: Schedules use the six-field form second minute hour day-of-month month day-of-week, with the standard *, ?, ,, -, / operators and MONSUN weekday names. Sanity-check expressions on crontab.guru (note: crontab.guru parses the five-field POSIX form — Cronjob's leading seconds field is an extension).
  • In-memory job registry: Jobs live in process memory only. There is no table on disk, no WAL entry, no cluster coordination. cron_jobs is the introspection table; cron_delete is the un-register call.
  • Same connection, same access: Scheduled queries run against the database the scheduler was registered from, with the same access rights. Anything you can run interactively, you can schedule.
  • No persistence: Restart the process and every registration is gone. There is no file to delete, no stale entry to clean up — and equally, no resume after a crash. Restart-safe scheduling lives outside this extension.

🛡️ Production caveats

The fundamentals to internalize before scheduling anything important.

  • Schedules die with the process: This is the central caveat. A kill -9, a deploy, an OOM, or a clean shutdown all wipe the registry — there is no resume. If the work needs to happen on a wall-clock cadence regardless of process state, drive the run from an external orchestrator (system cron(8), systemd timers, Kubernetes CronJob) and have that invoke DuckDB.
  • No multi-process coordination: Two DuckDB processes that both load Cronjob and register the same job will both run it. There is no leader election, no lease — pick one process to own the schedule, or de-duplicate downstream.
  • Errors are recorded, not retried: A failed query writes its error into last_result in cron_jobs. The extension does not retry, back off, or alert — wrap your scheduled query in its own error-reporting logic (write to an audit table, fire a webhook) if you need notification.
  • Long-running queries block the next tick: Each registered job is single-threaded against itself. If a query takes longer than the schedule interval, the next tick waits — no overlap, no parallel runs of the same job.
  • Experimental status: Marked experimental upstream. The function surface is small and stable in spirit, but pin a known-good extension version in production until it's promoted.

🎯 Common Use Cases

Refresh a rollup table on a cadence

CREATE OR REPLACE TABLE wrapped in cron — the simplest path to an always-fresh aggregate without an external scheduler.

Periodic exports to disk or S3

Schedule a COPY ... TO to write Parquet to a local path or, with httpfs, to S3 / GCS — daily reports without leaving SQL.

Threshold-triggered webhooks

Pair with the http_client extension: a query that POSTs to a Slack / PagerDuty webhook whenever a metric crosses a threshold, scheduled to run every minute.

Routine maintenance

Schedule ANALYZE, prune log tables older than N days, or rebuild secondary indexes — work that doesn't belong on the hot path of any single query.

Deep Dive

Technical Details

What you can do with one query

The shortest path from “I want this SQL to run every 5 minutes” to a live schedule:

SELECT cron(
  'CREATE OR REPLACE TABLE hourly_rollup AS
     SELECT bucket, SUM(amount) AS total
     FROM events
     WHERE ts >= now() - INTERVAL 1 HOUR
     GROUP BY bucket',
  '0 */5 * * * *'
);
-- → 'task_0'

cron registers the query against a six-field cron expression and returns a job_id. A background thread in this DuckDB process wakes on each tick, runs the query, and writes the outcome into cron_jobs. No external scheduler, no shell wrapper.

Schedules die when the DuckDB process exits

Cronjob lives entirely inside the running DuckDB process. A graceful shutdown, a kill, a crash, an OOM, a deploy — any of these wipe every registered job. There is no on-disk registry, no WAL entry, no resume after restart.

If the work needs to happen on wall-clock time regardless of process state, drive it from an external scheduler — system cron(8), systemd timers, or Kubernetes CronJob — and have it invoke DuckDB. Use this extension when DuckDB is already a long-running service and you want SQL-defined scheduling inside that lifetime.

Cron expression format

Cronjob uses six fields, with seconds as the leading field — one more than POSIX cron(8):

┌────────── second  (0–59)
│ ┌──────── minute  (0–59)
│ │ ┌────── hour    (0–23)
│ │ │ ┌──── day of month (1–31)
│ │ │ │ ┌── month   (1–12)
│ │ │ │ │ ┌ day of week (0–6, Sun..Sat, or MON–SUN)
* * * * * *

Operators: * (any), ? (no specific value, used in day-of-month / day-of-week), , (list), - (range), / (step). crontab.guru is the standard sanity-check tool — note that crontab.guru parses the five-field POSIX form, so prepend a 0 (or */N ) to translate.

A few common patterns:

PatternMeaning
0 0 * * * *Top of every hour
0 */5 * * * *Every 5 minutes
*/15 * * * * *Every 15 seconds
0 0 0 * * *Daily at midnight
0 0 7 ? * MON-FRIWeekdays at 07:00
0 0 0 1 * ?First of every month

What runs, where

cron registers the query against the same database connection the scheduler was loaded into. That has two consequences:

  • The scheduled query inherits the same access rights, attached databases, and secrets as the registering session.
  • If you load Cronjob in a session that targets an in-memory database, the schedule disappears with the database — even before process exit.

Each job runs single-threaded against itself: if a tick’s query takes longer than the schedule interval, the next tick waits its turn. No overlap, no parallel runs of the same registration.

Inspect and cancel

cron_jobs is the introspection table — every registered job, its next scheduled run, the last run, and the last result (or error message):

SELECT job_id, schedule, next_run, status, last_run, last_result
FROM cron_jobs()
ORDER BY next_run;

cron_delete un-registers a job by job_id. It returns TRUE if a matching job existed; restart wipes everything anyway, so this is for un-registration during a live process:

SELECT cron_delete('task_0');

Pairing with other extensions

Cronjob is one of two “operate your DuckDB process” extensions in this collection — pick by what you want to drive:

  • Cronjob schedules SQL queries on a wall-clock cadence inside the process.
  • Events streams query / transaction / connection events out of the process to an external handler.

Together they cover push (Events fires on internal activity) and pull (Cronjob fires on time). Both share the same fundamental scope: they only operate while the DuckDB process is alive.

For the alerting and exporting recipes, Cronjob composes naturally with:

  • http_client — schedule a query that POSTs to a webhook (Slack, PagerDuty, generic HTTP).
  • webmacro — wrap repeated HTTP integrations as named macros, then schedule the macro.
  • httpfs — schedule a COPY ... TO that writes Parquet to S3 / GCS.

When to use system cron instead

If any of the following are true, reach for system cron(8) / systemd timers / Kubernetes CronJob and have that invoke duckdb -c '...':

  • The DuckDB process is short-lived (a CLI invocation, a script, a query runner).
  • The schedule must survive deploys, restarts, or host moves.
  • You need delivery guarantees, retries, or observability the host scheduler already provides.
  • The work spans multiple databases or extends beyond what one DuckDB session can express.

Cronjob is for the opposite case: DuckDB is already a long-running service, and the convenience of “schedule it from SQL” outweighs the cost of “schedule lifetime equals process lifetime.”

Reference

Extension Contents

Quick reference to all available functions and settings organized by category.

Name Description
Scheduling

Register, list, and cancel cron-style scheduled SQL queries. Six-field cron expressions with second-level resolution. Schedules live in memory inside the DuckDB process — when the process exits, every job is forgotten. For restart-safe scheduling, drive the DuckDB run from an external orchestrator (system [cron(8)](https://man7.org/linux/man-pages/man8/cron.8.html), systemd timers, Kubernetes CronJob).

cron() Register a SQL query to run on a six-field cron expression (second minute hour day-of-month month day-of-week)
cron_delete() Cancel a scheduled job by job_id (the value returned from cron or visible in cron_jobs)
cron_jobs() Table function listing every scheduled job in this process — job_id, query, schedule, next_run, status, last_run, and last_result

API Reference

Function Documentation

Detailed documentation for each function including signatures, parameters, and examples.

cron

Scalar Function Scheduling
Signature
cron(col0: VARCHAR, col1: VARCHAR) → VARCHAR
Parameters (Positional)
Parameter Type Mode Description
col0 VARCHAR Positional
col1 VARCHAR Positional
Returns
Description

Register a SQL query to run on a six-field cron expression (second minute hour day-of-month month day-of-week). Returns a job_id (e.g. task_0) you can later pass to cron_delete or look up in cron_jobs. The schedule fires only while this DuckDB process is alive — when the process exits, the registration is gone. Use crontab.guru to sanity-check expressions before registering.

Examples
1

Every 15 seconds during hours 1–4

SELECT cron('SELECT now()', '*/15 * 1-4 * * *');
2

Weekdays at 7am — daily metrics export

SELECT cron('CALL daily_metrics_export()', '0 0 7 ? * MON-FRI');
3

Top of every hour — refresh a materialized table

SELECT cron('CREATE OR REPLACE TABLE hourly_rollup AS SELECT ...', '0 0 * * * *');

cron_delete

Scalar Function Scheduling
Signature
cron_delete(col0: VARCHAR) → BOOLEAN
Parameters (Positional)
Parameter Type Mode Description
col0 VARCHAR Positional
Returns
Description

Cancel a scheduled job by job_id (the value returned from cron or visible in cron_jobs). Returns TRUE if a matching job was found and removed. Process restarts also cancel jobs — cron_delete is for un-registering while DuckDB is still running.

Examples
1

Cancel a specific job

SELECT cron_delete('task_0');

cron_jobs

Table Function Scheduling
Signature
cron_jobs() → None
Parameters
Parameter Type Mode Description
Returns
Description

Table function listing every scheduled job in this process — job_id, query, schedule, next_run, status, last_run, and last_result. The introspection primitive: filter by status to find failing jobs, or join last_result against expected output to assert that recent runs succeeded.

Examples
1

List all scheduled jobs

SELECT * FROM cron_jobs();
2

Find jobs that have never run

SELECT job_id, schedule FROM cron_jobs() WHERE last_run IS NULL;

Practical Examples

Cookbook

Real-world recipes and patterns for common use cases.

Schedule a query

-- Refresh an hourly rollup every 5 minutes
SELECT cron(
  'CREATE OR REPLACE TABLE hourly_rollup AS
     SELECT bucket, SUM(amount) AS total
     FROM events
     WHERE ts >= now() - INTERVAL 1 HOUR
     GROUP BY bucket',
  '0 */5 * * * *'
);
-- → 'task_0'

cron returns a job_id (e.g. task_0) — capture it if you need to cancel later. Schedules tick only while this DuckDB process is running.

Inspect what’s scheduled

-- All scheduled jobs and their last result
SELECT job_id, schedule, next_run, status, last_run, last_result
FROM cron_jobs()
ORDER BY next_run;

-- Jobs that have never run yet
SELECT job_id, schedule
FROM cron_jobs()
WHERE last_run IS NULL;

-- Recently failed runs
SELECT job_id, last_run, last_result
FROM cron_jobs()
WHERE status = 'error';

Cancel a job

SELECT cron_delete('task_0');
-- TRUE if a matching job was found

Restart of the DuckDB process also cancels every registered job — cron_delete is for un-registering during a live process.

Periodic refresh of materialized data

-- Every 5 minutes: rebuild a recent-events rollup
SELECT cron(
  'CREATE OR REPLACE TABLE recent_events AS
     SELECT * FROM events WHERE ts >= now() - INTERVAL 1 HOUR',
  '0 */5 * * * *'
);

-- Daily at 02:00: rebuild a heavy daily aggregate
SELECT cron(
  'CREATE OR REPLACE TABLE daily_summary AS
     SELECT date_trunc(''day'', ts) AS day,
            user_id,
            SUM(amount) AS total
     FROM events
     GROUP BY 1, 2',
  '0 0 2 * * *'
);

Recurring exports to disk or S3

With the httpfs extension loaded, COPY ... TO writes to S3 / GCS just like any local path:

-- Local Parquet snapshot every hour
SELECT cron(
  'COPY (SELECT * FROM events WHERE ts >= now() - INTERVAL 1 HOUR)
   TO ''/var/exports/events_'' || strftime(now(), ''%Y%m%d_%H'') || ''.parquet''
   (FORMAT PARQUET)',
  '0 0 * * * *'
);

-- Daily export to S3 at 03:00
SELECT cron(
  'COPY (SELECT * FROM daily_summary)
   TO ''s3://my-bucket/exports/daily/'' || strftime(now(), ''%Y-%m-%d'') || ''.parquet''
   (FORMAT PARQUET)',
  '0 0 3 * * *'
);

Periodic Slack / webhook alerts

Pair with the http_client extension to fire a webhook when a metric crosses a threshold:

-- Every minute: post to Slack if error rate is high
SELECT cron(
  'WITH r AS (
     SELECT 100.0 * SUM(CASE WHEN status >= 500 THEN 1 END) / COUNT(*) AS pct
     FROM requests WHERE ts >= now() - INTERVAL 5 MINUTE
   )
   SELECT http_post(
     ''https://hooks.slack.com/services/XXX/YYY/ZZZ'',
     ''{"text":"Error rate '' || round(pct, 2) || ''%"}'',
     MAP {''Content-Type'': ''application/json''}
   )
   FROM r WHERE pct > 5',
  '0 * * * * *'
);

The query writes nothing to disk — http_post is a side effect that fires only when the WHERE filter passes.

Maintenance: ANALYZE and pruning

-- Nightly ANALYZE at 01:30
SELECT cron('ANALYZE', '0 30 1 * * *');

-- Prune log rows older than 30 days, weekly on Sunday at 04:00
SELECT cron(
  'DELETE FROM audit_log WHERE ts < now() - INTERVAL 30 DAY',
  '0 0 4 ? * SUN'
);

Six-field cron quick reference

┌────────── second  (0–59)
│ ┌──────── minute  (0–59)
│ │ ┌────── hour    (0–23)
│ │ │ ┌──── day of month (1–31)
│ │ │ │ ┌── month   (1–12)
│ │ │ │ │ ┌ day of week (0–6, Sun..Sat, or MON–SUN)
* * * * * *
PatternMeaning
0 0 * * * *Top of every hour
0 */5 * * * *Every 5 minutes
*/15 * * * * *Every 15 seconds
0 0 0 * * *Daily at midnight
0 0 7 ? * MON-FRIWeekdays at 07:00
0 0 0 1 * ?First of every month

Operators: * (any), ? (no specific value), , (list), - (range), / (step). Sanity-check on crontab.guru — note crontab.guru is five-field POSIX, so drop the leading seconds field when pasting.

Restart-safe scheduling: external orchestrator

When jobs must survive process restarts, drive DuckDB from system cron(8) instead of using this extension. Example crontab entry:

*/5 * * * * /usr/local/bin/duckdb /var/data/analytics.db -c "CREATE OR REPLACE TABLE hourly_rollup AS SELECT ..."

Or a systemd timer, or a Kubernetes CronJob. Use the Cronjob extension only when DuckDB is already a long-running service and the schedule’s lifetime can equal the process’s.

Platform Support

Compatibility

Extension availability may vary by platform and DuckDB version. Check below to ensure this extension supports your environment before installation.

Quick Facts

Software License MIT
Pricing Free
Written In C++
Source Available Yes
View on GitHub
Usage
14,703+
loads in last 30 days

Platforms

Linux
Linux (musl)
macOS
Windows
WASM
Supported platform architectures
Linux: aarch64, x86_64
Linux (musl): x86_64
macOS: Apple Silicon, Intel
Windows: x86_64
WASM: eh, mvp, threads
Compiled binary sizes
Platform Architecture Size
Linux aarch64 9.15 MB
Linux x86_64 10.41 MB
Linux (musl) x86_64 11.01 MB
macOS Apple Silicon 7.19 MB
macOS Intel 8.18 MB
Windows x86_64 7.51 MB
WASM eh 18.3 KB
WASM mvp 16.0 KB
WASM threads 15.9 KB

Gzipped download size from the DuckDB community-extensions registry.

DuckDB Versions

Release calendar
Supported
v1.4.4 v1.5.2

Cron-style scheduling inside DuckDB

Install Cronjob to register repeating SQL tasks — refresh aggregates, export reports, run ANALYZE, fire webhooks — that tick along while your DuckDB process is running.