🐚

Shellfs

Use Unix pipes as DuckDB filenames. Append `|` to a path and DuckDB executes it as a shell pipeline and reads from stdout; prefix with `|` to write to stdin. Lets read_csv / read_json / COPY chain into curl, awk, custom scripts β€” anything that speaks stdio.

Install

-- Install the extension
INSTALL shellfs FROM community;

-- Load it into your session
LOAD shellfs;

-- Read from a shell pipeline
SELECT * FROM read_csv('seq 1 100 | grep 2 |');

-- Curl directly into a JSON read
SELECT * FROM read_json('curl -s http://worldtimeapi.org/api/timezone/Etc/UTC |');

-- Pipe out of DuckDB into another program
COPY (SELECT * FROM big_table) TO '| gzip > out.csv.gz';

Deep Dive

Technical Details

Architecture

The Shellfs extension registers a DuckDB FileSystem subclass that recognizes two patterns:

  • Path ending in | β€” DuckDB calls popen(cmd, "r") and reads bytes from the resulting pipe.
  • Path beginning with | β€” DuckDB calls popen(cmd, "w") and writes bytes to the pipe’s stdin.

Everything else goes to the next filesystem in the chain (local disk, httpfs, aws, etc. as usual). The integration is one sentence of glue: any DuckDB function that takes a filename β€” read_csv, read_json, read_parquet, COPY ... TO, COPY ... FROM β€” accepts pipe paths transparently.

Why this exists

DuckDB ships with a strong built-in toolbox β€” HTTP, S3, parquet, JSON, etc. But the long tail is endless: a private API with custom auth, a tool that emits a proprietary text format, a one-off script that walks a directory tree. Rather than building a DuckDB extension for each, the Shellfs extension lets the existing Unix toolchain be the adapter.

Performance

Pipes are streaming β€” DuckDB starts processing rows as soon as the first chunk arrives, not after the upstream command finishes. For a seq 1 1_000_000_000 | you can LIMIT 100 and the upstream seq is killed via SIGPIPE without producing the rest.

popen overhead per pipeline is small (one fork+exec). The bottleneck is whatever your shell pipeline is doing, not the bridge.

Compared to alternatives

  • copy ... from program 'cmd' in PostgreSQL β€” same idea, more cumbersome syntax. The Shellfs extension slips into any DuckDB filename context.
  • Python subprocess + DataFrames β€” much heavier setup; you give up DuckDB’s vectorized execution against the streamed data.
  • DuckDB built-ins β€” use those first. Reach for the Shellfs extension only when the data lives behind a tool DuckDB can’t speak natively.

Security

Pipe paths execute arbitrary shell commands

The same rule that applies to os.system(user_input) in Python applies here: never build pipe paths from untrusted strings. In a multi-tenant or hosted DuckDB environment, consider not loading the Shellfs extension at all, or restricting which roles can use it. See OWASP β€” Command Injection for the threat model.

Configuration

Settings

Configure the shellfs extension behavior using these pragma settings.

ignore_sigpipe

Ignore SIGPIPE

Default Value: false

Practical Examples

Cookbook

Real-world recipes and patterns for common use cases.

Read a pipeline

-- Numbers 1..100 containing a "2"
SELECT * FROM read_csv('seq 1 100 | grep 2 |');

Chain multiple commands:

-- First two multiples of 7 in [1, 35]
SELECT * FROM read_csv('seq 1 35 | awk "\$1 % 7 == 0" | head -n 2 |');

Curl an HTTP API

For custom headers, query params, retries, or pre-processing β€” pipe through curl:

SELECT abbreviation, unixtime
FROM read_json('curl -s http://worldtimeapi.org/api/timezone/Etc/UTC |');

For programmatic HTTP from SQL without shelling out, see the http_client extension.

Drive a custom script

Any program that prints CSV (or JSON, or Parquet, etc.) to stdout becomes a queryable source β€” useful when you need bespoke parsing, schema-shaping, or to wrap an API DuckDB doesn’t speak natively:

# test-csv.py
print("counter1,counter2")
for i in range(10_000_000):
    print(f"{i},{i}")
SELECT COUNT(DISTINCT counter1) FROM read_csv('./test-csv.py |');

Pipe out of DuckDB

The | prefix turns COPY into a streaming export:

-- Stream a query result through gzip
COPY (SELECT * FROM big_table) TO '| gzip > out.csv.gz';

-- Stream into a downstream program
COPY (SELECT * FROM events) TO '| ./load_into_kafka.sh';

Platform Support

Compatibility

Extension availability may vary by platform and DuckDB version. Check below to ensure this extension supports your environment before installation.

Quick Facts

Software License MIT
Pricing Free
Written In C++
Source Available Yes
View on GitHub
Usage
73,834+
loads in last 30 days

Platforms

Linux
Linux (musl)
macOS
Windows
WASM
Supported platform architectures
Linux: x86_64, aarch64
Linux (musl): x86_64
macOS: Apple Silicon, Intel
Windows: x86_64
WASM: Not available
Compiled binary sizes
Platform Architecture Size
Linux aarch64 2.64 MB
Linux x86_64 3.00 MB
Linux (musl) x86_64 2.26 MB
macOS Apple Silicon 1.79 MB
macOS Intel 2.10 MB
Windows x86_64 7.50 MB

Gzipped download size from the DuckDB community-extensions registry.

DuckDB Versions

Release calendar
Supported
v1.4.4 v1.5.2

Pipe Anything Into DuckDB

Install the `Shellfs` extension to treat shell pipelines as readable / writable filenames β€” read_csv 'seq 1 100 | grep 2 |' just works.