Shellfs
Use Unix pipes as DuckDB filenames. Append `|` to a path and DuckDB executes it as a shell pipeline and reads from stdout; prefix with `|` to write to stdin. Lets read_csv / read_json / COPY chain into curl, awk, custom scripts β anything that speaks stdio.
Install
-- Install the extension
INSTALL shellfs FROM community;
-- Load it into your session
LOAD shellfs;
-- Read from a shell pipeline
SELECT * FROM read_csv('seq 1 100 | grep 2 |');
-- Curl directly into a JSON read
SELECT * FROM read_json('curl -s http://worldtimeapi.org/api/timezone/Etc/UTC |');
-- Pipe out of DuckDB into another program
COPY (SELECT * FROM big_table) TO '| gzip > out.csv.gz'; Deep Dive
Technical Details
Architecture
The Shellfs extension registers a DuckDB FileSystem subclass that recognizes two patterns:
- Path ending in
|β DuckDB callspopen(cmd, "r")and reads bytes from the resulting pipe. - Path beginning with
|β DuckDB callspopen(cmd, "w")and writes bytes to the pipeβs stdin.
Everything else goes to the next filesystem in the chain (local disk, httpfs, aws, etc. as usual). The integration is one sentence of glue: any DuckDB function that takes a filename β read_csv, read_json, read_parquet, COPY ... TO, COPY ... FROM β accepts pipe paths transparently.
Why this exists
DuckDB ships with a strong built-in toolbox β HTTP, S3, parquet, JSON, etc. But the long tail is endless: a private API with custom auth, a tool that emits a proprietary text format, a one-off script that walks a directory tree. Rather than building a DuckDB extension for each, the Shellfs extension lets the existing Unix toolchain be the adapter.
Performance
Pipes are streaming β DuckDB starts processing rows as soon as the first chunk arrives, not after the upstream command finishes. For a seq 1 1_000_000_000 | you can LIMIT 100 and the upstream seq is killed via SIGPIPE without producing the rest.
popen overhead per pipeline is small (one fork+exec). The bottleneck is whatever your shell pipeline is doing, not the bridge.
Compared to alternatives
copy ... from program 'cmd'in PostgreSQL β same idea, more cumbersome syntax. TheShellfsextension slips into any DuckDB filename context.- Python
subprocess+ DataFrames β much heavier setup; you give up DuckDBβs vectorized execution against the streamed data. - DuckDB built-ins β use those first. Reach for the
Shellfsextension only when the data lives behind a tool DuckDB canβt speak natively.
Security
The same rule that applies to os.system(user_input) in Python applies here: never build pipe paths from untrusted strings. In a multi-tenant or hosted DuckDB environment, consider not loading the Shellfs extension at all, or restricting which roles can use it. See OWASP β Command Injection for the threat model.
Configuration
Settings
Configure the shellfs extension behavior using these pragma settings.
Practical Examples
Cookbook
Real-world recipes and patterns for common use cases.
Read a pipeline
-- Numbers 1..100 containing a "2"
SELECT * FROM read_csv('seq 1 100 | grep 2 |');
Chain multiple commands:
-- First two multiples of 7 in [1, 35]
SELECT * FROM read_csv('seq 1 35 | awk "\$1 % 7 == 0" | head -n 2 |');
Curl an HTTP API
For custom headers, query params, retries, or pre-processing β pipe through curl:
SELECT abbreviation, unixtime
FROM read_json('curl -s http://worldtimeapi.org/api/timezone/Etc/UTC |');
For programmatic HTTP from SQL without shelling out, see the http_client extension.
Drive a custom script
Any program that prints CSV (or JSON, or Parquet, etc.) to stdout becomes a queryable source β useful when you need bespoke parsing, schema-shaping, or to wrap an API DuckDB doesnβt speak natively:
# test-csv.py
print("counter1,counter2")
for i in range(10_000_000):
print(f"{i},{i}")
SELECT COUNT(DISTINCT counter1) FROM read_csv('./test-csv.py |');
Pipe out of DuckDB
The | prefix turns COPY into a streaming export:
-- Stream a query result through gzip
COPY (SELECT * FROM big_table) TO '| gzip > out.csv.gz';
-- Stream into a downstream program
COPY (SELECT * FROM events) TO '| ./load_into_kafka.sh'; Platform Support
Compatibility
Extension availability may vary by platform and DuckDB version. Check below to ensure this extension supports your environment before installation.
Quick Facts
Platforms
Supported platform architectures
Compiled binary sizes
| Platform | Architecture | Size |
|---|---|---|
| Linux | aarch64 | 2.64 MB |
| Linux | x86_64 | 3.00 MB |
| Linux (musl) | x86_64 | 2.26 MB |
| macOS | Apple Silicon | 1.79 MB |
| macOS | Intel | 2.10 MB |
| Windows | x86_64 | 7.50 MB |
Gzipped download size from the DuckDB community-extensions registry.
Pipe Anything Into DuckDB
Install the `Shellfs` extension to treat shell pipelines as readable / writable filenames β read_csv 'seq 1 100 | grep 2 |' just works.