Crypto
Cryptographic hashing, HMAC, and secure random bytes inside DuckDB. Adds BLAKE3, BLAKE2b, SHA-3, the SHA-2 family, MD5, and SHA-1 alongside an order-deterministic aggregate hash for dataset checksums and tamper-evident audit trails.
Install
-- Install the extension
INSTALL crypto FROM community;
-- Load it into your session
LOAD crypto;
-- SHA-256 of a string (BLOB output, hex it for display)
SELECT lower(to_hex(crypto_hash('sha2-256', 'hello world'))) AS sha256;
-- HMAC-SHA256
SELECT lower(to_hex(crypto_hmac('sha2-256', 'my-secret', 'message'))) AS hmac;
-- 32 cryptographically secure random bytes (e.g. an AES-256 key)
SELECT crypto_random_bytes(32) AS key;
-- Deterministic checksum of an ordered dataset
SELECT lower(to_hex(crypto_hash_agg('sha2-256', email ORDER BY email))) AS dataset_hash
FROM users; Technical Overview
Cryptographic hashing, HMAC, and secure random — directly in SQL
DuckDB ships with hash() (non-cryptographic) and a built-in sha256(). The Crypto extension adds the full SHA-2 and SHA-3 families, BLAKE3 and BLAKE2b, HMAC, an order-deterministic aggregate hash, and a CSPRNG for keys, salts, and nonces. Scope is hashing and authentication only — no symmetric ciphers (AES), no password-hashing KDFs (bcrypt/scrypt/Argon2), no public-key signing.
🧮 What you get
Four functions, organized around the primitives most data pipelines actually need from cryptography in SQL.
- • Cryptographic hashes over any column:
crypto_hashoverVARCHAR,BLOB, every numeric type,BOOLEAN,DATE,TIME,TIMESTAMP,UUID, and lists of fixed-length types. Algorithms: BLAKE3,blake2b-512, the full SHA-2 family (224/256/384/512), the full SHA-3 family (224/256/384/512), Keccak (mapped to SHA-3), and the legacy MD4 / MD5 / SHA-1 for compatibility. - • Order-deterministic dataset checksum:
crypto_hash_aggfolds a column into one digest inORDER BYorder. By construction,crypto_hash_agg('algo', col ORDER BY ord)produces the same bytes ascrypto_hash('algo', LIST(col ORDER BY ord))— so any client (Python, Go, Rust) that can build the equivalent ordered list can verify a DuckDB digest byte-for-byte. - • Keyed message authentication:
crypto_hmacimplements HMAC with the same algorithm vocabulary ascrypto_hash— for signing API requests, verifying webhook payloads, or per-row tamper detection. BLAKE3 keying requires a 32-byte key. - • Secure random bytes:
crypto_random_bytesreturns bytes from OpenSSL'sRAND_bytes— the same CSPRNG TLS implementations rely on. Use for HMAC keys, salts, nonces, or random IDs.
🧷 Type-aware, length-safe hashing
The hashing surface is intentionally type-sensitive — 42::INTEGER and 42::BIGINT produce different digests because their binary representations differ. This is the right behaviour for cryptographic use: a hash that collapses across logically distinct domains is a footgun.
- • Length-prefixed list elements: Variable-length elements (
VARCHAR,BLOB) inside a list are hashed as[8-byte length][content]. This defends against length-extension-style ambiguity so that['ab', 'c']and['a', 'bc']produce distinct digests — both viacrypto_hashon lists and viacrypto_hash_aggoverVARCHAR/BLOBcolumns. - • Aggregate equals list:
crypto_hash_agg('algo', col ORDER BY ord)is byte-equal tocrypto_hash('algo', LIST(col ORDER BY ord)). TheORDER BYis mandatory — DuckDB's aggregate execution does not guarantee row order, so an unordered hash aggregate would be non-deterministic. - • NULL hashes to NULL:
crypto_hashandcrypto_hmacpropagate NULL.crypto_hash_aggreturns NULL for empty groups. Treat NULL like any other column in your hash inputs — coalesce or cast it explicitly if you need a stable representation. - • BLOB output, hex it for display: All three hash functions return
BLOB. Wrap withlower(to_hex(...))for a hex string, or compare BLOBs directly with=for byte-equality.
🛡️ Honest scope — what's not in this extension
The extension is deliberately narrow. If your problem needs primitives from a wider library, you'll need to combine this with something else.
- • No symmetric encryption: There is no
crypto_encrypt/crypto_decrypt. The extension does not implement AES (see NIST FIPS 197) or any other cipher. Encrypt-at-rest belongs at storage / file level, not in row expressions. - • No password-hashing KDFs: No bcrypt, scrypt, Argon2, or PBKDF2. For storing user passwords, follow the OWASP Password Storage Cheat Sheet and run a real KDF in your application layer — DuckDB SQL is the wrong place for that work. Salts can be generated here with
crypto_random_bytes; the actual KDF should not. - • No public-key signatures: No RSA / ECDSA / Ed25519. Only symmetric authentication via HMAC.
- • Legacy algorithms exposed for compatibility: MD4, MD5, and SHA-1 are present so you can interop with existing systems, but they are not cryptographically secure for new designs. See NIST SP 800-107 Rev. 1 for current guidance — prefer
sha2-256,sha2-512, orblake3. MD4 may be disabled in modern OpenSSL builds.
🎯 Common use cases
Whole-dataset checksum / change detection
One crypto_hash_agg over a sorted column produces a digest that any other system can recompute and compare. The fastest path from "did this partition change?" to a one-row answer.
Hash-chained audit trail
Combine crypto_hash with LAG over an ordered audit log to build a Merkle-style hash chain. Tampering with any historical row breaks the chain at the first altered position. See the hash-chain recipe.
Stable surrogate keys from many columns
Hash a tuple of source columns with crypto_hash to mint deterministic, idempotent IDs that survive re-runs. Useful for content-addressed storage and dedup pipelines.
API request signing and verification
crypto_hmac with a shared secret signs outgoing payloads and verifies incoming ones, all in one SQL expression. Pair with crypto_random_bytes for the secret itself.
Deep Dive
Technical Details
What this extension adds
DuckDB ships with hash() (non-cryptographic), md5(), and sha256(). The Crypto extension adds the rest of what production cryptographic workloads need:
- More algorithms — BLAKE3, BLAKE2b, the full SHA-2 family (224/256/384/512), the SHA-3 family (224/256/384/512), and Keccak variants.
- More input types — every numeric type, VARCHAR, BLOB, BOOLEAN, DATE, TIME, TIMESTAMP, UUID, and lists of those — not just strings. Different DuckDB types hash to different digests by design.
- HMAC with the same algorithm vocabulary — for keyed message authentication.
crypto_hash_agg— an order-deterministic aggregate hash for dataset checksums and change detection.crypto_random_bytes— cryptographically secure random bytes from OpenSSL’s CSPRNG.
Cryptographic vs non-cryptographic hashing
Use Crypto when you need:
- Tamper resistance — adversaries can’t construct collisions cheaply (BLAKE3, SHA-2, SHA-3).
- Authentication — HMAC for verifying message integrity with a shared secret.
- Stable cross-system fingerprints — agreed-upon algorithm output that other systems can compute identically (e.g. ETags, content-addressed storage).
- Secure random — keys, salts, nonces.
Use Hashfuncs instead when you need:
- Speed — xxHash and RapidHash run at near-RAM bandwidth with no cryptographic guarantees.
- Hash partitioning, sharding, cache keys — collision risk on adversarial inputs is acceptable.
- Bloom-filter / cuckoo-filter inputs — paired with
bitfilters.
Both extensions can coexist; pick per use case.
Determinism and types
crypto_hash is deterministic for a given (algorithm, value, type) triple but is type-sensitive — 42::INTEGER and 42::BIGINT produce different hashes. This is the right behaviour for cryptographic use: it keeps the hash from collapsing across logically distinct domains.
Lists of VARCHAR or BLOB values include each element’s length (as a 64-bit integer) before its content, defending against length-extension-style ambiguity attacks.
NULLs hash to NULL — crypto_hash('sha2-256', NULL::VARCHAR) IS NULL returns true.
Aggregate hash semantics
crypto_hash_agg('algo', col ORDER BY ord) produces the same digest as crypto_hash('algo', LIST(col ORDER BY ord)). The ORDER BY is required — DuckDB’s query planner doesn’t guarantee row order, so an unordered aggregate would be non-deterministic.
This equivalence is useful when comparing datasets across systems: anything that can produce the equivalent ordered list (Python, Go, Rust) can compute the same digest and verify a DuckDB result.
Random byte generation
crypto_random_bytes(n) returns n bytes from OpenSSL’s RAND_bytes(). That’s the same CSPRNG used by TLS implementations, key generation libraries, and openssl rand.
- Minimum length: 1 byte
- Maximum length: 4,294,967,295 bytes (4 GB − 1, DuckDB’s BLOB cap)
- Length 0 or negative raises
InvalidInputException
Reference
Extension Contents
Quick reference to all available functions and settings organized by category.
| Name | Description | |
|---|---|---|
| HMAC Keyed message authentication codes. Use for signing API requests, verifying message integrity, or building tamper-evident audit trails — anywhere a hash needs a shared secret. | ||
| crypto_hmac() | Hash-based Message Authentication Code (HMAC) | |
| Hashing Compute cryptographic digests of values, lists, and entire datasets. Supports BLAKE3, BLAKE2b, the SHA-2 / SHA-3 families, plus legacy MD4 / MD5 / SHA-1 for compatibility. The aggregate variant produces an order-deterministic dataset checksum. | ||
| crypto_hash() | Compute a cryptographic hash of a value with the given algorithm | |
| crypto_hash_agg() | Order-deterministic aggregate hash | |
| Random Cryptographically secure random bytes from OpenSSL's CSPRNG. Use for encryption keys, salts, nonces, and any other cryptographic material that must be unpredictable. | ||
| crypto_random_bytes() | Cryptographically secure random bytes from OpenSSL's RAND_bytes | |
API Reference
Function Documentation
Detailed documentation for each function including signatures, parameters, and examples.
crypto_hash
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
algorithm | VARCHAR | Positional | |
value | ANY | Positional |
Returns
Description
Compute a cryptographic hash of a value with the given algorithm. Supported algorithms: blake2b-512, blake3, md4, md5, sha1, sha2-{224,256,384,512}, sha3-{224,256,384,512}, keccak{224,256,384,512}. Different DuckDB types of the same numeric value produce different hashes.
Examples
SHA-256 of a string, displayed as hex
SELECT lower(to_hex(crypto_hash('sha2-256', 'hello world'))); Output
| lower(to_hex(crypto_hash('sha2-256', 'hello world'))) |
|---|
| b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9 |
BLAKE3 of arbitrary integer
SELECT lower(to_hex(crypto_hash('blake3', 42::INTEGER))); Output
| lower(to_hex(crypto_hash('blake3', CAST(42 AS INTEGER)))) |
|---|
| 95dbc3244503309f26dee9436b39568cc80b0cbe17ef409e9273c4edb58653fd |
Hash a list — each VARCHAR/BLOB element's length is hashed before its content
SELECT lower(to_hex(crypto_hash('sha2-256', ['hello', 'world']::VARCHAR[]))); Output
| lower(to_hex(crypto_hash('sha2-256', CAST(main.list_value('hello', 'world') AS VARCHAR[])))) |
|---|
| 306a0d104017a29193be6c7464b1fd5ee65495353a7ccad7dd2928e5fb9731fd |
Related Functions
crypto_hash_agg
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
algorithm | VARCHAR | Positional | |
value | BLOB | TINYINT | HUGEINT | … 14 concrete typesBIGINTBLOBDOUBLEFLOATHUGEINTINTEGERSMALLINTTINYINTUBIGINTUHUGEINTUINTEGERUSMALLINTUTINYINTVARCHAR | Positional |
Returns
Description
Order-deterministic aggregate hash. Hashes column values sequentially in the ORDER BY order — yielding the same digest as crypto_hash over an equivalent ordered list. Useful for dataset checksums, change detection, and Merkle-style hierarchical hashing.
Examples
Per-department checksum of employee IDs
SELECT department,
lower(to_hex(crypto_hash_agg('sha2-256', employee_id ORDER BY employee_id))) AS dept_hash
FROM employees
GROUP BY department; Aggregate hash matches list hash exactly when input is the same ordered set
SELECT crypto_hash_agg('sha2-256', value ORDER BY value)
= crypto_hash('sha2-256', [1, 2, 3, 4, 5]::INTEGER[])
FROM (VALUES (1),(2),(3),(4),(5)) t(value); Output
| (crypto_hash_agg('sha2-256', "value" ORDER BY "value") = crypto_hash('sha2-256', CAST(main.list_value(1, 2, 3, 4, 5) AS INTEGER[]))) |
|---|
| true |
crypto_hmac
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
algorithm | VARCHAR | Positional | |
key | VARCHAR | Positional | |
message | VARCHAR | Positional |
Returns
Description
Hash-based Message Authentication Code (HMAC). Combines a secret key with a message under a specified hash algorithm — used for message integrity, API request signing, and timing-safe authentication.
Examples
HMAC-SHA256 over a message with a secret key
SELECT lower(to_hex(crypto_hmac('sha2-256', 'my-secret-key', 'important message'))); Output
| lower(to_hex(crypto_hmac('sha2-256', 'my-secret-key', 'important message'))) |
|---|
| 97f324adef061b4ad0abeb6be543913d7db6ba8e6e7f33cd3c4395d619b56df4 |
crypto_random_bytes
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
length | BIGINT | Positional |
Returns
Description
Cryptographically secure random bytes from OpenSSL's RAND_bytes. Suitable for keys, salts, nonces, and other cryptographic material. Length must be between 1 and 4 GB − 1.
Examples
32 random bytes — an AES-256 key
SELECT crypto_random_bytes(32); Output
| crypto_random_bytes(32) |
|---|
| a35ef22ac875491a6cd476345f91a8daef58bfa1c417eed32a98a0d7eda7f9e9 |
16-byte salt displayed as hex
SELECT lower(to_hex(crypto_random_bytes(16))); Output
| lower(to_hex(crypto_random_bytes(16))) |
|---|
| 3f8483715eda5adabf0851568cb83642 |
Practical Examples
Cookbook
Real-world recipes and patterns for common use cases.
Hash a Column
crypto_hash returns a BLOB. For human-readable output, hex-encode it:
SELECT id, lower(to_hex(crypto_hash('sha2-256', email))) AS email_hash
FROM users;
Different DuckDB types of the same numeric value hash differently — 42::INTEGER, 42::BIGINT, and '42'::VARCHAR all produce distinct digests. That’s intentional; it keeps the hash from collapsing across logically distinct values.
Sign a Message
HMAC combines a secret key with a message under a hash algorithm:
SELECT lower(to_hex(crypto_hmac('sha2-256', :secret, request_body))) AS signature
FROM api_requests;
Compare signatures byte-for-byte (or in DuckDB, =) to verify authenticity. The verification side performs the same HMAC and rejects on mismatch.
Generate a Salt
-- 16-byte random salt for password hashing
SELECT crypto_random_bytes(16) AS salt;
The bytes come from OpenSSL’s CSPRNG — suitable for cryptographic use. Length is unbounded up to 4 GB − 1.
Dataset Checksum
crypto_hash_agg is the aggregate counterpart of crypto_hash. Hashing all values in a column in a specified order produces a single deterministic digest you can use for change detection or to verify a partition matches a known-good baseline:
-- Whole-table checksum
SELECT lower(to_hex(crypto_hash_agg('sha2-256', email ORDER BY email)))
FROM users;
-- Per-partition checksum — matches across systems if rows are identical
SELECT day,
lower(to_hex(crypto_hash_agg('sha2-256', payload ORDER BY id))) AS day_hash
FROM events
GROUP BY day
ORDER BY day;
The ORDER BY is required — without it, hashes would be non-deterministic across query plans.
Tamper-Evident Audit Log
Build a hash chain by hashing each row’s content together with the previous row’s hash:
WITH chained AS (
SELECT
id,
action,
user_id,
created_at,
LAG(hash) OVER (ORDER BY created_at) AS prev_hash,
crypto_hash('sha2-256',
[CAST(id AS VARCHAR), action, CAST(user_id AS VARCHAR), CAST(created_at AS VARCHAR),
COALESCE(LAG(hash) OVER (ORDER BY created_at)::VARCHAR, '')]
) AS hash
FROM audit_log
)
SELECT * FROM chained ORDER BY created_at;
Verify the chain later by recomputing each hash and comparing — any tampering breaks the chain at the first altered row.
Stable IDs from Multiple Columns
Generate a deterministic ID from any combination of columns:
SELECT
customer_email,
product_sku,
purchase_date,
-- stable surrogate key, idempotent across re-runs
lower(to_hex(crypto_hash('sha2-256',
[customer_email, product_sku, CAST(purchase_date AS VARCHAR)]
))) AS purchase_id
FROM raw_purchases;
Pair with hashfuncs when you want the same property but cheaper (non-cryptographic) for partitioning, sharding, or cache keys.
Algorithm Reference
| Algorithm | Output | Notes |
|---|---|---|
blake3 | 32 B | Modern, fast — recommended default |
blake2b-512 | 64 B | Strong all-rounder, no BLAKE3 hardware needed |
sha2-256 | 32 B | Industry standard SHA-2 |
sha2-512 | 64 B | Larger SHA-2 output |
sha3-256 | 32 B | Keccak-based, post-SHA-2 standard |
keccak256 | 32 B | Original Keccak (mapped to SHA3-256 here) |
md5 | 16 B | Legacy — fast but not cryptographically secure |
sha1 | 20 B | Legacy — not cryptographically secure |
Platform Support
Compatibility
Extension availability may vary by platform and DuckDB version. Check below to ensure this extension supports your environment before installation.
Quick Facts
Platforms
Supported platform architectures
Compiled binary sizes
| Platform | Architecture | Size |
|---|---|---|
| Linux | aarch64 | 5.22 MB |
| Linux | x86_64 | 5.28 MB |
| Linux (musl) | x86_64 | 5.05 MB |
| macOS | Apple Silicon | 3.65 MB |
| macOS | Intel | 3.64 MB |
| Windows | x86_64 | 9.04 MB |
Gzipped download size from the DuckDB community-extensions registry.
Cryptographic Hashing in SQL
Install the Crypto extension to compute hashes, HMACs, and secure random bytes against any DuckDB column — including aggregate hashes for dataset checksums.