🔐

Crypto

Cryptographic hashing, HMAC, and secure random bytes inside DuckDB. Adds BLAKE3, BLAKE2b, SHA-3, the SHA-2 family, MD5, and SHA-1 alongside an order-deterministic aggregate hash for dataset checksums and tamper-evident audit trails.

Install

-- Install the extension
INSTALL crypto FROM community;

-- Load it into your session
LOAD crypto;

-- SHA-256 of a string (BLOB output, hex it for display)
SELECT lower(to_hex(crypto_hash('sha2-256', 'hello world'))) AS sha256;

-- HMAC-SHA256
SELECT lower(to_hex(crypto_hmac('sha2-256', 'my-secret', 'message'))) AS hmac;

-- 32 cryptographically secure random bytes (e.g. an AES-256 key)
SELECT crypto_random_bytes(32) AS key;

-- Deterministic checksum of an ordered dataset
SELECT lower(to_hex(crypto_hash_agg('sha2-256', email ORDER BY email))) AS dataset_hash
FROM users;

Technical Overview

Cryptographic hashing, HMAC, and secure random — directly in SQL

DuckDB ships with hash() (non-cryptographic) and a built-in sha256(). The Crypto extension adds the full SHA-2 and SHA-3 families, BLAKE3 and BLAKE2b, HMAC, an order-deterministic aggregate hash, and a CSPRNG for keys, salts, and nonces. Scope is hashing and authentication only — no symmetric ciphers (AES), no password-hashing KDFs (bcrypt/scrypt/Argon2), no public-key signing.

🧮 What you get

Four functions, organized around the primitives most data pipelines actually need from cryptography in SQL.

  • Cryptographic hashes over any column: crypto_hash over VARCHAR, BLOB, every numeric type, BOOLEAN, DATE, TIME, TIMESTAMP, UUID, and lists of fixed-length types. Algorithms: BLAKE3, blake2b-512, the full SHA-2 family (224/256/384/512), the full SHA-3 family (224/256/384/512), Keccak (mapped to SHA-3), and the legacy MD4 / MD5 / SHA-1 for compatibility.
  • Order-deterministic dataset checksum: crypto_hash_agg folds a column into one digest in ORDER BY order. By construction, crypto_hash_agg('algo', col ORDER BY ord) produces the same bytes as crypto_hash('algo', LIST(col ORDER BY ord)) — so any client (Python, Go, Rust) that can build the equivalent ordered list can verify a DuckDB digest byte-for-byte.
  • Keyed message authentication: crypto_hmac implements HMAC with the same algorithm vocabulary as crypto_hash — for signing API requests, verifying webhook payloads, or per-row tamper detection. BLAKE3 keying requires a 32-byte key.
  • Secure random bytes: crypto_random_bytes returns bytes from OpenSSL's RAND_bytes — the same CSPRNG TLS implementations rely on. Use for HMAC keys, salts, nonces, or random IDs.

🧷 Type-aware, length-safe hashing

The hashing surface is intentionally type-sensitive42::INTEGER and 42::BIGINT produce different digests because their binary representations differ. This is the right behaviour for cryptographic use: a hash that collapses across logically distinct domains is a footgun.

  • Length-prefixed list elements: Variable-length elements (VARCHAR, BLOB) inside a list are hashed as [8-byte length][content]. This defends against length-extension-style ambiguity so that ['ab', 'c'] and ['a', 'bc'] produce distinct digests — both via crypto_hash on lists and via crypto_hash_agg over VARCHAR/BLOB columns.
  • Aggregate equals list: crypto_hash_agg('algo', col ORDER BY ord) is byte-equal to crypto_hash('algo', LIST(col ORDER BY ord)). The ORDER BY is mandatory — DuckDB's aggregate execution does not guarantee row order, so an unordered hash aggregate would be non-deterministic.
  • NULL hashes to NULL: crypto_hash and crypto_hmac propagate NULL. crypto_hash_agg returns NULL for empty groups. Treat NULL like any other column in your hash inputs — coalesce or cast it explicitly if you need a stable representation.
  • BLOB output, hex it for display: All three hash functions return BLOB. Wrap with lower(to_hex(...)) for a hex string, or compare BLOBs directly with = for byte-equality.

🛡️ Honest scope — what's not in this extension

The extension is deliberately narrow. If your problem needs primitives from a wider library, you'll need to combine this with something else.

  • No symmetric encryption: There is no crypto_encrypt / crypto_decrypt. The extension does not implement AES (see NIST FIPS 197) or any other cipher. Encrypt-at-rest belongs at storage / file level, not in row expressions.
  • No password-hashing KDFs: No bcrypt, scrypt, Argon2, or PBKDF2. For storing user passwords, follow the OWASP Password Storage Cheat Sheet and run a real KDF in your application layer — DuckDB SQL is the wrong place for that work. Salts can be generated here with crypto_random_bytes; the actual KDF should not.
  • No public-key signatures: No RSA / ECDSA / Ed25519. Only symmetric authentication via HMAC.
  • Legacy algorithms exposed for compatibility: MD4, MD5, and SHA-1 are present so you can interop with existing systems, but they are not cryptographically secure for new designs. See NIST SP 800-107 Rev. 1 for current guidance — prefer sha2-256, sha2-512, or blake3. MD4 may be disabled in modern OpenSSL builds.

🎯 Common use cases

Whole-dataset checksum / change detection

One crypto_hash_agg over a sorted column produces a digest that any other system can recompute and compare. The fastest path from "did this partition change?" to a one-row answer.

Hash-chained audit trail

Combine crypto_hash with LAG over an ordered audit log to build a Merkle-style hash chain. Tampering with any historical row breaks the chain at the first altered position. See the hash-chain recipe.

Stable surrogate keys from many columns

Hash a tuple of source columns with crypto_hash to mint deterministic, idempotent IDs that survive re-runs. Useful for content-addressed storage and dedup pipelines.

API request signing and verification

crypto_hmac with a shared secret signs outgoing payloads and verifies incoming ones, all in one SQL expression. Pair with crypto_random_bytes for the secret itself.

Deep Dive

Technical Details

What this extension adds

DuckDB ships with hash() (non-cryptographic), md5(), and sha256(). The Crypto extension adds the rest of what production cryptographic workloads need:

  • More algorithms — BLAKE3, BLAKE2b, the full SHA-2 family (224/256/384/512), the SHA-3 family (224/256/384/512), and Keccak variants.
  • More input types — every numeric type, VARCHAR, BLOB, BOOLEAN, DATE, TIME, TIMESTAMP, UUID, and lists of those — not just strings. Different DuckDB types hash to different digests by design.
  • HMAC with the same algorithm vocabulary — for keyed message authentication.
  • crypto_hash_agg — an order-deterministic aggregate hash for dataset checksums and change detection.
  • crypto_random_bytes — cryptographically secure random bytes from OpenSSL’s CSPRNG.

Cryptographic vs non-cryptographic hashing

Use Crypto when you need:

  • Tamper resistance — adversaries can’t construct collisions cheaply (BLAKE3, SHA-2, SHA-3).
  • Authentication — HMAC for verifying message integrity with a shared secret.
  • Stable cross-system fingerprints — agreed-upon algorithm output that other systems can compute identically (e.g. ETags, content-addressed storage).
  • Secure random — keys, salts, nonces.

Use Hashfuncs instead when you need:

  • Speed — xxHash and RapidHash run at near-RAM bandwidth with no cryptographic guarantees.
  • Hash partitioning, sharding, cache keys — collision risk on adversarial inputs is acceptable.
  • Bloom-filter / cuckoo-filter inputs — paired with bitfilters.

Both extensions can coexist; pick per use case.

Determinism and types

crypto_hash is deterministic for a given (algorithm, value, type) triple but is type-sensitive42::INTEGER and 42::BIGINT produce different hashes. This is the right behaviour for cryptographic use: it keeps the hash from collapsing across logically distinct domains.

Lists of VARCHAR or BLOB values include each element’s length (as a 64-bit integer) before its content, defending against length-extension-style ambiguity attacks.

NULLs hash to NULL — crypto_hash('sha2-256', NULL::VARCHAR) IS NULL returns true.

Aggregate hash semantics

crypto_hash_agg('algo', col ORDER BY ord) produces the same digest as crypto_hash('algo', LIST(col ORDER BY ord)). The ORDER BY is required — DuckDB’s query planner doesn’t guarantee row order, so an unordered aggregate would be non-deterministic.

This equivalence is useful when comparing datasets across systems: anything that can produce the equivalent ordered list (Python, Go, Rust) can compute the same digest and verify a DuckDB result.

Random byte generation

crypto_random_bytes(n) returns n bytes from OpenSSL’s RAND_bytes(). That’s the same CSPRNG used by TLS implementations, key generation libraries, and openssl rand.

  • Minimum length: 1 byte
  • Maximum length: 4,294,967,295 bytes (4 GB − 1, DuckDB’s BLOB cap)
  • Length 0 or negative raises InvalidInputException

Reference

Extension Contents

Quick reference to all available functions and settings organized by category.

Name Description
HMAC

Keyed message authentication codes. Use for signing API requests, verifying message integrity, or building tamper-evident audit trails — anywhere a hash needs a shared secret.

crypto_hmac() Hash-based Message Authentication Code (HMAC)
Hashing

Compute cryptographic digests of values, lists, and entire datasets. Supports BLAKE3, BLAKE2b, the SHA-2 / SHA-3 families, plus legacy MD4 / MD5 / SHA-1 for compatibility. The aggregate variant produces an order-deterministic dataset checksum.

crypto_hash() Compute a cryptographic hash of a value with the given algorithm
crypto_hash_agg() Order-deterministic aggregate hash
Random

Cryptographically secure random bytes from OpenSSL's CSPRNG. Use for encryption keys, salts, nonces, and any other cryptographic material that must be unpredictable.

crypto_random_bytes() Cryptographically secure random bytes from OpenSSL's RAND_bytes

API Reference

Function Documentation

Detailed documentation for each function including signatures, parameters, and examples.

crypto_hash

Scalar Function Hashing
Signature
crypto_hash(algorithm: VARCHAR, value: ANY) → BLOB
Parameters (Positional)
Parameter Type Mode Description
algorithm VARCHAR Positional
value ANY Positional
Returns
Description

Compute a cryptographic hash of a value with the given algorithm. Supported algorithms: blake2b-512, blake3, md4, md5, sha1, sha2-{224,256,384,512}, sha3-{224,256,384,512}, keccak{224,256,384,512}. Different DuckDB types of the same numeric value produce different hashes.

Examples
1

SHA-256 of a string, displayed as hex

SELECT lower(to_hex(crypto_hash('sha2-256', 'hello world')));

Output

lower(to_hex(crypto_hash('sha2-256', 'hello world')))
b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
2

BLAKE3 of arbitrary integer

SELECT lower(to_hex(crypto_hash('blake3', 42::INTEGER)));

Output

lower(to_hex(crypto_hash('blake3', CAST(42 AS INTEGER))))
95dbc3244503309f26dee9436b39568cc80b0cbe17ef409e9273c4edb58653fd
3

Hash a list — each VARCHAR/BLOB element's length is hashed before its content

SELECT lower(to_hex(crypto_hash('sha2-256', ['hello', 'world']::VARCHAR[])));

Output

lower(to_hex(crypto_hash('sha2-256', CAST(main.list_value('hello', 'world') AS VARCHAR[]))))
306a0d104017a29193be6c7464b1fd5ee65495353a7ccad7dd2928e5fb9731fd

crypto_hash_agg

Aggregate Function Hashing
Signature
crypto_hash_agg(algorithm: VARCHAR, value: BLOB | TINYINT | HUGEINT | …) → BLOB
Parameters (Positional)
Parameter Type Mode Description
algorithm VARCHAR Positional
value BLOB | TINYINT | HUGEINT | …
14 concrete types
BIGINTBLOBDOUBLEFLOATHUGEINTINTEGERSMALLINTTINYINTUBIGINTUHUGEINTUINTEGERUSMALLINTUTINYINTVARCHAR
Positional
Returns
Description

Order-deterministic aggregate hash. Hashes column values sequentially in the ORDER BY order — yielding the same digest as crypto_hash over an equivalent ordered list. Useful for dataset checksums, change detection, and Merkle-style hierarchical hashing.

Examples
1

Per-department checksum of employee IDs

SELECT department,
       lower(to_hex(crypto_hash_agg('sha2-256', employee_id ORDER BY employee_id))) AS dept_hash
FROM employees
GROUP BY department;
2

Aggregate hash matches list hash exactly when input is the same ordered set

SELECT crypto_hash_agg('sha2-256', value ORDER BY value)
     = crypto_hash('sha2-256', [1, 2, 3, 4, 5]::INTEGER[])
FROM (VALUES (1),(2),(3),(4),(5)) t(value);

Output

(crypto_hash_agg('sha2-256', "value" ORDER BY "value") = crypto_hash('sha2-256', CAST(main.list_value(1, 2, 3, 4, 5) AS INTEGER[])))
true

crypto_hmac

Scalar Function HMAC
Signature
crypto_hmac(algorithm: VARCHAR, key: VARCHAR, message: VARCHAR) → BLOB
Parameters (Positional)
Parameter Type Mode Description
algorithm VARCHAR Positional
key VARCHAR Positional
message VARCHAR Positional
Returns
Description

Hash-based Message Authentication Code (HMAC). Combines a secret key with a message under a specified hash algorithm — used for message integrity, API request signing, and timing-safe authentication.

Examples
1

HMAC-SHA256 over a message with a secret key

SELECT lower(to_hex(crypto_hmac('sha2-256', 'my-secret-key', 'important message')));

Output

lower(to_hex(crypto_hmac('sha2-256', 'my-secret-key', 'important message')))
97f324adef061b4ad0abeb6be543913d7db6ba8e6e7f33cd3c4395d619b56df4

crypto_random_bytes

Scalar Function Random
Signature
crypto_random_bytes(length: BIGINT) → BLOB
Parameters (Positional)
Parameter Type Mode Description
length BIGINT Positional
Returns
Description

Cryptographically secure random bytes from OpenSSL's RAND_bytes. Suitable for keys, salts, nonces, and other cryptographic material. Length must be between 1 and 4 GB − 1.

Examples
1

32 random bytes — an AES-256 key

SELECT crypto_random_bytes(32);

Output

crypto_random_bytes(32)
a35ef22ac875491a6cd476345f91a8daef58bfa1c417eed32a98a0d7eda7f9e9
2

16-byte salt displayed as hex

SELECT lower(to_hex(crypto_random_bytes(16)));

Output

lower(to_hex(crypto_random_bytes(16)))
3f8483715eda5adabf0851568cb83642

Practical Examples

Cookbook

Real-world recipes and patterns for common use cases.

Hash a Column

crypto_hash returns a BLOB. For human-readable output, hex-encode it:

SELECT id, lower(to_hex(crypto_hash('sha2-256', email))) AS email_hash
FROM users;

Different DuckDB types of the same numeric value hash differently — 42::INTEGER, 42::BIGINT, and '42'::VARCHAR all produce distinct digests. That’s intentional; it keeps the hash from collapsing across logically distinct values.

Sign a Message

HMAC combines a secret key with a message under a hash algorithm:

SELECT lower(to_hex(crypto_hmac('sha2-256', :secret, request_body))) AS signature
FROM api_requests;

Compare signatures byte-for-byte (or in DuckDB, =) to verify authenticity. The verification side performs the same HMAC and rejects on mismatch.

Generate a Salt

-- 16-byte random salt for password hashing
SELECT crypto_random_bytes(16) AS salt;

The bytes come from OpenSSL’s CSPRNG — suitable for cryptographic use. Length is unbounded up to 4 GB − 1.

Dataset Checksum

crypto_hash_agg is the aggregate counterpart of crypto_hash. Hashing all values in a column in a specified order produces a single deterministic digest you can use for change detection or to verify a partition matches a known-good baseline:

-- Whole-table checksum
SELECT lower(to_hex(crypto_hash_agg('sha2-256', email ORDER BY email)))
FROM users;
-- Per-partition checksum — matches across systems if rows are identical
SELECT day,
       lower(to_hex(crypto_hash_agg('sha2-256', payload ORDER BY id))) AS day_hash
FROM events
GROUP BY day
ORDER BY day;

The ORDER BY is required — without it, hashes would be non-deterministic across query plans.

Tamper-Evident Audit Log

Build a hash chain by hashing each row’s content together with the previous row’s hash:

WITH chained AS (
  SELECT
    id,
    action,
    user_id,
    created_at,
    LAG(hash) OVER (ORDER BY created_at) AS prev_hash,
    crypto_hash('sha2-256',
      [CAST(id AS VARCHAR), action, CAST(user_id AS VARCHAR), CAST(created_at AS VARCHAR),
       COALESCE(LAG(hash) OVER (ORDER BY created_at)::VARCHAR, '')]
    ) AS hash
  FROM audit_log
)
SELECT * FROM chained ORDER BY created_at;

Verify the chain later by recomputing each hash and comparing — any tampering breaks the chain at the first altered row.

Stable IDs from Multiple Columns

Generate a deterministic ID from any combination of columns:

SELECT
  customer_email,
  product_sku,
  purchase_date,
  -- stable surrogate key, idempotent across re-runs
  lower(to_hex(crypto_hash('sha2-256',
    [customer_email, product_sku, CAST(purchase_date AS VARCHAR)]
  ))) AS purchase_id
FROM raw_purchases;

Pair with hashfuncs when you want the same property but cheaper (non-cryptographic) for partitioning, sharding, or cache keys.

Algorithm Reference

AlgorithmOutputNotes
blake332 BModern, fast — recommended default
blake2b-51264 BStrong all-rounder, no BLAKE3 hardware needed
sha2-25632 BIndustry standard SHA-2
sha2-51264 BLarger SHA-2 output
sha3-25632 BKeccak-based, post-SHA-2 standard
keccak25632 BOriginal Keccak (mapped to SHA3-256 here)
md516 BLegacy — fast but not cryptographically secure
sha120 BLegacy — not cryptographically secure

Platform Support

Compatibility

Extension availability may vary by platform and DuckDB version. Check below to ensure this extension supports your environment before installation.

Quick Facts

Software License MIT
Pricing Free
Written In C++
Source Available Yes
View on GitHub
Usage
106,925+
loads in last 30 days

Platforms

Linux
Linux (musl)
macOS
Windows
WASM
Supported platform architectures
Linux: x86_64, aarch64
Linux (musl): x86_64
macOS: Intel, Apple Silicon
Windows: x86_64
WASM: Not available
Compiled binary sizes
Platform Architecture Size
Linux aarch64 5.22 MB
Linux x86_64 5.28 MB
Linux (musl) x86_64 5.05 MB
macOS Apple Silicon 3.65 MB
macOS Intel 3.64 MB
Windows x86_64 9.04 MB

Gzipped download size from the DuckDB community-extensions registry.

DuckDB Versions

Release calendar
Supported
v1.4.4 v1.5.2

Cryptographic Hashing in SQL

Install the Crypto extension to compute hashes, HMACs, and secure random bytes against any DuckDB column — including aggregate hashes for dataset checksums.