Crypto DuckDB Extension
The Crypto extension, developed by Query.Farm, adds cryptographic hash functions and HMAC to DuckDB.
While DuckDB already includes basic hash functions like hash() and sha256(), this extension provides additional algorithms including Blake3, SHA-3, and supports hashing of various data types beyond just strings.
Installation
crypto is a DuckDB Community Extension.
To install and use the extension, run these SQL commands in your DuckDB session:
INSTALL crypto FROM community;
LOAD crypto;Thatโs it! The extension is now ready to use.
Quick Start
-- Hash a string with SHA-256 (returns BLOB, convert to hex for readability)
SELECT lower(to_hex(crypto_hash('sha2-256', 'hello world')));
-- Result: b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
-- Hash an integer
SELECT lower(to_hex(crypto_hash('sha2-256', 42)));
-- Result: e8a4b2ee7ede79a3afb332b5b6cc3d952a65fd8cffb897f5d18016577c33d7cc
-- Calculate HMAC with a secret key
SELECT lower(to_hex(crypto_hmac('sha2-256', 'my-secret-key', 'important message')));
-- Result: 97f324adef061b4ad0abeb6be543913d7db6ba8e6e7f33cd3c4395d619b56df4Hash Functions
crypto_hash()
Syntax:
crypto_hash(algorithm, value) โ BLOBComputes a cryptographic hash of the input value using the specified algorithm.
Parameters:
algorithm(VARCHAR): The hash algorithm name (see supported algorithms below)value: The value to hash (supports multiple data types)
Returns: BLOB containing the raw hash bytes
Supported Data Types:
Strings:
VARCHAR,BLOBIntegers:
TINYINT,SMALLINT,INTEGER,BIGINT,HUGEINT,UTINYINT,USMALLINT,UINTEGER,UBIGINTFloating point:
FLOAT,DOUBLEOther:
BOOLEAN,DATE,TIME,TIMESTAMP,UUIDLists: Arrays of fixed-length types (e.g.,
INTEGER[],VARCHAR[],BLOB[])- Important: When hashing lists of
VARCHARorBLOB, each elementโs length (as a 64-bit integer) is hashed before its content to prevent length extension attacks
- Important: When hashing lists of
Note: Different data types with the same value will produce different hashes (e.g., 42::INTEGER vs 42::BIGINT vs '42'::VARCHAR).
crypto_hash_agg()
Syntax:
crypto_hash_agg(algorithm, value ORDER BY sort_expression) โ BLOBAn aggregate function that computes a cryptographic hash over multiple rows of data. This is useful for creating checksums of entire datasets, detecting changes in groups of records, or generating deterministic identifiers for sets of values.
Parameters:
algorithm(VARCHAR): The hash algorithm name (same algorithms ascrypto_hash)value: The column/expression to hash (supports same data types ascrypto_hash)ORDER BY: Required - ensures deterministic ordering of values before hashing
Returns: BLOB containing the raw hash bytes, or NULL for empty result sets
Important Notes:
- The
ORDER BYclause is mandatory because hash aggregation is order-dependent - Values are hashed sequentially in the order specified by
ORDER BY - For
VARCHARandBLOBtypes, each valueโs length is hashed before its content (same as list hashing) - The function produces the same hash as
crypto_hash()would produce for an equivalent list - Empty result sets return
NULL
Use Cases:
- Dataset Checksums: Verify data integrity across tables or partitions
- Change Detection: Detect if any values in a group have changed
- Merkle-like Hashing: Create hierarchical hashes of grouped data
- Deterministic IDs: Generate stable identifiers for sets of values
Supported Hash Algorithms
| Algorithm | Output Size | Description |
|---|---|---|
blake2b-512 |
64 bytes | BLAKE2b with 512-bit output |
blake3 |
32 bytes | BLAKE3, a modern cryptographic hash |
md4 |
16 bytes | MD4 (deprecated, may not work on some systems) |
md5 |
16 bytes | MD5 (not cryptographically secure) |
sha1 |
20 bytes | SHA-1 (not cryptographically secure) |
sha2-224 |
28 bytes | SHA-2 family with 224-bit output |
sha2-256 |
32 bytes | SHA-2 family with 256-bit output |
sha2-384 |
48 bytes | SHA-2 family with 384-bit output |
sha2-512 |
64 bytes | SHA-2 family with 512-bit output |
sha3-224 |
28 bytes | SHA-3 family with 224-bit output |
sha3-256 |
32 bytes | SHA-3 family with 256-bit output |
sha3-384 |
48 bytes | SHA-3 family with 384-bit output |
sha3-512 |
64 bytes | SHA-3 family with 512-bit output |
keccak224 |
28 bytes | Keccak-224 (mapped to SHA3-224) |
keccak256 |
32 bytes | Keccak-256 (mapped to SHA3-256) |
keccak384 |
48 bytes | Keccak-384 (mapped to SHA3-384) |
keccak512 |
64 bytes | Keccak-512 (mapped to SHA3-512) |
Note: Keccak variants are mapped to their SHA-3 equivalents in this implementation.
Examples
-- Hash a string with different algorithms
SELECT lower(to_hex(crypto_hash('sha2-256', 'test')));
-- 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
SELECT lower(to_hex(crypto_hash('blake3', 'test')));
-- 4878ca0425c739fa427f7eda20fe845f6b2e46ba5fe2a14df5b1e32f50603215
SELECT lower(to_hex(crypto_hash('md5', 'test')));
-- 098f6bcd4621d373cade4e832627b4f6
-- Hash different data types
SELECT lower(to_hex(crypto_hash('sha2-256', 42::INTEGER)));
-- e8a4b2ee7ede79a3afb332b5b6cc3d952a65fd8cffb897f5d18016577c33d7cc
SELECT lower(to_hex(crypto_hash('sha2-256', 3.14::DOUBLE)));
-- 2ee9194f7fa84ec9aec9742f02ba1a7f76b6b61b6ecf961a925fa9b4a67b22aa
SELECT lower(to_hex(crypto_hash('sha2-256', true::BOOLEAN)));
SELECT lower(to_hex(crypto_hash('sha2-256', DATE '2024-01-01')));
SELECT lower(to_hex(crypto_hash('sha2-256', UUID '550e8400-e29b-41d4-a716-446655440000')));
-- Hash a list of values
SELECT lower(to_hex(crypto_hash('sha2-256', [1, 2, 3, 4, 5]::INTEGER[])));
-- 4f6addc9659d6fb90fe94b6688a79f2a1fa8d36ec43f8f3e1d9b6528c448a384
SELECT lower(to_hex(crypto_hash('sha2-256', ['hello', 'world']::VARCHAR[])));
-- 306a0d104017a29193be6c7464b1fd5ee65495353a7ccad7dd2928e5fb9731fd
-- Hash data in a table column
CREATE TABLE users (id INTEGER, email VARCHAR);
INSERT INTO users VALUES (1, 'alice@example.com'), (2, 'bob@example.com');
SELECT id, lower(to_hex(crypto_hash('sha2-256', email))) as email_hash FROM users;
-- Get raw binary output size
SELECT octet_length(crypto_hash('sha2-256', 'test'));
-- 32
-- Handle NULL values
SELECT crypto_hash('sha2-256', NULL::VARCHAR) IS NULL;
-- true
-- Aggregate hash over multiple rows (requires ORDER BY)
SELECT lower(to_hex(crypto_hash_agg('sha2-256', email ORDER BY email)))
FROM users;
-- Produces a single hash representing all email values in order
-- Aggregate hash with grouping
SELECT
department,
lower(to_hex(crypto_hash_agg('sha2-256', employee_id ORDER BY employee_id))) as dept_hash
FROM employees
GROUP BY department;
-- Produces a hash for each department's employee IDs
-- Verify aggregate produces same hash as list
SELECT crypto_hash_agg('sha2-256', value ORDER BY value) =
crypto_hash('sha2-256', [1, 2, 3, 4, 5]::INTEGER[])
FROM (VALUES (1), (2), (3), (4), (5)) t(value);
-- true (aggregate hash matches list hash)HMAC Functions
crypto_hmac()
Syntax:
crypto_hmac(algorithm, key, message) โ BLOBComputes an HMAC (Hash-based Message Authentication Code) using the specified algorithm, secret key, and message.
Parameters:
algorithm(VARCHAR): The hash algorithm name (same algorithms ascrypto_hash)key(VARCHAR/BLOB): The secret key for HMAC calculationmessage(VARCHAR/BLOB): The message to authenticate
Returns: BLOB containing the raw HMAC bytes
Special Requirements:
- Blake3: Requires a key of exactly 32 bytes (will raise an error otherwise)
Examples
-- Basic HMAC with SHA-256
SELECT lower(to_hex(crypto_hmac('sha2-256', 'secret-key', 'message')));
-- 287a3bd8a4fc7731a94c722079055323644d8798bd291bf9878abc9b8fd4b1d0
-- HMAC with different algorithms
SELECT lower(to_hex(crypto_hmac('sha2-512', 'key', 'The quick brown fox jumps over the lazy dog')));
-- b42af09057bac1e2d41708e48a902e09b5ff7f12ab428a4fe86653c73dd248fb82f948a549f7b791a5b41915ee4d1ec3935357e4e2317250d0372afa2ebeeb3a
SELECT lower(to_hex(crypto_hmac('sha3-256', 'key', 'message')));
-- 0f43852a24d5597a8200312a95993991581679d63264f1b1ad4b5ccac7fe8ba4
-- Blake3 HMAC (requires exactly 32-byte key)
SELECT lower(to_hex(crypto_hmac('blake3', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'message')));
-- HMAC with empty key or message
SELECT lower(to_hex(crypto_hmac('sha2-256', '', 'message')));
SELECT lower(to_hex(crypto_hmac('sha2-256', 'key', '')));
-- Verify HMAC in authentication scenarios
CREATE TABLE api_requests (user_id INTEGER, data VARCHAR, received_hmac BLOB);
SELECT
user_id,
crypto_hmac('sha2-256', 'shared-secret', data) = received_hmac AS is_valid
FROM api_requests;Common Use Cases
Generating Unique IDs
-- Generate unique IDs from multiple columns
SELECT
lower(to_hex(crypto_hash('sha2-256', id::VARCHAR || email || created_at::VARCHAR))) AS unique_id
FROM users;Data Deduplication
-- Find duplicate content by hash
SELECT
lower(to_hex(crypto_hash('blake3', content))) AS content_hash,
count(*) AS duplicate_count
FROM documents
GROUP BY content_hash
HAVING count(*) > 1;API Request Signing
-- Sign API requests with HMAC
SELECT
request_id,
lower(to_hex(crypto_hmac('sha2-256', 'api-secret-key', request_body))) AS signature
FROM api_requests;Hashing Sensitive Data
-- Hash email addresses for privacy
SELECT
id,
lower(to_hex(crypto_hash('sha2-256', email))) AS email_hash
FROM users;Dataset Integrity Verification
-- Create a checksum for an entire table partition
SELECT
partition_date,
lower(to_hex(crypto_hash_agg('blake3', transaction_id ORDER BY transaction_id))) AS partition_checksum
FROM transactions
GROUP BY partition_date;
-- Detect changes in a dataset by comparing checksums
WITH current_hash AS (
SELECT crypto_hash_agg('sha2-256', data ORDER BY id) AS hash
FROM critical_table
)
SELECT hash = '\x<expected_hash_value>'::BLOB AS data_unchanged
FROM current_hash;Merkle-Style Hierarchical Hashing
-- Create hierarchical hashes for efficient change detection
-- Level 1: Hash individual user transactions
WITH user_hashes AS (
SELECT
user_id,
crypto_hash_agg('sha2-256', transaction_id ORDER BY timestamp) AS user_hash
FROM transactions
GROUP BY user_id
)
-- Level 2: Hash all user hashes to get global hash
SELECT
lower(to_hex(crypto_hash_agg('sha2-256', user_hash ORDER BY user_id))) AS global_hash
FROM user_hashes;Important Notes
Output Format:
crypto_hash(),crypto_hash_agg(), andcrypto_hmac()all return raw binary data asBLOB. Useto_hex()to convert to hexadecimal strings, orlower(to_hex(...))for lowercase hex.Type Sensitivity: The hash is computed on the binary representation of the data type. The same numeric value with different types will produce different hashes:
SELECT crypto_hash('sha2-256', 42::INTEGER) != crypto_hash('sha2-256', 42::BIGINT); -- true (different hashes)NULL Handling:
crypto_hash()andcrypto_hmac()returnNULLif the input value isNULL.crypto_hash_agg()returnsNULLfor empty result sets.List and Aggregate Hashing with Length Encoding:
Applies to both
crypto_hash()when hashing lists andcrypto_hash_agg()when aggregating valuesFor fixed-length types (integers, floats, dates, etc.), only the raw binary data is hashed
For variable-length types (
VARCHARandBLOB), each element is hashed as:[8-byte length][content]The length is encoded as a 64-bit unsigned integer (uint64_t) in native byte order
This prevents length extension attacks where
['ab', 'c']would otherwise hash the same as['a', 'bc']Example:
-- These produce different hashes due to length encoding SELECT lower(to_hex(crypto_hash('sha2-256', ['ab', 'c']::VARCHAR[]))); -- 43ee655579de01ca739b3f95c1c2d3f46d353b2c0df818064ea594506cdb2617 SELECT lower(to_hex(crypto_hash('sha2-256', ['a', 'bc']::VARCHAR[]))); -- 9a8acca1b6c6c0befd3fbc756aed625da998c998f7252e738c4ef061906b9b21 -- Different hashes prove length encoding prevents collisions -- Same applies to aggregate function SELECT lower(to_hex(crypto_hash_agg('sha2-256', data ORDER BY data))) FROM (VALUES ('ab'), ('c')) t(data); -- Produces different hash than ['a', 'bc']
Security Considerations:
- MD4, MD5, and SHA-1 are not cryptographically secure and should not be used for security purposes
- For modern applications, use SHA-2 (sha2-256, sha2-512) or Blake3
- For HMAC operations, use a strong, randomly generated secret key
- Blake3 HMAC requires exactly a 32-byte key
Aggregate Function Requirements:
crypto_hash_agg()requires anORDER BYclause to ensure deterministic resultsWithout
ORDER BY, the function will raise an errorThe aggregate produces the same hash as
crypto_hash()would for an equivalent ordered listExample:
-- This works - produces same hash as list [1,2,3,4,5] SELECT crypto_hash_agg('sha2-256', value ORDER BY value) FROM (VALUES (5), (2), (1), (4), (3)) t(value); -- This fails - ORDER BY is required SELECT crypto_hash_agg('sha2-256', value) FROM (VALUES (1), (2)) t(value); -- Error: Hash aggregation requires a distinct total ordering
Algorithm Availability: MD4 is deprecated and may be disabled in modern OpenSSL builds.
Comparison with Built-in DuckDB Functions
DuckDB has built-in hash() and sha256() functions, but this extension provides:
- More hash algorithms (Blake3, SHA-3, SHA-512, etc.)
- Support for hashing multiple data types beyond strings
- HMAC calculation capabilities
- Standard cryptographic implementations via OpenSSL and Blake3
License
This extension uses OpenSSL for most cryptographic operations and includes the Blake3 hash implementation. This extension uses the MIT license.
Love โค๏ธ this DuckDB extension? Youโll Love This.
Get the best from Query.Farm โ smart tips, powerful tools, and project updates sent directly to your inbox, but only when weโve got something great to share.