-- Install the extension
INSTALL example FROM community;
-- Load it into your session
LOAD example;
Technical Overview
Why Use Crypto Functions?
Understanding when and how to leverage cryptographic capabilities in your data workflows.
🔒 Data Security & Compliance
Modern data systems must handle sensitive information securely. This extension provides enterprise-grade cryptographic primitives directly within DuckDB, eliminating the need to export sensitive data to external systems for encryption or hashing.
•Password Storage: Use bcrypt hashing for secure password storage with automatic salt generation and configurable work factors.
•PII Protection: Hash or encrypt personally identifiable information (PII) to meet GDPR, CCPA, and HIPAA requirements.
•Data Masking: Create consistent, irreversible hashes for data anonymization in analytics and testing environments.
✓ Data Integrity & Verification
Ensure data hasn't been tampered with by generating cryptographic fingerprints and audit trails. Detect unauthorized modifications and maintain chain-of-custody for critical data.
•Blockchain-Style Auditing: Build immutable audit trails using hash chaining to detect any tampering with historical records.
•Aggregate Verification: Generate Merkle roots and aggregate hashes to verify entire datasets with a single value.
•Checksum Validation: Quickly validate data integrity during ETL pipelines and data transfers.
âš¡ Performance & Scalability
Native DuckDB extensions leverage columnar processing and vectorized execution for high-performance cryptographic operations on large datasets without moving data out of the database.
•In-Database Processing: Process millions of rows without data serialization or network overhead.
•Vectorized Operations: Benefit from DuckDB's SIMD optimizations for batch hashing and encryption.
•Zero-Copy Architecture: Avoid expensive data transfers between database and application layers.
🎯 Common Use Cases
Financial Services
Transaction verification, audit trails for compliance, and secure customer data handling.
Healthcare Systems
HIPAA-compliant patient data encryption, secure medical record hashing, and audit logging.
Data Engineering
ETL pipeline checksums, data deduplication via hashing, and secure data lake operations.
SaaS Applications
Multi-tenant data isolation, API key management, and secure password authentication.
Deep Dive
Technical Details
Algorithm Support
The crypto extension supports a wide range of cryptographic algorithms for different use cases. Below is a summary of the supported algorithms and their characteristics.
Hash Functions
Algorithm
Output Size
Security Level
Use Case
MD5
128 bits
Legacy
Checksums only (not cryptographically secure)
SHA-1
160 bits
Legacy
Compatibility with older systems
SHA-256
256 bits
Strong
General purpose hashing
SHA-512
512 bits
Strong
High security applications
BLAKE3
256 bits
Strong
High performance hashing
Deprecation Notice
MD5 and SHA-1 are provided for compatibility only. Do not use them for security-sensitive applications. Use SHA-256 or BLAKE3 for new projects.
Encryption Algorithms
Algorithm
Key Size
Mode
Performance
AES-128
128 bits
GCM
Fast
AES-256
256 bits
GCM
Moderate
ChaCha20-Poly1305
256 bits
AEAD
Very Fast
Algorithm Usage Distribution
Based on typical workloads, here’s how different algorithms are used in production:
Performance Benchmarks
The following benchmarks were performed on a dataset of 1 million rows using DuckDB’s vectorized execution engine.
-- Benchmark: Hash 1M rowsSELECT COUNT(*) FROM ( SELECT crypto_sha256(data) as hash FROM test_data);-- Result: ~450ms for 1M rows
Throughput Comparison
Operation
Rows/Second
MB/Second
SHA-256
2.2M
180
BLAKE3
4.5M
360
AES-256-GCM Encrypt
1.1M
90
AES-256-GCM Decrypt
1.2M
95
Architecture
The crypto extension integrates seamlessly with DuckDB’s query execution pipeline, leveraging vectorized processing for maximum performance.
Decision Flow: Choosing the Right Algorithm
Algorithm Selection Guide
Implementation Details
Vectorized Processing
All cryptographic operations in this extension are fully vectorized, meaning they process data in batches of ~2000 rows at a time. This approach maximizes CPU cache utilization and enables SIMD optimizations.
Info
Vectorized execution is automatic - you don’t need to do anything special to take advantage of it. Just write your SQL queries as usual.
-- Example: Batch hashing with saltSELECT id, crypto_sha256(CONCAT(salt, ':', password)) as password_hashFROM users;
Memory Management
The extension uses DuckDB’s native memory allocator and follows these principles:
Zero-copy operations - Input data is processed in-place when possible
Buffer reuse - Intermediate buffers are reused across operations
Streaming support - Large files are processed in chunks
Security Considerations
When using cryptographic functions in a database context, consider the following:
Key Management: Never store encryption keys in the same database as encrypted data
Salt Generation: Always use cryptographically secure random salts
Algorithm Selection: Use SHA-256 or better for new applications
Danger
Never hardcode encryption keys or passwords in your SQL queries. Use DuckDB secrets or environment variables for sensitive credentials.
-- Secure password hashing exampleSELECT crypto_bcrypt_hash( password, crypto_random_bytes(16), -- Random salt 12 -- Work factor) as secure_hashFROM new_users;
Tip
For bcrypt, a work factor of 12 provides a good balance between security and performance. Increase it for highly sensitive applications, but test the impact on query time.
Reference
Extension Contents
Quick reference to all available functions and settings organized by category.
Default: true Include column headers in the output
QUOTE
VARCHAR
Named
Default: " Quote character for string fields
Returns
Description
Exports query results to an encrypted CSV file. The CSV file is encrypted before being written to disk, protecting sensitive data in a portable text format. Useful for secure data exchange with systems that require CSV input.
Examples
1
Export encrypted customer data to CSV
COPY (
SELECT customer_id, name, email, phone
FROM customers
WHERE region = 'US'
)
TO 'encrypted_customers.csv'
WITH (
ENCRYPTION_KEY 'YourSecureKey123',
HEADER true,
DELIMITER ','
);
Destination file path for the encrypted Parquet file
Options
Available Options
Parameter
Type
Mode
Description
COMPRESSION
VARCHAR
Named
Default: 'SNAPPY' Compression codec: 'SNAPPY', 'GZIP', 'ZSTD', or 'NONE'
ENCRYPTION_ALGORITHM
VARCHAR
Named
Default: 'AES-256' Encryption algorithm to use: 'AES-256' or 'AES-128'
ENCRYPTION_KEY
VARCHAR
Named
Base64-encoded AES-256 encryption key for client-side encryption
ROW_GROUP_SIZE
INTEGER
Named
Default: 100000 Number of rows per Parquet row group
Returns
Description
Exports query results to an encrypted Parquet file with client-side encryption. The entire file is encrypted at rest using AES-256, ensuring data confidentiality during storage and transit. Compatible with standard Parquet readers when decryption keys are provided.
Examples
1
Export encrypted user data to Parquet
COPY (
SELECT user_id, email, created_at, subscription_tier
FROM users
WHERE created_at > '2024-01-01'
)
TO 'encrypted_users.parquet'
WITH (
ENCRYPTION_KEY 'YourBase64EncodedKey==',
COMPRESSION 'ZSTD'
);
2
Export with custom row group size for large datasets
COPY (
SELECT transaction_id, amount, timestamp, customer_id
FROM transactions
WHERE amount > 1000
)
TO 's3://my-bucket/encrypted-transactions.parquet'
WITH (
ENCRYPTION_KEY 'YourBase64EncodedKey==',
ROW_GROUP_SIZE 500000,
COMPRESSION 'SNAPPY'
);
Default: 'ARRAY' JSON format: 'ARRAY' (array of objects) or 'NEWLINE_DELIMITED' (JSONL)
PRETTY
BOOLEAN
Named
Default: false Pretty-print the JSON output with indentation
SIGNING_KEY
VARCHAR
Named
HMAC signing key for generating digital signatures
Returns
Description
Exports query results to a cryptographically signed JSON file. Each record is signed with HMAC-SHA256, enabling tamper detection and data integrity verification. The signature is included in the output, allowing consumers to verify authenticity.
Examples
1
Export signed API response data
COPY (
SELECT api_key, rate_limit, created_at, permissions
FROM api_keys
WHERE status = 'active'
)
TO 'signed_api_keys.json'
WITH (
SIGNING_KEY 'your-hmac-secret-key',
FORMAT 'ARRAY',
PRETTY true
);
2
Export newline-delimited signed events
COPY (
SELECT event_id, event_type, user_id, timestamp, payload
FROM audit_events
WHERE timestamp > CURRENT_DATE - INTERVAL 7 DAYS
)
TO 'signed_events.jsonl'
WITH (
SIGNING_KEY 'audit-signing-key',
FORMAT 'NEWLINE_DELIMITED'
);
Creates a cryptographic audit trail for database operations using blockchain-style hash chaining. Each record contains a hash of its data plus the previous record's hash, making tampering detectable.
Examples
1
Generate audit trail for financial transactions
SELECT * FROM crypto_audit_trail('financial_transactions', 'INSERT') LIMIT 5;
WITH audit AS (
SELECT * FROM crypto_audit_trail('sensitive_data', 'UPDATE')
)
SELECT
sequence_num,
record_hash,
chain_valid,
CASE
WHEN NOT chain_valid THEN 'TAMPERING DETECTED!'
ELSE 'Valid'
END AS status
FROM audit
WHERE NOT chain_valid;
Default: 10 Work factor (4-31, higher is slower/more secure)
Returns
Bcrypt hash string containing salt and hash (60 characters)
Description
Hashes a password using the bcrypt algorithm with automatic salt generation. Recommended for password storage. The cost parameter controls computational cost (2^cost iterations).
Examples
1
Hash password with default cost (10)
INSERT INTO users (username, password_hash)
VALUES ('alice', crypto_bcrypt('secret_password'));
Default: 2048 RSA key size in bits (1024, 2048, or 4096)
Returns
A table with the following columns:
Column
Type
Description
key_id
INTEGER
Unique identifier for the key pair
public_key
VARCHAR
PEM-encoded public key
private_key
VARCHAR
PEM-encoded private key
created_at
TIMESTAMP
Generation timestamp
Description
Generates RSA public/private key pairs in bulk. Useful for provisioning multiple users or services with cryptographic keys. Returns a table where each row contains one complete key pair.
Examples
1
Generate 10 RSA key pairs with default 2048-bit keys
SELECT * FROM crypto_generate_keys(10) LIMIT 3;
Output
key_id
public_key
private_key
created_at
1
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEF...
SELECT * FROM crypto_generate_keys(5, key_size := 4096);
3
Store generated keys in a table
CREATE TABLE user_keys AS
SELECT
key_id,
public_key,
private_key,
created_at
FROM crypto_generate_keys(100);
4
Generate keys and immediately assign to users
INSERT INTO user_keypairs (user_id, public_key, private_key)
SELECT
u.user_id,
k.public_key,
k.private_key
FROM users u
CROSS JOIN LATERAL crypto_generate_keys(1) k
WHERE u.needs_keypair = true;
Default: sha256 Hash algorithm to use ('sha256' or 'sha512')
Returns
Aggregated hash combining all input values
Description
Aggregates multiple values into a single cryptographic hash. Order-independent - produces the same hash regardless of row order. Useful for creating fingerprints of entire datasets or groups.
Examples
1
Generate fingerprint for all users
SELECT crypto_hash_agg(email) AS users_fingerprint
FROM users;
2
Hash all transaction IDs for a daily batch
SELECT
DATE(created_at) AS batch_date,
crypto_hash_agg(transaction_id) AS batch_hash
FROM transactions
GROUP BY DATE(created_at);
3
Use SHA-512 for higher security
SELECT crypto_hash_agg(sensitive_field, algorithm := 'sha512') AS secure_hash
FROM sensitive_table;
4
Detect data changes by comparing hashes
WITH current_hash AS (
SELECT crypto_hash_agg(data_column) AS hash FROM production_table
),
backup_hash AS (
SELECT crypto_hash_agg(data_column) AS hash FROM backup_table
)
SELECT
CASE
WHEN c.hash = b.hash THEN 'Data matches'
ELSE 'Data has changed!'
END AS status
FROM current_hash c, backup_hash b;
Varargs
One or more values to combine into a single hash. Values are concatenated in order before hashing.
Returns
64-character hexadecimal SHA-256 hash of all combined values
Description
Combines multiple values into a single cryptographic hash. Accepts any number of arguments, concatenates them in order, and produces a SHA-256 hash. Useful for creating composite keys, fingerprinting multiple columns, or generating deterministic IDs from multiple fields.
Examples
1
Combine multiple fields into a single hash
SELECT crypto_hash_combine(first_name, last_name, email) AS user_fingerprint
FROM users;
Creates an HMAC (Hash-based Message Authentication Code) signature using SHA-256. Used for API authentication, webhook verification, and message integrity.
Examples
1
Sign API requests with HMAC
SELECT
request_id,
payload,
crypto_hmac_sha256(payload, 'my_secret_key') AS signature
FROM api_requests;
The value to include in the Merkle tree (from each row)
Returns
Merkle root hash (64-character hex string)
Description
Computes a Merkle tree root hash for a set of values. Like blockchain transaction verification - any change to input data produces a different root. More efficient than concatenating all values for large datasets.
Examples
1
Calculate Merkle root for a block of transactions
SELECT
block_id,
crypto_merkle_root(transaction_hash) AS merkle_root
FROM blockchain_transactions
GROUP BY block_id;
2
Verify data integrity using Merkle root
WITH daily_blocks AS (
SELECT
DATE(timestamp) AS date,
crypto_merkle_root(record_id::VARCHAR) AS merkle_root
FROM audit_log
GROUP BY DATE(timestamp)
)
SELECT
date,
merkle_root,
LAG(merkle_root) OVER (ORDER BY date) AS previous_root
FROM daily_blocks;
3
Create tamper-evident file manifest
SELECT crypto_merkle_root(file_hash) AS manifest_hash
FROM uploaded_files
WHERE upload_batch_id = 'batch_123';
64-character hexadecimal string representing the SHA-256 hash
Description
Computes the SHA-256 cryptographic hash of the input string. SHA-256 is part of the SHA-2 family and produces a 256-bit (32-byte) hash value. Commonly used for data integrity verification and anonymization.
Creates a hash chain for audit trail records by combining the current record hash with the previous hash. Ensures tamper-evident logging where any modification breaks the chain.
Macro Definition
CREATE MACRO audit_hash_chain(current_data, previous_hash) AS crypto_sha256(CAST(current_data AS VARCHAR) || COALESCE(previous_hash, ''));
Examples
1
Build tamper-evident audit log
WITH audit_chain AS (
SELECT
id,
action,
user_id,
created_at,
LAG(hash) OVER (ORDER BY created_at) AS prev_hash,
audit_hash_chain(
STRUCT_PACK(id, action, user_id, created_at),
LAG(hash) OVER (ORDER BY created_at)
) AS hash
FROM audit_log
)
SELECT * FROM audit_chain
ORDER BY created_at;
2
Verify audit trail integrity
WITH audit_chain AS (
SELECT
*,
LAG(hash) OVER (ORDER BY created_at) AS prev_hash,
audit_hash_chain(
STRUCT_PACK(id, action, user_id, created_at),
LAG(hash) OVER (ORDER BY created_at)
) AS computed_hash
FROM audit_log
)
SELECT
id,
action,
CASE
WHEN hash = computed_hash THEN 'Valid'
ELSE 'Tampered'
END AS integrity_status
FROM audit_chain;
Simplified field encryption macro using AES-256. Wraps complex encryption operations into an easy-to-use function for encrypting sensitive data at rest.
Macro Definition
CREATE MACRO encrypt_field(plaintext, key) AS encode(crypto_aes_encrypt(CAST(plaintext AS VARCHAR), key));
Examples
1
Encrypt sensitive fields before storage
INSERT INTO secure_storage (id, encrypted_ssn, encrypted_credit_card)
SELECT
id,
encrypt_field(ssn, 'your-encryption-key') AS encrypted_ssn,
encrypt_field(credit_card, 'your-encryption-key') AS encrypted_credit_card
FROM sensitive_data;
Generates a cryptographic fingerprint for an entire row by concatenating all column values and hashing them. Useful for change detection, deduplication, and data integrity verification.
Macro Definition
CREATE MACRO fingerprint_row(cols) AS crypto_sha256(CAST(cols AS VARCHAR));
Examples
1
Detect changed records
SELECT
id,
fingerprint_row(STRUCT_PACK(*)) AS row_hash,
name,
email,
updated_at
FROM users
WHERE updated_at > CURRENT_DATE - INTERVAL 7 DAYS;
2
Deduplicate records based on content
WITH fingerprinted AS (
SELECT
*,
fingerprint_row(STRUCT_PACK(name, email, phone)) AS content_hash
FROM contacts
)
SELECT DISTINCT ON (content_hash)
name,
email,
phone
FROM fingerprinted
ORDER BY content_hash, created_at DESC;
Convenience macro for hashing personally identifiable information (PII) fields using SHA-256. Simplifies anonymization workflows by providing a shorthand for common hashing operations.
Macro Definition
CREATE MACRO hash_pii(field) AS crypto_sha256(CAST(field AS VARCHAR));
Examples
1
Anonymize PII fields in a query
SELECT
hash_pii(email) AS user_hash,
hash_pii(phone) AS phone_hash,
order_total,
created_at
FROM orders
LIMIT 10;
Output
user_hash
phone_hash
order_total
created_at
a591a6d40bf42040...
f7c3bc1d808e04732...
125.5
2024-01-15
3c59dc048e88461f...
1679091c5a880faf6...
89.99
2024-01-16
2
Create anonymized export
COPY (
SELECT
hash_pii(customer_email) AS customer_id,
hash_pii(customer_phone) AS phone_id,
product_category,
purchase_amount
FROM sales
WHERE created_at >= '2024-01-01'
) TO 'anonymized_sales.parquet' (FORMAT PARQUET);
Timing-safe string comparison macro that prevents timing attacks. Uses constant-time comparison to ensure the comparison duration does not leak information about the strings being compared.
Macro Definition
CREATE MACRO secure_compare(a, b) AS crypto_hmac_sha256(CAST(a AS VARCHAR), 'compare') = crypto_hmac_sha256(CAST(b AS VARCHAR), 'compare');
Examples
1
Safely compare sensitive tokens
SELECT
token_id,
secure_compare(stored_token, input_token) AS is_valid
FROM api_tokens
WHERE user_id = 12345;
Logs all cryptographic operations performed by the extension including hashing, encryption, and key generation. Captures operation type, input size, algorithm used, and execution time for security auditing and performance monitoring.
Examples
1
Enable crypto operation logging
-- Enable logging for crypto operations
SET log_crypto_operations = true;
-- Perform some crypto operations
SELECT crypto_sha256('test data');
SELECT crypto_bcrypt_hash('password123');
SELECT crypto_hmac('message', 'secret', 'sha256');
-- View the log entries
SELECT * FROM duckdb_logs() WHERE log_type = 'CryptoOperationLog';
Tracks all access to encryption keys and secrets managed by the crypto extension. Records key identifiers, access timestamps, and the context of key usage for compliance and security monitoring.
Examples
1
Track key access events
-- Enable key access logging
SET log_key_access = true;
-- Use keys for encryption
SELECT crypto_encrypt(data, 'key_id_123') FROM sensitive_table;
-- View key access logs
SELECT * FROM duckdb_logs() WHERE log_type = 'KeyAccessLog';
Detects and logs potential hash collisions when using fingerprinting or deduplication features. Helps identify data quality issues and potential security concerns in hash-based operations.
Examples
1
Monitor for hash collisions
-- Enable collision detection logging
SET log_hash_collisions = true;
-- Perform batch hashing with collision detection
SELECT
id,
crypto_sha256(data) as hash,
COUNT(*) OVER (PARTITION BY crypto_sha256(data)) as collision_count
FROM large_dataset;
-- Check collision log
SELECT * FROM duckdb_logs() WHERE log_type = 'HashCollisionLog';
Send log events to Security Information and Event Management (SIEM) systems like Splunk, Elastic, or Datadog. Supports HTTP/HTTPS endpoints with authentication and batching for efficient log transmission.
Configuration Parameters
Parameter
Type
Required
Description
endpoint
VARCHAR
Required
SIEM HTTP endpoint URL for log ingestion
api_key
VARCHAR
Required
API key or authentication token for SIEM access
batch_size
INTEGER
Optional
Default: 100 Number of log events to batch before sending
batch_timeout_ms
INTEGER
Optional
Default: 5000 Maximum time to wait before sending partial batch
compression
BOOLEAN
Optional
Default: true Enable gzip compression for log transmission
Examples
1
Configure SIEM log storage for Datadog
-- Configure log storage to send to Datadog
SET log_storage = 'SIEM';
SET log_storage_config = '{
"endpoint": "https://http-intake.logs.datadoghq.com/v1/input",
"api_key": "your_datadog_api_key",
"batch_size": 50,
"batch_timeout_ms": 3000,
"compression": true
}';
-- Enable crypto operation logging
SET log_crypto_operations = true;
-- All crypto operations will now be sent to Datadog
SELECT crypto_sha256('test');
Output
Log storage configured. Crypto operations will be sent to Datadog SIEM.
Store log events in Amazon S3 buckets with automatic partitioning by date and log type. Supports encryption at rest and configurable retention policies.
Configuration Parameters
Parameter
Type
Required
Description
bucket
VARCHAR
Required
S3 bucket name for log storage
prefix
VARCHAR
Optional
Default: duckdb-logs/ S3 key prefix for organizing logs
region
VARCHAR
Required
AWS region where the bucket is located
format
VARCHAR
Optional
Default: parquet Log file format (json, parquet, csv)
partition_by
VARCHAR
Optional
Default: date Partitioning strategy (date, hour, logtype)
-- Set up S3 log storage
SET log_storage = 'S3';
SET log_storage_config = '{
"bucket": "my-company-duckdb-logs",
"prefix": "production/crypto-logs/",
"region": "us-east-1",
"format": "parquet",
"partition_by": "date",
"encryption": "SSE-KMS"
}';
-- Logs will be written to:
-- s3://my-company-duckdb-logs/production/crypto-logs/date=2024-12-04/CryptoOperationLog.parquet
SELECT crypto_sha256('test');
Store log events in a PostgreSQL database table with full SQL query capabilities. Ideal for centralized logging across multiple DuckDB instances with long-term retention.
Configuration Parameters
Parameter
Type
Required
Description
connection_string
VARCHAR
Required
PostgreSQL connection string
table_name
VARCHAR
Optional
Default: duckdb_logs Table name for storing logs
auto_create_table
BOOLEAN
Optional
Default: true Automatically create table if it does not exist
async_write
BOOLEAN
Optional
Default: true Write logs asynchronously to avoid blocking queries
Examples
1
Store logs in PostgreSQL for centralized monitoring
-- Configure PostgreSQL log storage
SET log_storage = 'PostgreSQL';
SET log_storage_config = '{
"connection_string": "postgresql://user:pass@localhost:5432/logs",
"table_name": "duckdb_crypto_logs",
"auto_create_table": true,
"async_write": true
}';
-- All logs now written to PostgreSQL
SET log_crypto_operations = true;
SELECT crypto_bcrypt_hash('password');
-- Query logs from PostgreSQL
ATTACH 'logs_db' (TYPE POSTGRES, connection_string 'postgresql://user:pass@localhost:5432/logs');
SELECT * FROM logs_db.duckdb_crypto_logs ORDER BY timestamp DESC LIMIT 10;
Attach to PostgreSQL databases with automatic client-side encryption and decryption. All data transferred between DuckDB and PostgreSQL is encrypted using AES-256, protecting sensitive information in transit and at rest.
Parameters
Parameter
Type
Required
Description
host
VARCHAR
Required
PostgreSQL server hostname or IP address
port
INTEGER
Optional
Default: 5432 PostgreSQL server port number
database
VARCHAR
Required
Name of the PostgreSQL database to attach
user
VARCHAR
Required
PostgreSQL username for authentication
password
VARCHAR
Optional
PostgreSQL password (can use secrets instead)
encryption_key
VARCHAR
Required
Base64-encoded AES-256 encryption key for data encryption
Connect to MySQL databases with transparent encryption layer. Provides client-side encryption for all data transfers, ensuring sensitive information remains protected even when the MySQL server is compromised.
Parameters
Parameter
Type
Required
Description
host
VARCHAR
Required
MySQL server hostname or IP address
port
INTEGER
Optional
Default: 3306 MySQL server port number
database
VARCHAR
Required
Name of the MySQL database to attach
user
VARCHAR
Required
MySQL username for authentication
password
VARCHAR
Optional
MySQL password (can use secrets instead)
encryption_key
VARCHAR
Required
Base64-encoded AES-256 encryption key
Examples
1
Attach encrypted MySQL database
ATTACH 'analytics_db' (
TYPE ENCRYPTED_MYSQL,
host 'mysql.example.com',
database 'analytics',
user 'analyst',
password 'password123',
encryption_key 'your_base64_key=='
);
-- Access encrypted tables
SELECT COUNT(*) FROM analytics_db.transactions;
Attach to databases using credentials dynamically fetched from HashiCorp Vault. Eliminates hardcoded credentials and provides automatic credential rotation, audit logging, and centralized secret management.
Parameters
Parameter
Type
Required
Description
vault_addr
VARCHAR
Required
HashiCorp Vault server address (e.g., https://vault.example.com:8200)
vault_token
VARCHAR
Required
Vault authentication token or AppRole credentials
secret_path
VARCHAR
Required
Path to database credentials in Vault (e.g., secret/data/postgres/prod)
db_type
VARCHAR
Required
Database type (postgres, mysql, sqlite, etc.)
refresh_interval
INTEGER
Optional
Default: 3600 Credential refresh interval in seconds for automatic rotation
Examples
1
Attach database using Vault-managed credentials
-- Attach using Vault for credential management
ATTACH 'prod_db' (
TYPE VAULT_DB,
vault_addr 'https://vault.company.com:8200',
vault_token 'hvs.CAESIJ...',
secret_path 'secret/data/databases/production',
db_type 'postgres'
);
-- Credentials automatically rotated based on Vault policies
SELECT * FROM prod_db.customers;
2
Use Vault AppRole authentication
-- Attach with AppRole for service authentication
ATTACH 'analytics' (
TYPE VAULT_DB,
vault_addr 'https://vault.company.com:8200',
vault_token 'role_id=xxx&secret_id=yyy',
secret_path 'database/creds/analytics-role',
db_type 'postgres',
refresh_interval 1800
);
AWS S3 credentials for encrypting and decrypting data stored in S3 buckets. Enables server-side encryption with customer-provided keys (SSE-C) and client-side encryption workflows.
Parameters
Parameter
Type
Required
Description
access_key_id
VARCHAR
Required
AWS access key ID for authentication
secret_access_key
VARCHAR
Required
AWS secret access key for authentication
region
VARCHAR
Required
AWS region where the S3 bucket is located (e.g., us-east-1)
session_token
VARCHAR
Optional
Temporary session token for AWS STS credentials
encryption_key
VARCHAR
Optional
Base64-encoded 256-bit AES key for client-side encryption
Examples
1
Create an S3 secret with encryption key
CREATE SECRET aws_s3_crypto (
TYPE S3,
access_key_id 'AKIAIOSFODNN7EXAMPLE',
secret_access_key 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
region 'us-east-1',
encryption_key 'YourBase64EncodedEncryptionKey=='
);
2
Read encrypted data from S3
SELECT *
FROM read_parquet('s3://my-bucket/encrypted-data/*.parquet')
WHERE created_at > '2024-01-01';
3
Write encrypted data to S3
COPY (
SELECT
crypto_sha256(user_id) AS anonymous_id,
transaction_amount,
created_at
FROM transactions
)
TO 's3://my-bucket/encrypted-output/data.parquet'
(FORMAT PARQUET, ENCRYPTION 'AES-256');
ATTACH 'host=db.example.com' AS external_db (TYPE POSTGRES, SECRET database_crypto);
SELECT *
FROM external_db.public.sensitive_data
WHERE department = 'Finance';
Virtual filesystem for reading and writing encrypted data to Amazon S3. Automatically handles client-side encryption and decryption using AES-256 with customer-provided keys. Seamlessly integrates with DuckDB's S3 filesystem while adding transparent encryption layer.
Usage
Access this filesystem using the encrypted_s3:// protocol prefix in DuckDB file paths.
Virtual filesystem for Azure Blob Storage with built-in encryption support. Provides transparent encryption and decryption using Azure-compatible encryption standards. Works with customer-managed keys and Azure Key Vault integration.
Usage
Access this filesystem using the encrypted_azure:// protocol prefix in DuckDB file paths.
Local filesystem overlay that provides transparent file-level encryption. All files are automatically encrypted at rest using AES-256. Ideal for storing sensitive data on disk while maintaining DuckDB's standard file access patterns.
Usage
Access this filesystem using the encrypted_local:// protocol prefix in DuckDB file paths.
HashiCorp Vault integration filesystem for reading secrets and credentials directly from Vault. Enables secure access to centrally managed secrets without storing credentials in query scripts or configuration files.
Usage
Access this filesystem using the vault:// protocol prefix in DuckDB file paths.
Real-world recipes and patterns for common use cases.
Encrypting Sensitive Data
Protect PII (Personally Identifiable Information) in your database by encrypting sensitive columns.
Encrypt a Column with AES-256-GCM
-- Create a table with sensitive dataCREATE TABLE users ( id INTEGER PRIMARY KEY, email VARCHAR, ssn VARCHAR);-- Insert some dataINSERT INTO users VALUES (1, '[email protected]', '123-45-6789'), (2, '[email protected]', '987-65-4321');-- Generate a secure encryption key (store this safely!)SELECT crypto_random_bytes(32) AS encryption_key;-- Create an encrypted view of the SSN columnSELECT id, email, crypto_aes_encrypt( ssn::BLOB, '\x0123456789abcdef0123456789abcdef'::BLOB, -- Your 32-byte key 'aes-256-gcm' ) AS encrypted_ssnFROM users;
Secure Password Storage
Never store passwords in plain text. Use Argon2id for secure password hashing.
Hash Passwords with Argon2id
-- Hash a password for storageSELECT crypto_argon2_hash( 'user_password', 16, -- salt length 3, -- iterations (time cost) 65536, -- memory in KB 1 -- parallelism) AS password_hash;-- Verify a password against stored hashSELECT crypto_argon2_verify( 'user_password', stored_hash) AS is_valid;
Generating Secure Tokens
Create secure random tokens for API keys, session IDs, or verification codes.
Generate API Keys
-- Generate a 32-byte random token as hex stringSELECT crypto_encode( crypto_random_bytes(32), 'hex') AS api_key;-- Generate a URL-safe base64 tokenSELECT crypto_encode( crypto_random_bytes(24), 'base64') AS session_token;
Data Integrity with HMAC
Use HMAC to verify data hasn’t been tampered with.
Sign and Verify Data
-- Create an HMAC signatureSELECT crypto_hmac( 'important data to protect', 'your-secret-key', 'sha256') AS signature;-- Verify by comparing signaturesWITH signed_data AS ( SELECT 'important data to protect' AS data, crypto_hmac('important data to protect', 'your-secret-key', 'sha256') AS original_sig)SELECT data, original_sig = crypto_hmac(data, 'your-secret-key', 'sha256') AS is_validFROM signed_data;
Hashing for Deduplication
Use fast hashing algorithms to identify duplicate records.
Find Duplicate Files by Content Hash
-- Hash file contents to find duplicatesCREATE TABLE files ( path VARCHAR, content BLOB);-- Find duplicates by comparing BLAKE3 hashesWITH hashed_files AS ( SELECT path, crypto_blake3(content) AS content_hash FROM files)SELECT content_hash, list(path) AS duplicate_paths, count(*) AS countFROM hashed_filesGROUP BY content_hashHAVING count(*) > 1;
Key Derivation for Multi-Tenant Encryption
Derive unique encryption keys for each tenant from a master key.
Derive Tenant-Specific Keys
-- Derive a unique key for each tenant using HKDFSELECT tenant_id, crypto_hkdf( master_key, -- Your master secret key tenant_id::BLOB, -- Salt: unique per tenant 'tenant-encryption', -- Info: context string 32, -- Output key length 'sha256' ) AS tenant_keyFROM tenants;
Commercial Extension
Pricing
Unlike many DuckDB extensions, the Crypto extension is a commercially licensed product. This allows us to provide enterprise-grade security features, dedicated support, and continuous development. Start with a 15-day free trial to evaluate.
Start with a 15-day free trial — no credit card required
Registration required. You'll receive a license key to activate Example.
Supporting DuckDB Development
A portion of every purchase is shared with DuckDB Labs to support the continued development of DuckDB. By using this extension, you're helping sustain the open-source database that powers it.
Activating Your License
After registration, you'll receive a license key. Add it to your DuckDB configuration using one of these methods:
-- Option 1: Set via SQL
SET example_license_key = 'YOUR_LICENSE_KEY';
-- Option 2: Create a persistent secret
CREATE SECRET example_license (
TYPE example_license,
KEY 'YOUR_LICENSE_KEY'
);
View Terms & Conditions
License Terms
The Crypto extension is licensed based on concurrent instances. An instance is defined as a single DuckDB process with the extension loaded. The base license covers up to 5 concurrent instances across your infrastructure.
Instance Counting
Instances are counted based on unique license key activations within a rolling 24-hour window. Development and testing environments do not count against your instance limit when using the provided development key.
Academic & Research Discounts
We offer significant discounts for educational and research institutions:
Universities & Colleges: 75% discount on all plans for accredited educational institutions.
Research Institutions: 50% discount for non-profit research organizations.
Students: Free access for individual students with a valid .edu email address (limited to 1 instance).
To apply for academic pricing, contact us with proof of institutional affiliation.
Trial Period
Your 15-day free trial includes full access to all features with up to 2 concurrent instances. No credit card is required to start. At the end of the trial, you can purchase a subscription or the extension will revert to evaluation mode.
Refund Policy
We offer a 30-day money-back guarantee for all paid subscriptions. If you're not satisfied, contact support for a full refund.
Support
All paid plans include email and chat support with response times within 24 hours on business days. For mission-critical deployments, ask about our premium support options.