#️⃣

Hashfuncs

Fast, deterministic, non-cryptographic hash functions — xxHash, MurmurHash3, RapidHash. For partitioning, sharding, cache keys, and probabilistic-filter inputs; not for security.

Install

-- Install the extension
INSTALL hashfuncs FROM community;

-- Load it into your session
LOAD hashfuncs;

-- 64-bit xxHash3 — recommended general-purpose default
SELECT xxh3_64(payload) AS h FROM events;

-- Seeded for reproducible partitioning
SELECT xxh3_64(user_id, 42) % 16 AS partition FROM users;

-- 128-bit hex digest (canonical xxhash byte order;
-- matches Python xxhash.xxh3_128().hexdigest())
SELECT xxh3_128_hex('hello') AS digest;

Technical Overview

Why Use Hashfuncs?

Industry-standard, non-cryptographic hashes inside DuckDB SQL — xxHash, MurmurHash3, and RapidHash. Use them for partitioning, sharding, cache keys, and probabilistic-filter inputs where speed wins and adversarial collisions don't matter. For digests, signatures, or HMAC, use the crypto extension instead.

What this extension is for

Hashfuncs gives you fast, deterministic 32-, 64-, and 128-bit hashes — the kind you reach for when DuckDB's built-in hash() isn't enough and a cryptographic hash is overkill.

  • Hash partitioning and sharding: Modulo a hash by the partition count to bucket rows. xxh3_64 is a sensible default; murmurhash3_x64_128 is the right pick if you need to interoperate with Cassandra or Spark hash routing.
  • Probabilistic-filter inputs: Bloom, XOR, and Cuckoo filters need a fast UBIGINT hash per element. xxh3_64 and rapidhash feed straight into the bitfilters extension.
  • Stable cache and content keys: xxh3_128_hex emits the canonical 32-character hex digest used by Python xxhash and xxhash-rust — useful as a cross-language fingerprint for cache keys, dedup, and content-addressed lookups.
  • Cheap deduplication: SELECT DISTINCT xxh3_64(...) collapses duplicate payloads at near-RAM bandwidth. The 128-bit variants (xxh3_128, murmurhash3_x64_128) cut accidental-collision risk for very large keyspaces.

🧭 Hashfuncs vs. crypto — picking the right extension

These two extensions look similar but solve different problems. Pick by the threat model, not by output width.

  • Use hashfuncs when speed wins: Partitioning, sharding, deduplication, cache keys, Bloom-filter inputs — workloads where the adversary is randomness, not a human. xxh3_64 and rapidhash run roughly an order of magnitude faster than SHA-256.
  • Use crypto when collisions must be hard to forge: Digital signatures, HMAC, content-addressed storage with adversarial input, tamper-evident audit logs, password salts, secure random bytes — anything an attacker could try to break. The crypto extension gives you BLAKE3, SHA-2, SHA-3, HMAC, and a CSPRNG.
  • Don't substitute one for the other: These hashes are explicitly not collision-resistant against a motivated adversary. Treating an xxh3_128 like a SHA-256 digest is a security bug, not a performance optimization. Inverse: spending SHA-256 on partition routing is wasted CPU.
  • They compose: Both extensions can be loaded together. A common pattern: crypto for the durable identity hash on a row, hashfuncs for the routing hash that decides which shard processes it.

🔧 How it works

Every function is a scalar SQL function with one or two arguments — the value, and an optional seed. All families ship as upstream-faithful C/C++ implementations linked into the extension binary.

  • Three families, one interface: xxHash (32 / 64 / 128-bit, including a hex digest), MurmurHash3 (32 / 128-bit), and RapidHash. Every function takes any DuckDB scalar value as input.
  • Optional seeds for independent hash spaces: Every function has a seeded overload. Two different seeds give two uncorrelated hash spaces from the same column — useful for double-hashing in Bloom filters or running parallel shardings without correlation.
  • Canonical 128-bit hex digest: The 128-bit xxHash variant emits the lowercase 32-character canonical hex form defined in the xxHash specification, matching the Python xxhash and other bindings — so digests round-trip across languages.
  • Installs from the community repository: Available via DuckDB's community extensions channel — INSTALL hashfuncs FROM community; LOAD hashfuncs;. No external services or secrets required.

🎯 Common Use Cases

Partition rows for parallel processing

SELECT user_id, xxh3_64(user_id) % 16 AS partition FROM users gives you a stable, well-distributed bucket assignment. Pair with CREATE TABLE AS and Parquet partitioning for shardable output.

Build Bloom / XOR filters at SQL speed

Feed xxh3_64 into the bitfilters extension to construct probabilistic set-membership filters in a single query — millions of keys per second on commodity hardware.

Cross-language content fingerprints

Use xxh3_128_hex to produce digests that match what Python xxhash or xxhash-rust compute over the same bytes — handy for cache keys shared between SQL and application code.

Interoperate with Cassandra / Spark hash routing

murmurhash3_x64_128 matches the variant used by Apache Cassandra's Murmur3Partitioner and Apache Spark's default hash partitioner — useful when DuckDB is downstream of a system that already routes by MurmurHash3.

Deep Dive

Technical Details

Magic moment — partition a billion rows in one query

Take any column, hit it with a fast hash, modulo the partition count, and you have a stable, well-distributed bucket assignment with zero configuration:

-- Bucket every event into one of 64 shards using xxHash3
SELECT
  event_id,
  payload,
  xxh3_64(event_id) % 64 AS shard
FROM events;

xxh3_64 runs at near-RAM bandwidth on modern CPUs — typically an order of magnitude faster than a cryptographic hash. For workloads where the only “adversary” is randomness — partitioning, sharding, deduplication, Bloom filter inputs, cache keys — that’s the right trade.

Not cryptographically secure — by design

Every function in this extension is a non-cryptographic hash. They are fast and well-distributed against random inputs, but they do not resist a motivated adversary who controls the input. Treating xxh3_128 like a SHA-256 digest is a security bug, not a performance optimization.

For digital signatures, HMAC, content-addressed storage with adversarial input, password salts, tamper-evident audit logs, or secure random bytes, use the crypto extension instead — it provides BLAKE3, SHA-2, SHA-3, HMAC, and a CSPRNG.

Architecture

Hashfuncs is a small, focused extension: a set of scalar SQL functions, each a thin wrapper around an upstream-faithful C/C++ implementation linked into the extension binary. There is no catalog, no secrets, no network — every call is pure CPU.

Three families ship in the box:

Every function accepts any DuckDB scalar value and (optionally) a seed. Two different seeds produce two uncorrelated hash spaces from the same input — useful for double-hashing in Bloom filters or running parallel shardings without correlation.

Hashfuncs vs. crypto — picking the right extension

Both extensions output deterministic hashes, but they answer different questions.

You need…Reach forWhy
Partitioning / shardingxxh3_64, murmurhash3_x64_128Speed; well-distributed for random input
Bloom / XOR / Cuckoo filter inputxxh3_64, rapidhashUBIGINT output, near-RAM bandwidth
Cross-language fingerprintxxh3_128_hexCanonical hex; matches Python / Rust / C
Cassandra / Spark interopmurmurhash3_x64_128Standard hash for those routers
Cheap deduplicationxxh3_64 or xxh3_12864 vs 128 bits = collision-safety vs speed
Digital signatures, HMACcryptoAdversarial collision resistance
Tamper-evident logs / chainscryptoSHA-2 / SHA-3 / BLAKE3
Salts, keys, nonces (CSPRNG)cryptocrypto_random_bytes from OpenSSL
Password storageNeither — use a real password hash (bcrypt / argon2) outside DuckDBThese are general-purpose hashes

The two extensions compose. A common pattern is a crypto digest as the durable identity hash on a row, and an xxh3_64 hash for the routing decision that picks which shard processes it.

Compared to alternatives

vs. DuckDB’s built-in hash(): hash() is non-cryptographic too and is great for join/group-by internals. But its algorithm is implementation-defined and may change between DuckDB versions — don’t persist its output, and don’t expect another tool to reproduce it. Hashfuncs gives you named, stable algorithms whose output other systems can compute identically.

vs. cryptographic hashes (SHA-256, BLAKE3): roughly an order of magnitude slower per byte, but they resist adversarial collisions. Use crypto when that property matters; don’t pay for it when it doesn’t.

vs. doing this in Python / Pandas: pulling the column out of DuckDB just to hash it usually costs more than the hash itself. xxh3_64(...) in SQL is one vectorized scan with no Python round-trip — the same algorithm xxhash would compute, but in-process.

Seeds and overloads

Every function has both an unseeded form and a seeded form. The seed type follows the hash width:

Seeded variants are the right tool whenever you want the same column to participate in two different hash spaces — for example, a Bloom filter that uses two independent hash functions, or two independent shardings with no correlation between bucket assignments.

Pairing with bitfilters

The bitfilters extension (Bloom, XOR, Quotient, and Binary Fuse filters) takes UBIGINT hashes as its input. Hashfuncs is the natural producer:

-- Build an xor8 filter from a column of strings
SELECT xor8_filter(xxh3_64(email)) AS filter
FROM allowed_users;

-- Test membership in a streaming query
SELECT *
FROM events e, filter_table f
WHERE xor8_filter_contains(f.filter, xxh3_64(e.email));

The combination gives you sub-microsecond probabilistic set-membership at SQL speed — INSTALL hashfuncs FROM community; INSTALL bitfilters FROM community; and you have everything in one DuckDB session.

Reference

Extension Contents

Quick reference to all available functions and settings organized by category.

Name Description
MurmurHash3

Austin Appleby's MurmurHash3 — a battle-tested non-cryptographic hash widely used in distributed systems including Apache Cassandra, Apache Spark, and many partitioner implementations. Choose [murmurhash3_32](#murmurhash3_32), [murmurhash3_128](#murmurhash3_128), or [murmurhash3_x64_128](#murmurhash3_x64_128) when you need to interoperate with an existing MurmurHash3-based consumer.

murmurhash3_128() 128-bit MurmurHash3 — the x86 variant
murmurhash3_32() 32-bit MurmurHash3
murmurhash3_x64_128() 128-bit MurmurHash3 — the x64 variant
RapidHash

RapidHash is purpose-built for raw 64-bit throughput while keeping competitive distribution quality. Frequently the fastest published 64-bit non-cryptographic hash on x86_64 — reach for [rapidhash](#rapidhash) when [xxh3_64](#xxh3_64)'s speed isn't quite enough.

rapidhash() RapidHash — a 64-bit non-cryptographic hash designed for exceptional speed while keeping competitive distribution quality
xxHash

Yann Collet's xxHash family — among the fastest non-cryptographic hashes available, with excellent distribution. [xxh3_64](#xxh3_64) is a sensible modern default; the 128-bit variants ([xxh3_128](#xxh3_128), [xxh3_128_hex](#xxh3_128_hex)) give you more headroom against accidental collisions and a portable hex digest format used by Python xxhash and xxhash-rust.

xxh3_128() 128-bit xxHash3 (XXH3_128) returned as a UHUGEINT
xxh3_128_hex() 128-bit xxHash3 returned as a 32-character lowercase hex string in canonical xxHash byte order (low64 || high64), as defined by the xxHash specification
xxh3_64() 64-bit xxHash3 (XXH3_64) — the modern xxHash variant tuned for vectorized CPUs (SSE2/AVX2)
xxh32() 32-bit xxHash (XXH32) — the classic 32-bit member of the xxHash family
xxh64() 64-bit xxHash (XXH64)

API Reference

Function Documentation

Detailed documentation for each function including signatures, parameters, and examples.

murmurhash3_128

Scalar Function MurmurHash3
Signature
murmurhash3_128(value: ANY) → UHUGEINT
Parameters (Positional)
Parameter Type Mode Description
value ANY Positional
Returns
Description

128-bit MurmurHash3 — the x86 variant. Returns a UHUGEINT covering both 64-bit halves of the hash.

Included for compatibility with consumers that specifically expect the x86 128-bit form. On modern 64-bit hardware, murmurhash3_x64_128 is generally the right pick.

Examples
1

128-bit on x86

SELECT murmurhash3_128('hello world');

Output

murmurhash3_128('hello world')
206095855024402301784664199839047883400
2

Seeded

SELECT murmurhash3_128('hello world', 42);

Output

murmurhash3_128('hello world', 42)
181354507998755592033411453641571670529

murmurhash3_32

Scalar Function MurmurHash3
Signature
murmurhash3_32(value: ANY) → UINTEGER
Parameters (Positional)
Parameter Type Mode Description
value ANY Positional
Returns
Description

32-bit MurmurHash3. Battle-tested across distributed systems — variants of MurmurHash3 underpin hash partitioning in Apache Cassandra, Apache Spark, and many other tools.

Choose murmurhash3_32 when you need cross-system compatibility with an existing MurmurHash3-32 consumer. For new in-DuckDB workloads, xxh3_64 is generally faster.

Examples
1

Default

SELECT murmurhash3_32('hello world');

Output

murmurhash3_32('hello world')
1586663183
2

Seeded

SELECT murmurhash3_32('hello world', 123);

Output

murmurhash3_32('hello world', 123)
679062093

murmurhash3_x64_128

Scalar Function MurmurHash3
Signature
murmurhash3_x64_128(value: ANY) → UHUGEINT
Parameters (Positional)
Parameter Type Mode Description
value ANY Positional
Returns
Description

128-bit MurmurHash3 — the x64 variant. The standard 128-bit MurmurHash3 form on 64-bit platforms; this is what most modern consumers (including Apache Cassandra's Murmur3Partitioner and Apache Spark's default hash partitioner) actually use.

Returned as a UHUGEINT. Use this when DuckDB needs to produce hashes that match what another system has already computed for the same key.

Examples
1

128-bit hash optimized for x64

SELECT murmurhash3_x64_128('hello world');

Output

murmurhash3_x64_128('hello world')
228083453807047072434243676435732455694
2

Seeded

SELECT murmurhash3_x64_128('hello world', 42);

Output

murmurhash3_x64_128('hello world', 42)
46796720576937137733623800116632579848

rapidhash

Scalar Function RapidHash
Signature
rapidhash(value: ANY) → UBIGINT
Parameters (Positional)
Parameter Type Mode Description
value ANY Positional
Returns
Description

RapidHash — a 64-bit non-cryptographic hash designed for exceptional speed while keeping competitive distribution quality. Frequently the fastest 64-bit hash in published benchmarks on modern x86_64.

Reach for rapidhash when xxh3_64's throughput isn't quite enough, or when most of your hashed inputs are very short integers and you want the lowest per-call latency. As with the rest of this extension, this is not cryptographically secure — use crypto for digests and signatures.

Examples
1

Default

SELECT rapidhash('hello world');

Output

rapidhash('hello world')
3397907815814400320
2

Seeded

SELECT rapidhash('hello world', 2023);

Output

rapidhash('hello world', 2023)
11789095433300219990

xxh3_128

Scalar Function xxHash
Signature
xxh3_128(value: ANY) → UHUGEINT
Parameters (Positional)
Parameter Type Mode Description
value ANY Positional
Returns
Description

128-bit xxHash3 (XXH3_128) returned as a UHUGEINT. The wider hash space dramatically reduces accidental-collision probability for very large keyspaces (billions of distinct keys).

Use when 64 bits feels tight — content fingerprints, dedup keys at scale, or any place a 64-bit birthday-bound starts to matter. For a portable hex digest instead of a number, use xxh3_128_hex.

Examples
1

128-bit hash

SELECT xxh3_128('hello world');

Output

xxh3_128('hello world')
225447084758876380551077147957698971904
2

Seeded

SELECT xxh3_128('hello world', 42);

Output

xxh3_128('hello world', 42)
186337813068040876697969533742389665544

xxh3_128_hex

Scalar Function xxHash
Signature
xxh3_128_hex(col0: BLOB) → VARCHAR
Parameters (Positional)
Parameter Type Mode Description
col0 BLOB Positional
Returns
Description

128-bit xxHash3 returned as a 32-character lowercase hex string in canonical xxHash byte order (low64 || high64), as defined by the xxHash specification.

Matches the digest produced by XXH128_canonicalFromHash in C, xxhash.xxh3_128().hexdigest() in Python, and xxhash-rust when formatted as hex. Use it when the hash needs to be a string — Redis keys, filenames, or a fingerprint shared with non-SQL code.

Examples
1

32-char hex digest

SELECT xxh3_128_hex('hello');

Output

xxh3_128_hex('hello')
c779cfaa5e523818b5e9c1ad071b3e7f
2

Seeded variant

SELECT xxh3_128_hex('hello', 42);

Output

xxh3_128_hex('hello', 42)
8c2f5b7e4cd59e166ce89a0bdba81f08

xxh3_64

Scalar Function xxHash
Signature
xxh3_64(value: ANY) → UBIGINT
Parameters (Positional)
Parameter Type Mode Description
value ANY Positional
Returns
Description

64-bit xxHash3 (XXH3_64) — the modern xxHash variant tuned for vectorized CPUs (SSE2/AVX2). Faster than xxh64 on virtually all modern hardware, especially for short inputs, with comparable distribution quality.

This is the recommended general-purpose 64-bit non-cryptographic hash for partitioning, sharding, deduplication, and as input to probabilistic filters in bitfilters.

Examples
1

Default unseeded

SELECT xxh3_64('hello world');

Output

xxh3_64('hello world')
15296390279056496779
2

Seeded for partitioning

SELECT xxh3_64('hello world', 999);

Output

xxh3_64('hello world', 999)
3002856137354040482

xxh32

Scalar Function xxHash
Signature
xxh32(value: ANY) → UINTEGER
Parameters (Positional)
Parameter Type Mode Description
value ANY Positional
Returns
Description

32-bit xxHash (XXH32) — the classic 32-bit member of the xxHash family. Fast and well-distributed; use when 32 bits is enough — for in-memory hashtables, lightweight checksums, or short-lived bucket assignment.

Accepts an optional UINTEGER seed. Prefer xxh3_64 when you don't have a specific reason to use the 32-bit format — XXH3 is faster on modern hardware and offers more headroom against collisions.

Examples
1

Hash a string

SELECT xxh32('hello world');

Output

xxh32('hello world')
3468387874
2

Seeded variant — independent hash space

SELECT xxh32('hello world', 42);

Output

xxh32('hello world', 42)
4225033588

xxh64

Scalar Function xxHash
Signature
xxh64(value: ANY) → UBIGINT
Parameters (Positional)
Parameter Type Mode Description
value ANY Positional
Returns
Description

64-bit xxHash (XXH64). The original 64-bit xxHash — broadly supported across language bindings and on-the-wire identical to xxHash 0.6.x callers.

Pick xxh64 when you must match an existing XXH64-based system; otherwise xxh3_64 is faster on modern CPUs.

Examples
1

64-bit hash of a string

SELECT xxh64('hello world');

Output

xxh64('hello world')
5020219685658847592
2

Seeded

SELECT xxh64('hello world', 42);

Output

xxh64('hello world', 42)
14078989533569169714

Practical Examples

Cookbook

Real-world recipes and patterns for common use cases.

Pick the right function

You want…Pick
A modern fast 64-bit defaultxxh3_64
Maximum throughputrapidhash
Cross-system compatibility (Cassandra / Spark)murmurhash3_x64_128
Stable hex digest matching Python / Rust xxhashxxh3_128_hex
128-bit space for billions of distinct keysxxh3_128 or murmurhash3_x64_128
Cryptographic guaranteesnot this extension — use crypto

For digests, signatures, HMAC, secure random, or anywhere an attacker controls the input, reach for crypto. None of the functions in this extension are designed to resist adversarial collisions.

Hash partitioning

Pick a partition by taking the hash modulo the partition count:

SELECT user_id,
       xxh3_64(user_id) % 16 AS partition
FROM users;

A seed gives you a separate, uncorrelated hash space — useful when you want two different shardings of the same column without correlation between bucket assignments:

SELECT user_id,
       xxh3_64(user_id, 1) % 16 AS shard_a,
       xxh3_64(user_id, 2) % 16 AS shard_b
FROM users;

Materialize the result with CREATE TABLE AS to write a partitioned Parquet dataset.

Cross-system routing (Cassandra / Spark)

Use murmurhash3_x64_128 when DuckDB needs to compute the same routing hash that an upstream Apache Cassandra Murmur3Partitioner or Apache Spark job would assign:

-- Same hash Cassandra's Murmur3Partitioner would compute
SELECT key,
       murmurhash3_x64_128(key) AS token
FROM rows_to_route;

String cache keys

xxh3_128_hex emits the canonical 32-character hex digest used by Python xxhash and xxhash-rust — the right pick when the key needs to be a string (Redis key, filename, URL component):

-- Stable cache key for a (user, query) tuple
SELECT user_id,
       query_text,
       xxh3_128_hex(
         CAST(user_id AS VARCHAR) || '|' || query_text
       ) AS cache_key
FROM query_log;

Bloom / XOR / Cuckoo filter input

Pair with the bitfilters extension. Most filter constructors take UBIGINT hashes — exactly what xxh3_64 and rapidhash produce:

-- Build an xor8 filter from a column of strings
SELECT xor8_filter(xxh3_64(email)) AS filter
FROM allowed_users;

-- Test membership
SELECT xor8_filter_contains(f.filter, xxh3_64(:candidate_email))
       AS might_be_allowed
FROM filter_table f;

For a Bloom filter using two independent hash functions, use two seeds of the same algorithm rather than two different algorithms:

SELECT xxh3_64(key, 1) AS h1,
       xxh3_64(key, 2) AS h2
FROM data;

Cross-language fingerprints

xxh3_128_hex matches the canonical lowercase 32-character hex form (low64 || high64) defined by the xxHash specification. The same bytes hashed in any of these will produce the same digest:

SELECT xxh3_128_hex('hello');
-- c779cfaa5e523818b5e9c1ad071b3e7f
# Python — pip install xxhash (https://pypi.org/project/xxhash/)
import xxhash
xxhash.xxh3_128('hello').hexdigest()
# 'c779cfaa5e523818b5e9c1ad071b3e7f'
// Rust — xxhash-rust (https://docs.rs/xxhash-rust/latest/xxhash_rust/)
use xxhash_rust::xxh3::xxh3_128;
format!("{:032x}", xxh3_128(b"hello").swap_bytes());

Useful as a content fingerprint shared between SQL and application code.

Cheap deduplication

SELECT DISTINCT xxh3_64(...) collapses duplicate payloads at near-RAM bandwidth. Drop to 128 bits when the keyspace is large enough that 64-bit birthday collisions matter:

-- Distinct payloads in a high-volume event stream
SELECT DISTINCT xxh3_64(payload) AS payload_hash
FROM events;

-- Same idea with more headroom
SELECT DISTINCT xxh3_128(payload) AS payload_hash
FROM events;

Treat the hash as an accidental-collision-safe key, not a security claim.

Independent hashes via seeds

Two different seeds give two uncorrelated hashes from the same input. Useful for double-hashing schemes, parallel shardings, and probabilistic data structures that need multiple hash functions:

SELECT key,
       xxh3_64(key, 1) AS h1,
       xxh3_64(key, 2) AS h2,
       xxh3_64(key, 3) AS h3
FROM keys;

Platform Support

Compatibility

Extension availability may vary by platform and DuckDB version. Check below to ensure this extension supports your environment before installation.

Quick Facts

Software License MIT
Pricing Free
Written In C++
Source Available Yes
View on GitHub
Usage
280,287+
loads in last 30 days

Platforms

Linux
Linux (musl)
macOS
Windows
WASM
Supported platform architectures
Linux: aarch64, x86_64
Linux (musl): x86_64
macOS: Intel, Apple Silicon
Windows: x86_64
WASM: eh, threads, mvp
Compiled binary sizes
Platform Architecture Size
Linux aarch64 2.70 MB
Linux x86_64 3.06 MB
Linux (musl) x86_64 2.32 MB
macOS Apple Silicon 1.85 MB
macOS Intel 2.16 MB
Windows x86_64 7.53 MB
WASM eh 24.9 KB
WASM mvp 19.4 KB
WASM threads 19.3 KB

Gzipped download size from the DuckDB community-extensions registry.

DuckDB Versions

Release calendar
Supported
v1.4.4 v1.5.2

Fast Hashing in SQL

Install Hashfuncs to plug industry-standard non-cryptographic hashes into DuckDB queries. For digests, signatures, or anything adversarial, use the [crypto](/products/extensions/crypto) extension instead.