Bitfilters
Probabilistic set-membership filters as DuckDB columns. Quotient, XOR (8/16-bit), Binary Fuse (8/16-bit), and DuckDB-style Bloom filters β answer 'is X in this set?' in O(1) using a fraction of the memory of the source set, with tunable false-positive rates and zero false negatives.
Install
-- Install the extension
INSTALL bitfilters FROM community;
-- Load it into your session
LOAD bitfilters;
-- Build an xor8 filter from hashed keys (1% false-positive rate, ~9 bits per key)
WITH f AS (
SELECT xor8_filter(hash(email)) AS filter FROM users
)
SELECT email,
xor8_filter_contains(f.filter, hash(email)) AS might_be_known
FROM new_signups, f
LIMIT 10;
-- Per-partition Quotient filter β supports deletion and resize
SELECT bucket,
quotient_filter(16, 4, hash(id)) AS filter
FROM ids
GROUP BY bucket; Technical Overview
Why Use Bitfilters?
Build a probabilistic set-membership filter from a column, ship it as a small BLOB, and probe it from anywhere DuckDB runs. Answer "is this maybe in the set?" in O(1) using a fraction of the memory of the source set β with no false negatives and a tunable false-positive rate. The classic use is skipping work when the answer is definitely no: pre-filter joins, skip partitions, dodge cache misses.
π― What this extension is for
Bitfilters materializes four families of approximate-membership-query (AMQ) data structures as DuckDB aggregate functions. Build the filter once, store the BLOB as an ordinary column, probe it later β locally, in another DuckDB instance, or shipped over the network.
- β’ Skip a join when the answer is definitely no: Build a filter from the small "interesting" side of a join, then test the large side against it. False positives let an unwanted row through; false negatives never happen β so the cheap probe + expensive verify pattern is exact for correctness.
- β’ Per-partition data skipping: Materialize one filter per logical partition, store alongside metadata, and probe at query time to skip whole partitions that can't possibly contain the key. For high-cardinality lookups against fact tables this routinely skips 99%+ of partitions.
- β’ Compact set artifacts: A Binary Fuse 8 filter is under 9 bits per key β three orders of magnitude smaller than the source set for typical UUID workloads. Persist it in a Parquet column, ship it to another process, embed it in a manifest.
- β’ DuckDB-native join filters: One family emits filters in the same internal format DuckDB uses for its own runtime join filters β persist a filter, then re-use it in custom data-skipping logic the optimizer recognizes.
π§ͺ Filter families
Four families, three design points. Pick on mutation, memory, and false-positive rate in that order β not the other way around.
- β’ Quotient β dynamic sets: The only family here that supports deletion and resize. Pick it when the underlying set evolves at runtime. Tunable slot count and FPR. See Quotient filter.
- β’ XOR β static, cross-system compatible: ~9 bits per key at ~0.39% FPR (8-bit) or ~18 bits per key at ~0.0015% FPR (16-bit) β about 20% smaller than a Bloom at the same FPR. Built once, queried many times. See the Xor filters paper (Graf & Lemire, 2019).
- β’ Binary Fuse β modern best-in-class: Typically lands under 9 bits per key with very fast construction and lookups β currently the most space-efficient static filter family. Default choice for read-heavy, append-only sets.
- β’ DuckDB Bloom β native compatibility: Speaks DuckDB's internal Bloom filter format and hash layout, so a persisted filter slots into the engine's own data-skipping path.
βοΈ How it works
Every filter constructor is a DuckDB aggregate function that takes UBIGINT hashes as input and returns a single BLOB. The result is dense, self-contained, and storable like any other column.
- β’ Hashes go in, BLOBs come out: Hash your input first β DuckDB's built-in
hash()is fine for most cases. For specific algorithms (xxh3_64, MurmurHash3, rapidhash), pair with thehashfuncsextension. Filters distribute keys based on hash bits β quality of the hash matters. - β’ Filters as columns: Every filter is just a
BLOB. Store it in a regular column, persist to Parquet, send over the wire, hold in a CTE, partition by anything β there is no special filter type, only the matching*_containsprobe function. - β’ No false negatives, ever: The filter never lies about absence: a
falsefrom a probe means the key is definitively not in the source set. False positives are bounded β measure them by probing known non-members and confirm the rate matches the family. - β’ Set-membership only β not cardinality: Bitfilters answers "is X in the set?". For cardinality estimation ("how many distinct items?") and set intersection sizes, use the sibling
datasketchesextension β it has Theta sketches and HyperLogLog for that workload. The two extensions cover complementary problems.
π‘οΈ Honest limitations
What probabilistic filters cannot do β read this before designing a system around one.
- β’ False positives are inherent: Every family here trades exactness for memory. A
truefrom*_containsmeans probably present, not definitely present. If your downstream cannot tolerate even one wrong yes, pair the probe with an exact verify β the filter just narrows the candidate set. - β’ Filters cannot enumerate: There's no way to list the members from a filter β it's set-membership only. If you need the elements themselves, store them; if you need cardinality, use
datasketches. - β’ Static families don't support updates: XOR, Binary Fuse, and DuckDB-Bloom filters are built once and immutable. Adding a single key requires a full rebuild. Use
quotient_filterwhen the set genuinely churns, or rebuild on a cadence that matches your write throughput. - β’ Tiny sets don't earn the filter: For a few thousand elements an in-memory hash set or a sorted array is faster and exact. Bitfilters earns its keep when the source set is large enough that storing the full set is what hurts β millions of keys, per-partition replication, network ship costs.
π― Common Use Cases
Pre-filter an expensive join or lookup
Build a filter from the small side, probe from the large side, verify only candidates that pass. The probe is O(1) and fits in cache; the verify only runs at the false-positive rate. Net: the join sees a fraction of the rows it would otherwise.
Per-partition data skipping
Materialize one Binary Fuse filter per partition at write time, store alongside partition metadata. At query time the probe tells the planner which partitions can possibly contain the key β skip the rest.
Ship a set as a tiny artifact
A 10M-key Binary Fuse 8 filter is ~11 MB. Persist it to S3, attach it to a manifest, distribute it to read replicas, embed it in a CDN β any place where shipping the full set would be too big.
Dedupe pre-check / cache hit prediction
Probe a filter built from "already-seen" keys before doing the expensive dedupe work. Misses are free; hits go to the exact path. Same pattern for predicting cache hits before issuing the round-trip.
Deep Dive
Technical Details
What is a probabilistic filter?
A probabilistic set-membership filter answers βIs X in this set?β with two carefully chosen guarantees:
- No false negatives. If the filter says NO, the element is definitely not in the set.
- Bounded false positives. If the filter says YES, the element is probably in the set β with a measurable probability of being wrong.
Thatβs a powerful tradeoff. You give up exactness in one direction in exchange for:
- Memory β typically 1β2 bytes per element instead of storing the whole element.
- Speed β O(1) probe, often a single cache line read.
Use cases include join pre-filtering, data skipping, dedupe pre-checks, cache hit/miss prediction, and routing requests to the right shard.
Filter family comparison
| Family | Memory (β, per element) | False positive rate | Mutation | Best for |
|---|---|---|---|---|
| Quotient | tunable (q, r) | ~2^(βr) | Add, delete, resize | Dynamic sets |
| XOR-8 | ~9 bits | ~0.39% | Static | General-purpose |
| XOR-16 | ~18 bits | ~0.0015% | Static | Very low FPR needed |
| Binary Fuse-8 | <9 bits | ~0.39% | Static | Smallest static filter |
| Binary Fuse-16 | ~18 bits | ~0.0015% | Static | Smallest very-low-FPR filter |
| DuckDB Bloom | tunable | tunable | Static | DuckDB-native compat |
For most read-heavy workloads, Binary Fuse 8 is the right default. Switch to Quotient when the underlying set changes; switch to a 16-bit variant when even 0.4% false positives are too many.
Hashing the input
Every filter takes UBIGINT hashes as input β the algorithm-quality of the hash matters because filters distribute keys based on the hash bits. Three options, in increasing fanciness:
hash(value)β DuckDBβs built-in. Good distribution, very fast, no extension needed.xxh3_64(value)fromhashfuncsβ slightly faster, well-distributed, cross-language compatible.crypto_hash('sha2-256', value)fromcryptocast to UBIGINT β only when adversarial inputs matter (cryptographic resistance to filter-poisoning).
For 99% of cases use option 1 or 2. Option 3 is correct but ~10Γ slower.
Filters as columns
Every filter constructor is an aggregate function and produces a single BLOB. That makes filters first-class values you can:
- Store in a regular column (
CREATE TABLE filters (bucket INT, filter BLOB)). - Persist in Parquet for cross-process reuse.
- Pass to other queries via CTE or table valued.
- Combine via UNION (build per-partition filters separately, then probe each).
Thereβs no special filter type β just BLOB plus the matching *_contains function.
When not to use a filter
- You actually need the elements, not membership. Filters canβt enumerate; theyβre set-only.
- The set is tiny. A 10K-element set in a HashMap is faster than a filter.
- You canβt tolerate any false positives, including in a probe-then-verify pattern. (Though probe-then-verify makes filters exact for correctness purposes β you just pay the verify cost on FPR-rate of inputs.)
- The set churns continuously. Even Quotient filters have construction cost; if you rebuild on every query, the filter isnβt earning its keep.
Reference
Extension Contents
Quick reference to all available functions and settings organized by category.
| Name | Description | |
|---|---|---|
| Binary Fuse Filters Binary Fuse is the modern best-in-class space-efficient family. <9 bits per key for the 8-bit variant, very fast construction and lookups. Static structure β pick this for read-heavy workloads where storage matters most. | ||
| binary_fuse16_filter() | Aggregate that builds a 16-bit Binary Fuse filter β same family as binary_fuse8 but with much lower false-positive rate at the cost of ~2Γ the memory | |
| binary_fuse16_filter_contains() | Test whether a hash value is a member of a 16-bit Binary Fuse filter | |
| binary_fuse8_filter() | Aggregate that builds an 8-bit Binary Fuse filter from hashed keys | |
| binary_fuse8_filter_contains() | Test whether a hash value is a member of an 8-bit Binary Fuse filter | |
| DuckDB Bloom Filters Bloom filters in DuckDB's native internal format. Use these when you need to materialize join filters or zone-map-style data-skipping structures that the DuckDB optimizer can consume directly. | ||
| bitfilters_duckdb_bloom_filter_create() | Build a Bloom filter compatible with DuckDB's internal join filter format | |
| bitfilters_duckdb_bloom_filter_probe() | Test whether a hash value might be a member of a DuckDB-compatible Bloom filter built by bitfilters_duckdb_bloom_filter_create | |
| bitfilters_duckdb_hash() | Hash function compatible with DuckDB's internal Bloom filter probe layout | |
| Quotient Filter Quotient filters support deletion and resize. Use when the underlying set evolves at runtime β adds, removes, occasional rebuilds. Configurable via | ||
| quotient_filter() | Aggregate that builds a Quotient filter from hashed keys | |
| quotient_filter_contains() | Test whether a hash value is a member of a Quotient filter | |
| XOR Filters XOR filters are static β built once, queried many times. ~9 bits per key for the 8-bit variant (0.39% FPR), ~18 bits per key for the 16-bit (0.0015% FPR). ~20% smaller than Bloom for the same false-positive rate. | ||
| xor16_filter() | Aggregate that builds a 16-bit XOR filter from hashed keys | |
| xor16_filter_contains() | Test whether a hash value is a member of a 16-bit XOR filter | |
| xor8_filter() | Aggregate that builds an 8-bit XOR filter from hashed keys | |
| xor8_filter_contains() | Test whether a hash value is a member of an 8-bit XOR filter | |
API Reference
Function Documentation
Detailed documentation for each function including signatures, parameters, and examples.
binary_fuse16_filter
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | UBIGINT | Positional |
Returns
Description
Aggregate that builds a 16-bit Binary Fuse filter β same family as binary_fuse8 but with much lower false-positive rate at the cost of ~2Γ the memory.
Examples
Build a Binary Fuse 16 filter
SELECT binary_fuse16_filter(hash(id)) AS filter FROM ids; Related Functions
binary_fuse16_filter_contains
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | BLOB | Positional | |
col1 | UBIGINT | Positional |
Returns
Description
Test whether a hash value is a member of a 16-bit Binary Fuse filter.
Examples
Membership test
SELECT binary_fuse16_filter_contains(filter, hash(:candidate)) FROM filter_table; binary_fuse8_filter
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | UBIGINT | Positional |
Returns
Description
Aggregate that builds an 8-bit Binary Fuse filter from hashed keys. The most space-efficient filter family β typically <9 bits per key β with very fast construction and lookups. Static structure.
Examples
Build a Binary Fuse 8 filter
SELECT binary_fuse8_filter(hash(id)) AS filter FROM ids; Related Functions
- binary_fuse8_filter_contains() β Test whether a hash value is a member of an 8-bit Binary Fuse filter
- binary_fuse16_filter() β Aggregate that builds a 16-bit Binary Fuse filter β same family as binary_fuse8 but with much lower false-positive rate at the cost of ~2Γ the memory
- xor8_filter() β Aggregate that builds an 8-bit XOR filter from hashed keys
binary_fuse8_filter_contains
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | BLOB | Positional | |
col1 | UBIGINT | Positional |
Returns
Description
Test whether a hash value is a member of an 8-bit Binary Fuse filter.
Examples
Membership test
SELECT binary_fuse8_filter_contains(filter, hash(:candidate)) FROM filter_table; Related Functions
bitfilters_duckdb_bloom_filter_create
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | VARCHAR | Positional | |
col1 | INTEGER | Positional | |
col2 | UBIGINT | Positional |
Returns
Description
Build a Bloom filter compatible with DuckDB's internal join filter format. Produced filters can be used in custom join optimization paths and in zone-map-style data skipping.
Examples
SELECT bitfilters_duckdb_bloom_filter_create('v1.5.1', 16384, bitfilters_duckdb_hash('v1.5.1', key)) FROM tbl; Related Functions
bitfilters_duckdb_bloom_filter_probe
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | VARCHAR | Positional | |
col1 | BLOB | Positional | |
varargs | ANY | Positional | Varargs |
Returns
Description
Test whether a hash value might be a member of a DuckDB-compatible Bloom filter built by bitfilters_duckdb_bloom_filter_create.
Examples
SELECT bitfilters_duckdb_bloom_filter_probe('v1.5.1', filter_blob, key_col) FROM tbl; SELECT bitfilters_duckdb_bloom_filter_probe('v1.5.1', filter_blob, col1, col2) FROM tbl; Related Functions
bitfilters_duckdb_hash
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | VARCHAR | Positional | |
varargs | ANY | Positional | Varargs |
Returns
Description
Hash function compatible with DuckDB's internal Bloom filter probe layout. Use this rather than a generic hash when feeding values into the DuckDB-style Bloom filter functions.
Examples
SELECT bitfilters_duckdb_hash('v1.5.1', 42); Output
| bitfilters_duckdb_hash('v1.5.1', 42) |
|---|
| 7199933130570745587 |
SELECT bitfilters_duckdb_hash('v1.5.1', col1, col2) FROM tbl; Related Functions
quotient_filter
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | INTEGER | Positional | |
col1 | INTEGER | Positional | |
col2 | UBIGINT | BLOB 2 concrete typesBLOBUBIGINT | Positional |
Returns
Description
Aggregate that builds a Quotient filter from hashed keys. q controls slot count (2^q slots), r controls remainder bits per slot. Quotient filters support deletion and resize β use them when the underlying set evolves over time.
Examples
Build a Quotient filter per partition
CREATE TABLE filters AS
SELECT id % 2 AS remainder,
quotient_filter(16, 4, hash(id)) AS filter
FROM series_data
GROUP BY id % 2; Related Functions
quotient_filter_contains
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | BLOB | Positional | |
col1 | UBIGINT | Positional |
Returns
Description
Test whether a hash value is a member of a Quotient filter. Returns BOOLEAN β false means definitely-not-present; true means probably-present (within the filter's configured false-positive rate).
Examples
Membership test against a stored filter
SELECT quotient_filter_contains(filter, hash(12345)) AS might_exist
FROM filters WHERE remainder = 1; xor16_filter
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | UBIGINT | Positional |
Returns
Description
Aggregate that builds a 16-bit XOR filter from hashed keys. ~18 bits per key, ~0.0015% false positive rate. Use when you need a much lower FPR than xor8 and can afford 2Γ the memory.
Examples
Build an xor16 filter for low FPR
SELECT xor16_filter(hash(id)) AS filter FROM ids; Related Functions
xor16_filter_contains
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | BLOB | Positional | |
col1 | UBIGINT | Positional |
Returns
Description
Test whether a hash value is a member of a 16-bit XOR filter.
Examples
Membership test
SELECT xor16_filter_contains(filter, hash(:candidate)) FROM filter_table; Related Functions
xor8_filter
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | UBIGINT | Positional |
Returns
Description
Aggregate that builds an 8-bit XOR filter from hashed keys. ~9 bits per key, ~0.39% false positive rate. Static β built once, queried many times. ~20% smaller than Bloom for the same FPR.
Examples
Build an xor8 filter from a column of hashed IDs
SELECT xor8_filter(hash(id)) AS filter FROM ids; Related Functions
- xor8_filter_contains() β Test whether a hash value is a member of an 8-bit XOR filter
- xor16_filter() β Aggregate that builds a 16-bit XOR filter from hashed keys
- binary_fuse8_filter() β Aggregate that builds an 8-bit Binary Fuse filter from hashed keys
- quotient_filter() β Aggregate that builds a Quotient filter from hashed keys
xor8_filter_contains
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
col0 | BLOB | Positional | |
col1 | UBIGINT | Positional |
Returns
Description
Test whether a hash value is a member of an 8-bit XOR filter.
Examples
Membership test
SELECT xor8_filter_contains(filter, hash(:candidate)) FROM filter_table; Related Functions
Practical Examples
Cookbook
Real-world recipes and patterns for common use cases.
Pick the Right Filter
| Need | Pick |
|---|---|
| Static set, smallest possible memory | binary_fuse8_filter |
| Static set, very low FPR | binary_fuse16_filter or xor16_filter |
| Dynamic set with deletion / resize | quotient_filter |
| Maximum cross-system compatibility | xor8_filter |
| Plug into DuckDBβs join optimizer | bitfilters_duckdb_bloom_filter_* |
All filter constructors take UBIGINT hash values as input. Use DuckDBβs built-in hash() for general purpose, or pair with hashfuncs when you want a specific algorithm (xxh3_64, rapidhash, MurmurHash3).
Build a Filter
Filter constructors are aggregate functions β pass the hashes of every member of the set and the result is a single BLOB you can store anywhere DuckDB stores BLOBs:
-- Materialize a filter per partition
CREATE TABLE allowed_filters AS
SELECT bucket,
binary_fuse8_filter(hash(id)) AS filter
FROM allowed_ids
GROUP BY bucket;
The filter is dense and self-contained β no separate hash function metadata needed at probe time, just call the matching *_contains function with the same hash.
Pre-Filter an Expensive Join
The classic use case β avoid an expensive lookup or cross-partition join when you can probabilistically rule rows out:
-- 1. Build a filter from "interesting" keys, once
CREATE TABLE interesting_filter AS
SELECT xor8_filter(hash(key)) AS filter FROM interesting_keys;
-- 2. Probe the filter to skip the expensive operation entirely for non-members
SELECT e.*
FROM events e, interesting_filter f
WHERE xor8_filter_contains(f.filter, hash(e.key)) -- fast: maybe-member
AND e.key IN (SELECT key FROM interesting_keys); -- slow: confirm if maybe
Because filters never produce false negatives, the probe-then-verify pattern is exact β you only pay the expensive IN for rows that pass the cheap filter.
Per-Partition Data Skipping
Materialize one filter per logical partition; use it to skip whole partitions at query time:
-- Build filters at write time
CREATE TABLE partition_filters AS
SELECT day,
binary_fuse8_filter(hash(user_id)) AS users_in_partition
FROM events
GROUP BY day;
-- At query time, only scan partitions that might contain the user
SELECT day, COUNT(*) AS hits
FROM events
WHERE day IN (
SELECT day FROM partition_filters
WHERE binary_fuse8_filter_contains(users_in_partition, hash(:target_user))
)
AND user_id = :target_user
GROUP BY day;
For large fact tables this can skip 99%+ of partitions when looking for a specific user/key.
Verify the False-Positive Rate
If you want to confirm the FPR on your data, count probes that the filter says yes to but the source set doesnβt actually contain:
-- Real members
SELECT COUNT(*) AS true_positives
FROM source_set s, the_filter f
WHERE xor8_filter_contains(f.filter, hash(s.id));
-- Non-members that the filter incorrectly accepts
SELECT COUNT(*) AS false_positives
FROM unrelated_set u, the_filter f
WHERE xor8_filter_contains(f.filter, hash(u.id));
For xor8, expect ~0.39% false positives across enough probes; binary_fuse8 lower; binary_fuse16 / xor16 nearly zero.
Quotient Filter β Dynamic Sets
Quotient filters are the only family here that supports deletion and resize. The q and r parameters tune the size/FPR tradeoff:
qβ number of slot index bits. Filter holds up to ~2^q keys.rβ remainder bits per slot. Higherrβ lower FPR, more memory.
Approximate FPR β 2^(βr). At r = 4, expect ~6% FPR; at r = 8, ~0.4%.
SELECT bucket,
quotient_filter(20, 8, hash(id)) AS filter -- ~1M slots, ~0.4% FPR
FROM ids
GROUP BY bucket;
DuckDB-Compatible Bloom Filters
The bitfilters_duckdb_* family produces filters in DuckDBβs internal Bloom filter format β the same one the optimizer materializes during certain joins. Use these when you want to:
- Persist a filter and re-use it later in a join hint.
- Build per-zone Bloom filters for custom data-skipping logic.
All three functions take a constant version string (e.g. 'v1.5.1') as their first argument β bind-folded at parse time β so the produced hash and filter layout match the DuckDB build that will probe them later.
-- Build one DuckDB-shaped Bloom filter per bucket. The aggregate's middle
-- argument (16384 here) is the sector count β must be a power of two.
CREATE TABLE zone_filters AS
SELECT bucket,
bitfilters_duckdb_bloom_filter_create(
'v1.5.1',
16384,
bitfilters_duckdb_hash('v1.5.1', key)
) AS bf
FROM keys
GROUP BY bucket;
-- Probe the matching filter at scan time
SELECT *
FROM events e, zone_filters z
WHERE z.bucket = e.bucket
AND bitfilters_duckdb_bloom_filter_probe('v1.5.1', z.bf, e.key);
bitfilters_duckdb_hash and bitfilters_duckdb_bloom_filter_create produce hashes / sector layouts that match the DuckDB build named by the version string β pin it to your runtimeβs exact version.
Platform Support
Compatibility
Extension availability may vary by platform and DuckDB version. Check below to ensure this extension supports your environment before installation.
Quick Facts
Platforms
Supported platform architectures
Compiled binary sizes
| Platform | Architecture | Size |
|---|---|---|
| Linux | aarch64 | 2.72 MB |
| Linux | x86_64 | 3.08 MB |
| Linux (musl) | x86_64 | 2.92 MB |
| macOS | Apple Silicon | 1.88 MB |
| macOS | Intel | 2.19 MB |
| Windows | x86_64 | 7.55 MB |
| WASM | eh | 35.6 KB |
| WASM | mvp | 32.0 KB |
| WASM | threads | 32.3 KB |
Gzipped download size from the DuckDB community-extensions registry.
Skip Expensive Operations Probabilistically
Install Bitfilters to materialize compact, fast set-membership filters as DuckDB blobs. Pre-filter joins, dodge cache misses, skip data partitions β all with measurable false-positive bounds.