Bitfilters DuckDB Extension

A high-performance DuckDB extension providing probabilistic data structures for fast set membership testing and approximate duplicate detection. This extension implements state-of-the-art filter algorithms including Quotient filters, XOR filters, Binary Fuse filters, and soon Bloom filters.

bitfilters provides space-efficient probabilistic data structures that can answer β€œIs element X in set S?” with:

You may find it useful to use this module in combination with the hashfuncs extension to produce the hash outputs that these filters require. Most filters require a UBIGINT or unsigned 64-bit integer value as their input type. You can use the functions provided by hashfuncs or the DuckDB hash() function to produce those values from most DuckDB data types.

Installation

bitfilters is a DuckDB Community Extension.

You can install and use this extension with the following SQL commands:

INSTALL bitfilters FROM community;
LOAD bitfilters;

For more information about DuckDB extensions, see the official documentation.

What are bitfilters?

Bitfilters are probabilistic data structures that provide fast, memory-efficient approximate set membership testing. They are designed to answer the question β€œIs element X in set S?” with:

  • Space efficiency: Use significantly less memory than storing the actual set
  • Speed: Extremely fast lookups (typically O(1) or O(k) where k is small)
  • No false negatives: If a filter says β€œNO”, the element is definitely not in the set
  • Possible false positives: If a filter says β€œYES”, the element might be in the set

Common Use Cases

  • Pre-filtering expensive operations: Avoid costly disk I/O or network calls for non-existent data
  • Duplicate detection: Quickly identify potential duplicates in large datasets
  • Cache optimization: Determine if data might be in cache before expensive lookups
  • Data skipping: Skip irrelevant data partitions in analytical queries
  • Set operations: Approximate set intersections and unions on massive datasets
  • Database join optimization: Pre-filter join candidates to reduce computation

Performance Benefits

-- Without bitfilters: Expensive operation on every row
SELECT expensive_function(data)
FROM large_table
WHERE complex_condition(data);

-- With bitfilters: Pre-filter to reduce expensive operations by 90%+
SELECT expensive_function(lt.data)
FROM large_table lt
JOIN precomputed_filters pf ON lt.partition = pf.partition
WHERE filter_contains(pf.filter, lt.key)  -- Fast filter check
  AND complex_condition(lt.data);         -- Expensive check only when needed

Available Filters

1. Quotient Filters

Space-efficient filters that support deletion and resizing operations. Learn more about Quotient Filters.

Functions:

  • quotient_filter(q, r, hash_value) - Create filter from hash values
  • quotient_filter_contains(filter, hash_value) - Test membership

Characteristics:

  • Use case: Dynamic datasets, applications requiring deletion
  • Pros: Supports deletion, better cache locality, resizable
  • Cons: More complex implementation, slightly slower than Bloom filters
  • Memory: Similar to Bloom filters but with better cache performance
-- Create a quotient filter with q=16 and r=4
CREATE TABLE quotient_filters AS (
    SELECT id % 2 AS remainder,
           quotient_filter(16, 4, hash(id)) AS filter
    FROM series_data
    GROUP BY id % 2
);

-- Test membership
SELECT quotient_filter_contains(filter, hash(12345)) AS might_exist
FROM quotient_filters WHERE remainder = 1;

2. XOR Filters

Modern high-performance filters with better space efficiency than Bloom filters. Read the XOR Filters paper for technical details.

Functions:

  • xor16_filter(hash_value) - Create 16-bit XOR filter from hash values
  • xor8_filter(hash_value) - Create 8-bit XOR filter from hash values
  • xor16_filter_contains(filter, hash_value) - Test membership in 16-bit filter
  • xor8_filter_contains(filter, hash_value) - Test membership in 8-bit filter

Characteristics:

  • Use case: Read-heavy workloads, static datasets
  • Pros: Better space efficiency (~20% less memory), faster queries
  • Cons: Static size, more complex construction, no incremental updates
  • Memory: ~1.23 bits per element for 1% false positive rate
-- Create XOR filters (both 8-bit and 16-bit versions)
CREATE TABLE xor_filters AS (
    SELECT id % 2 AS remainder,
           xor16_filter(hash(id)) AS xor16_filter,
           xor8_filter(hash(id)) AS xor8_filter
    FROM series_data
    GROUP BY id % 2
);

-- Test membership
SELECT
    xor16_filter_contains(xor16_filter, hash(12345)) AS in_xor16,
    xor8_filter_contains(xor8_filter, hash(12345)) AS in_xor8
FROM xor_filters WHERE remainder = 1;

3. Binary Fuse Filters

Latest generation filters with optimal space usage and excellent performance. See the Binary Fuse Filters paper for implementation details.

Functions:

  • binary_fuse16_filter(hash_value) - Create 16-bit Binary Fuse filter from hash values
  • binary_fuse8_filter(hash_value) - Create 8-bit Binary Fuse filter from hash values
  • binary_fuse16_filter_contains(filter, hash_value) - Test membership in 16-bit filter
  • binary_fuse8_filter_contains(filter, hash_value) - Test membership in 8-bit filter

Characteristics:

  • Use case: Applications requiring minimal memory footprint
  • Pros: Best space efficiency, fast construction and queries
  • Cons: Static size, newer algorithm with less production experience
  • Memory: Significantly more space-efficient than other filters
-- Create Binary Fuse filters (both 8-bit and 16-bit versions)
CREATE TABLE binary_fuse_filters AS (
    SELECT id % 2 AS remainder,
           binary_fuse16_filter(hash(id)) AS binary_fuse16_filter,
           binary_fuse8_filter(hash(id)) AS binary_fuse8_filter
    FROM series_data
    GROUP BY id % 2
);

-- Test membership
SELECT
    binary_fuse16_filter_contains(binary_fuse16_filter, hash(12345)) AS in_fuse16,
    binary_fuse8_filter_contains(binary_fuse8_filter, hash(12345)) AS in_fuse8
FROM binary_fuse_filters WHERE remainder = 1;

Usage Examples

Basic Filter Operations

-- Create test data
CREATE TABLE series_data AS (
    SELECT * AS id FROM generate_series(1, 100000) AS id
);

-- Create quotient filters with q=16 and r=4
CREATE TABLE quotient_filters AS (
    SELECT id % 2 AS remainder,
           quotient_filter(16, 4, hash(id)) AS filter
    FROM series_data
    GROUP BY id % 2
);

-- Test membership (should find all matching elements)
SELECT remainder,
       COUNT(CASE WHEN quotient_filter_contains(filter, hash(id)) THEN 1 ELSE NULL END) AS matches
FROM series_data, quotient_filters
WHERE series_data.id % 2 = quotient_filters.remainder
GROUP BY remainder;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ remainder β”‚ matches β”‚
β”‚   int64   β”‚  int64  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚         0 β”‚   50000 β”‚
β”‚         1 β”‚   50000 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

-- Check false positives (elements not in the filter that test positive)
SELECT remainder,
       COUNT(CASE WHEN quotient_filter_contains(filter, hash(id)) THEN 1 ELSE NULL END) AS false_positives
FROM series_data, quotient_filters
WHERE series_data.id % 2 != quotient_filters.remainder
GROUP BY remainder;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ remainder β”‚ false_positives β”‚
β”‚   int64   β”‚      int64      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚         0 β”‚            2264 β”‚
β”‚         1 β”‚            2273 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

XOR Filter Examples

-- Create XOR filters (both 8-bit and 16-bit)
CREATE TABLE xor_filters AS (
    SELECT id % 2 AS remainder,
           xor16_filter(hash(id)) AS xor16_filter,
           xor8_filter(hash(id)) AS xor8_filter
    FROM series_data
    GROUP BY id % 2
);

-- Verify all elements are found (no false negatives)
SELECT remainder,
       COUNT(CASE WHEN xor16_filter_contains(xor16_filter, hash(id)) THEN 1 ELSE NULL END) AS xor16_matches,
       COUNT(CASE WHEN xor8_filter_contains(xor8_filter, hash(id)) THEN 1 ELSE NULL END) AS xor8_matches
FROM series_data, xor_filters
WHERE series_data.id % 2 = xor_filters.remainder
GROUP BY remainder;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ remainder β”‚ xor16_matches β”‚ xor8_matches β”‚
β”‚   int64   β”‚     int64     β”‚    int64     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚         0 β”‚         50000 β”‚        50000 β”‚
β”‚         1 β”‚         50000 β”‚        50000 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

-- Compare filter performance
SELECT
    'XOR16' AS filter_type,
    octet_length(xor16_filter) AS size_bytes
FROM xor_filters WHERE remainder = 0
UNION ALL
SELECT
    'XOR8' AS filter_type,
    octet_length(xor8_filter) AS size_bytes
FROM xor_filters WHERE remainder = 0;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ filter_type β”‚ size_bytes β”‚
β”‚   varchar   β”‚   int64    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ XOR16       β”‚     123076 β”‚
β”‚ XOR8        β”‚      61546 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Binary Fuse Filter Examples

-- Create Binary Fuse filters
CREATE TABLE binary_fuse_filters AS (
    SELECT id % 2 AS remainder,
           binary_fuse16_filter(hash(id)) AS binary_fuse16_filter,
           binary_fuse8_filter(hash(id)) AS binary_fuse8_filter
    FROM series_data
    GROUP BY id % 2
);

-- Test all elements are found
SELECT remainder,
       COUNT(CASE WHEN binary_fuse16_filter_contains(binary_fuse16_filter, hash(id)) THEN 1 ELSE NULL END) AS fuse16_matches,
       COUNT(CASE WHEN binary_fuse8_filter_contains(binary_fuse8_filter, hash(id)) THEN 1 ELSE NULL END) AS fuse8_matches
FROM series_data, binary_fuse_filters
WHERE series_data.id % 2 = binary_fuse_filters.remainder
GROUP BY remainder;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ remainder β”‚ fuse16_matches β”‚ fuse8_matches β”‚
β”‚   int64   β”‚     int64      β”‚     int64     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚         0 β”‚          50000 β”‚         50000 β”‚
β”‚         1 β”‚          50000 β”‚         50000 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

-- Check false positive rates
SELECT remainder,
       COUNT(CASE WHEN binary_fuse16_filter_contains(binary_fuse16_filter, hash(id)) THEN 1 ELSE NULL END) AS fuse16_false_positives,
       COUNT(CASE WHEN binary_fuse8_filter_contains(binary_fuse8_filter, hash(id)) THEN 1 ELSE NULL END) AS fuse8_false_positives
FROM series_data, binary_fuse_filters
WHERE series_data.id % 2 != binary_fuse_filters.remainder
GROUP BY remainder;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ remainder β”‚ fuse16_false_positives β”‚ fuse8_false_positives β”‚
β”‚   int64   β”‚         int64          β”‚         int64         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚         0 β”‚                      1 β”‚                   171 β”‚
β”‚         1 β”‚                      1 β”‚                   199 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Practical Use Case: User Activity Tracking

-- Track active users by day using quotient filters
CREATE TABLE user_activity AS (
    SELECT
        (CURRENT_DATE - (random() * 30)::INTEGER) AS activity_date,
        (random() * 1000000)::INTEGER AS user_id
    FROM generate_series(1, 5000000)
);

-- Create daily quotient filters for active users
CREATE TABLE daily_user_filters AS (
    SELECT
        activity_date,
        quotient_filter(16, 4, hash(user_id)) AS user_filter,
        COUNT(DISTINCT user_id) AS actual_unique_users
    FROM user_activity
    GROUP BY activity_date
);

-- Check if specific users were active on specific dates
SELECT
    activity_date,
    quotient_filter_contains(user_filter, hash(12345)) AS user_12345_active,
    quotient_filter_contains(user_filter, hash(67890)) AS user_67890_active,
    actual_unique_users
FROM daily_user_filters
ORDER BY activity_date;

-- Find days when specific users might have been active
WITH target_users AS (
    SELECT unnest([12345, 67890, 11111, 99999]) AS user_id
)
SELECT
    tu.user_id,
    duf.activity_date,
    quotient_filter_contains(duf.user_filter, hash(tu.user_id)) AS possibly_active,
    EXISTS(
        SELECT 1 FROM user_activity ua
        WHERE ua.user_id = tu.user_id
        AND ua.activity_date = duf.activity_date
    ) AS actually_active
FROM target_users tu
CROSS JOIN daily_user_filters duf
WHERE quotient_filter_contains(duf.user_filter, hash(tu.user_id))
ORDER BY tu.user_id, duf.activity_date;

Filter Comparison Example

-- Compare all filter types side by side
WITH sample_data AS (
    SELECT hash(id) AS hash_value
    FROM generate_series(1, 1000000) AS id
),
all_filters AS (
    SELECT
        quotient_filter(20, 4, hash_value) AS qf,
        xor16_filter(hash_value) AS xor16,
        xor8_filter(hash_value) AS xor8,
        binary_fuse16_filter(hash_value) AS bf16,
        binary_fuse8_filter(hash_value) AS bf8
    FROM sample_data
)
SELECT
    'Quotient Filter' AS filter_type,
    octet_length(qf) AS size_bytes,
    octet_length(qf) / 1000000.0 AS bytes_per_element
FROM all_filters
UNION ALL
SELECT
    'XOR16 Filter',
    octet_length(xor16),
    octet_length(xor16) / 1000000.0
FROM all_filters
UNION ALL
SELECT
    'XOR8 Filter',
    octet_length(xor8),
    octet_length(xor8) / 1000000.0
FROM all_filters
UNION ALL
SELECT
    'Binary Fuse16 Filter',
    octet_length(bf16),
    octet_length(bf16) / 1000000.0
FROM all_filters
UNION ALL
SELECT
    'Binary Fuse8 Filter',
    octet_length(bf8),
    octet_length(bf8) / 1000000.0
FROM all_filters;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     filter_type      β”‚ size_bytes β”‚ bytes_per_element β”‚
β”‚       varchar        β”‚   int64    β”‚      double       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Quotient Filter      β”‚     917544 β”‚          0.917544 β”‚
β”‚ XOR16 Filter         β”‚    2460076 β”‚          2.460076 β”‚
β”‚ XOR8 Filter          β”‚    1230046 β”‚          1.230046 β”‚
β”‚ Binary Fuse16 Filter β”‚    2261024 β”‚          2.261024 β”‚
β”‚ Binary Fuse8 Filter  β”‚    1130524 β”‚          1.130524 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Best Practices

βœ… Do’s

  • Use a hash function to create consistent hash values for filter operations
  • Store filters in materialized views for reuse across queries
  • Combine filters with exact checks for final results
  • Monitor actual false positive rates in production
  • Choose appropriate bit-width (8 vs 16) based on memory/accuracy tradeoffs

❌ Don’ts

  • Don’t rely on filters for exact results without confirmation
  • Don’t mix hash values from different sources in the same filter
  • Don’t rebuild filters frequently for dynamic data
  • Don’t use filters for very small datasets (overhead not worth it)
  • Don’t forget that quotient filter parameters (q, r) affect capacity and accuracy
-- Good: Use filter to pre-screen, then exact check
SELECT expensive_operation(data)
FROM large_table lt
JOIN precomputed_filter pf ON lt.partition = pf.partition
WHERE quotient_filter_contains(pf.filter, hash(lt.key))  -- Fast pre-filter
  AND exact_expensive_condition(lt.data);                -- Exact check

-- Bad: Using filter as final arbiter
SELECT data
FROM large_table lt
JOIN precomputed_filter pf ON lt.partition = pf.partition
WHERE quotient_filter_contains(pf.filter, hash(lt.key)); -- May have false positives!

API Reference

Quotient Filter Functions

quotient_filter(q, r, hash_value)

Creates a quotient filter with 2^q slots and r remainder bits.

Parameters:
  • q (INTEGER): Log2 of the number of slots (capacity = 2^q)
  • r (INTEGER): Number of remainder bits (affects accuracy)
  • hash_value (UBIGINT): Hash values to insert into the filter
Returns:

BLOB containing the serialized quotient filter

Example
-- Create filter with 2^16 = 65536 slots and 4 remainder bits
SELECT quotient_filter(16, 4, hash(user_id)) FROM users;

quotient_filter_contains(filter, hash_value)

Tests if a quotient filter may contain a hash value.

Parameters:
  • filter (BLOB): Serialized quotient filter
  • hash_value (UBIGINT): Hash value to test
Returns:

BOOLEAN

  • true: Hash value might be in the set (possible false positive)
  • false: Hash value is definitely not in the set (no false negatives)

XOR Filter Functions

xor16_filter(hash_value) / xor8_filter(hash_value)

Creates XOR filters with 16-bit or 8-bit fingerprints.

Parameters:
  • hash_value (UBIGINT): Hash values to insert into the filter
Returns:

BLOB containing the serialized XOR filter

Example:
SELECT xor16_filter(hash(product_id)) FROM products;
SELECT xor8_filter(hash(product_id)) FROM products;  -- Smaller but higher FPR

xor16_filter_contains(filter, hash_value) / xor8_filter_contains(filter, hash_value)

Tests if an XOR filter may contain a hash value.

Parameters:
  • filter (BLOB): Serialized XOR filter
  • hash_value (UBIGINT): Hash value to test
Returns:

BOOLEAN (same semantics as quotient filters)


Binary Fuse Filter Functions

binary_fuse16_filter(hash_value) / binary_fuse8_filter(hash_value)

Creates Binary Fuse filters with 16-bit or 8-bit fingerprints.

Parameters:
  • hash_value (UBIGINT): Hash values to insert into the filter
Returns:

BLOB containing the serialized Binary Fuse filter

Example:
SELECT binary_fuse16_filter(hash(transaction_id)) FROM transactions;
SELECT binary_fuse8_filter(hash(transaction_id)) FROM transactions;

binary_fuse16_filter_contains(filter, hash_value) / binary_fuse8_filter_contains(filter, hash_value)

Tests if a Binary Fuse filter may contain a hash value.

Parameters:
  • filter (BLOB): Serialized Binary Fuse filter
  • hash_value (BIGINT): Hash value to test
Returns:

BOOLEAN (same semantics as other filters)

Filter Characteristics Summary

Filter Type Bits per Element False Positive Rate Notes
Quotient Variable Depends on q,r Supports deletion
XOR16 ~9-10 bits ~0.4% High performance
XOR8 ~9-10 bits ~0.4% Smaller size
Binary Fuse16 ~9-10 bits ~0.4% Best space efficiency
Binary Fuse8 ~9-10 bits ~1.5% Smallest size

Integration with Other Extensions

Using with hashfuncs Extension

The hashfuncs extension provides additional hash functions that can improve filter performance and distribution.

-- Load both extensions
LOAD hashfuncs;
LOAD bitfilters;

-- Use specialized hash functions for better distribution
SELECT quotient_filter(16, 4, xxh64(complex_key || salt))
FROM my_table;

Hash Function Recommendations

For optimal performance, consider these hash functions from the hashfuncs extension:

Limitations

  1. Hash-based Input: All filters require hash values as input
  2. Static Size: XOR and Binary Fuse filters cannot be resized after creation
  3. No Direct Deletion: Only quotient filters support element removal
  4. False Positives: All filters may return false positives but never false negatives
  5. Type Consistency: Hash values must be consistent between creation and testing
  6. Memory Usage: Larger filters provide better accuracy but use more memory

Performance Characteristics

For detailed performance analysis, see the respective research papers: Quotient Filters, XOR Filters, and Binary Fuse Filters.

Operation Quotient Filter XOR Filter Binary Fuse Filter
Construction O(n) O(n) O(n)
Query O(1) average O(1) O(1)
Space Variable ~9.84 bits/element ~9.1 bits/element
False Positive Rate Configurable ~0.39% 8-bit: ~1.56%, 16-bit: ~0.39%
Supports Deletion βœ… ❌ ❌
Resizable βœ… ❌ ❌

License

MIT Licensed