Marisa DuckDB Extension

The Marisa extension, developed by Query.Farm, adds MARISA (Matching Algorithm with Recursively Implemented StorAge) trie functionality for DuckDB. MARISA is a static and space-efficient trie data structure that enables fast string lookups, prefix searches, and predictive text operations.

DuckDB extensions are plugins that expand the core DuckDB engine with new capabilities.

Installation

marisa is a DuckDB Community Extension.

You can now use this by using this SQL:

install marisa from community;
load marisa;

Extension API

The marisa extension provides several functions for working with MARISA tries:

Creating a Trie

Use the marisa_trie() aggregate function to create a trie from string data:

CREATE TABLE employees(name TEXT);
INSERT INTO employees VALUES('Alice'), ('Bob'), ('Charlie'), ('David'), ('Eve'), ('Frank'), ('Mallory'), ('Megan'), ('Oscar'), ('Melissa');

-- Create a trie from the employee names
CREATE TABLE employees_trie AS SELECT marisa_trie(name) AS trie FROM employees;

SELECT trie, octet_length(trie) FROM employees_trie;
┌─────────────────────────────────────────────────────────────────┬────────────────────┐
│                              trie                               │ octet_length(trie) │
blob                               │       int64        │
├─────────────────────────────────────────────────────────────────┼────────────────────┤
│ We love Marisa.\x00\x08\x00\x00\x00\x00\x00\x00\x00\xFD\x1B`\…  │        4160
└─────────────────────────────────────────────────────────────────┴────────────────────┘

Lookup Function

Check if a string exists in the trie using marisa_lookup():

-- Check if 'Alice' exists in the trie (returns true)
SELECT marisa_lookup(trie, 'Alice') FROM employees_trie;
┌──────────────────────────────┐
│ marisa_lookup(trie, 'Alice') │
boolean
├──────────────────────────────┤
true
└──────────────────────────────┘

-- Check if 'Unknown' exists in the trie (returns false)
SELECT marisa_lookup(trie, 'Unknown') FROM employees_trie;
┌────────────────────────────────┐
│ marisa_lookup(trie, 'Unknown') │
boolean
├────────────────────────────────┤
false
└────────────────────────────────┘

Function Reference

marisa_trie(column)

Type: Aggregate Function

Description: Creates a MARISA trie from string values in a column.

Parameters:

  • column (VARCHAR): Column containing strings to build the trie from

Example:

SELECT marisa_trie(name) FROM employees;

marisa_lookup(trie, search_string)

Type: Scalar Function

Description: Checks if a string exists in the trie.

Parameters:

  • trie (BLOB): The trie created by marisa_trie()
  • search_string (VARCHAR): String to search for

Returns: BOOLEAN (true if found, false otherwise)

Example:

SELECT marisa_lookup(trie, 'Alice') FROM employees_trie;

marisa_common_prefix(trie, search_string, max_results)

Type: Scalar Function

Description: Finds all strings in the trie that are prefixes of the search string.

Parameters:

  • trie (BLOB): The trie created by marisa_trie()
  • search_string (VARCHAR): String to find prefixes for
  • max_results (INTEGER): Maximum number of results to return

Returns: LIST(VARCHAR) - List of prefix matches

Example:

SELECT marisa_common_prefix(trie, 'USA', 10) FROM countries_trie;
-- Returns: ['U', 'US', 'USA']

marisa_predictive(trie, prefix, max_results)

Type: Scalar Function

Description: Finds all strings in the trie that start with the given prefix.

Parameters: - trie (BLOB): The trie created by marisa_trie() - prefix (VARCHAR): Prefix to search for - max_results (INTEGER): Maximum number of results to return

Returns: LIST(VARCHAR) - List of strings starting with the prefix

Example:

SELECT marisa_predictive(trie, 'Me', 10) FROM employees_trie;
-- Returns: ['Megan', 'Melissa']

Love ❤️ this DuckDB extension? You’ll Love This.

Get the best from Query.Farm — smart tips, powerful tools, and project updates sent directly to your inbox, but only when we’ve got something great to share.