Marisa DuckDB Extension
The Marisa extension, developed by Query.Farm, adds MARISA (Matching Algorithm with Recursively Implemented StorAge) trie functionality for DuckDB. MARISA is a static and space-efficient trie data structure that enables fast string lookups, prefix searches, and predictive text operations.
DuckDB extensions are plugins that expand the core DuckDB engine with new capabilities.
Installation
marisa
is a DuckDB Community Extension.
You can now use this by using this SQL:
from community;
install marisa load marisa;
Extension API
The marisa
extension provides several functions for working with MARISA tries:
Creating a Trie
Use the marisa_trie()
aggregate function to create a trie from string data:
CREATE TABLE employees(name TEXT);
INSERT INTO employees VALUES('Alice'), ('Bob'), ('Charlie'), ('David'), ('Eve'), ('Frank'), ('Mallory'), ('Megan'), ('Oscar'), ('Melissa');
-- Create a trie from the employee names
CREATE TABLE employees_trie AS SELECT marisa_trie(name) AS trie FROM employees;
SELECT trie, octet_length(trie) FROM employees_trie;
┌─────────────────────────────────────────────────────────────────┬────────────────────┐
│ trie │ octet_length(trie) │blob │ int64 │
│
├─────────────────────────────────────────────────────────────────┼────────────────────┤4160 │
│ We love Marisa.\x00\x08\x00\x00\x00\x00\x00\x00\x00\xFD\x1B`\… │ └─────────────────────────────────────────────────────────────────┴────────────────────┘
Lookup Function
Check if a string exists in the trie using marisa_lookup()
:
-- Check if 'Alice' exists in the trie (returns true)
SELECT marisa_lookup(trie, 'Alice') FROM employees_trie;
┌──────────────────────────────┐'Alice') │
│ marisa_lookup(trie, boolean │
│
├──────────────────────────────┤true │
│
└──────────────────────────────┘
-- Check if 'Unknown' exists in the trie (returns false)
SELECT marisa_lookup(trie, 'Unknown') FROM employees_trie;
┌────────────────────────────────┐'Unknown') │
│ marisa_lookup(trie, boolean │
│
├────────────────────────────────┤false │
│ └────────────────────────────────┘
Common Prefix Search
Find all strings in the trie that are prefixes of a given string using marisa_common_prefix()
:
CREATE TABLE countries(name TEXT);
INSERT INTO countries VALUES ('U'), ('US'), ('USA');
CREATE TABLE countries_trie AS SELECT marisa_trie(name) AS trie FROM countries;
-- Find all prefixes of 'USA' (returns ['U', 'US', 'USA'])
SELECT marisa_common_prefix(trie, 'USA', 10) FROM countries_trie;
┌───────────────────────────────────────┐'USA', 10) │
│ marisa_common_prefix(trie, varchar[] │
│
├───────────────────────────────────────┤
│ [U, US, USA] │ └───────────────────────────────────────┘
Predictive Search
Find all strings in the trie that start with a given prefix using marisa_predictive()
:
-- Find all names starting with 'Me' (returns ['Megan', 'Melissa'])
SELECT marisa_predictive(trie, 'Me', 10) FROM employees_trie;
┌───────────────────────────────────┐'Me', 10) │
│ marisa_predictive(trie, varchar[] │
│
├───────────────────────────────────┤
│ [Megan, Melissa] │ └───────────────────────────────────┘
Function Reference
marisa_trie(column)
Type: Aggregate Function
Description: Creates a MARISA trie from string values in a column.
Parameters:
column
(VARCHAR): Column containing strings to build the trie from
Example:
SELECT marisa_trie(name) FROM employees;
marisa_lookup(trie, search_string)
Type: Scalar Function
Description: Checks if a string exists in the trie.
Parameters:
trie
(BLOB): The trie created bymarisa_trie()
search_string
(VARCHAR): String to search for
Returns: BOOLEAN (true if found, false otherwise)
Example:
SELECT marisa_lookup(trie, 'Alice') FROM employees_trie;
marisa_common_prefix(trie, search_string, max_results)
Type: Scalar Function
Description: Finds all strings in the trie that are prefixes of the search string.
Parameters:
trie
(BLOB): The trie created bymarisa_trie()
search_string
(VARCHAR): String to find prefixes formax_results
(INTEGER): Maximum number of results to return
Returns: LIST(VARCHAR) - List of prefix matches
Example:
SELECT marisa_common_prefix(trie, 'USA', 10) FROM countries_trie;
-- Returns: ['U', 'US', 'USA']
marisa_predictive(trie, prefix, max_results)
Type: Scalar Function
Description: Finds all strings in the trie that start with the given prefix.
Parameters: - trie
(BLOB): The trie created by marisa_trie()
- prefix
(VARCHAR): Prefix to search for - max_results
(INTEGER): Maximum number of results to return
Returns: LIST(VARCHAR) - List of strings starting with the prefix
Example:
SELECT marisa_predictive(trie, 'Me', 10) FROM employees_trie;
-- Returns: ['Megan', 'Melissa']
Love ❤️ this DuckDB extension? You’ll Love This.
Get the best from Query.Farm — smart tips, powerful tools, and project updates sent directly to your inbox, but only when we’ve got something great to share.