Lindel DuckDB Extension

The Lindel extension, developed by Query.Farm, brings advanced spatial indexing capabilities to DuckDB by enabling linearization and delinearization of multi-dimensional data using space-filling curvesβ€”specifically, Hilbert, and Morton/Z-Order curves. These curves transform multi-dimensional coordinates into a single-dimensional value while preserving spatial locality, which improves data clustering, indexing, and performance for range queries and joins on multi-column keys.

DuckDB extensions are plugins that expand the core DuckDB engine with new capabilities.

Getting Started

Lindel is a DuckDB community extension maintained and supported by Query.Farm.

Install Lindel in DuckDB by running:

INSTALL lindel FROM community;

Then load it with:

LOAD lindel;

Functionality

Linearization maps multi-dimensional data into a one-dimensional sequence while preserving locality, enhancing the efficiency of data structures and algorithms for spatial data, such as in databases, GIS, and memory caches.

β€œThe principle of locality states that programs tend to reuse data and instructions they have used recently.”

In SQL, sorting by a single column (e.g., time or identifier) is often sufficient, but sometimes queries involve multiple fields, such as:

  • Time and identifier (historical trading data)
  • Latitude and Longitude (GIS applications)
  • Latitude, Longitude, and Altitude (flight tracking)
  • Latitude, Longitude, Altitude, and Time (flight history)

Sorting by a single field isn’t optimal for multi-field queries. Linearization maps multiple fields into a single value, while preserving localityβ€”meaning values close in the original representation remain close in the mapped representation.

Where has this been used before?

DataBricks has long supported Z-Ordering (they also now default to using the Hilbert curve for the ordering). This video explains how Delta Lake queries are faster when the data is Z-Ordered. This extension also allows DuckDB to write files with the same ordering optimization.

Numerous articles describe the benefits of applying a Z-Ordering/Hilbert ordering to data for query performance.

Your particular performance improvements will vary, but for some query patterns Z-Ordering and Hilbert ordering will make quite a big difference.

When would I use this?

For query patterns across multiple numeric or short text columns, consider sorting rows using Hilbert encoding when storing data in Parquet:

COPY (
  SELECT * FROM 'source.csv'
  order by
  hilbert_encode([source_data.time, source_data.symbol_id]::integer[2])
)
TO 'example.parquet' (FORMAT PARQUET)

-- or if dealing with latitude and longitude

COPY (
  SELECT * FROM 'source.csv'
  order by
  hilbert_encode([source_data.lat, source_data.lon]::double[2])
) TO 'example.parquet' (FORMAT PARQUET)

The Parquet file format stores statistics for each row group. Since rows are sorted with locality into these row groups the query execution may be able to skip row groups that contain no relevant rows, leading to faster query execution times.

Encoding Types

This extension offers two different encoding types, Hilbert and Morton encoding.

Hilbert Encoding

Hilbert encoding uses the Hilbert curve, a continuous fractal space-filling curve named after David Hilbert. It rearranges coordinates based on the Hilbert curve’s path, preserving spatial locality better than Morton encoding.

This video is good explaination of the [Hilbert curve]:

Morton Encoding (Z-order Curve)

Morton encoding, also known as the Z-order curve, interleaves the binary representations of coordinates into a single integer. It is named after Glenn K. Morton.

Locality: Hilbert encoding generally preserves locality better than Morton encoding, making it preferable for applications where spatial proximity matters.

Encoded Output is limited to a 128-bit UHUGEINT. The input array size is validated to ensure it fits within this limit.

Input Type Maximum Number of Elements Output Type (depends on number of elements)
UTINYINT 16 1: UTINYINT
2: USMALLINT
3-4: UINTEGER
4-8: UBIGINT
8-16: UHUGEINT
USMALLINT 8 1: USMALLINT
2: UINTEGER
3-4: UBIGINT
4-8: UHUGEINT
UINTEGER 4 1: UINTEGER
2: UBIGINT
3-4: UHUGEINT
UBIGINT 2 1: UBIGINT
2: UHUGEINT
FLOAT 4 1: UINTEGER
2: UBIGINT
3-4: UHUGEINT
DOUBLE 2 1: UBIGINT
2: UHUGEINT

Love ❀️ this DuckDB extension? You’ll Love This.

Get the best from Query.Farm β€” smart tips, powerful tools, and project updates sent directly to your inbox, but only when we’ve got something great to share.

API

The extension adds these functions:

  • hilbert_encode
  • hilbert_decode
  • morton_encode
  • morton_decode

The encode functions have the signature of

hilbert_encode(ANY[]) -> ANY
morton_encode(ANY[]) -> ANY

The input argument should be an array of any of the numeric types above, while respecting the maximum number of elements.

The decode functions reverse that encoding process, their arguments are:

hilbert_decode(UTINYINT | USMALLINT | UBIGINT | UHUGEINT, UTINYINT, BOOLEAN, BOOLEAN) -> ANY
morton_decode(UTINYINT | USMALLINT | UBIGINT | UHUGEINT, UTINYINT, BOOLEAN, BOOLEAN) -> ANY

The arguments of the decode function are:

  • The input value (one of UTINYINT, USMALLINT, UBIGINT or UHUGEINT)
  • The number of elements to decode
  • A boolean flag indicating the value should be returned as a float
  • A boolean flag indicating the value should be returned unsigned

Examples

SELECT hilbert_encode([1, 2, 3]::tinyint[3]) as encoded;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ encoded β”‚
β”‚ uint32  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   22    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

SELECT hilbert_decode(22::uinteger, 3, false, false) as decoded;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  decoded   β”‚
β”‚ tinyint[3] β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ [1, 2, 3]  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

For floats:

SELECT hilbert_encode([1.5, 2.5, 3.5]::float[3]) as encoded;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           encoded            β”‚
β”‚           uint128            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 8002398169070261008685072384 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

SELECT hilbert_decode(hilbert_encode([1.5, 2.5, 3.5]::float[3]), 3, true, false) as decoded;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     encoded     β”‚
β”‚    float[3]     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ [1.5, 2.5, 3.5] β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

SELECT morton_encode([1.5, 2.5, 3.5]::float[3]) as encoded;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           encoded            β”‚
β”‚           uint128            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 4421214485312321218706145280 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

 SELECT morton_decode(morton_encode([1.5, 2.5, 3.5]::float[3]), 3, true, false) as decoded;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     decoded     β”‚
β”‚    float[3]     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ [1.5, 2.5, 3.5] β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

An additional showing the difference between Hilbert and Morton encoding:

with elements as (
  select * as id from range(3)
)
select
  a.id as a,
  b.id as b,
  hilbert_encode([a.id, b.id]::tinyint[2]) as hilbert,
  morton_encode([a.id, b.id]::tinyint[2]) as morton,
  hilbert_decode(hilbert_encode([a.id, b.id]::tinyint[2]), 2, false, false) as hilbert_decoded,
  morton_decode(morton_encode([a.id, b.id]::tinyint[2]), 2, false, false) as morton_decoded,
  from
elements as a cross join elements as b order by a, b;
β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   a   β”‚   b   β”‚ hilbert β”‚ morton β”‚ hilbert_decoded β”‚ morton_decoded β”‚
β”‚ int64 β”‚ int64 β”‚ uint16  β”‚ uint16 β”‚   tinyint[2]    β”‚   tinyint[2]   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚     0 β”‚     0 β”‚       0 β”‚      0 β”‚ [0, 0]          β”‚ [0, 0]         β”‚
β”‚     0 β”‚     1 β”‚       3 β”‚      1 β”‚ [0, 1]          β”‚ [0, 1]         β”‚
β”‚     0 β”‚     2 β”‚       4 β”‚      4 β”‚ [0, 2]          β”‚ [0, 2]         β”‚
β”‚     1 β”‚     0 β”‚       1 β”‚      2 β”‚ [1, 0]          β”‚ [1, 0]         β”‚
β”‚     1 β”‚     1 β”‚       2 β”‚      3 β”‚ [1, 1]          β”‚ [1, 1]         β”‚
β”‚     1 β”‚     2 β”‚       7 β”‚      6 β”‚ [1, 2]          β”‚ [1, 2]         β”‚
β”‚     2 β”‚     0 β”‚      14 β”‚      8 β”‚ [2, 0]          β”‚ [2, 0]         β”‚
β”‚     2 β”‚     1 β”‚      13 β”‚      9 β”‚ [2, 1]          β”‚ [2, 1]         β”‚
β”‚     2 β”‚     2 β”‚       8 β”‚     12 β”‚ [2, 2]          β”‚ [2, 2]         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜