JSON Schema
JSON Schema validation and default-value enrichment in SQL. Validate that JSON columns conform to a schema, lint the schema itself, generate JSON-Patch operations that fill in missing defaults, and apply those defaults in place. Targets JSON Schema draft-07.
Install
-- Install the extension
INSTALL json_schema FROM community;
-- Load it into your session
LOAD json_schema;
-- Validate a JSON document against a schema
SELECT json_schema_validate(:schema, payload) AS ok
FROM events;
-- Verify the schema itself is well-formed
SELECT json_schema_validate_schema(:schema) AS schema_ok;
-- Compute a JSON-Patch that fills in missing defaults
SELECT json_schema_patch(:schema, payload) AS patch FROM events;
-- Or apply the defaults inline
SELECT json_schema_update(:schema, payload) AS enriched FROM events; Technical Overview
Why Use JSON Schema from DuckDB?
Validate JSON columns against a JSON Schema contract directly in SQL β no Pydantic / ajv / jsonschema-Python in a sidecar. Best fit: ingest-time data-quality gates, ETL contract checks, and one-query enrichment that fills in declared default values. Built on the pboettch/json-schema-validator C++ library; the validator's primary target is JSON Schema draft-07.
π What this extension is for
JSON Schema is the de facto contract language for JSON payloads. This extension makes those contracts executable from inside DuckDB so you don't need a separate Python or Node.js step to enforce them.
- β’ Ingest-time data quality: Filter rows that don't satisfy the schema before they reach a clean table β
json_schema_validatereturnsBOOLEAN, so it drops straight into aWHEREclause. - β’ Lint your contracts: Run
json_schema_validate_schemaagainst the schema itself before deploying. Catches typos in the schema before they cause every row to fail validation. - β’ Default-value enrichment: Use
json_schema_patchto compute the JSON Patch (RFC 6902) operations that bring an incomplete document into conformance with the schema's declareddefaultvalues, orjson_schema_updateto apply them inline. - β’ Audits and dashboards: Aggregate
valid/invalidcounts over a column of JSON in one query β much simpler than pulling rows out to a Python validator and back.
βοΈ How it works
Internally, this is a thin DuckDB scalar-function wrapper around pboettch/json-schema-validator and nlohmann/json. Both schema and data are passed as DuckDB JSON values. There's no ATTACH, no secret type, no network β pure local validation against an in-query schema string.
- β’ Four scalar functions:
json_schema_validateandjson_schema_validate_schemareturnBOOLEAN;json_schema_patchandjson_schema_updatereturnJSON. All four take their schema and data inline β no preregistration step. - β’ Draft compatibility: Primary target is JSON Schema draft-07. The
$schemakeyword in your document is honored where the underlying validator supports it; for newer drafts (2019-09 / 2020-12), test the keywords you actually use rather than assuming full coverage. - β’ Schema features in scope: Type validation (
string,number,integer,boolean,array,object,null),required,properties, numericminimum/maximum/multipleOf, stringminLength/maxLength/pattern/format, arrayminItems/maxItems/uniqueItems, and thedefaultkeyword used byjson_schema_patch/json_schema_update. - β’ Pairs with DuckDB JSON: Combine with DuckDB's JSON support β
read_json_auto,JSONtype,json_extract_*β so the validator slots into the existing JSON pipeline rather than replacing it.
π‘οΈ What to know before relying on it
Honest scoping for production use. The extension is small and focused; it doesn't try to be every JSON-Schema validator out there.
- β’ Boolean validation, not error reports:
json_schema_validatereturnsTRUE/FALSE. It does not return a structured list of which keywords or paths failed β for that level of diagnostic detail, validate at the application layer with ajv / Pydantic / jsonschema and use this extension as a coarse SQL gate. - β’ No
$refresolution across files: Schemas are passed as a singleJSONvalue per call β there is no filesystem or HTTP$refresolver. Inline any$reftargets into the same schema document before validating. - β’ Not a replacement for typed columns: If your data has a fixed shape, modeling it with proper DuckDB columns plus
CHECKconstraints is cheaper and clearer. JSON Schema validation is for the genuinely-JSON case where the shape is contract-defined but the storage stays opaque. - β’ Performance scales with schema complexity: Deeply-nested schemas with many keywords run the full validator per row. For very large tables, validate on ingest into a staging table rather than re-running the check on every read.
π― Common Use Cases
Quality gate on ingest
INSERT INTO events_clean SELECT * FROM events_raw WHERE json_schema_validate(:schema, payload) β non-conforming rows can be tee'd into a quarantine table for manual review.
Audit a JSON column
Aggregate valid and invalid counts with json_schema_validate inside COUNT(*) FILTER (...). Useful for retroactively checking historical data against a freshly-tightened contract.
Fill in missing defaults
Declare default in the schema, then use json_schema_update to produce the enriched JSON in one expression β or json_schema_patch when you need the diff for an audit log.
Lint schemas before deploying
json_schema_validate_schema over a table of candidate schemas in CI. Catches typos before they reject every production row.
Deep Dive
Technical Details
What you can do with one query
Validate every row of a JSON column against a schema in a single SELECT:
SELECT
COUNT(*) AS total,
COUNT(*) FILTER (WHERE json_schema_validate(:schema, payload)) AS valid,
COUNT(*) FILTER (WHERE NOT json_schema_validate(:schema, payload)) AS invalid
FROM events;
json_schema_validate returns a plain BOOLEAN, so it composes with WHERE, CASE, and aggregate FILTER clauses the way every other DuckDB predicate does β no UDF setup, no detour through Python, no extra round-trip.
This extension is built on the pboettch/json-schema-validator C++ library. Its primary target is JSON Schema draft-07; newer drafts (2019-09 / 2020-12) work for the keywords the underlying validator supports β test the keywords you actually use.
The validation result is a BOOLEAN. There is no structured per-keyword failure report today. There is also no $ref resolver for external files or HTTP β inline any $ref targets into the schema document you pass in.
For application-grade validation with detailed error reports, keep ajv / Pydantic / jsonschema at the API layer. This extension is the right tool when you want the gate to live in SQL β at ingest, in CI, or as part of an audit.
Architecture
Internally this is a thin DuckDB scalar-function wrapper around two C++ libraries: pboettch/json-schema-validator for the validation engine and nlohmann/json for parsing. Both schema and data are passed as DuckDB JSON values (or struct literals, or strings cast with ::JSON). There is no ATTACH, no secret type, no network call β the validator runs locally against the in-query schema.
Four scalar functions cover the surface:
| Group | Functions | Returns |
|---|---|---|
| Validate | json_schema_validate, json_schema_validate_schema | BOOLEAN |
| Enrich | json_schema_patch, json_schema_update | JSON |
Schema features in scope match what draft-07 covers: type validation (string, number, integer, boolean, array, object, null), required, properties, numeric minimum / maximum / multipleOf, string minLength / maxLength / pattern / format, array minItems / maxItems / uniqueItems, and the default keyword used by json_schema_patch and json_schema_update.
Pairs with DuckDBβs JSON support
Schema-side validation is one piece; the rest of the JSON pipeline is plain DuckDB. Read JSON files with read_json_auto, pull out fields with json_extract_*, or cast strings with ::JSON β and feed any of those into json_schema_validate. See DuckDB JSON overview for the surrounding toolkit.
Compared to alternatives
- DuckDB
CHECKconstraints β if your data has a fixed shape, modeling it as proper columns plusCHECKis faster, clearer, and gives you better error messages. Reach for JSON Schema when the payload is genuinely contract-defined JSON and the storage stays opaque. - Application-layer validators (ajv / Pydantic / jsonschema) β these give detailed per-keyword failure reports and richer draft coverage. Pair them at the API boundary; use this extension as the SQL-side gate inside DuckDB so the database doesnβt depend on a Python service to enforce its contracts.
- Manual SQL predicates β handcrafted
WHEREs overjson_extract_*get unwieldy fast for non-trivial schemas. Pointingjson_schema_validateat a versioned schema file keeps the contract in one place.
Default-value enrichment
The non-obvious capability: turn the schemaβs declared default values into actual data. json_schema_patch returns a JSON Patch (RFC 6902) array of add operations; json_schema_update applies them inline. Together they let you go from βincoming partial payloadβ to βfully populated recordβ in one SQL expression β no per-field COALESCE cascade, no application-layer enrichment code.
Reference
Extension Contents
Quick reference to all available functions and settings organized by category.
| Name | Description | |
|---|---|---|
| Enrich Default-value handling. [ | ||
| json_schema_patch() | Compute the JSON Patch (RFC 6902) diff that would bring json_data into conformance with the default values declared by the schema | |
| json_schema_update() | Apply the schema's default values inline | |
| Validate Conformance checks. [ | ||
| json_schema_validate() | Validate a JSON value against a JSON Schema | |
| json_schema_validate_schema() | Lint a JSON Schema before applying it | |
API Reference
Function Documentation
Detailed documentation for each function including signatures, parameters, and examples.
json_schema_patch
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
schema | JSON | Positional | JSON Schema document. Properties whose default keyword is set are the ones that contribute to the patch. |
json_data | JSON | Positional | The (possibly incomplete) JSON value. Missing properties for which the schema declares a default will appear in the resulting patch. |
Returns
JSON β a JSON Patch (RFC 6902) array of add operations that, when applied, fill in the missing defaults.
Description
Compute the JSON Patch (RFC 6902) diff that would bring json_data into conformance with the default values declared by the schema. Returns the patch β does not apply it (use json_schema_update for the in-place variant).
Use the patch when you want to log or audit which fields were synthesized, or when you need to apply the same diff to multiple downstream representations of the document.
Examples
Generate a patch for a partial document
SELECT json_schema_patch('{
"$schema": "https://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"id": {"type": "integer", "default": 5},
"name": {"type": "string"}
}
}', {'name': 'George'}) AS patch;
-- [{"op":"add","path":"/id","value":5}] Materialize patches alongside the original payload for auditing
SELECT
payload,
json_schema_patch(:schema, payload) AS defaults_patch
FROM events_raw; Related Functions
json_schema_update
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
schema | JSON | Positional | JSON Schema document. Properties with default declared are filled in where the input is missing them. |
json_data | JSON | Positional | The JSON value to enrich. Existing fields are preserved; only missing properties with declared default values are added. |
Returns
JSON β the input value with missing defaults filled in.
Description
Apply the schema's default values inline. Equivalent to running json_schema_patch and applying the resulting patch β but in a single function call you can use directly in a SELECT projection.
This is the function to reach for when you want enriched JSON, not a diff.
Examples
Fill in a missing default
SELECT json_schema_update('{
"$schema": "https://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"id": {"type": "integer", "default": 5},
"name": {"type": "string"}
}
}', {'name': 'George'}) AS updated;
-- {"id":5,"name":"George"} Build a clean table with defaults applied
CREATE TABLE events_enriched AS
SELECT json_schema_update(:schema, payload) AS payload
FROM events_raw
WHERE json_schema_validate(:schema, payload); Related Functions
json_schema_validate
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
schema | JSON | Positional | The JSON Schema document to validate against. The validator's primary target is draft-07; newer drafts work for the keywords the underlying pboettch/json-schema-validator supports. |
json_data | JSON | Positional | The JSON value to validate. Pass a column of type JSON, a struct literal, or a string cast with ::JSON. |
Returns
BOOLEAN β TRUE if the value conforms, FALSE otherwise.
Description
Validate a JSON value against a JSON Schema. Returns a plain BOOLEAN, so it composes naturally with WHERE, CASE, and COUNT(*) FILTER (...).
This function does not return structured error detail β it answers conformance only. For a full per-keyword failure report, validate at the application layer with ajv / Pydantic / jsonschema and use this function as the coarse SQL gate.
Examples
Validate a single document
SELECT json_schema_validate('{
"$schema": "https://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"id": {"type": "integer"},
"name": {"type": "string"}
},
"required": ["id"]
}', {'id': 5, 'name': 'George'}) AS ok;
-- TRUE Filter a column on schema conformance
INSERT INTO events_clean
SELECT * FROM events_raw
WHERE json_schema_validate(:schema, payload); Aggregate valid / invalid counts
SELECT
COUNT(*) AS total,
COUNT(*) FILTER (WHERE json_schema_validate(:schema, payload)) AS valid,
COUNT(*) FILTER (WHERE NOT json_schema_validate(:schema, payload)) AS invalid
FROM events; Related Functions
json_schema_validate_schema
Signature
Parameters (Positional)
| Parameter | Type | Mode | Description |
|---|---|---|---|
schema | JSON | Positional | The candidate JSON Schema document. Will be checked against the validator's meta-schema. |
Returns
BOOLEAN β TRUE if the schema is well-formed, FALSE otherwise.
Description
Lint a JSON Schema before applying it. Catches typos and structural mistakes that would otherwise cause every json_schema_validate call to fail at runtime.
A reasonable habit: gate schema deployments through this function in CI, the same way you'd run a YAML or SQL linter.
Examples
Lint a schema before deploying
SELECT json_schema_validate_schema('{
"$schema": "https://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"id": {"type": "integer"}
}
}') AS schema_ok;
-- TRUE Find any malformed schemas in a registry table
SELECT name, version
FROM schema_registry
WHERE NOT json_schema_validate_schema(definition); Related Functions
Practical Examples
Cookbook
Real-world recipes and patterns for common use cases.
Validate one document
SELECT json_schema_validate('{
"$schema": "https://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"id": {"type": "integer"},
"name": {"type": "string"}
},
"required": ["id"]
}', {'id': 5, 'name': 'George'}) AS valid;
-- TRUE
The schema document follows the JSON Schema specification. See json_schema_validate.
Validate a column
Most real workloads point validation at a JSON column instead of a literal:
SELECT
COUNT(*) AS total,
COUNT(*) FILTER (WHERE json_schema_validate(:schema, payload)) AS valid,
COUNT(*) FILTER (WHERE NOT json_schema_validate(:schema, payload)) AS invalid
FROM events;
json_schema_validate returns BOOLEAN, so it slots into FILTER aggregates and WHERE clauses without a wrapper.
Lint the schema before deploying
Before pushing a schema to production, verify itβs well-formed:
SELECT json_schema_validate_schema('{
"$schema": "https://json-schema.org/draft-07/schema",
"type": "object",
"properties": { "id": { "type": "integer" } }
}') AS schema_ok;
-- TRUE
json_schema_validate_schema catches typos in the schema itself β useful as a CI step before a contract update lands. Run it across a registry table to find any bad entries:
SELECT name, version
FROM schema_registry
WHERE NOT json_schema_validate_schema(definition);
Quality-gate filter at ingest
Reject non-conforming payloads at the boundary; tee the rest into a quarantine table for review:
INSERT INTO events_clean
SELECT * FROM events_raw
WHERE json_schema_validate(:event_schema, payload);
INSERT INTO events_quarantine
SELECT * FROM events_raw
WHERE NOT json_schema_validate(:event_schema, payload);
Compute a defaults patch
Given a schema with default values declared, json_schema_patch returns the JSON Patch (RFC 6902) add operations needed to fill in the missing fields:
SELECT json_schema_patch('{
"$schema": "https://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"id": {"type": "integer", "default": 5},
"name": {"type": "string"}
}
}', {'name': 'George'}) AS patch;
-- [{"op":"add","path":"/id","value":5}]
Useful when you want to log or audit which fields were synthesized, or when the same diff has to apply to multiple downstream representations.
Apply defaults inline
When you want enriched JSON rather than the diff, reach for json_schema_update instead:
SELECT json_schema_update('{
"$schema": "https://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"id": {"type": "integer", "default": 5},
"name": {"type": "string"}
}
}', {'name': 'George'}) AS updated;
-- {"id":5,"name":"George"}
Build a clean enriched table in one step β validate, then enrich:
CREATE TABLE events_enriched AS
SELECT json_schema_update(:schema, payload) AS payload
FROM events_raw
WHERE json_schema_validate(:schema, payload);
Validate JSON files on ingest
Pair with DuckDBβs read_json_auto for file-based inputs:
SELECT *
FROM read_json_auto('events/*.json')
WHERE json_schema_validate(:schema, to_json(*));
For records arriving as already-typed JSON columns, pass the column directly without the to_json wrap.
Use a schema parameter
Keep the schema out of query text by binding it as a SQL variable β handy when the same query runs against different contracts:
SET VARIABLE event_schema = (SELECT definition FROM schema_registry WHERE name = 'events.v3');
SELECT COUNT(*) AS valid
FROM events
WHERE json_schema_validate(:event_schema, payload); Platform Support
Compatibility
Extension availability may vary by platform and DuckDB version. Check below to ensure this extension supports your environment before installation.
Platforms
Supported platform architectures
Compiled binary sizes
| Platform | Architecture | Size |
|---|---|---|
| Linux | aarch64 | 2.85 MB |
| Linux | x86_64 | 3.23 MB |
| Linux (musl) | x86_64 | 2.33 MB |
| macOS | Apple Silicon | 1.96 MB |
| macOS | Intel | 2.28 MB |
| Windows | x86_64 | 7.62 MB |
| WASM | eh | 35.7 KB |
| WASM | mvp | 32.1 KB |
| WASM | threads | 32.0 KB |
Gzipped download size from the DuckDB community-extensions registry.
Validate JSON in SQL
Install JSON Schema to enforce data contracts on JSON columns at query time β useful for API ingest checks, ETL data-quality gates, and config validation.