No Pre-conversion Required! Load GIS Data into DuckDB and Display Vector Tiles

2025-11-14

No Pre-conversion Required! Load GIS Data into DuckDB and Display Vector Tiles

I'm Hiroki Inoue, CTO. We're developing a new web-based BI tool that can work with various kinds of data, including geospatial data. As part of that effort, we're running a proof of concept (PoC) for GIS data visualization.

In web GIS, you can render data as-is in formats like GeoJSON, but performance degrades when the dataset gets large. That's why it's common to use the lighter and more efficient Vector Tile format. Among those, Mapbox Vector Tile (MVT) is widely used.

However, to use MVT you typically need to convert from sources like GeoJSON ahead of time, generate dedicated tile files, and serve them from some server. For a BI tool that should be able to ingest and visualize diverse datasets, mandatory pre-conversion becomes a real burden.

In this PoC, we explored an approach that solves this: generating vector tiles on demand directly from GIS data stored in DuckDB, and displaying them with MapLibre GL JS. Without pre-conversion, we use DuckDB query results directly as tiles to build a simple and flexible map rendering pipeline.

Try the implementation and demo here:

Pre-converting to MVT is a hassle for tile rendering

One of the simplest ways to visualize geospatial data on the web is to load it as GeoJSON and draw it directly on a mapping library. But this approach has clear drawbacks. For example, for nationwide administrative boundary polygons or hundreds of thousands of point features, large data volumes can make client-side rendering heavy, causing map display and interactions to slow down significantly.

To address this, the "tile" concept is commonly used. Tiles split the map into small squares by zoom level and position, and load only what is needed incrementally. There are raster tiles (images) and vector tiles (geometry information). Recently, vector tiles have become more popular because they enable flexible rendering and dynamic style changes.

The most widely adopted vector tile data format is MVT. It's a PBF (Protocol Buffers) binary format defined by Mapbox for efficient delivery and rendering, and it’s supported by many mapping libraries, especially Mapbox and MapLibre GL JS. MVT compresses geometry efficiently and is typically faster than rendering large GeoJSON directly.

On the flip side, using MVT tends to raise implementation and operations costs because you need pre-conversion and an environment to serve tiles. When you optimize for rendering performance by choosing MVT, you also face a trade-off where data mutability and immediacy can suffer.

In this PoC, we tried to bridge that gap with on-demand vector tile generation using DuckDB.

What is DuckDB?

DuckDB is a lightweight, high-performance, columnar analytical database engine often described as "SQLite for Analytics." Despite being a single-binary C++ engine, it delivers very fast query performance even on large datasets. Compared with systems like PostgreSQL, Snowflake, or BigQuery (often used for analytics), DuckDB can run locally with no external dependencies, minimal setup, and low resource usage.

One reason DuckDB is drawing attention is its ability to handle a wide range of file formats and external data sources transparently. You can query, aggregate, and join CSV, Parquet, and JSON directly via SQL. It can also work with PostGIS-like formats, GeoJSON, SQLite, and data in cloud storage.

This flexibility is attractive because it reduces the need to build dedicated ETL pipelines for analysis.

Also, with DuckDB-WASM, you can even run DuckDB directly in the browser. That enables web applications to use DuckDB like a backend within the browser, reading, transforming, and visualizing data completely client-side.

In this PoC, we leverage DuckDB’s "runs anywhere" lightweight nature and its flexibility with many formats. We store geospatial data in DuckDB running in the browser via DuckDB-WASM, and validate an approach that serves vector tiles without an intermediate conversion to MVT.

Storing geospatial data in DuckDB

We consider using DuckDB-WASM in-memory in the browser. First initialize DuckDB and load geospatial data. You can read various files with SQL like below. Creating a spatial index (R-Tree) after loading speeds up tile fetching later.

CREATE TABLE example AS SELECT * FROM st_read("
https://example.com/hoge.geojson
");
CREATE INDEX example_idx ON example USING RTREE (geom);

Here’s a TypeScript snippet that initializes DuckDB and reads GeoJSON over HTTP:

import * as duckdb from "@duckdb/duckdb-wasm";
import eh_worker from "@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js?url";
import mvp_worker from "@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js?url";
import duckdb_wasm_eh from "@duckdb/duckdb-wasm/dist/duckdb-eh.wasm?url";
import duckdb_wasm from "@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm?url";

const MANUAL_BUNDLES: duckdb.DuckDBBundles = {
  mvp: { mainModule: duckdb_wasm, mainWorker: mvp_worker },
  eh: { mainModule: duckdb_wasm_eh, mainWorker: eh_worker },
};

const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
const worker = new Worker(bundle.mainWorker!);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);

const conn = await db.connect();
await conn.execute("LOAD spatial;");

await conn.execute(`CREATE TABLE example AS SELECT * FROM st_read("
https://example.com/hoge.geojson);");`
);
await conn.execute(`CREATE INDEX example_idx ON example USING RTREE (geom);`);

A quick note on input file formats. DuckDB supports many, including GeoJSON, but some formats can be slow to read when the data is large. For example, loading an ~800 MB GeoJSON with over 100k features in DuckDB-WASM can take 30 seconds or more depending on the environment.

To speed up the initial load, beyond geometry optimization, it’s effective to convert GeoJSON to Parquet in advance. You can also do the conversion with DuckDB:

CREATE TABLE example AS SELECT * FROM st_read("
https://example.com/hoge.geojson
);");
COPY example TO "hoge.parquet";

Fetching tiles as GeoJSON from DuckDB

Next, fetch records within an arbitrary Z/X/Y tile range from a DuckDB table. Here’s SQL that takes Z, X, and Y as parameters, clips geometry to the tile, and returns GeoJSON:

SELECT
  ST_AsGeoJSON(
    ST_Intersection(
      ST_Transform(geom, 'EPSG:4326', 'EPSG:3857', true),
      ST_TileEnvelope(${z}, ${x}, ${y})
    )
  ) AS geojson,
  name,       -- example property
  description -- example property
FROM example
WHERE ST_Intersects(
  ST_Transform(geom, 'EPSG:4326', 'EPSG:3857', true),
  ST_TileEnvelope(${z}, ${x}, ${y})
);

Key spatial extension functions used here:

  • ST_TileEnvelope: Creates a tile bounding box (Web Mercator) from Z/X/Y
  • ST_Transform: Converts from WGS84 (EPSG:4326) to Web Mercator (EPSG:3857) with always_xy=true
  • ST_Intersects: Intersection test
  • ST_Intersection: Gets geometry within the tile (removes overflow outside the tile)
  • ST_AsGeoJSON: Converts geometry to a GeoJSON string

TypeScript example that returns a FeatureCollection:

async function queryTile(
  conn: any,
  table: string,
  z: number,
  x: number,
  y: number,
  columns: string[] = []
) {
  const props = columns.length
    ? 
columns.map
((c) => `"${c.replace(/\"/g, '""')}"`).join(', ')
    : '*';

  const sql = `SELECT
    ST_AsGeoJSON(
      ST_Intersection(
        ST_Transform(geom, 'EPSG:4326', 'EPSG:3857', true),
        ST_TileEnvelope(${z}, ${x}, ${y})
      )
    ) AS geojson,
    ${props}
  FROM ${table}
  WHERE ST_Intersects(
    ST_Transform(geom, 'EPSG:4326', 'EPSG:3857', true),
    ST_TileEnvelope(${z}, ${x}, ${y})
  );`;

  const result = await conn.query(sql);
  const rows = result.toArray();

  const features = rows
    .map((row: any) => {
      if (!row.geojson) return null;
      let geometry;
      try {
        geometry = JSON.parse(row.geojson as string);
      } catch (e) {
        console.error('Error parsing GeoJSON:', e);
        return null;
      }
      const properties = Object.fromEntries(
        Object.entries(row).filter(([k]) => k !== 'geom' && k !== 'geojson')
      );
      return { type: 'Feature' as const, geometry, properties };
    })
    .filter(Boolean);

  return { type: 'FeatureCollection' as const, features };
}

This is simple, but MapLibre GL JS custom protocols only accept MVT for vector sources, so you cannot feed raw GeoJSON directly. That means you’d still need a GeoJSON-to-MVT conversion step, which can become another performance bottleneck.

Fetching tiles as MVT from DuckDB

Good news: starting with DuckDB-WASM v1.30.1, STAsMVT and STAsMVTGeom are available!

Using these, DuckDB can generate MVT (PBF) directly. With the spatial extension enabled, you can complete the whole flow in SQL using PostGIS-compatible functions like ST_AsMVT, ST_AsMVTGeom, and ST_TileEnvelope, from tile-boundary clipping to PBF encoding.

Here’s an example SQL:

WITH tile_data AS (
  SELECT {
    'geometry': ST_AsMVTGeom(
      ST_Transform(
        ST_SimplifyPreserveTopology("geom", ${simplify}),
        'EPSG:4326', 'EPSG:3857', true
      ),
      ST_Extent(ST_TileEnvelope(${z}, ${x}, ${y})),
      4096,    -- extent (MVT standard)
      0,       -- buffer (consider 64 etc. as needed)
      false
    ),
    -- Cast properties to MVT-supported types
    'name': TRY_CAST("name" AS VARCHAR),
    'value': TRY_CAST("value" AS INTEGER)
  } AS feature
  FROM ${table}
  WHERE "geom" IS NOT NULL
    AND ST_Intersects(
      ST_Transform("geom", 'EPSG:4326', 'EPSG:3857', true),
      ST_TileEnvelope(${z}, ${x}, ${y})
    )
  LIMIT 50000
)
SELECT ST_AsMVT(
  feature,
  'default', -- layer name
  4096,
  'geometry'
) AS mvt
FROM tile_data
WHERE feature.geometry IS NOT NULL AND NOT ST_IsEmpty(feature.geometry);

Implementation notes:

  • To avoid coordinate-order pitfalls, set the 4th arg of ST_Transform (always_xy) to true, so it's always lon,lat
  • extent is the virtual coordinate resolution inside a tile. 4096 is effectively standard for MVT
  • buffer is extra margin to prevent seams across tile borders in the same coordinate system as extent. For example, with extent=4096, buffer=64
  • MapLibre GL JS's tileSize (default 512 px) is a screen rendering size and separate from the MVT extent/buffer concept
  • MVT property types are limited (string, integer, float, boolean). Use TRY_CAST to safely cast DuckDB’s richer types
  • If there are many features, use LIMIT and zoom-dependent simplification to cap tile size, for example ST_SimplifyPreserveTopology
  • Exclude empty geometries with ST_IsEmpty

From DuckDB-WASM you’ll get the query result as a Uint8Array (PBF). You can return it as-is from a MapLibre GL JS custom protocol handler. One caveat: to avoid ArrayBuffer detachment due to WASM memory relocation, take a safe copy before returning.

const result = await conn.query(/* SQL above */);
const raw = result.get(0)?.mvt as Uint8Array | undefined;
if (!raw || raw.length === 0) return new Uint8Array();
// Take a safe copy to avoid detachment
return new Uint8Array(raw.buffer.slice(raw.byteOffset, raw.byteOffset + raw.byteLength));

Displaying tiles in MapLibre GL JS

MapLibre GL JS supports "protocols," where you register a handler that returns raw bytes for your own URL scheme. For vector sources, return MVT PBF bytes. For raster sources, return image bytes.

This lets you treat non-HTTP paths and on-demand generation (like in-browser DuckDB-WASM) just like a normal tiles config.

function parseDuckDBTileUrl(url: string): { tableSpec: string; zxy: { z: number; x: number; y: number } } | null {
  const m = url.match(/^duckdb:\/\/([^\/]+)\/(\d+)\/(\d+)\/(\d+)\.pbf$/);
  if (!m) return null;
  const [, tableSpec, z, x, y] = m;
  return { tableSpec, zxy: { z: +z, x: +x, y: +y } };
}

async function generateVectorTile(
  conn: AsyncDuckDBConnection,
  tableName: string,
  zxy: { z: number; x: number; y: number },
  geomCol = 'geom',
  selectedProps: string[] = [],
  columnTypes?: Record<string, string | null>
): Promise<Uint8Array> {
  const simplify = calculateSimplifyTolerance(zxy.z);

	const safeTableName = tableName.replace(/"/g, '""');
  const propStruct = selectedProps
    .map((col) => {
      const type = columnTypes?.[col]?.toUpperCase() ?? '';
      const key = col.replace(/'/g, "''");
      const isComplex = /(STRUCT|LIST|\[\]|MAP|JSON|UNION)/.test(type);
      const intTarget = /^(TINYINT|SMALLINT|UTINYINT|USMALLINT)$/.test(type)
        ? 'INTEGER'
        : /^(HUGEINT|UHUGEINT|UINTEGER|UBIGINT)$/.test(type)
        ? 'BIGINT'
        : null;
      const expr = isComplex
        ? `TRY_CAST("${col}" AS VARCHAR)`
        : intTarget
        ? `TRY_CAST("${col}" AS ${intTarget})`
        : `"${col}"`;
      return `'${key}': ${expr}`;
    })
    .join(', ');

  const sql = `WITH tile_data AS (
    SELECT {
      'geometry': ST_AsMVTGeom(
        ST_Transform(ST_SimplifyPreserveTopology("${geomCol}", ${simplify}), 'EPSG:4326', 'EPSG:3857', true),
        ST_Extent(ST_TileEnvelope(${zxy.z}, ${zxy.x}, ${zxy.y})),
        4096,
        0,
        false
      )${propStruct ? `, ${propStruct}` : ''}
    } AS feature
    FROM "${safeTableName}"
    WHERE "${geomCol}" IS NOT NULL
      AND ST_Intersects(
        ST_Transform("${geomCol}", 'EPSG:4326', 'EPSG:3857', true),
        ST_TileEnvelope(${zxy.z}, ${zxy.x}, ${zxy.y})
      )
    LIMIT 50000
  )
  SELECT ST_AsMVT(feature, 'default', 4096, 'geometry') AS mvt
  FROM tile_data
  WHERE feature.geometry IS NOT NULL AND NOT ST_IsEmpty(feature.geometry);`;

  const result = await conn.query(sql);
  const tile = result.get(0)?.mvt as Uint8Array | undefined;
  if (!tile || tile.length === 0) return new Uint8Array();
  return new Uint8Array(tile.buffer.slice(tile.byteOffset, tile.byteOffset + tile.byteLength));
}

maplibregl.addProtocol('duckdb', async (params) => {
  const parsed = parseDuckDBTileUrl(params.url);
  if (!parsed) return { data: new Uint8Array() };
  const { tableSpec, zxy } = parsed;
  const mvt = await generateVectorTile(conn, tableSpec, zxy, 'geom', ['name', 'value']);
  return { data: mvt };
});

function calculateSimplifyTolerance(zoom: number): number {
  if (zoom >= 15) return 0;
  const maxSimplify = 0.001;
  const t = Math.max(0, Math.min(1, zoom / 15));
  return +(maxSimplify * (1 - t)).toFixed(6);
}

Results

With the above implemented and tested on prefecture polygon data from Japan's National Land Information dataset, we got results like this:

Try the implementation and demo here:

Summary

We showed a method that uses DuckDB-WASM in the browser, converts SQL results to MVT on the spot, and renders them via MapLibre GL JS protocol handlers. Without depending on pre-conversion or dedicated tile-serving infrastructure, you can visualize data on demand with both speed and flexibility.

By leveraging DuckDB’s spatial extension and STAsMVT / STAsMVTGeom directly, we eliminated the intermediate GeoJSON-to-MVT conversion and supplied MVT on demand via the

duckdb:// scheme with addProtocol. This enables:
  • Query changes to reflect directly in tiles for faster iterative exploratory analysis
  • Minimal overhead for tile generation/serving pipelines and storage operations
  • Arbitrary schemas, derived columns, spatial joins, and aggregations written directly in SQL

Open issues and next steps:

  • Because this depends on browser memory, very large datasets should be preprocessed first, e.g., Parquet conversion, column reduction, pre-aggregation. Complex geometries and dense clusters of tiny polygons can slow rendering dramatically and may need optimization
  • Custom protocols require attention to request cancellation and concurrency controls
  • For further performance, implementing an MVT cache at the protocol layer can help

References

Appendix: Converting GeoJSON to vector tiles

If you do want to convert GeoJSON fetched from DuckDB into MVT/PBF that MapLibre GL JS understands, geojson-vt and vt-pbf are handy.

import geojsonvt from 'geojson-vt';
import vtpbf from 'vt-pbf';

export function geojsonToVectorTile(
  features: Feature[],
  z: number,
  x: number,
  y: number
): Uint8Array {
  const tileIndex = geojsonvt({ type: 'FeatureCollection', features }, {
    generateId: true,
    indexMaxZoom: z,
    maxZoom: z,
    buffer: 0,
    tolerance: 0,
    extent: 4096
  });

  const tile = tileIndex.getTile(z, x, y);
  if (!tile) return new Uint8Array();
  return vtpbf.fromGeojsonVt({ default: tile });
}
English

Eukaryaでは様々な職種で採用を行っています!OSSにコントリビュートしていただける皆様からの応募をお待ちしております!

Eukarya 採用ページ

Eukarya is hiring for various positions! We are looking forward to your application from everyone who can contribute to OSS!

Eukarya Careers

Eukaryaは、Re:Earthと呼ばれるWebGISのSaaSの開発運営・研究開発を行っています。Web上で3Dを含むGIS(地図アプリの公開、データ管理、データ変換等)に関するあらゆる業務を完結できることを目指しています。ソースコードはほとんどOSSとしてGitHubで公開されています。

Re:Earth / ➔ Eukarya / ➔ note / ➔ GitHub

Eukarya is developing and operating a WebGIS SaaS called Re:Earth. We aim to complete all GIS-related tasks including 3D (such as publishing map applications, data management, and data conversion) on the web. Most of the source code is published on GitHub as OSS.

Re:Earth / ➔ Eukarya / ➔ Medium / ➔ GitHub