Metabase + DuckDB: Local-First Analytics Setup Guide [2026]

Q: Can DuckDB query data on S3 directly?

Yes. With the `httpfs` extension, DuckDB reads Parquet and CSV directly from S3, GCS, Azure or any HTTP endpoint: `SELECT * FROM read_parquet('s3://bucket/path/*.parquet')`. Combined with Metabase, you get serverless analytics directly on cloud storage — no copy, no ingestion.

TL;DR: Pair Metabase (open-source BI) with DuckDB (in-process analytical database) and you get a serverless, file-based analytics stack that queries Parquet and CSV at warehouse speed without provisioning anything. Install the DuckDB community driver, point it at a .duckdb file or Parquet glob, and you have charts and dashboards in minutes. It's the right choice for solo data engineers, small teams and embedded analytics — and the wrong choice for high-concurrency production BI.

If you've ever wanted a real BI tool on top of Parquet files sitting on your laptop or in S3, the Metabase + DuckDB combo is the shortest path there. No warehouse, no ETL, no separate query engine: Metabase points at a single DuckDB file (or a directory of Parquet) and you get the same dashboarding experience you'd get on Snowflake or BigQuery — at zero cost and zero infrastructure.

This guide walks through why the combo works, how to wire it up end-to-end, the gotchas that bite people the first time, and when you should grow out of it.

Why Metabase + DuckDB Is a Powerful Combo

Both tools have built reputations independently — but they happen to fit each other unusually well.

What Metabase Brings

Metabase is the most popular open-source BI tool. It gives you:

A no-SQL query builder for non-technical users
A SQL editor for analysts
Dashboards with drill-down, filters, and scheduled refresh
Embedding via signed URLs or full white-label
Native support for ~20+ databases out of the box

The community-maintained DuckDB driver adds DuckDB to that list.

What DuckDB Brings

DuckDB is an embedded analytical database — think "SQLite for analytics." It runs in-process, has no server to manage, and reads Parquet/CSV/JSON natively without ingestion. Core strengths:

Vectorized columnar engine — comparable to ClickHouse/Spark on a single node
Reads Parquet and CSV with read_parquet('s3://bucket/*.parquet') — including remote files
ACID transactions on the local .duckdb file
Zero external dependencies, zero server

Why They Fit Together

Most BI tools assume a database server you connect to over the network. DuckDB doesn't have one — it's a library. The Metabase DuckDB driver bridges this by opening the DuckDB file as a connection inside the Metabase JVM. From Metabase's perspective DuckDB looks like any other database; from DuckDB's perspective Metabase is just another consumer of the file.

The result: warehouse-grade analytics on a single Parquet folder, with a dashboard layer on top, all running on a $5 VM (or your laptop).

Setting Up Metabase with DuckDB

There are two paths: Docker (recommended for production) and the Metabase JAR (handy for local exploration).

Option A: Docker Compose

Create a docker-compose.yml that mounts your DuckDB file and ships Metabase with the DuckDB driver plugin:

services:
  metabase:
    image: metabase/metabase:latest
    ports:
      - "3000:3000"
    volumes:
      - ./data:/data:ro
      - ./plugins:/plugins
    environment:
      MB_DB_TYPE: h2
      MB_PLUGINS_DIR: /plugins

The plugins/ directory holds the DuckDB driver JAR. Download the latest from the community driver releases and drop it into plugins/ before starting:

mkdir -p plugins data
curl -L -o plugins/duckdb.metabase-driver.jar \
  https://github.com/AlexR2D2/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar

docker compose up -d

Wait ~60 seconds for Metabase to initialize, then open http://localhost:3000.

Option B: Native JAR (Local Dev)

If you don't want Docker, download the Metabase JAR and the DuckDB driver, and place the driver in a plugins/ folder next to the JAR:

curl -L -o metabase.jar https://downloads.metabase.com/latest/metabase.jar
mkdir plugins
curl -L -o plugins/duckdb.metabase-driver.jar \
  https://github.com/AlexR2D2/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar

java -jar metabase.jar

Same outcome: Metabase starts on port 3000.

Connecting DuckDB as a Data Source

Once Metabase is up:

Go to Admin → Databases → Add database
Select DuckDB as the database type (it shows up because of the driver)
Set the Database file to the path inside the container: /data/analytics.duckdb
Save

Metabase scans the schemas and tables. Within a minute you can build your first question against any table in the file.

If you don't have a .duckdb file yet, create one in two lines of Python:

import duckdb
con = duckdb.connect("data/analytics.duckdb")
con.execute("CREATE TABLE events AS SELECT * FROM read_parquet('events/*.parquet')")

Now Metabase can query events like any other table.

Common Gotchas

The combo works beautifully — once you've hit these landmines once.

1. Read-Only vs Read-Write Mode

By default the Metabase driver opens DuckDB in read-write mode, which means only one process can attach to the file at a time. If you also have a Jupyter notebook or a dbt run touching the same file, Metabase will fail to connect.

Fix: open the connection in read-only mode by checking the Read-only option when you set up the data source, or by passing ?access_mode=read_only in the connection string.

For a serious setup, treat the file as immutable from Metabase's side and let your ETL job write to a fresh file then atomically swap it in.

2. File Path Inside Containers

The Metabase container sees a different filesystem than your host. The path you put in the data source UI must be the path inside the container — i.e., the mount target, not the host path.

If you mount ./data:/data:ro, your analytics.duckdb file at ./data/analytics.duckdb on the host is at /data/analytics.duckdb inside the container.

3. Concurrency Limits

DuckDB allows multiple readers but only one writer per file. Metabase questions issued in parallel by different users are fine — they all share the same read connection inside Metabase. But if you also point a separate process (a scheduled dbt job, a Python notebook) at the same file in write mode, expect lock contention.

For >10 concurrent users on the same file, switch to read-only mode and run any writes through a dedicated job that produces a new file.

4. Memory Settings

DuckDB defaults to using a fraction of system RAM. If Metabase runs on a small VM and DuckDB tries to load a 5GB Parquet scan, the JVM gets squeezed and Metabase becomes sluggish. Cap DuckDB's memory in the connection string: ?memory_limit=2GB.

When to Use Metabase + DuckDB

This stack shines in three scenarios.

Local Analytics on Parquet Files

You have a folder of Parquet files (from a Spark job, a Fivetran sync to S3, or a daily CSV export) and you want to slice them without spinning up a warehouse. Point DuckDB at the directory, expose tables as views, and Metabase makes them queryable to non-technical users.

Compared to pulling the same data into BigQuery or Snowflake, you save the ingestion step entirely.

Embedded Analytics for Small SaaS

For a SaaS product with per-tenant analytics dashboards, a single DuckDB file per tenant + an embedded Metabase iframe is one of the cheapest production setups available. No per-tenant warehouse cost, instant cold-start (the file lives on disk), and the dashboards behave the same as on any production warehouse.

This is especially viable if your analytics data fits on a single disk and tolerates a few minutes of staleness — which is true for most SaaS tools below the ten-thousand-customer mark.

Replace Postgres for Analytics Workloads

Many companies run analytics on a Postgres replica with reporting queries that take minutes. Drop the same dataset into DuckDB (export tables as Parquet, load into a .duckdb file) and the same queries finish in seconds. Metabase doesn't care which database it's pointed at.

This is the fastest "before/after" demo to convince a team that columnar matters: same dashboard, 50x faster.

When Not to Use This Combo

The combo has hard limits.

High-concurrency production BI (50+ analysts running queries simultaneously). DuckDB is single-node; Metabase doesn't shard. Move to Snowflake/BigQuery/ClickHouse.
Real-time data. DuckDB doesn't subscribe to streams. If you need second-fresh data, you'll need an OLAP database that ingests continuously (Pinot, Druid, ClickHouse).
Datasets that don't fit on disk. DuckDB can read Parquet directly from S3, but heavy queries that scan terabytes will be limited by network bandwidth. Above ~500GB, a real warehouse pays off.
Strict uptime SLAs. A single DuckDB file is a single point of failure. Backup, replicate, or tier this with care.

For everything below those thresholds — solo data engineers, internal tools at startups, side projects, and embedded analytics for early-stage SaaS — the combo is hard to beat.

Alternatives

If Metabase + DuckDB is almost right but not quite, here's what else to look at.

DuckDB UI (built-in duckdb -ui): the official lightweight notebook. Great for exploration, no dashboarding.
Apache Superset + DuckDB: similar architecture, different UX. Superset is more flexible for custom viz; Metabase is friendlier for non-engineers.
dbt + DuckDB + Metabase: add dbt for transformation modelling. dbt's dbt-duckdb adapter materializes models into the same DuckDB file Metabase reads from.
MotherDuck: managed DuckDB-as-a-service. Same SQL, plus collaboration and shared catalogs. The Metabase DuckDB driver also works against MotherDuck.

If you're picking a stack from scratch, Metabase + DuckDB + dbt is a strong default for the first two years of a data team — and we cover the modelling layer in our analytics engineering with dbt guide.

Frequently Asked Questions

Can Metabase connect to DuckDB?

Yes, via the community-maintained DuckDB driver. Drop the JAR into Metabase's plugins/ directory, restart, and DuckDB shows up as a database type in the admin UI. Setup takes 5–10 minutes.

Is DuckDB fast enough for BI dashboards?

For most dashboards, yes — DuckDB is comparable to a single-node Snowflake or BigQuery shard on analytical queries. Datasets up to ~100GB scanned regularly run sub-second on modest hardware. Above that, query time depends heavily on disk speed and Parquet partitioning.

Can I use DuckDB and Postgres together in Metabase?

Yes. Metabase supports multiple databases simultaneously. A common pattern is to keep operational data in Postgres and analytical data in DuckDB, then build dashboards from each. You can't join across them inside Metabase, but DuckDB's postgres_scanner extension can pull Postgres tables into DuckDB queries directly.

Is this production-ready?

For internal tools, embedded analytics in early-stage SaaS, and small-team BI, yes. For high-concurrency public-facing analytics with strict SLAs, no — pair Metabase with a server-based warehouse. The driver is community-maintained, not officially supported by Metabase, so factor in that risk.

How does DuckDB handle concurrent writes?

It doesn't, in the same file. Only one process can hold a write lock. The standard pattern: an ETL job writes to a staging.duckdb file, then atomically renames it to analytics.duckdb (or symlinks to the latest) so Metabase always sees a consistent file.

Can DuckDB query data on S3 directly?

Yes. With the httpfs extension, DuckDB reads Parquet and CSV directly from S3, GCS, Azure or any HTTP endpoint: SELECT * FROM read_parquet('s3://bucket/path/*.parquet'). Combined with Metabase, you get serverless analytics directly on cloud storage — no copy, no ingestion.