TL;DR: Pair Metabase (open-source BI) with DuckDB (in-process analytical database) and you get a serverless, file-based analytics stack that queries Parquet and CSV at warehouse speed without provisioning anything. Install the DuckDB community driver, point it at a
.duckdbfile or Parquet glob, and you have charts and dashboards in minutes. It's the right choice for solo data engineers, small teams and embedded analytics — and the wrong choice for high-concurrency production BI.
If you've ever wanted a real BI tool on top of Parquet files sitting on your laptop or in S3, the Metabase + DuckDB combo is the shortest path there. No warehouse, no ETL, no separate query engine: Metabase points at a single DuckDB file (or a directory of Parquet) and you get the same dashboarding experience you'd get on Snowflake or BigQuery — at zero cost and zero infrastructure.
This guide walks through why the combo works, how to wire it up end-to-end, the gotchas that bite people the first time, and when you should grow out of it.
Why Metabase + DuckDB Is a Powerful Combo
Both tools have built reputations independently — but they happen to fit each other unusually well.
What Metabase Brings
Metabase is the most popular open-source BI tool. It gives you:
- A no-SQL query builder for non-technical users
- A SQL editor for analysts
- Dashboards with drill-down, filters, and scheduled refresh
- Embedding via signed URLs or full white-label
- Native support for ~20+ databases out of the box
The community-maintained DuckDB driver adds DuckDB to that list.
What DuckDB Brings
DuckDB is an embedded analytical database — think "SQLite for analytics." It runs in-process, has no server to manage, and reads Parquet/CSV/JSON natively without ingestion. Core strengths:
- Vectorized columnar engine — comparable to ClickHouse/Spark on a single node
- Reads Parquet and CSV with
read_parquet('s3://bucket/*.parquet')— including remote files - ACID transactions on the local
.duckdbfile - Zero external dependencies, zero server
Why They Fit Together
Most BI tools assume a database server you connect to over the network. DuckDB doesn't have one — it's a library. The Metabase DuckDB driver bridges this by opening the DuckDB file as a connection inside the Metabase JVM. From Metabase's perspective DuckDB looks like any other database; from DuckDB's perspective Metabase is just another consumer of the file.
The result: warehouse-grade analytics on a single Parquet folder, with a dashboard layer on top, all running on a $5 VM (or your laptop).
Setting Up Metabase with DuckDB
There are two paths: Docker (recommended for production) and the Metabase JAR (handy for local exploration).
Option A: Docker Compose
Create a docker-compose.yml that mounts your DuckDB file and ships Metabase with the DuckDB driver plugin:
services:
metabase:
image: metabase/metabase:latest
ports:
- "3000:3000"
volumes:
- ./data:/data:ro
- ./plugins:/plugins
environment:
MB_DB_TYPE: h2
MB_PLUGINS_DIR: /pluginsThe plugins/ directory holds the DuckDB driver JAR. Download the latest from the community driver releases and drop it into plugins/ before starting:
mkdir -p plugins data
curl -L -o plugins/duckdb.metabase-driver.jar \
https://github.com/AlexR2D2/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar
docker compose up -dWait ~60 seconds for Metabase to initialize, then open http://localhost:3000.
Option B: Native JAR (Local Dev)
If you don't want Docker, download the Metabase JAR and the DuckDB driver, and place the driver in a plugins/ folder next to the JAR:
curl -L -o metabase.jar https://downloads.metabase.com/latest/metabase.jar
mkdir plugins
curl -L -o plugins/duckdb.metabase-driver.jar \
https://github.com/AlexR2D2/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar
java -jar metabase.jarSame outcome: Metabase starts on port 3000.
Connecting DuckDB as a Data Source
Once Metabase is up:
- Go to Admin → Databases → Add database
- Select DuckDB as the database type (it shows up because of the driver)
- Set the Database file to the path inside the container:
/data/analytics.duckdb - Save
Metabase scans the schemas and tables. Within a minute you can build your first question against any table in the file.
If you don't have a .duckdb file yet, create one in two lines of Python:
import duckdb
con = duckdb.connect("data/analytics.duckdb")
con.execute("CREATE TABLE events AS SELECT * FROM read_parquet('events/*.parquet')")Now Metabase can query events like any other table.
Common Gotchas
The combo works beautifully — once you've hit these landmines once.
1. Read-Only vs Read-Write Mode
By default the Metabase driver opens DuckDB in read-write mode, which means only one process can attach to the file at a time. If you also have a Jupyter notebook or a dbt run touching the same file, Metabase will fail to connect.
Fix: open the connection in read-only mode by checking the Read-only option when you set up the data source, or by passing ?access_mode=read_only in the connection string.
For a serious setup, treat the file as immutable from Metabase's side and let your ETL job write to a fresh file then atomically swap it in.
2. File Path Inside Containers
The Metabase container sees a different filesystem than your host. The path you put in the data source UI must be the path inside the container — i.e., the mount target, not the host path.
If you mount ./data:/data:ro, your analytics.duckdb file at ./data/analytics.duckdb on the host is at /data/analytics.duckdb inside the container.
3. Concurrency Limits
DuckDB allows multiple readers but only one writer per file. Metabase questions issued in parallel by different users are fine — they all share the same read connection inside Metabase. But if you also point a separate process (a scheduled dbt job, a Python notebook) at the same file in write mode, expect lock contention.
For >10 concurrent users on the same file, switch to read-only mode and run any writes through a dedicated job that produces a new file.
4. Memory Settings
DuckDB defaults to using a fraction of system RAM. If Metabase runs on a small VM and DuckDB tries to load a 5GB Parquet scan, the JVM gets squeezed and Metabase becomes sluggish. Cap DuckDB's memory in the connection string: ?memory_limit=2GB.
When to Use Metabase + DuckDB
This stack shines in three scenarios.
Local Analytics on Parquet Files
You have a folder of Parquet files (from a Spark job, a Fivetran sync to S3, or a daily CSV export) and you want to slice them without spinning up a warehouse. Point DuckDB at the directory, expose tables as views, and Metabase makes them queryable to non-technical users.
Compared to pulling the same data into BigQuery or Snowflake, you save the ingestion step entirely.
Embedded Analytics for Small SaaS
For a SaaS product with per-tenant analytics dashboards, a single DuckDB file per tenant + an embedded Metabase iframe is one of the cheapest production setups available. No per-tenant warehouse cost, instant cold-start (the file lives on disk), and the dashboards behave the same as on any production warehouse.
This is especially viable if your analytics data fits on a single disk and tolerates a few minutes of staleness — which is true for most SaaS tools below the ten-thousand-customer mark.
Replace Postgres for Analytics Workloads
Many companies run analytics on a Postgres replica with reporting queries that take minutes. Drop the same dataset into DuckDB (export tables as Parquet, load into a .duckdb file) and the same queries finish in seconds. Metabase doesn't care which database it's pointed at.
This is the fastest "before/after" demo to convince a team that columnar matters: same dashboard, 50x faster.
When Not to Use This Combo
The combo has hard limits.
- High-concurrency production BI (50+ analysts running queries simultaneously). DuckDB is single-node; Metabase doesn't shard. Move to Snowflake/BigQuery/ClickHouse.
- Real-time data. DuckDB doesn't subscribe to streams. If you need second-fresh data, you'll need an OLAP database that ingests continuously (Pinot, Druid, ClickHouse).
- Datasets that don't fit on disk. DuckDB can read Parquet directly from S3, but heavy queries that scan terabytes will be limited by network bandwidth. Above ~500GB, a real warehouse pays off.
- Strict uptime SLAs. A single DuckDB file is a single point of failure. Backup, replicate, or tier this with care.
For everything below those thresholds — solo data engineers, internal tools at startups, side projects, and embedded analytics for early-stage SaaS — the combo is hard to beat.
Alternatives
If Metabase + DuckDB is almost right but not quite, here's what else to look at.
- DuckDB UI (built-in
duckdb -ui): the official lightweight notebook. Great for exploration, no dashboarding. - Apache Superset + DuckDB: similar architecture, different UX. Superset is more flexible for custom viz; Metabase is friendlier for non-engineers.
- dbt + DuckDB + Metabase: add dbt for transformation modelling. dbt's
dbt-duckdbadapter materializes models into the same DuckDB file Metabase reads from. - MotherDuck: managed DuckDB-as-a-service. Same SQL, plus collaboration and shared catalogs. The Metabase DuckDB driver also works against MotherDuck.
If you're picking a stack from scratch, Metabase + DuckDB + dbt is a strong default for the first two years of a data team — and we cover the modelling layer in our analytics engineering with dbt guide.
Frequently Asked Questions
Can Metabase connect to DuckDB?
Yes, via the community-maintained DuckDB driver. Drop the JAR into Metabase's plugins/ directory, restart, and DuckDB shows up as a database type in the admin UI. Setup takes 5–10 minutes.
Is DuckDB fast enough for BI dashboards?
For most dashboards, yes — DuckDB is comparable to a single-node Snowflake or BigQuery shard on analytical queries. Datasets up to ~100GB scanned regularly run sub-second on modest hardware. Above that, query time depends heavily on disk speed and Parquet partitioning.
Can I use DuckDB and Postgres together in Metabase?
Yes. Metabase supports multiple databases simultaneously. A common pattern is to keep operational data in Postgres and analytical data in DuckDB, then build dashboards from each. You can't join across them inside Metabase, but DuckDB's postgres_scanner extension can pull Postgres tables into DuckDB queries directly.
Is this production-ready?
For internal tools, embedded analytics in early-stage SaaS, and small-team BI, yes. For high-concurrency public-facing analytics with strict SLAs, no — pair Metabase with a server-based warehouse. The driver is community-maintained, not officially supported by Metabase, so factor in that risk.
How does DuckDB handle concurrent writes?
It doesn't, in the same file. Only one process can hold a write lock. The standard pattern: an ETL job writes to a staging.duckdb file, then atomically renames it to analytics.duckdb (or symlinks to the latest) so Metabase always sees a consistent file.
Can DuckDB query data on S3 directly?
Yes. With the httpfs extension, DuckDB reads Parquet and CSV directly from S3, GCS, Azure or any HTTP endpoint: SELECT * FROM read_parquet('s3://bucket/path/*.parquet'). Combined with Metabase, you get serverless analytics directly on cloud storage — no copy, no ingestion.