DuckDB Resources
A curated collection of valuable DuckDB resources to help you get the most out of this analytical database. Most of the links are sourced from the great awesome-duckdb project, thanks a lot!
Official Resources
- Official documentation - Comprehensive DuckDB documentation
- Official blog - Latest articles, news and updates
- DuckDB clients - Client APIs for DuckDB
- DuckDB documentation PDF - The documentation as a single PDF file
- DuckDB documentation in Markdown - The documentation as a single Markdown file
Client APIs
DuckDB offers client APIs (also known as "drivers") for several languages, categorized by support tier.
Primary Support Tier
These clients are the first to receive new features and are covered by community support.
- C - The foundational API maintained by the DuckDB team
- Command Line Interface (CLI) - Interactive shell for DuckDB
- Java (JDBC) - JDBC driver for Java applications
- Go - Go SQL driver for DuckDB
- Node.js (node-neo) - Modern Node.js driver
- Python - Python client for DuckDB
- R - R interface for DuckDB
- WebAssembly (Wasm) - Run DuckDB in the browser
Secondary Support Tier
These clients receive new features but are not covered by community support.
- ADBC (Arrow) - Arrow Database Connectivity
- C# (.NET) - .NET driver for DuckDB
- C++ - C++ API for DuckDB
- Dart - DuckDB for Dart applications
- Julia - DuckDB for Julia language
- Node.js (deprecated) - Original Node.js API
- ODBC - Open Database Connectivity driver
- Rust - Rust client for DuckDB
- Swift - Swift client for DuckDB
Tertiary Support Tier
These clients are maintained by third parties with no feature or support guarantees.
Extensions
DuckDB's functionality can be extended through extensions, which are organized into Core Extensions (maintained by the DuckDB team) and Community Extensions (contributed by the community).
Core Extensions
These extensions are maintained by the DuckDB team and can be installed via INSTALL <extension_name>
.
- arrow - Zero-copy data integration with Apache Arrow
- autocomplete - Adds support for autocomplete in the shell
- aws - Provides features that depend on the AWS SDK
- azure - Adds filesystem abstraction for Azure blob storage
- delta - Adds support for Delta Lake
- excel - Adds support for Excel-like format strings
- fts - Adds support for Full-Text Search Indexes
- httpfs - Support for HTTP(S) or S3 connections
- iceberg - Adds support for Apache Iceberg
- icu - Support for time zones and collations using ICU
- inet - Support for IP-related data types and functions
- jemalloc - Overwrites system allocator with jemalloc
- json - Adds support for JSON operations
- mysql - Support for MySQL database connections
- parquet - Support for reading and writing Parquet files
- postgres - Support for PostgreSQL connections
- spatial - Geospatial functionality and processing
- sqlite - Support for SQLite database files
- tpcds - TPC-DS data generation and query support
- tpch - TPC-H data generation and query support
- vss - Support for vector similarity search queries
Community Extensions
These extensions are contributed by the community and can be installed via INSTALL <extension_name> FROM community
.
- avro - Read Apache Avro files
- bigquery - Google BigQuery integration
- blockduck - Live SQL queries on blockchain
- cache_httpfs - Read cached filesystem for httpfs
- capi_quack - Hello world example from C/C++ C API template
- chsql - ClickHouse SQL dialect macros for DuckDB
- chsql_native - ClickHouse native client & file reader
- cronjob - DuckDB HTTP cronjob extension
- crypto - Cryptographic hash functions and HMAC
- datasketches - Apache DataSketches for approximate analytics
- duckpgq - Graph workloads supporting SQL/PGQ standard
- evalexpr_rhai - Evaluates Rhai scripting language in SQL
- flockmtl - LLM & RAG extension for analytics and semantic analysis
- fuzzycomplete - Fuzzy string matching for autocompletion
- geography - Global spatial data processing on the sphere
- gsheets - Read and write Google Sheets using SQL
- h3 - Hierarchical hexagonal indexing for geospatial data
- hdf5 - Read HDF5 files from DuckDB
- hostfs - Navigate and explore the filesystem using SQL
- http_client - DuckDB HTTP client extension
- httpserver - DuckDB HTTP API server and query interface
- lindel - Linearization/Delinearization, Z-Order, Hilbert curves
- magic - libmagic/file utilities ported to DuckDB
- netquack - Parse, extract, and analyze domains, URIs, and paths
- open_prompt - Interact with LLMs with a simple extension
- pcap_reader - Read PCAP files from DuckDB
- pivot_table - Provides a spreadsheet-style pivot_table function
- prql - Support for PRQL, the Pipelined Relational Query Language
- psql - Support for PSQL, a piped SQL dialect for DuckDB
- pyroscope - DuckDB Pyroscope extension for continuous profiling
- quack - Provides a hello world example demo
- rusty_quack - Hello world demo from Rust-based extension template
- scrooge - Financial data aggregation and scanners
- shellfs - Use shell commands for input and output
- sheetreader - Fast XLSX file importer
- substrait - Allows conversion and execution of Substrait query plans
- tsid - DuckDB Time-Sortable ID generator
- ulid - ULID data type for DuckDB (timestamped UUID-like identifiers)
- webmacro - Load DuckDB Macros from the web
- zipfs - Read files within zip archives
Learning Resources
Links to talks, videos, books and podcasts
Talks & Videos
- DuckCon #6 playlist
- DuckCon #5 playlist
- DuckCon #4 playlist
- DuckCon #3 playlist
- DuckCon #2 playlist
- DuckDB: Crunching data anywhere from laptops to servers @ GOTO Amsterdam 2024 - Gábor Szárnyas
- In-Process Analytical Data Management with DuckDB @ PyData Amsterdam - Hannes Mühleisen
- DuckDB: The Power of a Data Warehouse in your Python Process @ PyData Yerevan - Gábor Szárnyas
- DuckDB: Bringing analytical SQL directly to your Python shell @ EuroPython - Pedro Holanda
- DuckDB keynote @ Data + AI Summit 2023 - Hannes Mühleisen
- DuckDB: Bringing Analytical SQL Directly To Your Python Shell @ FOSDEM - Pedro Holanda
- DuckDB Extensions @ DuckCon - Pedro Holanda & Sam Ansmink
- Developing Systems in Academia: The Good, the Bad, and the not-so-Ugly Duckling @ CIDR - Hannes Mühleisen
- DuckDB An Embeddable Analytical Database @ FOSDEM - Hannes Mühleisen
- DuckDB tutorials playlist by Learn Data with Mark - Mark Needham
- DuckDB tutorials playlist by MotherDuck - Mehdi Ouazza
- Nextflow and database uses: powering data engineering, exploring DuckDB, and beyond - Edmund Miller
- Why should you care about DuckDB? @ Dublin DuckDB meetup - Mihai Bojin
- Exploring Monte Carlo Simulations With DuckDB @ Dublin DuckDB meetup - James McNeill
- DuckDB and recommenders: a lightning fast synergy @ Dublin DuckDB meetup - Khalil Muhammad
Podcasts
- Developer Voices: Implementing Hardware-Friendly Databases - Hannes Mühleisen
- The Geek Narrator: DuckDB Internals - Mark Raasveldt
- Software Engineering Daily: DuckDB - Hannes Mühleisen
- Data Engineering Podcast: Move Your Database To The Data - Hannes Mühleisen
- The Analytics Engineering Podcast: The Personal Data Warehouse - Jordan Tigani
Books
- DuckDB in Action - Book by Manning Publications
- Getting Started with DuckDB - Practical guide for data workflows
Cloud & Serverless
- AWS Lambda Layers for DuckDB - Run DuckDB in AWS Lambda functions
- Serverless DuckDB - Use DuckDB as API with Amazon API Gateway and AWS Lambda
- Serverless Parquet Repartitioner - Use DuckDB to repartition data in S3-based Data Lakes
- DuckDB as API in Docker - A TypeScript-based Docker image containing DuckDB, and a Hono framework REST API with JSON or streaming Arrow responses
Tools based on DuckDB
- SQL Workbench - SQL Workbench for running queries on local or remote data, data visualizations, and sharing queries via URLs
- Rill Data - Tool for transforming data sets into powerful, opinionated dashboards using SQL
- Ibis Project - A DataFrame API for interacting with DuckDB and other compute engines
- Boiling Data - Serverless data analytics overlay on top of S3 Data Lakes
- Hex Dataframe SQL - Hex's Dataframe SQL cells powered by DuckDB
- Mode - Uses DuckDB for their in-memory data engine
- VulcanSQL - Data API framework for creating REST APIs by writing SQL templates
- Tad - A fast, free, cross-platform tabular data viewer application
- Honeycomb Maps - A browser-based geospatial analysis tool leveraging DuckDB-Wasm
- Malloy - Experimental language for describing data relationships and transformations
- Evidence - Generate reports using SQL and markdown
- Huey - Blazing-fast & intuitive pivot tables on Parquet, CSV, JSON files
- DatalakeStudio - Load, explore, transform datasets and expose them via API
- Spice.ai - A unified SQL query interface and portable runtime
- Definite - Analytics platform with managed DuckDB, ELT, and BI
- Amphi ETL - Low-code data pipelines for structured and unstructured data
- Quackpipe - Serverless OLAP API/UI with ClickHouse API compatibility
- UniverSQL - Implementation of Snowflake API for running queries locally
- Whereabouts - Fast, accurate, open-source geocoding in Python
- sqlglot - Python transpiler for 23 different SQL dialects
- yato - The smallest DuckDB SQL orchestrator on Earth
- SQLMesh - Next-generation data transformation and modeling framework
- Duck-UI - Web-based interface for interacting with DuckDB
SQL Clients
- Harlequin - The DuckDB IDE for your terminal
- qStudio - A free SQL tool specialized for data analysts
- DBeaver - Universal database access and development tool
- DataGrip - Paid SQL IDE by JetBrains
- Duckling - A fast viewer for CSV/Parquet files and DuckDB/SQLite
- SQL DATA LENS - A lightweight, commercial SQL IDE
- Dataflare - Simple easy-to-use database manager