Evaluating The Impact of Optimizations On Duckdb
DuckDB is an embedded analytical database system designed to support efficient analytical query processing through vectorized execution and parallelism. This paper investigates three execution and optimization strategies implemented in DuckDB: vectorized (batch) execution, multithreaded parallel execution, and optimizer-level enhancements based on bi-directional information passing (Parachute). The evaluation is conducted using the TPC-H benchmark at scale factor 1 (SF1), stored in Parquet format, and a fixed workload of five analytical queries that include scans, joins, filters, and aggregations. The study employs repeated query execution under controlled conditions and analyzes performance using median execution time and execution-time variability. The results provide insight into the performance characteristics and execution behavior of the evaluated strategies across different analytical query patterns.
