Parallel Query Execution

Process large datasets in parallel using Rayon thread pools.

Overview

DBX’s parallel query executor processes multiple RecordBatches concurrently. Parallelization only activates when data is large enough, avoiding overhead on small datasets.

Small (< 1,000 rows):  Sequential    → No overhead
Large (≥ 1,000 rows):  Parallel      → Multi-core utilization

Supported Operations

Operation	Method	Description
Filter	`par_filter()`	Parallel row filtering by predicate
Aggregate	`par_aggregate()`	Parallel SUM, COUNT, AVG, MIN, MAX
Projection	`par_project()`	Parallel column extraction

Usage

use dbx_core::sql::executor::parallel_query::{
    ParallelQueryExecutor, AggregateType
};

let executor = ParallelQueryExecutor::new();  // default: parallel above 1000 rows

// Parallel aggregation
let result = executor.par_aggregate(&batches, 0, AggregateType::Sum)?;
println!("Sum: {}, Count: {}", result.value, result.count);

// Custom configuration
let executor = ParallelQueryExecutor::new()
    .with_min_rows(5000)         // parallel above 5000 rows
    .with_threshold(4)           // requires 4+ batches
    .with_thread_pool(pool);     // custom thread pool

Parallelization Criteria

Parallel execution activates when both conditions are met:

Batch count ≥ parallel_threshold (default 2)
Total rows ≥ min_rows_for_parallel (default 1,000)

Performance

Data Size	Sequential	Parallel	Note
150 rows	431 ns	32.5 µs	🚫 Parallel slower → sequential fallback
10,000 rows	~50 µs	~15 µs	✅ Parallel faster
1M rows	~5 ms	~1.2 ms	🔥 4x improvement

Next Steps

Plan Cache Guide — Optimize repeated SQL execution
WAL Recovery Guide — WAL partitioning synergy