SQL API
SQL query execution for DBX.
Overview
DBX provides SQL support for complex queries using Apache Arrow and DataFusion-inspired query execution.
Supported SQL Features:
SELECT- Column projectionWHERE- Filtering with predicatesJOIN- Inner joinsGROUP BY- AggregationORDER BY- SortingLIMIT- Result limiting
Executing SQL
execute_sql(sql: &str) -> DbxResult<RecordBatch>
Executes a SQL query and returns results as an Arrow RecordBatch.
Parameters:
sql- SQL query string
Returns:
DbxResult<RecordBatch>- Query results in Arrow format
Example:
let result = db.execute_sql("SELECT name, age FROM users WHERE age > 18")?;
SQL Syntax
SELECT Statement
Basic SELECT:
SELECT column1, column2 FROM table_name
Example:
let result = db.execute_sql("SELECT name, email FROM users")?;
WHERE Clause
Filtering:
SELECT * FROM table_name WHERE condition
Supported Operators:
=- Equal!=or<>- Not equal>- Greater than>=- Greater than or equal<- Less than<=- Less than or equalAND- Logical ANDOR- Logical ORNOT- Logical NOT
Example:
let result = db.execute_sql(
"SELECT name FROM users WHERE age >= 18 AND city = 'Seoul'"
)?;
JOIN Clause
Inner Join:
SELECT t1.col1, t2.col2
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.user_id
Example:
let result = db.execute_sql(
"SELECT u.name, o.amount
FROM users u
JOIN orders o ON u.id = o.user_id"
)?;
GROUP BY Clause
Aggregation:
SELECT column, AGG_FUNC(column)
FROM table_name
GROUP BY column
Supported Aggregate Functions:
COUNT(*)- Count rowsSUM(column)- Sum valuesAVG(column)- Average valuesMIN(column)- Minimum valueMAX(column)- Maximum value
Example:
let result = db.execute_sql(
"SELECT city, COUNT(*), AVG(age)
FROM users
GROUP BY city"
)?;
ORDER BY Clause
Sorting:
SELECT * FROM table_name ORDER BY column [ASC|DESC]
Example:
let result = db.execute_sql(
"SELECT name, age FROM users ORDER BY age DESC"
)?;
LIMIT Clause
Result Limiting:
SELECT * FROM table_name LIMIT n
Example:
let result = db.execute_sql(
"SELECT name FROM users ORDER BY created_at DESC LIMIT 10"
)?;
Working with Results
RecordBatch
Results are returned as Arrow RecordBatch, which provides:
- Zero-copy access to data
- Columnar format for efficient processing
- Type safety with Arrow schema
Example:
use arrow::array::StringArray;
let result = db.execute_sql("SELECT name FROM users")?;
// Access column
let name_col = result
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
// Iterate rows
for i in 0..result.num_rows() {
println!("Name: {}", name_col.value(i));
}
Converting to Rust Types
Example: Extract to Vec:
use arrow::array::{Int64Array, StringArray};
let result = db.execute_sql("SELECT id, name FROM users")?;
let ids = result.column(0).as_any().downcast_ref::<Int64Array>().unwrap();
let names = result.column(1).as_any().downcast_ref::<StringArray>().unwrap();
let users: Vec<(i64, String)> = (0..result.num_rows())
.map(|i| (ids.value(i), names.value(i).to_string()))
.collect();
Query Optimization
Automatic Optimizations
DBX automatically applies:
- Projection Pushdown - Only read required columns
- Predicate Pushdown - Filter data early
- Column Pruning - Skip unnecessary columns
- Vectorized Execution - SIMD operations
Example:
-- Only reads 'age' column, filters early
SELECT COUNT(*) FROM users WHERE age > 18
GPU Acceleration
For large datasets (>1M rows), DBX can use GPU acceleration:
// GPU is automatically used for supported operations
let result = db.execute_sql(
"SELECT SUM(amount) FROM transactions WHERE amount > 500000"
)?;
GPU-Accelerated Operations:
SUM,AVG,MIN,MAXWHEREwith numeric predicates- Large table scans (>1M rows)
Performance Tips
Best Practices
- Use column projection - Select only needed columns
-- ✅ Good SELECT name, age FROM users -- ❌ Bad SELECT * FROM users - Filter early - Use WHERE clause to reduce data
-- ✅ Good SELECT name FROM users WHERE age > 18 -- ❌ Bad (filter in application) SELECT name, age FROM users - Use indexes - Create indexes for frequently queried columns
db.create_index("users", "age")?; - Batch queries - Combine multiple queries when possible
-- ✅ Good SELECT city, COUNT(*), AVG(age) FROM users GROUP BY city -- ❌ Bad (multiple queries) -- SELECT COUNT(*) FROM users WHERE city = 'Seoul' -- SELECT AVG(age) FROM users WHERE city = 'Seoul'
SQL Examples
Example 1: User Analytics
let result = db.execute_sql(
"SELECT
city,
COUNT(*) as user_count,
AVG(age) as avg_age,
MAX(created_at) as last_signup
FROM users
WHERE active = true
GROUP BY city
ORDER BY user_count DESC
LIMIT 10"
)?;
Example 2: Sales Report
let result = db.execute_sql(
"SELECT
p.name as product_name,
SUM(o.quantity) as total_sold,
SUM(o.amount) as total_revenue
FROM orders o
JOIN products p ON o.product_id = p.id
WHERE o.created_at >= '2026-01-01'
GROUP BY p.name
ORDER BY total_revenue DESC"
)?;
Example 3: Top Customers
let result = db.execute_sql(
"SELECT
u.name,
COUNT(o.id) as order_count,
SUM(o.amount) as total_spent
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.name
HAVING total_spent > 10000
ORDER BY total_spent DESC
LIMIT 20"
)?;
Error Handling
SQL Errors
DbxError::SqlParse- SQL syntax errorDbxError::SqlExecution- Query execution errorDbxError::ColumnNotFound- Referenced column doesn’t existDbxError::TableNotFound- Referenced table doesn’t exist
Example:
match db.execute_sql("SELECT * FROM nonexistent_table") {
Ok(result) => println!("Success"),
Err(DbxError::TableNotFound(table)) => {
eprintln!("Table not found: {}", table);
}
Err(e) => eprintln!("Error: {}", e),
}
Limitations
Current Limitations
- ❌ No
UPDATEorDELETEvia SQL (use CRUD API) - ❌ No subqueries
- ❌ No window functions
- ❌ No
UNIONorINTERSECT - ❌ No
LEFT JOINorRIGHT JOIN(onlyINNER JOIN)
Workarounds
UPDATE via CRUD:
// Instead of: UPDATE users SET age = 30 WHERE id = 1
db.insert("users", b"user:1", b"30")?;
DELETE via CRUD:
// Instead of: DELETE FROM users WHERE id = 1
db.delete("users", b"user:1")?;
See Also
- Database API - Core database operations
- SQL Reference Guide - Detailed SQL syntax
- GPU Acceleration Guide - GPU query optimization