Storage Layers

Deep dive into DBX’s 5-Tier Hybrid Storage architecture.

Table of contents

  1. Overview
  2. Tier 1: Delta Store
    1. Purpose
    2. Implementation
    3. Characteristics
    4. Usage
    5. Performance
  3. Tier 2: Columnar Cache
    1. Purpose
    2. Implementation
    3. Characteristics
    4. Usage
    5. Optimizations
      1. Projection Pushdown
      2. Predicate Pushdown
      3. Vectorized Execution
  4. Tier 3: WOS (Write-Optimized Store)
    1. Purpose
    2. Implementation
    3. Characteristics
    4. MVCC Versioning
    5. Usage
  5. Tier 4: Index (Bloom Filter)
    1. Purpose
    2. Implementation
    3. Characteristics
    4. Usage
    5. Performance Impact
  6. Tier 5: ROS (Read-Optimized Store)
    1. Purpose
    2. Implementation
    3. Characteristics
    4. Compression Ratios
    5. Usage
    6. Parquet Features
      1. Column Pruning
      2. Predicate Pushdown
  7. Data Flow
    1. Write Path
    2. Read Path (OLTP)
    3. Read Path (OLAP)
  8. Configuration
    1. Delta Store Settings
    2. Compaction Settings
  9. Performance Tuning
    1. Hot Data in Delta Store
    2. Batch Compaction
    3. Columnar Cache Warming
  10. Monitoring
    1. Storage Statistics
  11. Best Practices
    1. 1. Flush Strategically
    2. 2. Compact Regularly
    3. 3. Use Columnar Cache for Analytics
  12. Next Steps

Overview

DBX uses a sophisticated 5-Tier Hybrid Storage architecture designed to optimize both OLTP (transactional) and OLAP (analytical) workloads.

┌─────────────────────────────────────────┐
│  Tier 1: Delta Store (BTreeMap)         │  ← In-memory write buffer
│     - Lock-free concurrency             │
│     - Hot data caching                  │
└─────────────────┬───────────────────────┘
                  │ Flush
┌─────────────────▼───────────────────────┐
│  Tier 2: Columnar Cache (Arrow)         │  ← OLAP optimization
│     - RecordBatch caching               │
│     - Projection Pushdown               │
└─────────────────┬───────────────────────┘
                  │
┌─────────────────▼───────────────────────┐
│  Tier 3: WOS (sled)                     │  ← Persistent storage
│     - Write-Optimized Store             │
│     - MVCC with Snapshot Isolation      │
└─────────────────┬───────────────────────┘
                  │ Compaction
┌─────────────────▼───────────────────────┐
│  Tier 4: Index (Bloom Filter)           │  ← Fast existence check
│     - Minimize false positives          │
└─────────────────┬───────────────────────┘
                  │
┌─────────────────▼───────────────────────┐
│  Tier 5: ROS (Parquet)                  │  ← Columnar compression
│     - Read-Optimized Store              │
│     - Apache Arrow/Parquet              │
└─────────────────────────────────────────┘

Tier 1: Delta Store

Purpose

The Delta Store is an in-memory write buffer that provides:

  • Ultra-fast writes
  • Hot data caching
  • Lock-free concurrent reads
  • Automatic flushing to persistent storage

Implementation

// Internal structure (conceptual)
struct DeltaStore {
    data: BTreeMap<Vec<u8>, Vec<u8>>,
    flush_threshold: usize,
}

Characteristics

Feature Details
Data Structure BTreeMap<Vec<u8>, Vec<u8>>
Concurrency Lock-free reads, synchronized writes
Capacity Configurable (default: 10,000 records)
Flush Trigger Size threshold or manual flush
Persistence Volatile (lost on crash without WAL)

Usage

use dbx_core::Database;

fn main() -> dbx_core::DbxResult<()> {
    let db = Database::open("./data")?;
    
    // Writes go to Delta Store first
    db.insert("users", b"user:1", b"Alice")?;
    
    // Reads check Delta Store first (fastest path)
    let value = db.get("users", b"user:1")?;
    
    // Manual flush to WOS
    db.flush_delta("users")?;
    
    Ok(())
}

Performance

  • Write: O(log n) - BTreeMap insertion
  • Read: O(1) - Direct map lookup
  • Memory: ~100 bytes per record (overhead)

Tier 2: Columnar Cache

Purpose

The Columnar Cache provides OLAP optimization through:

  • Apache Arrow RecordBatch caching
  • Columnar data layout
  • Vectorized execution (SIMD)
  • Zero-copy operations

Implementation

// Internal structure (conceptual)
struct ColumnarCache {
    batches: Vec<RecordBatch>,
    schema: Arc<Schema>,
    last_sync: Instant,
}

Characteristics

Feature Details
Format Apache Arrow RecordBatch
Layout Columnar (column-oriented)
Operations Vectorized SIMD
Sync Delta Store → Columnar Cache
Use Case Analytical queries (SQL)

Usage

use dbx_core::Database;
use arrow::array::{Int32Array, StringArray, RecordBatch};
use arrow::datatypes::{DataType, Field, Schema};
use std::sync::Arc;

fn main() -> dbx_core::DbxResult<()> {
    let db = Database::open_in_memory()?;
    
    // Create schema
    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Int32, false),
        Field::new("name", DataType::Utf8, false),
    ]));
    
    // Create RecordBatch
    let batch = RecordBatch::try_new(
        schema.clone(),
        vec![
            Arc::new(Int32Array::from(vec![1, 2, 3])),
            Arc::new(StringArray::from(vec!["Alice", "Bob", "Charlie"])),
        ],
    ).unwrap();
    
    // Register table (goes to Columnar Cache)
    db.register_table("users", vec![batch]);
    
    // SQL queries use Columnar Cache
    let results = db.execute_sql("SELECT * FROM users WHERE id > 1")?;
    
    Ok(())
}

Optimizations

Projection Pushdown

Only read required columns:

// Only reads 'name' column from cache
let results = db.execute_sql("SELECT name FROM users")?;

Predicate Pushdown

Filter during scan:

// Filter applied at cache level
let results = db.execute_sql("SELECT * FROM users WHERE id > 100")?;

Vectorized Execution

Process multiple rows simultaneously using SIMD:

// Vectorized aggregation
let results = db.execute_sql("SELECT SUM(amount) FROM orders")?;

Tier 3: WOS (Write-Optimized Store)

Purpose

WOS provides persistent transactional storage:

  • ACID guarantees
  • MVCC with Snapshot Isolation
  • Crash recovery
  • Compaction

Implementation

Built on sled embedded database:

// Internal structure (conceptual)
struct WOS {
    db: sled::Db,
    mvcc: MvccManager,
}

Characteristics

Feature Details
Backend sled (embedded KV store)
Persistence Disk-based
Transactions MVCC Snapshot Isolation
Durability fsync on commit
Compaction Automatic background

MVCC Versioning

Each record stores multiple versions:

// Key format: table:key:version
// Value: serialized data + metadata
struct VersionedValue {
    value: Vec<u8>,
    version: u64,
    deleted: bool,
}

Usage

use dbx_core::Database;

fn main() -> dbx_core::DbxResult<()> {
    let db = Database::open("./data")?;
    
    // Transaction writes go to WOS
    let tx = db.begin_transaction()?;
    tx.insert("users", b"user:1", b"Alice")?;
    tx.commit()?; // Persisted to WOS
    
    Ok(())
}

Tier 4: Index (Bloom Filter)

Purpose

Bloom Filter provides fast existence checks:

  • Minimize false positives
  • Reduce unnecessary ROS reads
  • Space-efficient

Implementation

// Internal structure (conceptual)
struct BloomIndex {
    filter: BloomFilter,
    false_positive_rate: f64,
}

Characteristics

Feature Details
Type Probabilistic data structure
False Positives Possible (configurable rate)
False Negatives Never
Space ~10 bits per element
Lookup O(1)

Usage

use dbx_core::Database;

fn main() -> dbx_core::DbxResult<()> {
    let db = Database::open("./data")?;
    
    // Bloom filter checked before ROS lookup
    let value = db.get("users", b"user:1")?;
    
    // Rebuild index after bulk operations
    db.rebuild_index("users")?;
    
    Ok(())
}

Performance Impact

Without Bloom Filter:
  Read miss: Check Delta → WOS → ROS (slow)

With Bloom Filter:
  Read miss: Check Delta → WOS → Bloom → (skip ROS if not present)
  Speedup: ~3-5x for non-existent keys

Tier 5: ROS (Read-Optimized Store)

Purpose

ROS provides long-term columnar storage:

  • Apache Parquet format
  • High compression ratios
  • Efficient analytical scans
  • Schema evolution

Implementation

// Internal structure (conceptual)
struct ROS {
    parquet_files: Vec<PathBuf>,
    metadata: ParquetMetadata,
}

Characteristics

Feature Details
Format Apache Parquet
Compression ZSTD, Snappy, Brotli
Layout Columnar
Encoding Dictionary, RLE, Delta
Statistics Min/Max, Null count

Compression Ratios

Data Type Typical Compression
Integers 5-10x
Strings 3-5x
Timestamps 10-20x
Floats 2-4x

Usage

use dbx_core::Database;

fn main() -> dbx_core::DbxResult<()> {
    let db = Database::open("./data")?;
    
    // Compact WOS to ROS
    db.compact("users")?;
    
    // Reads from ROS (if not in Delta/WOS)
    let value = db.get("users", b"user:1")?;
    
    Ok(())
}

Parquet Features

Column Pruning

// Only reads 'name' column from Parquet
let results = db.execute_sql("SELECT name FROM users")?;

Predicate Pushdown

// Uses Parquet statistics to skip row groups
let results = db.execute_sql("SELECT * FROM users WHERE age > 30")?;

Data Flow

Write Path

Application
    ↓
1. Delta Store (in-memory)
    ↓ (auto-flush on threshold)
2. WOS (persistent, transactional)
    ↓ (manual/auto compaction)
3. ROS (compressed, columnar)

Example:

use dbx_core::Database;

fn main() -> dbx_core::DbxResult<()> {
    let db = Database::open("./data")?;
    
    // 1. Write to Delta Store
    db.insert("users", b"user:1", b"Alice")?;
    
    // 2. Flush to WOS
    db.flush_delta("users")?;
    
    // 3. Compact to ROS
    db.compact("users")?;
    
    Ok(())
}

Read Path (OLTP)

Application
    ↓
1. Check Delta Store → if found, return
    ↓
2. Check WOS → if found, return
    ↓
3. Check Bloom Filter → if not present, return None
    ↓
4. Read from ROS → return

Read Path (OLAP)

Application (SQL query)
    ↓
1. Query Optimizer
    ↓
2. Columnar Cache (if cached)
    ↓
3. Sync Delta Store to Cache
    ↓
4. Vectorized Execution (SIMD)
    ↓
5. Optional: GPU Acceleration
    ↓
Results

Configuration

Delta Store Settings

use dbx_core::{Database, Config};

fn main() -> dbx_core::DbxResult<()> {
    let config = Config::default()
        .delta_flush_threshold(10000)  // Flush after 10k records
        .delta_flush_interval(60);     // Or after 60 seconds
    
    let db = Database::open_with_config("./data", config)?;
    
    Ok(())
}

Compaction Settings

let config = Config::default()
    .compaction_threshold(100_000)  // Compact after 100k records
    .compaction_interval(3600);     // Or after 1 hour

let db = Database::open_with_config("./data", config)?;

Performance Tuning

Hot Data in Delta Store

Keep frequently accessed data in memory:

// Hot data stays in Delta Store
for i in 0..1000 {
    let key = format!("hot:key:{}", i);
    db.insert("cache", key.as_bytes(), b"value")?;
}
// Don't flush immediately

Batch Compaction

Compact multiple tables together:

db.compact("users")?;
db.compact("orders")?;
db.compact("products")?;

Columnar Cache Warming

Pre-load cache for analytical queries:

// Register frequently queried tables
db.register_table("users", batches)?;
db.register_table("orders", order_batches)?;

Monitoring

Storage Statistics

use dbx_core::Database;

fn main() -> dbx_core::DbxResult<()> {
    let db = Database::open("./data")?;
    
    // Get storage stats
    let stats = db.storage_stats("users")?;
    
    println!("Delta Store: {} records", stats.delta_count);
    println!("WOS: {} records", stats.wos_count);
    println!("ROS: {} files, {} MB", stats.ros_files, stats.ros_size_mb);
    
    Ok(())
}

Best Practices

1. Flush Strategically

// Good: Flush after batch operations
for i in 0..10000 {
    db.insert("users", format!("user:{}", i).as_bytes(), b"data")?;
}
db.flush_delta("users")?;

// Avoid: Flush after every write
db.insert("users", b"user:1", b"data")?;
db.flush_delta("users")?; // Too frequent

2. Compact Regularly

// Schedule compaction during low-traffic periods
if is_low_traffic_time() {
    db.compact("users")?;
}

3. Use Columnar Cache for Analytics

// Register tables for SQL queries
db.register_table("users", user_batches)?;
db.register_table("orders", order_batches)?;

// Now SQL queries are fast
let results = db.execute_sql(
    "SELECT u.name, COUNT(o.id) 
     FROM users u 
     JOIN orders o ON u.id = o.user_id 
     GROUP BY u.name"
)?;

Next Steps


Copyright © 2026 ByteLogicCore. MIT OR Apache-2.0 License.

This site uses Just the Docs, a documentation theme for Jekyll.