# JSONite: High-Performance Embedded Database for Semi-Structured Data --- ## The JSON Performance Crisis **JSON is Everywhere:** - Web APIS, IoT, logs, configurations - Semi-structured, flexible, human-readable **But Current Solutions Fail:** - **Large Databases**: People use MongoDB or PostgreSQL's JSONB to store data - **Embeded Databases**: RocksDB and PoloDB lack of ACID and SQL support - **Serialization to String**: Or serialize JSON into strings and store in SQLite Serialized JSON with SQL example ```sql insert into http_request_log (ip, headers) values ('127.0.0.1', '{ "Content-Type": "application/oct-stream", "X-Forwarded-For": "100.64.0.1", }'); ``` --- ## Introducing JSONite **Best of Both Worlds:** - SQLite's based - Native JSON optimization **Key Advantages:** - ✅ ACID compliance - ✅ SQL simplicity - ✅ Serverless C library - ✅ Lightning-fast JSON access --- ## Smart Key Optimization **Key Sorting by Length:** ``` { "id": 1, "address": {...} "name": "John", "email": "john@example.com", } ``` **Sorted as:** ``` { "id", (2 chars) "name", (4 chars) "email", (5 chars) "address", (7 chars) } ``` **Binary search on length → Fast lookups** --- ## Handling Massive Data: Smart TOAST **The Oversized-Attribute Storage Technique** - Standard approach: arbitrary chunking - JSONite's innovation: **Data-Type Aware TOAST** **Intelligent Chunking:** - Arrays split between elements - Objects split between key-value pairs - Text falls back to fixed chunks **Enables "Slice Detoasting":** - `$.logs[1000000:1000010]` fetches only 10 elements - Not the entire multi-gigabyte array Smart Chunking Example ```json { "id": 1, "title": "some text", "html": , "photos": [], "crawl_logs": [] } ``` --- ## Query Power **Full SQL + JSON Support:** - PostgreSQL-compatible JSONB path operators - GIN indexes for instant search ```sql SELECT * FROM accounts WHERE data @> '{"status": "active"}' ``` --- ## Performance Validation: Benchmark Datasets **Three Specialized Workloads:** 1. **YCSB-Style Read Benchmark** - Yahoo! Cloud Serving Benchmark - 1M JSON documents (1KB-100KB each) 2. **TPC-C Inspired Update Benchmark** - Transaction Processing Performance Council - 100K transactional JSON records - Frequent small field updates 3. **Large-Array Slice Benchmark** - Multi-gigabyte JSON documents - Massive arrays (10M+ elements) **Comparison Targets:** SQLite JSONB vs MongoDB vs PostgreSQL vs JSONite --- ## JSONite: The Future of Embedded Data Storage **Why It Matters Today:** - **Edge Computing**: Lightweight, handles sensor data efficiently - **Modern Apps**: SQL power + JSON flexibility, no schema migrations **The Vision:** - Open source implementation - Community-driven development - Becoming the default choice for embedded JSON storage - Bridging SQL reliability with NoSQL flexibility --- ## Thank You **Questions?** *CHEN Yongyuan* *2025-11-01*