154 lines
3.0 KiB
Markdown
154 lines
3.0 KiB
Markdown
# JSONite: High-Performance Embedded Database for Semi-Structured Data
|
|
|
|
---
|
|
|
|
## The JSON Performance Crisis
|
|
|
|
**JSON is Everywhere:**
|
|
- Web APIS, IoT, logs, configurations
|
|
- Semi-structured, flexible, human-readable
|
|
|
|
**But Current Solutions Fail:**
|
|
- **Large Databases**: People use MongoDB or PostgreSQL's JSONB to store data
|
|
- **Embeded Databases**: RocksDB and PoloDB lack of ACID and SQL support
|
|
- **Serialization to String**: Or serialize JSON into strings and store in SQLite
|
|
|
|
|
|
Serialized JSON with SQL example
|
|
|
|
```sql
|
|
insert into http_request_log (ip, headers)
|
|
values ('127.0.0.1', '{
|
|
"Content-Type": "application/oct-stream",
|
|
"X-Forwarded-For": "100.64.0.1",
|
|
}');
|
|
```
|
|
|
|
---
|
|
|
|
## Introducing JSONite
|
|
|
|
**Best of Both Worlds:**
|
|
- SQLite's based
|
|
- Native JSON optimization
|
|
|
|
**Key Advantages:**
|
|
- ✅ ACID compliance
|
|
- ✅ SQL simplicity
|
|
- ✅ Serverless C library
|
|
- ✅ Lightning-fast JSON access
|
|
|
|
---
|
|
|
|
## Smart Key Optimization
|
|
|
|
**Key Sorting by Length:**
|
|
```
|
|
{
|
|
"id": 1,
|
|
"address": {...}
|
|
"name": "John",
|
|
"email": "john@example.com",
|
|
}
|
|
```
|
|
|
|
**Sorted as:**
|
|
```
|
|
{
|
|
"id", (2 chars)
|
|
"name", (4 chars)
|
|
"email", (5 chars)
|
|
"address", (7 chars)
|
|
}
|
|
```
|
|
|
|
**Binary search on length → Fast lookups**
|
|
|
|
---
|
|
|
|
## Handling Massive Data: Smart TOAST
|
|
|
|
**The Oversized-Attribute Storage Technique**
|
|
- Standard approach: arbitrary chunking
|
|
- JSONite's innovation: **Data-Type Aware TOAST**
|
|
|
|
**Intelligent Chunking:**
|
|
- Arrays split between elements
|
|
- Objects split between key-value pairs
|
|
- Text falls back to fixed chunks
|
|
|
|
**Enables "Slice Detoasting":**
|
|
- `$.logs[1000000:1000010]` fetches only 10 elements
|
|
- Not the entire multi-gigabyte array
|
|
|
|
|
|
Smart Chunking Example
|
|
|
|
```json
|
|
{
|
|
"id": 1,
|
|
"title": "some text",
|
|
"html": <pointer to TOAST of 200k text>,
|
|
"photos": [<pointer to TOAST of binary data>],
|
|
"crawl_logs": [<pointer to TOAST of array of texts>]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Query Power
|
|
|
|
**Full SQL + JSON Support:**
|
|
- PostgreSQL-compatible JSONB path operators
|
|
- GIN indexes for instant search
|
|
|
|
```sql
|
|
SELECT *
|
|
FROM accounts
|
|
WHERE data @> '{"status": "active"}'
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Validation: Benchmark Datasets
|
|
|
|
**Three Specialized Workloads:**
|
|
|
|
1. **YCSB-Style Read Benchmark**
|
|
- Yahoo! Cloud Serving Benchmark
|
|
- 1M JSON documents (1KB-100KB each)
|
|
|
|
2. **TPC-C Inspired Update Benchmark**
|
|
- Transaction Processing Performance Council
|
|
- 100K transactional JSON records
|
|
- Frequent small field updates
|
|
|
|
3. **Large-Array Slice Benchmark**
|
|
- Multi-gigabyte JSON documents
|
|
- Massive arrays (10M+ elements)
|
|
|
|
**Comparison Targets:** SQLite JSONB vs MongoDB vs PostgreSQL vs JSONite
|
|
|
|
---
|
|
|
|
## JSONite: The Future of Embedded Data Storage
|
|
|
|
**Why It Matters Today:**
|
|
- **Edge Computing**: Lightweight, handles sensor data efficiently
|
|
- **Modern Apps**: SQL power + JSON flexibility, no schema migrations
|
|
|
|
**The Vision:**
|
|
- Open source implementation
|
|
- Community-driven development
|
|
- Becoming the default choice for embedded JSON storage
|
|
- Bridging SQL reliability with NoSQL flexibility
|
|
|
|
---
|
|
|
|
## Thank You
|
|
|
|
**Questions?**
|
|
|
|
*CHEN Yongyuan*
|
|
*2025-11-01*
|