Compare commits
1 Commits
it-writing
...
advance-to
| Author | SHA1 | Date | |
|---|---|---|---|
|
cf351be434
|
BIN
examples/images/Screenshot_20251125_010053.jpeg
Normal file
BIN
examples/images/Screenshot_20251125_010053.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 28 KiB |
BIN
examples/images/Screenshot_20251125_011029.jpeg
Normal file
BIN
examples/images/Screenshot_20251125_011029.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 111 KiB |
BIN
examples/images/Screenshot_20251125_011348.jpeg
Normal file
BIN
examples/images/Screenshot_20251125_011348.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 92 KiB |
BIN
examples/images/Screenshot_20251125_014355.jpeg
Normal file
BIN
examples/images/Screenshot_20251125_014355.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 79 KiB |
BIN
examples/images/Screenshot_20251125_015416.jpeg
Normal file
BIN
examples/images/Screenshot_20251125_015416.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 36 KiB |
BIN
examples/images/Screenshot_20251125_015705.jpeg
Normal file
BIN
examples/images/Screenshot_20251125_015705.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 40 KiB |
@@ -7,7 +7,7 @@
|
|||||||
<title>reveal.js - Markdown Example</title>
|
<title>reveal.js - Markdown Example</title>
|
||||||
|
|
||||||
<link rel="stylesheet" href="../dist/reveal.css">
|
<link rel="stylesheet" href="../dist/reveal.css">
|
||||||
<link rel="stylesheet" href="../dist/theme/white.css" id="theme">
|
<link rel="stylesheet" href="../dist/theme/black.css" id="theme">
|
||||||
|
|
||||||
<link rel="stylesheet" href="../plugin/highlight/monokai.css">
|
<link rel="stylesheet" href="../plugin/highlight/monokai.css">
|
||||||
</head>
|
</head>
|
||||||
@@ -19,7 +19,7 @@
|
|||||||
<div class="slides">
|
<div class="slides">
|
||||||
|
|
||||||
<!-- Use external markdown resource, separate slides by three newlines; vertical slides by two newlines -->
|
<!-- Use external markdown resource, separate slides by three newlines; vertical slides by two newlines -->
|
||||||
<section style="text-align: left;" data-markdown="markdown.md" data-separator="---" data-separator-vertical="^\n\n"></section>
|
<section data-markdown="markdown.md" data-separator="^\n\n\n" data-separator-vertical="^\n\n"></section>
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@@ -1,153 +1,133 @@
|
|||||||
# JSONite: High-Performance Embedded Database for Semi-Structured Data
|
# RUG: Turbo LLM for Rust Unit Test Generation
|
||||||
|
|
||||||
---
|
Keywords: LLM, Rust, Unit Test
|
||||||
|
|
||||||
## The JSON Performance Crisis
|
Research date: 2022, published date: 2025
|
||||||
|
|
||||||
**JSON is Everywhere:**
|
|
||||||
- Web APIS, IoT, logs, configurations
|
|
||||||
- Semi-structured, flexible, human-readable
|
|
||||||
|
|
||||||
**But Current Solutions Fail:**
|
|
||||||
- **Large Databases**: People use MongoDB or PostgreSQL's JSONB to store data
|
|
||||||
- **Embeded Databases**: RocksDB and PoloDB lack of ACID and SQL support
|
|
||||||
- **Serialization to String**: Or serialize JSON into strings and store in SQLite
|
|
||||||
|
|
||||||
|
|
||||||
Serialized JSON with SQL example
|
|
||||||
|
|
||||||
```sql
|
#### Introduction
|
||||||
insert into http_request_log (ip, headers)
|
|
||||||
values ('127.0.0.1', '{
|
|
||||||
"Content-Type": "application/oct-stream",
|
|
||||||
"X-Forwarded-For": "100.64.0.1",
|
|
||||||
}');
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
* Unit testing is crucial but costly.
|
||||||
|
|
||||||
## Introducing JSONite
|
* Rust's strict type system.
|
||||||
|
|
||||||
**Best of Both Worlds:**
|
* Existing LLM approaches often fail.
|
||||||
- SQLite's based
|
|
||||||
- Native JSON optimization
|
|
||||||
|
|
||||||
**Key Advantages:**
|
|
||||||
- ✅ ACID compliance
|
|
||||||
- ✅ SQL simplicity
|
|
||||||
- ✅ Serverless C library
|
|
||||||
- ✅ Lightning-fast JSON access
|
|
||||||
|
|
||||||
---
|
#### Rust Unit Test
|
||||||
|
|
||||||
## Smart Key Optimization
|
```rust
|
||||||
|
/// Returns the sum of two numbers
|
||||||
**Key Sorting by Length:**
|
///
|
||||||
```
|
/// # Examples
|
||||||
{
|
///
|
||||||
"id": 1,
|
/// ```
|
||||||
"address": {...}
|
/// assert_eq!(add(2, 3), 5);
|
||||||
"name": "John",
|
/// assert_eq!(add(-1, 1), 0);
|
||||||
"email": "john@example.com",
|
/// ```
|
||||||
|
fn add(a: i32, b: i32) -> i32 {
|
||||||
|
a + b
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
**Sorted as:**
|
|
||||||
```
|
#### Challenge
|
||||||
{
|
|
||||||
"id", (2 chars)
|
```rust
|
||||||
"name", (4 chars)
|
fn encode<E: Encoder>(&self: char, encoder: E) -> Result<EncodeError> // target function
|
||||||
"email", (5 chars)
|
|
||||||
"address", (7 chars)
|
impl<W: Writer, C: Config> Encoder for EncoderImpl
|
||||||
}
|
|
||||||
|
pub struct EncoderImpl<W: Writer, C: Config>
|
||||||
|
impl Writer for SliceWriter
|
||||||
|
impl Writer for IoWriter
|
||||||
|
|
||||||
|
impl<T> Config for T where T: R1 + R2 + R3
|
||||||
|
pub struct Configuration<R1, R2, R3>
|
||||||
```
|
```
|
||||||
|
|
||||||
**Binary search on length → Fast lookups**
|
Simplified python version
|
||||||
|
|
||||||
---
|
```python
|
||||||
|
def encode(char_data, encoder):
|
||||||
|
result = encoder.process(char_data)
|
||||||
|
return result
|
||||||
|
|
||||||
## Handling Massive Data: Smart TOAST
|
class Encoder:
|
||||||
|
def __init__(self, writer, config):
|
||||||
|
self.config = config
|
||||||
|
|
||||||
**The Oversized-Attribute Storage Technique**
|
def process(self, data):
|
||||||
- Standard approach: arbitrary chunking
|
output = self.writer.write(data, self.config)
|
||||||
- JSONite's innovation: **Data-Type Aware TOAST**
|
return output
|
||||||
|
|
||||||
**Intelligent Chunking:**
|
class Config:
|
||||||
- Arrays split between elements
|
def __init__(self):
|
||||||
- Objects split between key-value pairs
|
self.settings = {}
|
||||||
- Text falls back to fixed chunks
|
|
||||||
|
|
||||||
**Enables "Slice Detoasting":**
|
config = Config()
|
||||||
- `$.logs[1000000:1000010]` fetches only 10 elements
|
encoder = Encoder(stdout, config)
|
||||||
- Not the entire multi-gigabyte array
|
|
||||||
|
|
||||||
|
# Test code
|
||||||
Smart Chunking Example
|
result = encode('A', encoder)
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"id": 1,
|
|
||||||
"title": "some text",
|
|
||||||
"html": <pointer to TOAST of 200k text>,
|
|
||||||
"photos": [<pointer to TOAST of binary data>],
|
|
||||||
"crawl_logs": [<pointer to TOAST of array of texts>]
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
LLM generated code are hard to pass the compiler.
|
||||||
|
|
||||||
## Query Power
|
|
||||||
|
|
||||||
**Full SQL + JSON Support:**
|
|
||||||
- PostgreSQL-compatible JSONB path operators
|
|
||||||
- GIN indexes for instant search
|
|
||||||
|
|
||||||
```sql
|
#### RUG design
|
||||||
SELECT *
|
|
||||||
FROM accounts
|
|
||||||
WHERE data @> '{"status": "active"}'
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
<img src="./images/Screenshot_20251125_010053.jpeg"
|
||||||
|
width="75%">
|
||||||
|
|
||||||
## Performance Validation: Benchmark Datasets
|
|
||||||
|
|
||||||
**Three Specialized Workloads:**
|
|
||||||
|
|
||||||
1. **YCSB-Style Read Benchmark**
|
<img src="./images/Screenshot_20251125_011029.jpeg" width="80%">
|
||||||
- Yahoo! Cloud Serving Benchmark
|
|
||||||
- 1M JSON documents (1KB-100KB each)
|
|
||||||
|
|
||||||
2. **TPC-C Inspired Update Benchmark**
|
<img src="./images/Screenshot_20251125_011348.jpeg" width="80%">
|
||||||
- Transaction Processing Performance Council
|
|
||||||
- 100K transactional JSON records
|
|
||||||
- Frequent small field updates
|
|
||||||
|
|
||||||
3. **Large-Array Slice Benchmark**
|
|
||||||
- Multi-gigabyte JSON documents
|
|
||||||
- Massive arrays (10M+ elements)
|
|
||||||
|
|
||||||
**Comparison Targets:** SQLite JSONB vs MongoDB vs PostgreSQL vs JSONite
|
|
||||||
|
|
||||||
---
|
#### Implementation
|
||||||
|
|
||||||
## JSONite: The Future of Embedded Data Storage
|
- gpt-3.5-turbo-16k-0613
|
||||||
|
- gpt-4-1106
|
||||||
|
- presence penalty set to -1
|
||||||
|
- frequency_penalty set to 0.5
|
||||||
|
- temperature set to 1 (by default)
|
||||||
|
|
||||||
**Why It Matters Today:**
|
|
||||||
- **Edge Computing**: Lightweight, handles sensor data efficiently
|
|
||||||
- **Modern Apps**: SQL power + JSON flexibility, no schema migrations
|
|
||||||
|
|
||||||
**The Vision:**
|
|
||||||
- Open source implementation
|
|
||||||
- Community-driven development
|
|
||||||
- Becoming the default choice for embedded JSON storage
|
|
||||||
- Bridging SQL reliability with NoSQL flexibility
|
|
||||||
|
|
||||||
---
|
#### Eval: Comparison with Traditional Tools
|
||||||
|
|
||||||
## Thank You
|
<img src="./images/Screenshot_20251125_014355.jpeg" width="75%">
|
||||||
|
|
||||||
**Questions?**
|
|
||||||
|
|
||||||
*CHEN Yongyuan*
|
#### Token Consumption
|
||||||
*2025-11-01*
|
|
||||||
|
- GPT-4 cost 1000$ in baseline method (send the whole context)
|
||||||
|
- RUG saved 51.3% tokens (process unique dependency only once)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### Real-World Usability
|
||||||
|
|
||||||
|
> We directly leverage RUG's generated tests, without changing test bodies and send them as PRs to the open source projects.
|
||||||
|
> To our surprise, the developers are happy to merge these machine generated tests.
|
||||||
|
> RUG generated a total of 248 unit tests, of which we submitted 113 to the corresponding crates based on their quality and priority.
|
||||||
|
> So far, 53 of these unit tests have been merged with positive feedback.
|
||||||
|
|
||||||
|
> Developers chose not to merge 17 tests for two main reasons:
|
||||||
|
> first, the target functions are imported from external libraries(16),
|
||||||
|
> and the developers do not intend to include tests
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### 2025 Situation
|
||||||
|
|
||||||
|
<img src="./images/Screenshot_20251125_015416.jpeg" width="60%">
|
||||||
|
|
||||||
|
|
||||||
|
<img src="./images/Screenshot_20251125_015705.jpeg" width="60%">
|
||||||
|
|||||||
Reference in New Issue
Block a user