更新markdown示例：添加RUG Rust单元测试生成演示内容

- 修改主题为黑色样式以提高可读性 - 更新markdown.md为RUG论文演示内容 - 添加相关图片资源到images目录 - 调整演示尺寸为1920x1080以适应现代显示器 - 移除原有的示例幻灯片，专注于学术演示内容
2025-11-25 10:02:44 +08:00
8 changed files with 94 additions and 114 deletions
--- a/examples/images/Screenshot_20251125_010053.jpeg
+++ b/examples/images/Screenshot_20251125_010053.jpeg
--- a/examples/images/Screenshot_20251125_011029.jpeg
+++ b/examples/images/Screenshot_20251125_011029.jpeg
--- a/examples/images/Screenshot_20251125_011348.jpeg
+++ b/examples/images/Screenshot_20251125_011348.jpeg
--- a/examples/images/Screenshot_20251125_014355.jpeg
+++ b/examples/images/Screenshot_20251125_014355.jpeg
--- a/examples/images/Screenshot_20251125_015416.jpeg
+++ b/examples/images/Screenshot_20251125_015416.jpeg
--- a/examples/images/Screenshot_20251125_015705.jpeg
+++ b/examples/images/Screenshot_20251125_015705.jpeg
--- a/examples/markdown.html
+++ b/examples/markdown.html
@@ -7,7 +7,7 @@
 		<title>reveal.js - Markdown Example</title>
 		<link rel="stylesheet" href="../dist/reveal.css">
-		<link rel="stylesheet" href="../dist/theme/white.css" id="theme">
+		<link rel="stylesheet" href="../dist/theme/black.css" id="theme">
        <link rel="stylesheet" href="../plugin/highlight/monokai.css">
 	</head>
@@ -19,7 +19,7 @@
 			<div class="slides">
                <!-- Use external markdown resource, separate slides by three newlines; vertical slides by two newlines -->
-                <section style="text-align: left;" data-markdown="markdown.md" data-separator="---" data-separator-vertical="^\n\n"></section>
+                <section data-markdown="markdown.md" data-separator="^\n\n\n" data-separator-vertical="^\n\n"></section>
            </div>
 		</div>
--- a/examples/markdown.md
+++ b/examples/markdown.md
@@ -1,153 +1,133 @@
-# JSONite: High-Performance Embedded Database for Semi-Structured Data
+# RUG: Turbo LLM for Rust Unit Test Generation
---
+Keywords: LLM, Rust, Unit Test
-## The JSON Performance Crisis
+Research date: 2022, published date: 2025
 **JSON is Everywhere:**
 - Web APIS, IoT, logs, configurations
 - Semi-structured, flexible, human-readable
 **But Current Solutions Fail:**
 - **Large Databases**: People use MongoDB or PostgreSQL's JSONB to store data
 - **Embeded Databases**: RocksDB and PoloDB lack of ACID and SQL support
 - **Serialization to String**: Or serialize JSON into strings and store in SQLite
 Serialized JSON with SQL example
-```sql
+#### Introduction
 insert into http_request_log (ip, headers)
 values ('127.0.0.1', '{
    "Content-Type": "application/oct-stream",
    "X-Forwarded-For": "100.64.0.1",
 }');
 ```
---
+*   Unit testing is crucial but costly.
-## Introducing JSONite
+*   Rust's strict type system.
-**Best of Both Worlds:**
+*   Existing LLM approaches often fail.
 - SQLite's based
 - Native JSON optimization
 **Key Advantages:**
 - ✅ ACID compliance
 - ✅ SQL simplicity  
 - ✅ Serverless C library
 - ✅ Lightning-fast JSON access
---
+#### Rust Unit Test
-## Smart Key Optimization
+```rust
-
+/// Returns the sum of two numbers
-**Key Sorting by Length:**
+///
-```
+/// # Examples
-{
+///
-  "id": 1,
+/// ```
-  "address": {...}
+/// assert_eq!(add(2, 3), 5);
-  "name": "John",
+/// assert_eq!(add(-1, 1), 0);
-  "email": "john@example.com",
+/// ```
 fn add(a: i32, b: i32) -> i32 {
    a + b
 }
 ```
-**Sorted as:**
+
-```
+#### Challenge
-{
+
-  "id", (2 chars)
+```rust
-  "name", (4 chars)  
+fn encode<E: Encoder>(&self: char, encoder: E) -> Result<EncodeError> // target function
-  "email", (5 chars)
+
-  "address", (7 chars)
+impl<W: Writer, C: Config> Encoder for EncoderImpl
-}
+
 pub struct EncoderImpl<W: Writer, C: Config>
 impl Writer for SliceWriter
 impl Writer for IoWriter
 impl<T> Config for T where T: R1 + R2 + R3
 pub struct Configuration<R1, R2, R3>
 ```
-**Binary search on length → Fast lookups**
+Simplified python version
---
+```python
 def encode(char_data, encoder):
    result = encoder.process(char_data)
    return result
-## Handling Massive Data: Smart TOAST
+class Encoder:
    def __init__(self, writer, config):
        self.config = config
    def process(self, data):
        output = self.writer.write(data, self.config)
        return output
-**The Oversized-Attribute Storage Technique**
+class Config:
- Standard approach: arbitrary chunking
+    def __init__(self):
- JSONite's innovation: **Data-Type Aware TOAST**
+        self.settings = {}
-**Intelligent Chunking:**
+config = Config()
- Arrays split between elements
+encoder = Encoder(stdout, config)
 - Objects split between key-value pairs  
 - Text falls back to fixed chunks
-**Enables "Slice Detoasting":**
+# Test code
- `$.logs[1000000:1000010]` fetches only 10 elements
+result = encode('A', encoder)
 - Not the entire multi-gigabyte array
 Smart Chunking Example
 ```json
 {
    "id": 1,
    "title": "some text",
    "html": <pointer to TOAST of 200k text>,
    "photos": [<pointer to TOAST of binary data>],
    "crawl_logs": [<pointer to TOAST of array of texts>]
 }
 ```
---
+LLM generated code are hard to pass the compiler.
 ## Query Power
 **Full SQL + JSON Support:**
 - PostgreSQL-compatible JSONB path operators
 - GIN indexes for instant search
-```sql
+#### RUG design
 SELECT *
 FROM accounts
 WHERE data @> '{"status": "active"}'
 ```
---
+<img src="./images/Screenshot_20251125_010053.jpeg"
  width="75%">
 ## Performance Validation: Benchmark Datasets
 **Three Specialized Workloads:**
-1. **YCSB-Style Read Benchmark**
+<img src="./images/Screenshot_20251125_011029.jpeg" width="80%">
   - Yahoo! Cloud Serving Benchmark
   - 1M JSON documents (1KB-100KB each)
-2. **TPC-C Inspired Update Benchmark**  
+<img src="./images/Screenshot_20251125_011348.jpeg" width="80%">
   - Transaction Processing Performance Council
   - 100K transactional JSON records
   - Frequent small field updates
 3. **Large-Array Slice Benchmark**
   - Multi-gigabyte JSON documents
   - Massive arrays (10M+ elements)
 **Comparison Targets:** SQLite JSONB vs MongoDB vs PostgreSQL vs JSONite
---
+#### Implementation
-## JSONite: The Future of Embedded Data Storage
+- gpt-3.5-turbo-16k-0613
 - gpt-4-1106
 - presence penalty set to -1
 - frequency_penalty set to 0.5
 - temperature set to 1 (by default)
 **Why It Matters Today:**
 - **Edge Computing**: Lightweight, handles sensor data efficiently
 - **Modern Apps**: SQL power + JSON flexibility, no schema migrations
 **The Vision:**
 - Open source implementation
 - Community-driven development  
 - Becoming the default choice for embedded JSON storage
 - Bridging SQL reliability with NoSQL flexibility
---
+#### Eval: Comparison with Traditional Tools
-## Thank You
+<img src="./images/Screenshot_20251125_014355.jpeg" width="75%">
 **Questions?**
-*CHEN Yongyuan*  
+#### Token Consumption
-*2025-11-01*
+
 - GPT-4 cost 1000$ in baseline method (send the whole context)
 - RUG saved 51.3% tokens (process unique dependency only once)
 #### Real-World Usability
 > We directly leverage RUG's generated tests, without changing test bodies and send them as PRs to the open source projects.
 > To our surprise, the developers are happy to merge these machine generated tests.
 > RUG generated a total of 248 unit tests, of which we submitted 113 to the corresponding crates based on their quality and priority.
 > So far, 53 of these unit tests have been merged with positive feedback.
 > Developers chose not to merge 17 tests for two main reasons:
 > first, the target functions are imported from external libraries(16),
 > and the developers do not intend to include tests
 #### 2025 Situation
 <img src="./images/Screenshot_20251125_015416.jpeg" width="60%">
 <img src="./images/Screenshot_20251125_015705.jpeg" width="60%">