Files
reveal.js/examples/markdown.md
heimoshuiyu cf351be434 更新markdown示例:添加RUG Rust单元测试生成演示内容
- 修改主题为黑色样式以提高可读性
- 更新markdown.md为RUG论文演示内容
- 添加相关图片资源到images目录
- 调整演示尺寸为1920x1080以适应现代显示器
- 移除原有的示例幻灯片,专注于学术演示内容
2025-11-25 10:02:44 +08:00

134 lines
2.7 KiB
Markdown

# RUG: Turbo LLM for Rust Unit Test Generation
Keywords: LLM, Rust, Unit Test
Research date: 2022, published date: 2025
#### Introduction
* Unit testing is crucial but costly.
* Rust's strict type system.
* Existing LLM approaches often fail.
#### Rust Unit Test
```rust
/// Returns the sum of two numbers
///
/// # Examples
///
/// ```
/// assert_eq!(add(2, 3), 5);
/// assert_eq!(add(-1, 1), 0);
/// ```
fn add(a: i32, b: i32) -> i32 {
a + b
}
```
#### Challenge
```rust
fn encode<E: Encoder>(&self: char, encoder: E) -> Result<EncodeError> // target function
impl<W: Writer, C: Config> Encoder for EncoderImpl
pub struct EncoderImpl<W: Writer, C: Config>
impl Writer for SliceWriter
impl Writer for IoWriter
impl<T> Config for T where T: R1 + R2 + R3
pub struct Configuration<R1, R2, R3>
```
Simplified python version
```python
def encode(char_data, encoder):
result = encoder.process(char_data)
return result
class Encoder:
def __init__(self, writer, config):
self.config = config
def process(self, data):
output = self.writer.write(data, self.config)
return output
class Config:
def __init__(self):
self.settings = {}
config = Config()
encoder = Encoder(stdout, config)
# Test code
result = encode('A', encoder)
```
LLM generated code are hard to pass the compiler.
#### RUG design
<img src="./images/Screenshot_20251125_010053.jpeg"
width="75%">
<img src="./images/Screenshot_20251125_011029.jpeg" width="80%">
<img src="./images/Screenshot_20251125_011348.jpeg" width="80%">
#### Implementation
- gpt-3.5-turbo-16k-0613
- gpt-4-1106
- presence penalty set to -1
- frequency_penalty set to 0.5
- temperature set to 1 (by default)
#### Eval: Comparison with Traditional Tools
<img src="./images/Screenshot_20251125_014355.jpeg" width="75%">
#### Token Consumption
- GPT-4 cost 1000$ in baseline method (send the whole context)
- RUG saved 51.3% tokens (process unique dependency only once)
#### Real-World Usability
> We directly leverage RUG's generated tests, without changing test bodies and send them as PRs to the open source projects.
> To our surprise, the developers are happy to merge these machine generated tests.
> RUG generated a total of 248 unit tests, of which we submitted 113 to the corresponding crates based on their quality and priority.
> So far, 53 of these unit tests have been merged with positive feedback.
> Developers chose not to merge 17 tests for two main reasons:
> first, the target functions are imported from external libraries(16),
> and the developers do not intend to include tests
#### 2025 Situation
<img src="./images/Screenshot_20251125_015416.jpeg" width="60%">
<img src="./images/Screenshot_20251125_015705.jpeg" width="60%">