- 修改主题为黑色样式以提高可读性 - 更新markdown.md为RUG论文演示内容 - 添加相关图片资源到images目录 - 调整演示尺寸为1920x1080以适应现代显示器 - 移除原有的示例幻灯片,专注于学术演示内容
134 lines
2.7 KiB
Markdown
134 lines
2.7 KiB
Markdown
# RUG: Turbo LLM for Rust Unit Test Generation
|
|
|
|
Keywords: LLM, Rust, Unit Test
|
|
|
|
Research date: 2022, published date: 2025
|
|
|
|
|
|
|
|
#### Introduction
|
|
|
|
* Unit testing is crucial but costly.
|
|
|
|
* Rust's strict type system.
|
|
|
|
* Existing LLM approaches often fail.
|
|
|
|
|
|
#### Rust Unit Test
|
|
|
|
```rust
|
|
/// Returns the sum of two numbers
|
|
///
|
|
/// # Examples
|
|
///
|
|
/// ```
|
|
/// assert_eq!(add(2, 3), 5);
|
|
/// assert_eq!(add(-1, 1), 0);
|
|
/// ```
|
|
fn add(a: i32, b: i32) -> i32 {
|
|
a + b
|
|
}
|
|
```
|
|
|
|
|
|
#### Challenge
|
|
|
|
```rust
|
|
fn encode<E: Encoder>(&self: char, encoder: E) -> Result<EncodeError> // target function
|
|
|
|
impl<W: Writer, C: Config> Encoder for EncoderImpl
|
|
|
|
pub struct EncoderImpl<W: Writer, C: Config>
|
|
impl Writer for SliceWriter
|
|
impl Writer for IoWriter
|
|
|
|
impl<T> Config for T where T: R1 + R2 + R3
|
|
pub struct Configuration<R1, R2, R3>
|
|
```
|
|
|
|
Simplified python version
|
|
|
|
```python
|
|
def encode(char_data, encoder):
|
|
result = encoder.process(char_data)
|
|
return result
|
|
|
|
class Encoder:
|
|
def __init__(self, writer, config):
|
|
self.config = config
|
|
|
|
def process(self, data):
|
|
output = self.writer.write(data, self.config)
|
|
return output
|
|
|
|
class Config:
|
|
def __init__(self):
|
|
self.settings = {}
|
|
|
|
config = Config()
|
|
encoder = Encoder(stdout, config)
|
|
|
|
# Test code
|
|
result = encode('A', encoder)
|
|
```
|
|
|
|
LLM generated code are hard to pass the compiler.
|
|
|
|
|
|
|
|
#### RUG design
|
|
|
|
<img src="./images/Screenshot_20251125_010053.jpeg"
|
|
width="75%">
|
|
|
|
|
|
|
|
<img src="./images/Screenshot_20251125_011029.jpeg" width="80%">
|
|
|
|
<img src="./images/Screenshot_20251125_011348.jpeg" width="80%">
|
|
|
|
|
|
|
|
#### Implementation
|
|
|
|
- gpt-3.5-turbo-16k-0613
|
|
- gpt-4-1106
|
|
- presence penalty set to -1
|
|
- frequency_penalty set to 0.5
|
|
- temperature set to 1 (by default)
|
|
|
|
|
|
|
|
#### Eval: Comparison with Traditional Tools
|
|
|
|
<img src="./images/Screenshot_20251125_014355.jpeg" width="75%">
|
|
|
|
|
|
#### Token Consumption
|
|
|
|
- GPT-4 cost 1000$ in baseline method (send the whole context)
|
|
- RUG saved 51.3% tokens (process unique dependency only once)
|
|
|
|
|
|
|
|
#### Real-World Usability
|
|
|
|
> We directly leverage RUG's generated tests, without changing test bodies and send them as PRs to the open source projects.
|
|
> To our surprise, the developers are happy to merge these machine generated tests.
|
|
> RUG generated a total of 248 unit tests, of which we submitted 113 to the corresponding crates based on their quality and priority.
|
|
> So far, 53 of these unit tests have been merged with positive feedback.
|
|
|
|
> Developers chose not to merge 17 tests for two main reasons:
|
|
> first, the target functions are imported from external libraries(16),
|
|
> and the developers do not intend to include tests
|
|
|
|
|
|
|
|
#### 2025 Situation
|
|
|
|
<img src="./images/Screenshot_20251125_015416.jpeg" width="60%">
|
|
|
|
|
|
<img src="./images/Screenshot_20251125_015705.jpeg" width="60%">
|