Files

- 修改主题为黑色样式以提高可读性
- 更新markdown.md为RUG论文演示内容
- 添加相关图片资源到images目录
- 调整演示尺寸为1920x1080以适应现代显示器
- 移除原有的示例幻灯片，专注于学术演示内容

2025-11-25 10:02:44 +08:00

2.7 KiB

Raw Blame History

RUG: Turbo LLM for Rust Unit Test Generation

Keywords: LLM, Rust, Unit Test

Research date: 2022, published date: 2025

Introduction

Unit testing is crucial but costly.
Rust's strict type system.
Existing LLM approaches often fail.

Rust Unit Test

/// Returns the sum of two numbers
///
/// # Examples
///
/// ```
/// assert_eq!(add(2, 3), 5);
/// assert_eq!(add(-1, 1), 0);
/// ```
fn add(a: i32, b: i32) -> i32 {
    a + b
}

Challenge

fn encode<E: Encoder>(&self: char, encoder: E) -> Result<EncodeError> // target function

impl<W: Writer, C: Config> Encoder for EncoderImpl

pub struct EncoderImpl<W: Writer, C: Config>
impl Writer for SliceWriter
impl Writer for IoWriter

impl<T> Config for T where T: R1 + R2 + R3
pub struct Configuration<R1, R2, R3>

Simplified python version

def encode(char_data, encoder):
    result = encoder.process(char_data)
    return result

class Encoder:
    def __init__(self, writer, config):
        self.config = config
    
    def process(self, data):
        output = self.writer.write(data, self.config)
        return output

class Config:
    def __init__(self):
        self.settings = {}

config = Config()
encoder = Encoder(stdout, config)

# Test code
result = encode('A', encoder)

LLM generated code are hard to pass the compiler.

RUG design

Implementation

gpt-3.5-turbo-16k-0613
gpt-4-1106
presence penalty set to -1
frequency_penalty set to 0.5
temperature set to 1 (by default)

Eval: Comparison with Traditional Tools

Token Consumption

GPT-4 cost 1000$ in baseline method (send the whole context)
RUG saved 51.3% tokens (process unique dependency only once)

Real-World Usability

We directly leverage RUG's generated tests, without changing test bodies and send them as PRs to the open source projects. To our surprise, the developers are happy to merge these machine generated tests. RUG generated a total of 248 unit tests, of which we submitted 113 to the corresponding crates based on their quality and priority. So far, 53 of these unit tests have been merged with positive feedback.

Developers chose not to merge 17 tests for two main reasons: first, the target functions are imported from external libraries(16), and the developers do not intend to include tests

2.7 KiB Raw Blame History

RUG: Turbo LLM for Rust Unit Test Generation

Introduction

Rust Unit Test

Challenge

RUG design

Implementation

Eval: Comparison with Traditional Tools

Token Consumption

Real-World Usability

2025 Situation

2.7 KiB

Raw Blame History