- 修改主题为黑色样式以提高可读性 - 更新markdown.md为RUG论文演示内容 - 添加相关图片资源到images目录 - 调整演示尺寸为1920x1080以适应现代显示器 - 移除原有的示例幻灯片,专注于学术演示内容
2.7 KiB
RUG: Turbo LLM for Rust Unit Test Generation
Keywords: LLM, Rust, Unit Test
Research date: 2022, published date: 2025
Introduction
-
Unit testing is crucial but costly.
-
Rust's strict type system.
-
Existing LLM approaches often fail.
Rust Unit Test
/// Returns the sum of two numbers
///
/// # Examples
///
/// ```
/// assert_eq!(add(2, 3), 5);
/// assert_eq!(add(-1, 1), 0);
/// ```
fn add(a: i32, b: i32) -> i32 {
a + b
}
Challenge
fn encode<E: Encoder>(&self: char, encoder: E) -> Result<EncodeError> // target function
impl<W: Writer, C: Config> Encoder for EncoderImpl
pub struct EncoderImpl<W: Writer, C: Config>
impl Writer for SliceWriter
impl Writer for IoWriter
impl<T> Config for T where T: R1 + R2 + R3
pub struct Configuration<R1, R2, R3>
Simplified python version
def encode(char_data, encoder):
result = encoder.process(char_data)
return result
class Encoder:
def __init__(self, writer, config):
self.config = config
def process(self, data):
output = self.writer.write(data, self.config)
return output
class Config:
def __init__(self):
self.settings = {}
config = Config()
encoder = Encoder(stdout, config)
# Test code
result = encode('A', encoder)
LLM generated code are hard to pass the compiler.
RUG design
Implementation
- gpt-3.5-turbo-16k-0613
- gpt-4-1106
- presence penalty set to -1
- frequency_penalty set to 0.5
- temperature set to 1 (by default)
Eval: Comparison with Traditional Tools
Token Consumption
- GPT-4 cost 1000$ in baseline method (send the whole context)
- RUG saved 51.3% tokens (process unique dependency only once)
Real-World Usability
We directly leverage RUG's generated tests, without changing test bodies and send them as PRs to the open source projects. To our surprise, the developers are happy to merge these machine generated tests. RUG generated a total of 248 unit tests, of which we submitted 113 to the corresponding crates based on their quality and priority. So far, 53 of these unit tests have been merged with positive feedback.
Developers chose not to merge 17 tests for two main reasons: first, the target functions are imported from external libraries(16), and the developers do not intend to include tests
2025 Situation
