reveal.js/examples/markdown.md

# RUG: Turbo LLM for Rust Unit Test Generation

Keywords: LLM, Rust, Unit Test

Research date: 2022, published date: 2025


#### Introduction

*   Unit testing is crucial but costly.

*   Rust's strict type system.

*   Existing LLM approaches often fail.


#### Rust Unit Test

```rust
/// Returns the sum of two numbers
///
/// # Examples
///
/// ```
/// assert_eq!(add(2, 3), 5);
/// assert_eq!(add(-1, 1), 0);
/// ```
fn add(a: i32, b: i32) -> i32 {
    a + b
}
```


#### Challenge

```rust
fn encode<E: Encoder>(&self: char, encoder: E) -> Result<EncodeError> // target function

impl<W: Writer, C: Config> Encoder for EncoderImpl

pub struct EncoderImpl<W: Writer, C: Config>
impl Writer for SliceWriter
impl Writer for IoWriter

impl<T> Config for T where T: R1 + R2 + R3
pub struct Configuration<R1, R2, R3>
```

Simplified python version

```python
def encode(char_data, encoder):
    result = encoder.process(char_data)
    return result

class Encoder:
    def __init__(self, writer, config):
        self.config = config

    def process(self, data):
        output = self.writer.write(data, self.config)
        return output

class Config:
    def __init__(self):
        self.settings = {}

config = Config()
encoder = Encoder(stdout, config)

# Test code
result = encode('A', encoder)
```

LLM generated code are hard to pass the compiler.


#### RUG design

<img src="./images/Screenshot_20251125_010053.jpeg"
  width="75%">


<img src="./images/Screenshot_20251125_011029.jpeg" width="80%">

<img src="./images/Screenshot_20251125_011348.jpeg" width="80%">


#### Implementation

- gpt-3.5-turbo-16k-0613
- gpt-4-1106
- presence penalty set to -1
- frequency_penalty set to 0.5
- temperature set to 1 (by default)


#### Eval: Comparison with Traditional Tools

<img src="./images/Screenshot_20251125_014355.jpeg" width="75%">


#### Token Consumption

- GPT-4 cost 1000$ in baseline method (send the whole context)
- RUG saved 51.3% tokens (process unique dependency only once)


#### Real-World Usability

> We directly leverage RUG's generated tests, without changing test bodies and send them as PRs to the open source projects.
> To our surprise, the developers are happy to merge these machine generated tests.
> RUG generated a total of 248 unit tests, of which we submitted 113 to the corresponding crates based on their quality and priority.
> So far, 53 of these unit tests have been merged with positive feedback.

> Developers chose not to merge 17 tests for two main reasons:
> first, the target functions are imported from external libraries(16),
> and the developers do not intend to include tests


#### 2025 Situation

<img src="./images/Screenshot_20251125_015416.jpeg" width="60%">


<img src="./images/Screenshot_20251125_015705.jpeg" width="60%">