Add documentation for autocompressor (#70)
This commit is contained in:
107
docs/algorithm.md
Normal file
107
docs/algorithm.md
Normal file
@@ -0,0 +1,107 @@
|
||||
# Compression algorithm
|
||||
|
||||
## What is state?
|
||||
State is things like who is in a room, what the room topic/name is, who has
|
||||
what privilege levels etc. Synapse keeps track of it for various reasons such as
|
||||
spotting invalid events (e.g. ones sent by banned users) and providing room membership
|
||||
information to clients.
|
||||
|
||||
## What is a state group?
|
||||
|
||||
Synapse needs to keep track of the state at the moment of each event. A state group
|
||||
corresponds to a unique state. The database table `event_to_state_groups` keeps track
|
||||
of the mapping from event ids to state group ids.
|
||||
|
||||
Consider the following simplified example:
|
||||
```
|
||||
State group id | State
|
||||
_____________________________________________
|
||||
1 | Alice in room
|
||||
2 | Alice in room, Bob in room
|
||||
3 | Bob in room
|
||||
|
||||
|
||||
Event id | What the event was
|
||||
______________________________________
|
||||
1 | Alice sends a message
|
||||
3 | Bob joins the room
|
||||
4 | Bob sends a message
|
||||
5 | Alice leaves the room
|
||||
6 | Bob sends a message
|
||||
|
||||
|
||||
Event id | State group id
|
||||
_________________________
|
||||
1 | 1
|
||||
2 | 1
|
||||
3 | 2
|
||||
4 | 2
|
||||
5 | 3
|
||||
6 | 3
|
||||
```
|
||||
|
||||
## What are deltas and predecessors?
|
||||
When a new state event happens (e.g. Bob joins the room) a new state group is created.
|
||||
BUT instead of copying all of the state from the previous state group, we just store
|
||||
the change from the previous group (saving on lots of storage space!). The difference
|
||||
from the previous state group is called the "delta".
|
||||
|
||||
So for the previous example, we would have the following (Note only rows 1 and 2 will
|
||||
make sense at this point):
|
||||
|
||||
```
|
||||
State group id | Previous state group id | Delta
|
||||
____________________________________________________________
|
||||
1 | NONE | Alice in room
|
||||
2 | 1 | Bob in room
|
||||
3 | NONE | Bob in room
|
||||
```
|
||||
|
||||
So why is state group 3's previous state group NONE and not 2? Well, the way that deltas
|
||||
work in Synapse is that they can only add in new state or overwrite old state, but they
|
||||
cannot remove it. (So if the room topic is changed then that is just overwriting state,
|
||||
but removing Alice from the room is neither an addition nor an overwriting). If it is
|
||||
impossible to find a delta, then you just start from scratch again with a "snapshot" of
|
||||
the entire state.
|
||||
|
||||
(NOTE this is not documentation on how synapse handles leaving rooms but is purely for illustrative
|
||||
purposes)
|
||||
|
||||
The state of a state group is worked out by following the previous state group's and adding
|
||||
together all of the deltas (with the most recent taking precedence).
|
||||
|
||||
The mapping from state group to previous state group takes place in `state_group_edges`
|
||||
and the deltas are stored in `state_groups_state`.
|
||||
|
||||
## What are we compressing then?
|
||||
In order to speed up the conversion from state group id to state, there is a limit of 100
|
||||
hops set by synapse (that is: we will only ever have to look up the deltas for a maximum of
|
||||
100 state groups). It does this by taking another "snapshot" every 100 state groups.
|
||||
|
||||
However, it is these snapshots that take up the bulk of the storage in a synapse database,
|
||||
so we want to find a way to reduce the number of them without dramatically increasing the
|
||||
maximum number of hops needed to do lookups.
|
||||
|
||||
|
||||
## Compression Algorithm
|
||||
|
||||
The algorithm works by attempting to create a *tree* of deltas, produced by
|
||||
appending state groups to different "levels". Each level has a maximum size, where
|
||||
each state group is appended to the lowest level that is not full. This tool calls a
|
||||
state group "compressed" once it has been added to
|
||||
one of these levels.
|
||||
|
||||
This produces a graph that looks approximately like the following, in the case
|
||||
of having two levels with the bottom level (L1) having a maximum size of 3:
|
||||
|
||||
```
|
||||
L2 <-------------------- L2 <---------- ...
|
||||
^--- L1 <--- L1 <--- L1 ^--- L1 <--- L1 <--- L1
|
||||
|
||||
NOTE: A <--- B means that state group B's predecessor is A
|
||||
```
|
||||
The structure that synapse creates by default would be equivalent to having one level with
|
||||
a maximum length of 100.
|
||||
|
||||
**Note**: Increasing the sum of the sizes of levels will increase the time it
|
||||
takes to query the full state of a given state group.
|
||||
54
docs/python.md
Normal file
54
docs/python.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Running the compressor tools from python
|
||||
|
||||
Both the automatic and manual tools use PyO3 to allow the compressor
|
||||
to be run from Python.
|
||||
|
||||
To see any output from the tools, logging must be setup in Python before
|
||||
the compressor is run.
|
||||
|
||||
## Setting things up
|
||||
|
||||
1. Create a virtual environment in the place you want to use the compressor from
|
||||
(if it doesn't already exist)
|
||||
`$ virtualenv -p python3 venv`
|
||||
|
||||
2. Activate the virtual environment and install `maturin` (if you haven't already)
|
||||
`$ source venv/bin/activate`
|
||||
`$ pip install maturin`
|
||||
|
||||
3. Navigate to the correct location
|
||||
For the automatic tool:
|
||||
`$ cd /home/synapse/rust-synapse-compress-state/auto_compressor`
|
||||
For the manual tool:
|
||||
`$ cd /home/synapse/rust-synapse-compress-state`
|
||||
|
||||
3. Build and install the library
|
||||
`$ maturin develop`
|
||||
|
||||
This will install the relevant compressor tool into the activated virtual environment.
|
||||
|
||||
## Automatic tool example:
|
||||
|
||||
```python
|
||||
import auto_compressor
|
||||
|
||||
auto_compressor.compress_state_events_table(
|
||||
db_url="postgresql://localhost/synapse",
|
||||
chunk_size=500,
|
||||
default_levels="100,50,25",
|
||||
number_of_chunks=100
|
||||
)
|
||||
```
|
||||
|
||||
# Manual tool example:
|
||||
|
||||
```python
|
||||
import synapse_compress_state
|
||||
|
||||
synapse_compress_state.run_compression(
|
||||
db_url="postgresql://localhost/synapse",
|
||||
room_id="!some_room:example.com",
|
||||
output_file="out.sql",
|
||||
transactions=True
|
||||
)
|
||||
```
|
||||
Reference in New Issue
Block a user