What is the Tarantool write ahead buffer size, and how often is it synced? Is there any specific size and can it be tuned depending on the disk type (SSD or HDD)?
Tarantool doesn't fsync transactions to disk by default, since the
[default wal_mode setting is “write”, which means “pass the data to the
file system”. Memtx snapshot files have
it instructs the filesystem to bypass its cache for this data.
Since xlog files are relayed to replicas, the flag is not set for them.
If wal_mode is “sync” the entire file is opened with O_SYNC flag. In a
nutshell, it instructs the operating system to write each chunk directly to
disk. In our benchmarks,
sync mode is ~2-3 times slower than
in a typical workload.
Tarantool transparently compresses all data that is written to disk.
It also uses
group commit feature to group all transactions committed
in the current event loop iteration into a single batch. The batch is
compressed, checksummed and passed to the operating system in a single
The buffer size is selected automatically depending on the intensity of write workload, to ensure as many transactions as possible are written to disk in a single batch. There are two built-in constants that configure this algorithm:
XLOG_TX_AUTOCOMMIT_THRESHOLD, set to 128K - if the current size of transactions in the buffer reaches this limit, the buffer is flushed to disk. A single transaction can be larger than 128K.
XLOG_TX_COMPRESS_THRESHOLD, set to 2K - the buffer is compressed before flushing if it is at least that big. On smaller sizes compression takes up CPU but doesn't yield seizable gains.