Log-based histograms provide lower estimation error for the same number of buckets compared to log-linear histograms.
For example, the current Histogram implementation splits each decimal range (10^n .. 10^(n+1)] into 18 buckets.
These buckets have the following bounds:
- for log-linear histogram: (1 .. 1.5], (1.5 .. 2], (2 .. 2.5], ... (9.5 .. 10]
- for log-based histogram: (1 .. 1.136], (1.136 .. 1.292], ... (8.799 ... 10]
The maximum estimated error for log-linear histogram is reached in the first bucket per each decimal range and equals to 1.5-1=0.5 or 50%.
The maximum estimated error for log-based histogram is constant across buckets and equals to 1.136-1=0.136 or 13.6%.
This means that log-based histogram improves histogram accuracy by up to 50%/13.6% = 3.6 times when using the same number of buckets.
Further reading - https://linuxczar.net/blog/2020/08/13/histogram-error/