bsda2: pkg_validate Performance Tweaks
I am currently updating the bsda2 code for pkg_validate
with
LST.sh, this adds some overhead (however small) and to counter
that I decided to try tweak the performance a little.
Two approaches have shown benefits.
Tweaking Checksum Verification
Checksum verification is performed in two steps. The checksum binary (currently only sha256 is supported) is passed a set of files that it checks in one go. The resulting list is checked against the reference checksums and mismatches are inspected individually in order to allow providing a reason (e.g. file missing, insufficient privileges etc.).
One important case is symlinks, the checksum tool scans the files referred to by a symlink whereas the reference checksum is a checksum of the path referred to by the symlink. This has to be reproduced (including reproducing a bug in pkg, which cannot be fixed without altering checksums).
The performance tweak performed is substituting symlinks with /dev/null
in the file list in order to trigger the checksum mismatch without
actually scanning a file.
An alternative approach would be to substitute an invalid file name,
but it turns out that checksumming /dev/null
is faster than failing
on a missing file.
The other tweak is changing the batch size, finding the correct batch size is simply a logarithmic search with a performance metric. The metric I used is the validate all packages benchmark.
A small batch size benefits improves CPU utilisation at the beginning and end of the process
A larger batch size reduces overhead (less calls of the sha256
executable, more locking operations on the task queue).
The original batch size was 64, runtime improved till 1024, beyond which it stalled and then degraded. So the new batch size is 1024.
Benchmark
The test system is an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz with 6 cores / 12 threads, running FreeBSD stable/13-n248234-3218666bd082. The maximum turbo clock is 4.5 GHz for a single core and 4.0 GHz for all cores. The CPU clock is controlled by the hwpstate driver, but as far as I can tell single clock turbo does not work for this model, hwpstate always sets the same clock speed for all cores.
pkg_validate (1207 packages)
This benchmark was performed seven times for each pkg_validate
version:
for i in $(jot 7); do time pkg_validate; done
Benchmark validating all packages seven times in a row.
The first two runs show additional turbo boost benefits, whereas the third run has reached thermal equilibrium and performance is fairly stable from that point onwards. The median runtime of pkg_validate 0.4.2 is 51.42 s and the tweaked version 45.79 s. A 10.9 % runtime reduction.
Usually a reduction in real time is achieved by improving the utilisation of cores, but in this case we actually managed to reduce actual work done (the user + sys measurements).
real [1 pt/s]
user + sys [1 pt/s]
pkg_validate texlive-*
To verify that there are no regressions I also ran a smaller test case validating the texlive packages:
for i in $(jot 9); do time pkg_validate texlive-\*; done
Benchmark validating all texlive packages nine times in a row.
This benchmark is dominated by the texlive-texmf package, which contributes 85605 out of 117570 files (72.8 %). This is the reason why the simple one job per package approach does not scale well.
Luckily even this use case gets away with a net win, where I expected at least a small performance regression from the tweaks.
It is noteworthy that this benchmarks does not seem to be thermally limited, increasing the number of runs to 25 did not make a difference either. Monitoring the system during the runs implies that CPU utilisation is too low to reach a state where thermal throttling limits the turbo boost.
It might mean there is some untapped performance potential - or we are constrained by the limits of file system IO.
real [5 pt/s]
user + sys [5 pt/s]
Conclusion
It’s always pleasant to find some low hanging fruit. If you want to play with the batch size yourself, you will be able to using the latest commit:
$ for i in $(jot 14 0); do time src/pkg_validate -b$((1 << i)) texlive-\* || break; done
54.06 real 183.56 user 444.20 sys
30.73 real 114.92 user 239.29 sys
18.74 real 72.86 user 123.50 sys
14.27 real 49.49 user 60.66 sys
12.19 real 36.52 user 38.02 sys
11.35 real 31.05 user 25.05 sys
10.69 real 26.42 user 18.47 sys
10.20 real 23.14 user 15.78 sys
9.75 real 22.41 user 13.26 sys
9.64 real 20.63 user 12.17 sys
9.40 real 18.70 user 12.06 sys
9.22 real 18.02 user 11.95 sys
9.17 real 17.70 user 11.26 sys
9.22 real 17.00 user 11.52 sys
Verify texlive packages with batch sizes from 1 to 8192.