121
|
1 ==================================
|
|
2 Benchmarking tips
|
|
3 ==================================
|
|
4
|
|
5
|
|
6 Introduction
|
|
7 ============
|
|
8
|
|
9 For benchmarking a patch we want to reduce all possible sources of
|
|
10 noise as much as possible. How to do that is very OS dependent.
|
|
11
|
|
12 Note that low noise is required, but not sufficient. It does not
|
|
13 exclude measurement bias. See
|
|
14 https://www.cis.upenn.edu/~cis501/papers/producing-wrong-data.pdf for
|
|
15 example.
|
|
16
|
|
17 General
|
|
18 ================================
|
|
19
|
|
20 * Use a high resolution timer, e.g. perf under linux.
|
|
21
|
|
22 * Run the benchmark multiple times to be able to recognize noise.
|
|
23
|
|
24 * Disable as many processes or services as possible on the target system.
|
|
25
|
|
26 * Disable frequency scaling, turbo boost and address space
|
|
27 randomization (see OS specific section).
|
|
28
|
|
29 * Static link if the OS supports it. That avoids any variation that
|
|
30 might be introduced by loading dynamic libraries. This can be done
|
|
31 by passing ``-DLLVM_BUILD_STATIC=ON`` to cmake.
|
|
32
|
|
33 * Try to avoid storage. On some systems you can use tmpfs. Putting the
|
|
34 program, inputs and outputs on tmpfs avoids touching a real storage
|
|
35 system, which can have a pretty big variability.
|
|
36
|
|
37 To mount it (on linux and freebsd at least)::
|
|
38
|
|
39 mount -t tmpfs -o size=<XX>g none dir_to_mount
|
|
40
|
|
41 Linux
|
|
42 =====
|
|
43
|
|
44 * Disable address space randomization::
|
|
45
|
|
46 echo 0 > /proc/sys/kernel/randomize_va_space
|
|
47
|
|
48 * Set scaling_governor to performance::
|
|
49
|
|
50 for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
|
|
51 do
|
|
52 echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
|
|
53 done
|
|
54
|
|
55 * Use https://github.com/lpechacek/cpuset to reserve cpus for just the
|
|
56 program you are benchmarking. If using perf, leave at least 2 cores
|
|
57 so that perf runs in one and your program in another::
|
|
58
|
|
59 cset shield -c N1,N2 -k on
|
|
60
|
|
61 This will move all threads out of N1 and N2. The ``-k on`` means
|
|
62 that even kernel threads are moved out.
|
|
63
|
|
64 * Disable the SMT pair of the cpus you will use for the benchmark. The
|
|
65 pair of cpu N can be found in
|
|
66 ``/sys/devices/system/cpu/cpuN/topology/thread_siblings_list`` and
|
|
67 disabled with::
|
|
68
|
|
69 echo 0 > /sys/devices/system/cpu/cpuX/online
|
|
70
|
|
71
|
|
72 * Run the program with::
|
|
73
|
|
74 cset shield --exec -- perf stat -r 10 <cmd>
|
|
75
|
|
76 This will run the command after ``--`` in the isolated cpus. The
|
|
77 particular perf command runs the ``<cmd>`` 10 times and reports
|
|
78 statistics.
|
|
79
|
|
80 With these in place you can expect perf variations of less than 0.1%.
|
|
81
|
|
82 Linux Intel
|
|
83 -----------
|
|
84
|
|
85 * Disable turbo mode::
|
|
86
|
|
87 echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
|