Mercurial > hg > CbC > CbC_llvm
comparison docs/Benchmarking.rst @ 121:803732b1fca8
LLVM 5.0
author | kono |
---|---|
date | Fri, 27 Oct 2017 17:07:41 +0900 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
120:1172e4bd9c6f | 121:803732b1fca8 |
---|---|
1 ================================== | |
2 Benchmarking tips | |
3 ================================== | |
4 | |
5 | |
6 Introduction | |
7 ============ | |
8 | |
9 For benchmarking a patch we want to reduce all possible sources of | |
10 noise as much as possible. How to do that is very OS dependent. | |
11 | |
12 Note that low noise is required, but not sufficient. It does not | |
13 exclude measurement bias. See | |
14 https://www.cis.upenn.edu/~cis501/papers/producing-wrong-data.pdf for | |
15 example. | |
16 | |
17 General | |
18 ================================ | |
19 | |
20 * Use a high resolution timer, e.g. perf under linux. | |
21 | |
22 * Run the benchmark multiple times to be able to recognize noise. | |
23 | |
24 * Disable as many processes or services as possible on the target system. | |
25 | |
26 * Disable frequency scaling, turbo boost and address space | |
27 randomization (see OS specific section). | |
28 | |
29 * Static link if the OS supports it. That avoids any variation that | |
30 might be introduced by loading dynamic libraries. This can be done | |
31 by passing ``-DLLVM_BUILD_STATIC=ON`` to cmake. | |
32 | |
33 * Try to avoid storage. On some systems you can use tmpfs. Putting the | |
34 program, inputs and outputs on tmpfs avoids touching a real storage | |
35 system, which can have a pretty big variability. | |
36 | |
37 To mount it (on linux and freebsd at least):: | |
38 | |
39 mount -t tmpfs -o size=<XX>g none dir_to_mount | |
40 | |
41 Linux | |
42 ===== | |
43 | |
44 * Disable address space randomization:: | |
45 | |
46 echo 0 > /proc/sys/kernel/randomize_va_space | |
47 | |
48 * Set scaling_governor to performance:: | |
49 | |
50 for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor | |
51 do | |
52 echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor | |
53 done | |
54 | |
55 * Use https://github.com/lpechacek/cpuset to reserve cpus for just the | |
56 program you are benchmarking. If using perf, leave at least 2 cores | |
57 so that perf runs in one and your program in another:: | |
58 | |
59 cset shield -c N1,N2 -k on | |
60 | |
61 This will move all threads out of N1 and N2. The ``-k on`` means | |
62 that even kernel threads are moved out. | |
63 | |
64 * Disable the SMT pair of the cpus you will use for the benchmark. The | |
65 pair of cpu N can be found in | |
66 ``/sys/devices/system/cpu/cpuN/topology/thread_siblings_list`` and | |
67 disabled with:: | |
68 | |
69 echo 0 > /sys/devices/system/cpu/cpuX/online | |
70 | |
71 | |
72 * Run the program with:: | |
73 | |
74 cset shield --exec -- perf stat -r 10 <cmd> | |
75 | |
76 This will run the command after ``--`` in the isolated cpus. The | |
77 particular perf command runs the ``<cmd>`` 10 times and reports | |
78 statistics. | |
79 | |
80 With these in place you can expect perf variations of less than 0.1%. | |
81 | |
82 Linux Intel | |
83 ----------- | |
84 | |
85 * Disable turbo mode:: | |
86 | |
87 echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo |