annotate polly/www/performance.html @ 240:ca573705d418

merge
author matac
date Fri, 28 Jul 2023 20:50:09 +0900
parents c4bab56944e8
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
150
anatofuz
parents:
diff changeset
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
anatofuz
parents:
diff changeset
2 "http://www.w3.org/TR/html4/strict.dtd">
anatofuz
parents:
diff changeset
3 <!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
anatofuz
parents:
diff changeset
4 <html>
anatofuz
parents:
diff changeset
5 <head> <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
anatofuz
parents:
diff changeset
6 <title>Polly - Performance</title>
anatofuz
parents:
diff changeset
7 <link type="text/css" rel="stylesheet" href="menu.css">
anatofuz
parents:
diff changeset
8 <link type="text/css" rel="stylesheet" href="content.css">
anatofuz
parents:
diff changeset
9 </head>
anatofuz
parents:
diff changeset
10 <body>
anatofuz
parents:
diff changeset
11 <div id="box">
anatofuz
parents:
diff changeset
12 <!--#include virtual="menu.html.incl"-->
anatofuz
parents:
diff changeset
13 <div id="content">
anatofuz
parents:
diff changeset
14 <h1>Performance</h1>
anatofuz
parents:
diff changeset
15
anatofuz
parents:
diff changeset
16 <p>To evaluate the performance benefits Polly currently provides we compiled the
anatofuz
parents:
diff changeset
17 <a href="https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/">Polybench
anatofuz
parents:
diff changeset
18 2.0</a> benchmark suite. Each benchmark was run with double precision floating
anatofuz
parents:
diff changeset
19 point values on an Intel Core Xeon X5670 CPU @ 2.93GHz (12 cores, 24 thread)
anatofuz
parents:
diff changeset
20 system. We used <a href="https://sourceforge.net/projects/pocc/files/">PoCC</a> and the included <a
anatofuz
parents:
diff changeset
21 href="http://pluto-compiler.sf.net">Pluto</a> transformations to optimize the
anatofuz
parents:
diff changeset
22 code. The source code of Polly and LLVM/clang was checked out on
anatofuz
parents:
diff changeset
23 25/03/2011.</p>
anatofuz
parents:
diff changeset
24
anatofuz
parents:
diff changeset
25 <p>The results shown were created fully automatically without manual
anatofuz
parents:
diff changeset
26 interaction. We did not yet spend any time to tune the results. Hence
236
c4bab56944e8 LLVM 16
kono
parents: 150
diff changeset
27 further improvements may be achieved by tuning the code generated by Polly, the
150
anatofuz
parents:
diff changeset
28 heuristics used by Pluto or by investigating if more code could be optimized.
anatofuz
parents:
diff changeset
29 As Pluto was never used at such a low level, its heuristics are probably
anatofuz
parents:
diff changeset
30 far from perfect. Another area where we expect larger performance improvements
anatofuz
parents:
diff changeset
31 is the SIMD vector code generation. At the moment, it rarely yields to
anatofuz
parents:
diff changeset
32 performance improvements, as we did not yet include vectorization in our
anatofuz
parents:
diff changeset
33 heuristics. By changing this we should be able to significantly increase the
anatofuz
parents:
diff changeset
34 number of test cases that show improvements.</p>
anatofuz
parents:
diff changeset
35
anatofuz
parents:
diff changeset
36 <p>The polybench test suite contains computation kernels from linear algebra
anatofuz
parents:
diff changeset
37 routines, stencil computations, image processing and data mining. Polly
236
c4bab56944e8 LLVM 16
kono
parents: 150
diff changeset
38 recognizes the majority of them and is able to show good speedup. However,
150
anatofuz
parents:
diff changeset
39 to show similar speedup on larger examples like the SPEC CPU benchmarks Polly
anatofuz
parents:
diff changeset
40 still misses support for integer casts, variable-sized multi-dimensional arrays
236
c4bab56944e8 LLVM 16
kono
parents: 150
diff changeset
41 and probably several other constructs. This support is necessary as such
150
anatofuz
parents:
diff changeset
42 constructs appear in larger programs, but not in our limited test suite.
anatofuz
parents:
diff changeset
43
anatofuz
parents:
diff changeset
44 <h2> Sequential runs</h2>
anatofuz
parents:
diff changeset
45
anatofuz
parents:
diff changeset
46 For the sequential runs we used Polly to create a program structure that is
anatofuz
parents:
diff changeset
47 optimized for data-locality. One of the major optimizations performed is tiling.
anatofuz
parents:
diff changeset
48 The speedups shown are without the use of any multi-core parallelism. No
anatofuz
parents:
diff changeset
49 additional hardware is used, but the single available core is used more
anatofuz
parents:
diff changeset
50 efficiently.
anatofuz
parents:
diff changeset
51 <h3> Small data size</h3>
anatofuz
parents:
diff changeset
52 <img src="images/performance/sequential-small.png" /><br />
anatofuz
parents:
diff changeset
53 <h3> Large data size</h3>
anatofuz
parents:
diff changeset
54 <img src="images/performance/sequential-large.png" />
anatofuz
parents:
diff changeset
55 <h2> Parallel runs</h2>
anatofuz
parents:
diff changeset
56 For the parallel runs we used Polly to expose parallelism and to add calls to an
anatofuz
parents:
diff changeset
57 OpenMP runtime library. With OpenMP we can use all 12 hardware cores
anatofuz
parents:
diff changeset
58 instead of the single core that was used before. We can see that in several
anatofuz
parents:
diff changeset
59 cases we obtain more than linear speedup. This additional speedup is due to
anatofuz
parents:
diff changeset
60 improved data-locality.
anatofuz
parents:
diff changeset
61 <h3> Small data size</h3>
anatofuz
parents:
diff changeset
62 <img src="images/performance/parallel-small.png" /><br />
anatofuz
parents:
diff changeset
63 <h3> Large data size</h3>
anatofuz
parents:
diff changeset
64 <img src="images/performance/parallel-large.png" />
anatofuz
parents:
diff changeset
65 </div>
anatofuz
parents:
diff changeset
66 </div>
anatofuz
parents:
diff changeset
67 </body>
anatofuz
parents:
diff changeset
68 </html>