CbC/CbC_llvm: polly/www/performance.html annotate

annotate polly/www/performance.html @ 240:ca573705d418

merge

author	matac
date	Fri, 28 Jul 2023 20:50:09 +0900
parents	c4bab56944e8
children

rev	line source
150 1d019706d866 LLVM10 anatofuz parents: diff changeset	1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
1d019706d866 LLVM10 anatofuz parents: diff changeset	2 "http://www.w3.org/TR/html4/strict.dtd">
1d019706d866 LLVM10 anatofuz parents: diff changeset	3 <!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
1d019706d866 LLVM10 anatofuz parents: diff changeset	4 <html>
1d019706d866 LLVM10 anatofuz parents: diff changeset	5 <head> <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
1d019706d866 LLVM10 anatofuz parents: diff changeset	6 <title>Polly - Performance</title>
1d019706d866 LLVM10 anatofuz parents: diff changeset	7 <link type="text/css" rel="stylesheet" href="menu.css">
1d019706d866 LLVM10 anatofuz parents: diff changeset	8 <link type="text/css" rel="stylesheet" href="content.css">
1d019706d866 LLVM10 anatofuz parents: diff changeset	9 </head>
1d019706d866 LLVM10 anatofuz parents: diff changeset	10 <body>
1d019706d866 LLVM10 anatofuz parents: diff changeset	11 <div id="box">
1d019706d866 LLVM10 anatofuz parents: diff changeset	12 <!--#include virtual="menu.html.incl"-->
1d019706d866 LLVM10 anatofuz parents: diff changeset	13 <div id="content">
1d019706d866 LLVM10 anatofuz parents: diff changeset	14 <h1>Performance</h1>
1d019706d866 LLVM10 anatofuz parents: diff changeset	15
1d019706d866 LLVM10 anatofuz parents: diff changeset	16 <p>To evaluate the performance benefits Polly currently provides we compiled the
1d019706d866 LLVM10 anatofuz parents: diff changeset	17 <a href="https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/">Polybench
1d019706d866 LLVM10 anatofuz parents: diff changeset	18 2.0</a> benchmark suite. Each benchmark was run with double precision floating
1d019706d866 LLVM10 anatofuz parents: diff changeset	19 point values on an Intel Core Xeon X5670 CPU @ 2.93GHz (12 cores, 24 thread)
1d019706d866 LLVM10 anatofuz parents: diff changeset	20 system. We used <a href="https://sourceforge.net/projects/pocc/files/">PoCC</a> and the included <a
1d019706d866 LLVM10 anatofuz parents: diff changeset	21 href="http://pluto-compiler.sf.net">Pluto</a> transformations to optimize the
1d019706d866 LLVM10 anatofuz parents: diff changeset	22 code. The source code of Polly and LLVM/clang was checked out on
1d019706d866 LLVM10 anatofuz parents: diff changeset	23 25/03/2011.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	24
1d019706d866 LLVM10 anatofuz parents: diff changeset	25 <p>The results shown were created fully automatically without manual
1d019706d866 LLVM10 anatofuz parents: diff changeset	26 interaction. We did not yet spend any time to tune the results. Hence
236 c4bab56944e8 LLVM 16 kono parents: 150 diff changeset	27 further improvements may be achieved by tuning the code generated by Polly, the
150 1d019706d866 LLVM10 anatofuz parents: diff changeset	28 heuristics used by Pluto or by investigating if more code could be optimized.
1d019706d866 LLVM10 anatofuz parents: diff changeset	29 As Pluto was never used at such a low level, its heuristics are probably
1d019706d866 LLVM10 anatofuz parents: diff changeset	30 far from perfect. Another area where we expect larger performance improvements
1d019706d866 LLVM10 anatofuz parents: diff changeset	31 is the SIMD vector code generation. At the moment, it rarely yields to
1d019706d866 LLVM10 anatofuz parents: diff changeset	32 performance improvements, as we did not yet include vectorization in our
1d019706d866 LLVM10 anatofuz parents: diff changeset	33 heuristics. By changing this we should be able to significantly increase the
1d019706d866 LLVM10 anatofuz parents: diff changeset	34 number of test cases that show improvements.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	35
1d019706d866 LLVM10 anatofuz parents: diff changeset	36 <p>The polybench test suite contains computation kernels from linear algebra
1d019706d866 LLVM10 anatofuz parents: diff changeset	37 routines, stencil computations, image processing and data mining. Polly
236 c4bab56944e8 LLVM 16 kono parents: 150 diff changeset	38 recognizes the majority of them and is able to show good speedup. However,
150 1d019706d866 LLVM10 anatofuz parents: diff changeset	39 to show similar speedup on larger examples like the SPEC CPU benchmarks Polly
1d019706d866 LLVM10 anatofuz parents: diff changeset	40 still misses support for integer casts, variable-sized multi-dimensional arrays
236 c4bab56944e8 LLVM 16 kono parents: 150 diff changeset	41 and probably several other constructs. This support is necessary as such
150 1d019706d866 LLVM10 anatofuz parents: diff changeset	42 constructs appear in larger programs, but not in our limited test suite.
1d019706d866 LLVM10 anatofuz parents: diff changeset	43
1d019706d866 LLVM10 anatofuz parents: diff changeset	44 <h2> Sequential runs</h2>
1d019706d866 LLVM10 anatofuz parents: diff changeset	45
1d019706d866 LLVM10 anatofuz parents: diff changeset	46 For the sequential runs we used Polly to create a program structure that is
1d019706d866 LLVM10 anatofuz parents: diff changeset	47 optimized for data-locality. One of the major optimizations performed is tiling.
1d019706d866 LLVM10 anatofuz parents: diff changeset	48 The speedups shown are without the use of any multi-core parallelism. No
1d019706d866 LLVM10 anatofuz parents: diff changeset	49 additional hardware is used, but the single available core is used more
1d019706d866 LLVM10 anatofuz parents: diff changeset	50 efficiently.
1d019706d866 LLVM10 anatofuz parents: diff changeset	51 <h3> Small data size</h3>
1d019706d866 LLVM10 anatofuz parents: diff changeset	52 <img src="images/performance/sequential-small.png" /><br />
1d019706d866 LLVM10 anatofuz parents: diff changeset	53 <h3> Large data size</h3>
1d019706d866 LLVM10 anatofuz parents: diff changeset	54 <img src="images/performance/sequential-large.png" />
1d019706d866 LLVM10 anatofuz parents: diff changeset	55 <h2> Parallel runs</h2>
1d019706d866 LLVM10 anatofuz parents: diff changeset	56 For the parallel runs we used Polly to expose parallelism and to add calls to an
1d019706d866 LLVM10 anatofuz parents: diff changeset	57 OpenMP runtime library. With OpenMP we can use all 12 hardware cores
1d019706d866 LLVM10 anatofuz parents: diff changeset	58 instead of the single core that was used before. We can see that in several
1d019706d866 LLVM10 anatofuz parents: diff changeset	59 cases we obtain more than linear speedup. This additional speedup is due to
1d019706d866 LLVM10 anatofuz parents: diff changeset	60 improved data-locality.
1d019706d866 LLVM10 anatofuz parents: diff changeset	61 <h3> Small data size</h3>
1d019706d866 LLVM10 anatofuz parents: diff changeset	62 <img src="images/performance/parallel-small.png" /><br />
1d019706d866 LLVM10 anatofuz parents: diff changeset	63 <h3> Large data size</h3>
1d019706d866 LLVM10 anatofuz parents: diff changeset	64 <img src="images/performance/parallel-large.png" />
1d019706d866 LLVM10 anatofuz parents: diff changeset	65 </div>
1d019706d866 LLVM10 anatofuz parents: diff changeset	66 </div>
1d019706d866 LLVM10 anatofuz parents: diff changeset	67 </body>
1d019706d866 LLVM10 anatofuz parents: diff changeset	68 </html>

Mercurial > hg > CbC > CbC_llvm

annotate polly/www/performance.html @ 240:ca573705d418