CbC/CbC_llvm: polly/www/documentation/gpgpucodegen.html annotate

annotate polly/www/documentation/gpgpucodegen.html @ 150:1d019706d866

LLVM10

author	anatofuz
date	Thu, 13 Feb 2020 15:10:13 +0900
parents
children	0572611fdcc8

rev	line source
150 1d019706d866 LLVM10 anatofuz parents: diff changeset	1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
1d019706d866 LLVM10 anatofuz parents: diff changeset	2 "http://www.w3.org/TR/html4/strict.dtd">
1d019706d866 LLVM10 anatofuz parents: diff changeset	3 <!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
1d019706d866 LLVM10 anatofuz parents: diff changeset	4 <html>
1d019706d866 LLVM10 anatofuz parents: diff changeset	5 <head>
1d019706d866 LLVM10 anatofuz parents: diff changeset	6 <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
1d019706d866 LLVM10 anatofuz parents: diff changeset	7 <title>Polly - GPGPU Code Generation</title>
1d019706d866 LLVM10 anatofuz parents: diff changeset	8 <link type="text/css" rel="stylesheet" href="../menu.css">
1d019706d866 LLVM10 anatofuz parents: diff changeset	9 <link type="text/css" rel="stylesheet" href="../content.css">
1d019706d866 LLVM10 anatofuz parents: diff changeset	10 </head>
1d019706d866 LLVM10 anatofuz parents: diff changeset	11 <body>
1d019706d866 LLVM10 anatofuz parents: diff changeset	12 <div id="box">
1d019706d866 LLVM10 anatofuz parents: diff changeset	13 <!--#include virtual="../menu.html.incl"-->
1d019706d866 LLVM10 anatofuz parents: diff changeset	14 <div id="content">
1d019706d866 LLVM10 anatofuz parents: diff changeset	15 <!--*********************************************************************-->
1d019706d866 LLVM10 anatofuz parents: diff changeset	16 <h1>Polly - GPGPU Code Generation</h1>
1d019706d866 LLVM10 anatofuz parents: diff changeset	17 <!--*********************************************************************-->
1d019706d866 LLVM10 anatofuz parents: diff changeset	18 <p><em>WARNING: This project was part of the Google Summer of Code 2012.
1d019706d866 LLVM10 anatofuz parents: diff changeset	19 It is currently not finished, but it is in the design and implementation stage.
1d019706d866 LLVM10 anatofuz parents: diff changeset	20 The ideas/plans described here may not yet be implemented in Polly and may
1d019706d866 LLVM10 anatofuz parents: diff changeset	21 change later on.</em></p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	22
1d019706d866 LLVM10 anatofuz parents: diff changeset	23 This project adds GPGPU code generation feature to Polly.
1d019706d866 LLVM10 anatofuz parents: diff changeset	24
1d019706d866 LLVM10 anatofuz parents: diff changeset	25 <h2>Objective</h2>
1d019706d866 LLVM10 anatofuz parents: diff changeset	26 <p>The overall objective of this GSoC project is to create a preliminary
1d019706d866 LLVM10 anatofuz parents: diff changeset	27 implementation of GPGPU code generation for Polly. With this addition, users
1d019706d866 LLVM10 anatofuz parents: diff changeset	28 can parallelize some perfectly nested loops with Polly to execute on a
1d019706d866 LLVM10 anatofuz parents: diff changeset	29 heterogeneous platform, composed of CPU and GPU.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	30 <p>There are several successful projects about automatic source-to-source gpu
1d019706d866 LLVM10 anatofuz parents: diff changeset	31 code transformation. C-to-CUDA[1] uses the standard Pluto algorithms for
1d019706d866 LLVM10 anatofuz parents: diff changeset	32 computing an affine schedule and then applies a wavefront transformation to
1d019706d866 LLVM10 anatofuz parents: diff changeset	33 obtain one sequential and n-1 parallel loops. The parallel loops are then
1d019706d866 LLVM10 anatofuz parents: diff changeset	34 mapped onto the blocks and threads of GPU. PPCG[2] introduces some advanced
1d019706d866 LLVM10 anatofuz parents: diff changeset	35 algorithms which can expose much more parallelism than other methods . And It
1d019706d866 LLVM10 anatofuz parents: diff changeset	36 also introduces affine partition heuristics and code generation algorithms
1d019706d866 LLVM10 anatofuz parents: diff changeset	37 for locality enhancement in the registers and shared memory.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	38 <p>Since automatic GPGPU code generation is quite a complex problem and what we
1d019706d866 LLVM10 anatofuz parents: diff changeset	39 target is a low-level intermediate representation, LLVM IR, rather than a
1d019706d866 LLVM10 anatofuz parents: diff changeset	40 high-level language source, it is important for us to set a proper objective
1d019706d866 LLVM10 anatofuz parents: diff changeset	41 as a start step to give a complete solution to GPGPU code generation for LLVM
1d019706d866 LLVM10 anatofuz parents: diff changeset	42 IR.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	43 <p>Firstly, we plan to target two kinds of relatively simple test cases. One is
1d019706d866 LLVM10 anatofuz parents: diff changeset	44 comprised of pure parallel and perfectly nested loops, like the following
1d019706d866 LLVM10 anatofuz parents: diff changeset	45 code.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	46 <pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	47 parfor(int i=0 to M)
1d019706d866 LLVM10 anatofuz parents: diff changeset	48 parfor(int j=0 to N)
1d019706d866 LLVM10 anatofuz parents: diff changeset	49 LoopBody(i, j);
1d019706d866 LLVM10 anatofuz parents: diff changeset	50 </pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	51 <p>Another one is that all the loops in it are parallel except the inner-most
1d019706d866 LLVM10 anatofuz parents: diff changeset	52 one, just like this:</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	53 <pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	54 parfor(int i=0 to M)
1d019706d866 LLVM10 anatofuz parents: diff changeset	55 parfor(int j=0 to N)
1d019706d866 LLVM10 anatofuz parents: diff changeset	56 non-parfor(int k=0 to K)
1d019706d866 LLVM10 anatofuz parents: diff changeset	57 LoopBody(i, j, k);
1d019706d866 LLVM10 anatofuz parents: diff changeset	58 </pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	59 <p>The LoopBody part should be limited to instructions or functions calls
1d019706d866 LLVM10 anatofuz parents: diff changeset	60 (intrinsics) which can be handled by LLVM's NVPTX backend.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	61 <p>On the other hand, we focus on building a preliminary and scalable framework
1d019706d866 LLVM10 anatofuz parents: diff changeset	62 of GPGPU code generation for polly. Thus we plan to employ relatively simple
1d019706d866 LLVM10 anatofuz parents: diff changeset	63 tiling and mapping algorithms and optimize them later.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	64 <h2>Work Flow</h2>
1d019706d866 LLVM10 anatofuz parents: diff changeset	65 <h3>GPGPU Code Generation In General</h3>
1d019706d866 LLVM10 anatofuz parents: diff changeset	66 <p>C-to-CUDA[1] and PPCG[2] propose similar steps to solve the automatic GPGPU
1d019706d866 LLVM10 anatofuz parents: diff changeset	67 code generation problem.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	68 <li>Look for parallel loops.</li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	69 <li>Create a polyhedral model from the loops.</li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	70 <li>Tile and map the loops to GPU blocks and threads.</li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	71 <li>Determine where to place the data.</li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	72 <h3>What has been done in Polly</h3>
1d019706d866 LLVM10 anatofuz parents: diff changeset	73 <p>Polly has implemented the 1st, 2nd and part of the 3rd of the above steps and
1d019706d866 LLVM10 anatofuz parents: diff changeset	74 many other analysis and transformation passes.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	75 <h3>What to do in Polly</h3>
1d019706d866 LLVM10 anatofuz parents: diff changeset	76 <p>Unlike many source-to-source optimizers such as C-to-CUDA and PPCG, Polly is
1d019706d866 LLVM10 anatofuz parents: diff changeset	77 a low-level optimizer, which means we can't use a source-level compiler
1d019706d866 LLVM10 anatofuz parents: diff changeset	78 (e.g. NVCC) to generate the final assembly for the device. We need manually
1d019706d866 LLVM10 anatofuz parents: diff changeset	79 insert device driver API calls to execute the generated kernel assembly
1d019706d866 LLVM10 anatofuz parents: diff changeset	80 text.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	81 <p>In this project, we assume that the device driver library has provided an
1d019706d866 LLVM10 anatofuz parents: diff changeset	82 interface to launch kernels in the form of assembly text. Fortunately, most
1d019706d866 LLVM10 anatofuz parents: diff changeset	83 of the mainstream GPU vendors provide such a feature in thier products (see
1d019706d866 LLVM10 anatofuz parents: diff changeset	84 ptxjit of NVIDIA GPUs and CAL of AMD GPUs). Generally speaking, what we
1d019706d866 LLVM10 anatofuz parents: diff changeset	85 are going to do in Polly is:</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	86 <li>Find a way to tile the parallel loops.</li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	87 <li>Find a way to extract the loop body and transform it into thread-centric
1d019706d866 LLVM10 anatofuz parents: diff changeset	88 parallel code.</li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	89 <li>Find a way to store/load the thread-centric code into/from a device module.
1d019706d866 LLVM10 anatofuz parents: diff changeset	90 <li>Find a way to pass the target machine information and generate code of the
1d019706d866 LLVM10 anatofuz parents: diff changeset	91 device module for the target.
1d019706d866 LLVM10 anatofuz parents: diff changeset	92 <li>Find a way to map the tiled loop to GPU blocks and threads.</li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	93 <li>Find a way to insert CUDA synchronization operations on-demand.
1d019706d866 LLVM10 anatofuz parents: diff changeset	94 <li>Find a way to generate the memory copy operations between a host and a
1d019706d866 LLVM10 anatofuz parents: diff changeset	95 device.</li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	96 <li>Implement/Wrap a runtime library to serve as the execution engine for the
1d019706d866 LLVM10 anatofuz parents: diff changeset	97 generated device code.</li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	98
1d019706d866 LLVM10 anatofuz parents: diff changeset	99 <h3>The Work Flow</h3>
1d019706d866 LLVM10 anatofuz parents: diff changeset	100 <p>In this section, we assume that the host cpu is X86 and the device is NVIDIA
1d019706d866 LLVM10 anatofuz parents: diff changeset	101 CUDA-compatible. we will use the following test case to describe our work
1d019706d866 LLVM10 anatofuz parents: diff changeset	102 flow.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	103 <pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	104 for(i = 0; i < 128; i++)
1d019706d866 LLVM10 anatofuz parents: diff changeset	105 for(j = 0; j < 128; j++)
1d019706d866 LLVM10 anatofuz parents: diff changeset	106 A[i][j] = i*128 + j;
1d019706d866 LLVM10 anatofuz parents: diff changeset	107 </pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	108 <p>The work flow of our code generator is as follows.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	109 <p>1.We first use Polly's jscop file importer to get a wanted 4-level parallel
1d019706d866 LLVM10 anatofuz parents: diff changeset	110 tiled code.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	111 The "schedule" part of the pre-optimization jscop file is as the following:
1d019706d866 LLVM10 anatofuz parents: diff changeset	112 <pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	113 "schedule" : "{ Stmt_for_body3[i0, i1] -> schedule[0, i0, 0, i1, 0] }"
1d019706d866 LLVM10 anatofuz parents: diff changeset	114 </pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	115 The jscop file describing the tiling transformation is:
1d019706d866 LLVM10 anatofuz parents: diff changeset	116 <pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	117 "schedule" : "{ Stmt_for_body3[i0, i1] -> schedule[0, o0, o1, o2, o3]:
1d019706d866 LLVM10 anatofuz parents: diff changeset	118 o0 >= 0 and o0 <= 7 and o1 >= 0 and o1 <= 15 and
1d019706d866 LLVM10 anatofuz parents: diff changeset	119 o2 >= 0 and o2 <= 7 and o3 >= 0 and o3 <= 15 and
1d019706d866 LLVM10 anatofuz parents: diff changeset	120 i0 = 16o0 + o1 and i1 = 16o2 + o3 }"
1d019706d866 LLVM10 anatofuz parents: diff changeset	121 </pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	122 We can test the schedule with the following command line.
1d019706d866 LLVM10 anatofuz parents: diff changeset	123 <pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	124 opt -load /path/to/polly/build/LLVMPolly.so -basicaa -polly-import-jscop
1d019706d866 LLVM10 anatofuz parents: diff changeset	125 -polly-ast -analyze -q ./test.ll
1d019706d866 LLVM10 anatofuz parents: diff changeset	126 -polly-import-jscop-postfix=transformed+gpu
1d019706d866 LLVM10 anatofuz parents: diff changeset	127 </pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	128 The output of this schedule is:
1d019706d866 LLVM10 anatofuz parents: diff changeset	129 <pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	130 for (c2=0;c2<=7;c2++) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	131 for (c3=0;c3<=15;c3++) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	132 for (c4=0;c4<=7;c4++) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	133 for (c5=0;c5<=15;c5++) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	134 Stmt_for_body3(16c2+c3,16c4+c5);
1d019706d866 LLVM10 anatofuz parents: diff changeset	135 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	136 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	137 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	138 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	139 </pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	140 Now we get a 4-dimensional parallel loops with a single SCoP statement in it.
1d019706d866 LLVM10 anatofuz parents: diff changeset	141 <p>2.We then extract the loop body (or the inner-most non-parallel loop) into a
1d019706d866 LLVM10 anatofuz parents: diff changeset	142 LLVM function, tagging it with PTX_Kernel call convention.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	143 <p>3.We extract the PTX_kernel function into a temporary module, set the target
1d019706d866 LLVM10 anatofuz parents: diff changeset	144 triple (e.g. nvptx64-unknown-linux) for the module, transform the temporary
1d019706d866 LLVM10 anatofuz parents: diff changeset	145 module into a string, store it in the original module and erase the
1d019706d866 LLVM10 anatofuz parents: diff changeset	146 PTX_kernel function.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	147 <p>4.We replace the loops with their GPGPU counterpart. The GPGPU part of code
1d019706d866 LLVM10 anatofuz parents: diff changeset	148 is composed of a call to the llvm.codegen intrinsic and function calls to our
1d019706d866 LLVM10 anatofuz parents: diff changeset	149 GPU runtime library.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	150 <p>5.Finally, we generate the executable program with <em>llc</em> or run the
1d019706d866 LLVM10 anatofuz parents: diff changeset	151 optimized LLVM IRs with a JIT compiler like <em>lli</em>.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	152 <h2>Usage</h2>
1d019706d866 LLVM10 anatofuz parents: diff changeset	153 <p>1. Apply the llvm.codegen intrinsic patch to LLVM code base.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	154 <pre>cd /path/to/llvm/source
1d019706d866 LLVM10 anatofuz parents: diff changeset	155 git am /path/to/polly/source/utils/0001-Add-llvm.codegen-intrinsic.patch</pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	156 <p>2. Build the test case.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	157 <pre>/path/to/polly/source/test/create_ll.sh test.c</pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	158 <p>3. Get and edit the jscop file (take function "gpu_codegen" as an example).
1d019706d866 LLVM10 anatofuz parents: diff changeset	159 </p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	160 <pre>opt -load /path/to/polly/build/lib/LLVMPolly.so -basicaa
1d019706d866 LLVM10 anatofuz parents: diff changeset	161 -polly-export-jscop ./test.ll
1d019706d866 LLVM10 anatofuz parents: diff changeset	162 cp gpu_codegen___%for.cond---%for.end8.jscop
1d019706d866 LLVM10 anatofuz parents: diff changeset	163 gpu_codegen___%for.cond---%for.end8.jscop.transformed+gpu
1d019706d866 LLVM10 anatofuz parents: diff changeset	164 vi gpu_codegen___%for.cond---%for.end8.jscop.transformed+gpu</pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	165 <p><em>(Please refer to section "The Work Flow" on how to edit the "schedule"
1d019706d866 LLVM10 anatofuz parents: diff changeset	166 part of a statement)</em></p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	167 <p>4. Optimize the code with GPGPU code generation.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	168 <pre>opt -load /path/to/polly/build/lib/LLVMPolly.so -basicaa
1d019706d866 LLVM10 anatofuz parents: diff changeset	169 -polly-import-jscop-postfix=transformed+gpu -enable-polly-gpgpu
1d019706d866 LLVM10 anatofuz parents: diff changeset	170 -polly-gpgpu-triple=nvptx64-unknown-unknown -polly-codegen ./test.ll -S
1d019706d866 LLVM10 anatofuz parents: diff changeset	171 -o test.gpued.ll</pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	172 <p>5. Build the final assembly and executable.</p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	173 <pre>llc test.gpued.ll -o test.s
1d019706d866 LLVM10 anatofuz parents: diff changeset	174 gcc test.s -lGPURuntime -o test</pre>
1d019706d866 LLVM10 anatofuz parents: diff changeset	175 <p><em>(Please make sure that LD_LIBRARY_PATH is set properly so that
1d019706d866 LLVM10 anatofuz parents: diff changeset	176 /path/to/polly/build/lib/libGPURuntime.so is visible to gcc.)</em></p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	177 <h2>TODO List</h2>
1d019706d866 LLVM10 anatofuz parents: diff changeset	178
1d019706d866 LLVM10 anatofuz parents: diff changeset	179 <table class="wikitable" cellpadding="2">
1d019706d866 LLVM10 anatofuz parents: diff changeset	180 <tbody>
1d019706d866 LLVM10 anatofuz parents: diff changeset	181 <tr style="background: rgb(239, 239, 239)">
1d019706d866 LLVM10 anatofuz parents: diff changeset	182 <th width="400px"> Tasks</th>
1d019706d866 LLVM10 anatofuz parents: diff changeset	183 <th width="150px"> Status </th>
1d019706d866 LLVM10 anatofuz parents: diff changeset	184 <th> Owner </th>
1d019706d866 LLVM10 anatofuz parents: diff changeset	185 </tr>
1d019706d866 LLVM10 anatofuz parents: diff changeset	186 <tr>
1d019706d866 LLVM10 anatofuz parents: diff changeset	187 <th align="left">Tiling the Parallel Loops with An External Jscop File</th>
1d019706d866 LLVM10 anatofuz parents: diff changeset	188 <td align="center" class='open'>Open, In Design</td>
1d019706d866 LLVM10 anatofuz parents: diff changeset	189 <td>Yabin Hu</td>
1d019706d866 LLVM10 anatofuz parents: diff changeset	190 </tr>
1d019706d866 LLVM10 anatofuz parents: diff changeset	191 <tr>
1d019706d866 LLVM10 anatofuz parents: diff changeset	192 <th align="left">GPU Runtime Library Implementation</th>
1d019706d866 LLVM10 anatofuz parents: diff changeset	193 <td align="center" class='inprogress'>Coding Finished, In Reviewing</td>
1d019706d866 LLVM10 anatofuz parents: diff changeset	194 <td></td>
1d019706d866 LLVM10 anatofuz parents: diff changeset	195 </tr>
1d019706d866 LLVM10 anatofuz parents: diff changeset	196 <tr>
1d019706d866 LLVM10 anatofuz parents: diff changeset	197 <th align="left">llvm.codegen Intrinsic Implementation</th>
1d019706d866 LLVM10 anatofuz parents: diff changeset	198 <td align="center" class='inprogress'>Codeing Finished, To Be Reviewed</td>
1d019706d866 LLVM10 anatofuz parents: diff changeset	199 <td></td>
1d019706d866 LLVM10 anatofuz parents: diff changeset	200 </tr>
1d019706d866 LLVM10 anatofuz parents: diff changeset	201 <tr>
1d019706d866 LLVM10 anatofuz parents: diff changeset	202 <th align="left">Code Generation For Host</th>
1d019706d866 LLVM10 anatofuz parents: diff changeset	203 <td align="center" class='inprogress'>50% Done</td>
1d019706d866 LLVM10 anatofuz parents: diff changeset	204 <td></td>
1d019706d866 LLVM10 anatofuz parents: diff changeset	205 </tr>
1d019706d866 LLVM10 anatofuz parents: diff changeset	206
1d019706d866 LLVM10 anatofuz parents: diff changeset	207 </tbody></table>
1d019706d866 LLVM10 anatofuz parents: diff changeset	208
1d019706d866 LLVM10 anatofuz parents: diff changeset	209 <h2>References</h2>
1d019706d866 LLVM10 anatofuz parents: diff changeset	210 <li type="1" value="1">
1d019706d866 LLVM10 anatofuz parents: diff changeset	211 <em>Automatic C-to-CUDA Code Generation for Affine Programs. </em><br />
1d019706d866 LLVM10 anatofuz parents: diff changeset	212 Muthu Manikandan Baskaran, J. Ramanujam and P. Sadayappan.<br />
1d019706d866 LLVM10 anatofuz parents: diff changeset	213 International Conference on Compiler Construction (CC) 2010.<br />
1d019706d866 LLVM10 anatofuz parents: diff changeset	214 </li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	215 <li type="1"><em>PPCG Project</em><br />
1d019706d866 LLVM10 anatofuz parents: diff changeset	216 <a href="http://freecode.com/projects/ppcg">http://freecode.com/projects/ppcg
1d019706d866 LLVM10 anatofuz parents: diff changeset	217 </a></li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	218 <li type="1">
1d019706d866 LLVM10 anatofuz parents: diff changeset	219 <em>Where is the Data? Why You Cannot Debate GPU vs. CPU Performance Without the
1d019706d866 LLVM10 anatofuz parents: diff changeset	220 Answer. </em><br />
1d019706d866 LLVM10 anatofuz parents: diff changeset	221 Chris Gregg and Kim Hazelwood<br />
1d019706d866 LLVM10 anatofuz parents: diff changeset	222 International Symposium on Performance Analysis of Systems and Software
1d019706d866 LLVM10 anatofuz parents: diff changeset	223 (ISPASS) 2011.
1d019706d866 LLVM10 anatofuz parents: diff changeset	224 </li>
1d019706d866 LLVM10 anatofuz parents: diff changeset	225 <p></p>
1d019706d866 LLVM10 anatofuz parents: diff changeset	226 </div>
1d019706d866 LLVM10 anatofuz parents: diff changeset	227 </div>
1d019706d866 LLVM10 anatofuz parents: diff changeset	228 </body>
1d019706d866 LLVM10 anatofuz parents: diff changeset	229 </html>

Mercurial > hg > CbC > CbC_llvm

annotate polly/www/documentation/gpgpucodegen.html @ 150:1d019706d866