Mercurial > hg > CbC > CbC_llvm
comparison docs/Vectorizers.rst @ 148:63bd29f05246
merged
author | Shinji KONO <kono@ie.u-ryukyu.ac.jp> |
---|---|
date | Wed, 14 Aug 2019 19:46:37 +0900 |
parents | c2174574ed3a |
children |
comparison
equal
deleted
inserted
replaced
146:3fc4d5c3e21e | 148:63bd29f05246 |
---|---|
309 } | 309 } |
310 | 310 |
311 Vectorization of function calls | 311 Vectorization of function calls |
312 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 312 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
313 | 313 |
314 The Loop Vectorize can vectorize intrinsic math functions. | 314 The Loop Vectorizer can vectorize intrinsic math functions. |
315 See the table below for a list of these functions. | 315 See the table below for a list of these functions. |
316 | 316 |
317 +-----+-----+---------+ | 317 +-----+-----+---------+ |
318 | pow | exp | exp2 | | 318 | pow | exp | exp2 | |
319 +-----+-----+---------+ | 319 +-----+-----+---------+ |
325 +-----+-----+---------+ | 325 +-----+-----+---------+ |
326 |fma |trunc|nearbyint| | 326 |fma |trunc|nearbyint| |
327 +-----+-----+---------+ | 327 +-----+-----+---------+ |
328 | | | fmuladd | | 328 | | | fmuladd | |
329 +-----+-----+---------+ | 329 +-----+-----+---------+ |
330 | |
331 Note that the optimizer may not be able to vectorize math library functions | |
332 that correspond to these intrinsics if the library calls access external state | |
333 such as "errno". To allow better optimization of C/C++ math library functions, | |
334 use "-fno-math-errno". | |
330 | 335 |
331 The loop vectorizer knows about special instructions on the target and will | 336 The loop vectorizer knows about special instructions on the target and will |
332 vectorize a loop containing a function call that maps to the instructions. For | 337 vectorize a loop containing a function call that maps to the instructions. For |
333 example, the loop below will be vectorized on Intel x86 if the SSE4.1 roundps | 338 example, the loop below will be vectorized on Intel x86 if the SSE4.1 roundps |
334 instruction is available. | 339 instruction is available. |
367 | 372 |
368 Performance | 373 Performance |
369 ----------- | 374 ----------- |
370 | 375 |
371 This section shows the execution time of Clang on a simple benchmark: | 376 This section shows the execution time of Clang on a simple benchmark: |
372 `gcc-loops <http://llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/UnitTests/Vectorizer/>`_. | 377 `gcc-loops <https://github.com/llvm/llvm-test-suite/tree/master/SingleSource/UnitTests/Vectorizer>`_. |
373 This benchmarks is a collection of loops from the GCC autovectorization | 378 This benchmarks is a collection of loops from the GCC autovectorization |
374 `page <http://gcc.gnu.org/projects/tree-ssa/vectorization.html>`_ by Dorit Nuzman. | 379 `page <http://gcc.gnu.org/projects/tree-ssa/vectorization.html>`_ by Dorit Nuzman. |
375 | 380 |
376 The chart below compares GCC-4.7, ICC-13, and Clang-SVN with and without loop vectorization at -O3, tuned for "corei7-avx", running on a Sandybridge iMac. | 381 The chart below compares GCC-4.7, ICC-13, and Clang-SVN with and without loop vectorization at -O3, tuned for "corei7-avx", running on a Sandybridge iMac. |
377 The Y-axis shows the time in msec. Lower is better. The last column shows the geomean of all the kernels. | 382 The Y-axis shows the time in msec. Lower is better. The last column shows the geomean of all the kernels. |