CbC/CbC_llvm: llvm/lib/Analysis/DivergenceAnalysis.cpp annotate

annotate llvm/lib/Analysis/DivergenceAnalysis.cpp @ 220:42394fc6a535

Added tag llvm12 for changeset 0572611fdcc8

author	Shinji KONO <kono@ie.u-ryukyu.ac.jp>
date	Tue, 15 Jun 2021 19:13:43 +0900
parents	1d019706d866
children	2e18cbf3894f

rev	line source
150 1d019706d866 LLVM10 anatofuz parents: diff changeset	1 //===- DivergenceAnalysis.cpp --------- Divergence Analysis Implementation -==//
1d019706d866 LLVM10 anatofuz parents: diff changeset	2 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	3 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
1d019706d866 LLVM10 anatofuz parents: diff changeset	4 // See https://llvm.org/LICENSE.txt for license information.
1d019706d866 LLVM10 anatofuz parents: diff changeset	5 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
1d019706d866 LLVM10 anatofuz parents: diff changeset	6 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	7 //===----------------------------------------------------------------------===//
1d019706d866 LLVM10 anatofuz parents: diff changeset	8 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	9 // This file implements a general divergence analysis for loop vectorization
1d019706d866 LLVM10 anatofuz parents: diff changeset	10 // and GPU programs. It determines which branches and values in a loop or GPU
1d019706d866 LLVM10 anatofuz parents: diff changeset	11 // program are divergent. It can help branch optimizations such as jump
1d019706d866 LLVM10 anatofuz parents: diff changeset	12 // threading and loop unswitching to make better decisions.
1d019706d866 LLVM10 anatofuz parents: diff changeset	13 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	14 // GPU programs typically use the SIMD execution model, where multiple threads
1d019706d866 LLVM10 anatofuz parents: diff changeset	15 // in the same execution group have to execute in lock-step. Therefore, if the
1d019706d866 LLVM10 anatofuz parents: diff changeset	16 // code contains divergent branches (i.e., threads in a group do not agree on
1d019706d866 LLVM10 anatofuz parents: diff changeset	17 // which path of the branch to take), the group of threads has to execute all
1d019706d866 LLVM10 anatofuz parents: diff changeset	18 // the paths from that branch with different subsets of threads enabled until
1d019706d866 LLVM10 anatofuz parents: diff changeset	19 // they re-converge.
1d019706d866 LLVM10 anatofuz parents: diff changeset	20 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	21 // Due to this execution model, some optimizations such as jump
1d019706d866 LLVM10 anatofuz parents: diff changeset	22 // threading and loop unswitching can interfere with thread re-convergence.
1d019706d866 LLVM10 anatofuz parents: diff changeset	23 // Therefore, an analysis that computes which branches in a GPU program are
1d019706d866 LLVM10 anatofuz parents: diff changeset	24 // divergent can help the compiler to selectively run these optimizations.
1d019706d866 LLVM10 anatofuz parents: diff changeset	25 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	26 // This implementation is derived from the Vectorization Analysis of the
1d019706d866 LLVM10 anatofuz parents: diff changeset	27 // Region Vectorizer (RV). That implementation in turn is based on the approach
1d019706d866 LLVM10 anatofuz parents: diff changeset	28 // described in
1d019706d866 LLVM10 anatofuz parents: diff changeset	29 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	30 // Improving Performance of OpenCL on CPUs
1d019706d866 LLVM10 anatofuz parents: diff changeset	31 // Ralf Karrenberg and Sebastian Hack
1d019706d866 LLVM10 anatofuz parents: diff changeset	32 // CC '12
1d019706d866 LLVM10 anatofuz parents: diff changeset	33 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	34 // This DivergenceAnalysis implementation is generic in the sense that it does
1d019706d866 LLVM10 anatofuz parents: diff changeset	35 // not itself identify original sources of divergence.
1d019706d866 LLVM10 anatofuz parents: diff changeset	36 // Instead specialized adapter classes, (LoopDivergenceAnalysis) for loops and
1d019706d866 LLVM10 anatofuz parents: diff changeset	37 // (GPUDivergenceAnalysis) for GPU programs, identify the sources of divergence
1d019706d866 LLVM10 anatofuz parents: diff changeset	38 // (e.g., special variables that hold the thread ID or the iteration variable).
1d019706d866 LLVM10 anatofuz parents: diff changeset	39 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	40 // The generic implementation propagates divergence to variables that are data
1d019706d866 LLVM10 anatofuz parents: diff changeset	41 // or sync dependent on a source of divergence.
1d019706d866 LLVM10 anatofuz parents: diff changeset	42 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	43 // While data dependency is a well-known concept, the notion of sync dependency
1d019706d866 LLVM10 anatofuz parents: diff changeset	44 // is worth more explanation. Sync dependence characterizes the control flow
1d019706d866 LLVM10 anatofuz parents: diff changeset	45 // aspect of the propagation of branch divergence. For example,
1d019706d866 LLVM10 anatofuz parents: diff changeset	46 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	47 // %cond = icmp slt i32 %tid, 10
1d019706d866 LLVM10 anatofuz parents: diff changeset	48 // br i1 %cond, label %then, label %else
1d019706d866 LLVM10 anatofuz parents: diff changeset	49 // then:
1d019706d866 LLVM10 anatofuz parents: diff changeset	50 // br label %merge
1d019706d866 LLVM10 anatofuz parents: diff changeset	51 // else:
1d019706d866 LLVM10 anatofuz parents: diff changeset	52 // br label %merge
1d019706d866 LLVM10 anatofuz parents: diff changeset	53 // merge:
1d019706d866 LLVM10 anatofuz parents: diff changeset	54 // %a = phi i32 [ 0, %then ], [ 1, %else ]
1d019706d866 LLVM10 anatofuz parents: diff changeset	55 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	56 // Suppose %tid holds the thread ID. Although %a is not data dependent on %tid
1d019706d866 LLVM10 anatofuz parents: diff changeset	57 // because %tid is not on its use-def chains, %a is sync dependent on %tid
1d019706d866 LLVM10 anatofuz parents: diff changeset	58 // because the branch "br i1 %cond" depends on %tid and affects which value %a
1d019706d866 LLVM10 anatofuz parents: diff changeset	59 // is assigned to.
1d019706d866 LLVM10 anatofuz parents: diff changeset	60 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	61 // The sync dependence detection (which branch induces divergence in which join
1d019706d866 LLVM10 anatofuz parents: diff changeset	62 // points) is implemented in the SyncDependenceAnalysis.
1d019706d866 LLVM10 anatofuz parents: diff changeset	63 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	64 // The current DivergenceAnalysis implementation has the following limitations:
1d019706d866 LLVM10 anatofuz parents: diff changeset	65 // 1. intra-procedural. It conservatively considers the arguments of a
1d019706d866 LLVM10 anatofuz parents: diff changeset	66 // non-kernel-entry function and the return value of a function call as
1d019706d866 LLVM10 anatofuz parents: diff changeset	67 // divergent.
1d019706d866 LLVM10 anatofuz parents: diff changeset	68 // 2. memory as black box. It conservatively considers values loaded from
1d019706d866 LLVM10 anatofuz parents: diff changeset	69 // generic or local address as divergent. This can be improved by leveraging
1d019706d866 LLVM10 anatofuz parents: diff changeset	70 // pointer analysis and/or by modelling non-escaping memory objects in SSA
1d019706d866 LLVM10 anatofuz parents: diff changeset	71 // as done in RV.
1d019706d866 LLVM10 anatofuz parents: diff changeset	72 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	73 //===----------------------------------------------------------------------===//
1d019706d866 LLVM10 anatofuz parents: diff changeset	74
1d019706d866 LLVM10 anatofuz parents: diff changeset	75 #include "llvm/Analysis/DivergenceAnalysis.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	76 #include "llvm/Analysis/LoopInfo.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	77 #include "llvm/Analysis/Passes.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	78 #include "llvm/Analysis/PostDominators.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	79 #include "llvm/Analysis/TargetTransformInfo.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	80 #include "llvm/IR/Dominators.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	81 #include "llvm/IR/InstIterator.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	82 #include "llvm/IR/Instructions.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	83 #include "llvm/IR/IntrinsicInst.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	84 #include "llvm/IR/Value.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	85 #include "llvm/Support/Debug.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	86 #include "llvm/Support/raw_ostream.h"
1d019706d866 LLVM10 anatofuz parents: diff changeset	87 #include <vector>
1d019706d866 LLVM10 anatofuz parents: diff changeset	88
1d019706d866 LLVM10 anatofuz parents: diff changeset	89 using namespace llvm;
1d019706d866 LLVM10 anatofuz parents: diff changeset	90
1d019706d866 LLVM10 anatofuz parents: diff changeset	91 #define DEBUG_TYPE "divergence-analysis"
1d019706d866 LLVM10 anatofuz parents: diff changeset	92
1d019706d866 LLVM10 anatofuz parents: diff changeset	93 // class DivergenceAnalysis
1d019706d866 LLVM10 anatofuz parents: diff changeset	94 DivergenceAnalysis::DivergenceAnalysis(
1d019706d866 LLVM10 anatofuz parents: diff changeset	95 const Function &F, const Loop *RegionLoop, const DominatorTree &DT,
1d019706d866 LLVM10 anatofuz parents: diff changeset	96 const LoopInfo &LI, SyncDependenceAnalysis &SDA, bool IsLCSSAForm)
1d019706d866 LLVM10 anatofuz parents: diff changeset	97 : F(F), RegionLoop(RegionLoop), DT(DT), LI(LI), SDA(SDA),
1d019706d866 LLVM10 anatofuz parents: diff changeset	98 IsLCSSAForm(IsLCSSAForm) {}
1d019706d866 LLVM10 anatofuz parents: diff changeset	99
1d019706d866 LLVM10 anatofuz parents: diff changeset	100 void DivergenceAnalysis::markDivergent(const Value &DivVal) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	101 assert(isa<Instruction>(DivVal) \|\| isa<Argument>(DivVal));
1d019706d866 LLVM10 anatofuz parents: diff changeset	102 assert(!isAlwaysUniform(DivVal) && "cannot be a divergent");
1d019706d866 LLVM10 anatofuz parents: diff changeset	103 DivergentValues.insert(&DivVal);
1d019706d866 LLVM10 anatofuz parents: diff changeset	104 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	105
1d019706d866 LLVM10 anatofuz parents: diff changeset	106 void DivergenceAnalysis::addUniformOverride(const Value &UniVal) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	107 UniformOverrides.insert(&UniVal);
1d019706d866 LLVM10 anatofuz parents: diff changeset	108 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	109
1d019706d866 LLVM10 anatofuz parents: diff changeset	110 bool DivergenceAnalysis::updateTerminator(const Instruction &Term) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	111 if (Term.getNumSuccessors() <= 1)
1d019706d866 LLVM10 anatofuz parents: diff changeset	112 return false;
1d019706d866 LLVM10 anatofuz parents: diff changeset	113 if (auto *BranchTerm = dyn_cast<BranchInst>(&Term)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	114 assert(BranchTerm->isConditional());
1d019706d866 LLVM10 anatofuz parents: diff changeset	115 return isDivergent(*BranchTerm->getCondition());
1d019706d866 LLVM10 anatofuz parents: diff changeset	116 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	117 if (auto *SwitchTerm = dyn_cast<SwitchInst>(&Term)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	118 return isDivergent(*SwitchTerm->getCondition());
1d019706d866 LLVM10 anatofuz parents: diff changeset	119 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	120 if (isa<InvokeInst>(Term)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	121 return false; // ignore abnormal executions through landingpad
1d019706d866 LLVM10 anatofuz parents: diff changeset	122 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	123
1d019706d866 LLVM10 anatofuz parents: diff changeset	124 llvm_unreachable("unexpected terminator");
1d019706d866 LLVM10 anatofuz parents: diff changeset	125 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	126
1d019706d866 LLVM10 anatofuz parents: diff changeset	127 bool DivergenceAnalysis::updateNormalInstruction(const Instruction &I) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	128 // TODO function calls with side effects, etc
1d019706d866 LLVM10 anatofuz parents: diff changeset	129 for (const auto &Op : I.operands()) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	130 if (isDivergent(*Op))
1d019706d866 LLVM10 anatofuz parents: diff changeset	131 return true;
1d019706d866 LLVM10 anatofuz parents: diff changeset	132 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	133 return false;
1d019706d866 LLVM10 anatofuz parents: diff changeset	134 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	135
1d019706d866 LLVM10 anatofuz parents: diff changeset	136 bool DivergenceAnalysis::isTemporalDivergent(const BasicBlock &ObservingBlock,
1d019706d866 LLVM10 anatofuz parents: diff changeset	137 const Value &Val) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	138 const auto *Inst = dyn_cast<const Instruction>(&Val);
1d019706d866 LLVM10 anatofuz parents: diff changeset	139 if (!Inst)
1d019706d866 LLVM10 anatofuz parents: diff changeset	140 return false;
1d019706d866 LLVM10 anatofuz parents: diff changeset	141 // check whether any divergent loop carrying Val terminates before control
1d019706d866 LLVM10 anatofuz parents: diff changeset	142 // proceeds to ObservingBlock
1d019706d866 LLVM10 anatofuz parents: diff changeset	143 for (const auto *Loop = LI.getLoopFor(Inst->getParent());
1d019706d866 LLVM10 anatofuz parents: diff changeset	144 Loop != RegionLoop && !Loop->contains(&ObservingBlock);
1d019706d866 LLVM10 anatofuz parents: diff changeset	145 Loop = Loop->getParentLoop()) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	146 if (DivergentLoops.find(Loop) != DivergentLoops.end())
1d019706d866 LLVM10 anatofuz parents: diff changeset	147 return true;
1d019706d866 LLVM10 anatofuz parents: diff changeset	148 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	149
1d019706d866 LLVM10 anatofuz parents: diff changeset	150 return false;
1d019706d866 LLVM10 anatofuz parents: diff changeset	151 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	152
1d019706d866 LLVM10 anatofuz parents: diff changeset	153 bool DivergenceAnalysis::updatePHINode(const PHINode &Phi) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	154 // joining divergent disjoint path in Phi parent block
1d019706d866 LLVM10 anatofuz parents: diff changeset	155 if (!Phi.hasConstantOrUndefValue() && isJoinDivergent(*Phi.getParent())) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	156 return true;
1d019706d866 LLVM10 anatofuz parents: diff changeset	157 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	158
1d019706d866 LLVM10 anatofuz parents: diff changeset	159 // An incoming value could be divergent by itself.
1d019706d866 LLVM10 anatofuz parents: diff changeset	160 // Otherwise, an incoming value could be uniform within the loop
1d019706d866 LLVM10 anatofuz parents: diff changeset	161 // that carries its definition but it may appear divergent
1d019706d866 LLVM10 anatofuz parents: diff changeset	162 // from outside the loop. This happens when divergent loop exits
1d019706d866 LLVM10 anatofuz parents: diff changeset	163 // drop definitions of that uniform value in different iterations.
1d019706d866 LLVM10 anatofuz parents: diff changeset	164 //
1d019706d866 LLVM10 anatofuz parents: diff changeset	165 // for (int i = 0; i < n; ++i) { // 'i' is uniform inside the loop
1d019706d866 LLVM10 anatofuz parents: diff changeset	166 // if (i % thread_id == 0) break; // divergent loop exit
1d019706d866 LLVM10 anatofuz parents: diff changeset	167 // }
1d019706d866 LLVM10 anatofuz parents: diff changeset	168 // int divI = i; // divI is divergent
1d019706d866 LLVM10 anatofuz parents: diff changeset	169 for (size_t i = 0; i < Phi.getNumIncomingValues(); ++i) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	170 const auto *InVal = Phi.getIncomingValue(i);
1d019706d866 LLVM10 anatofuz parents: diff changeset	171 if (isDivergent(*Phi.getIncomingValue(i)) \|\|
1d019706d866 LLVM10 anatofuz parents: diff changeset	172 isTemporalDivergent(Phi.getParent(), InVal)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	173 return true;
1d019706d866 LLVM10 anatofuz parents: diff changeset	174 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	175 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	176 return false;
1d019706d866 LLVM10 anatofuz parents: diff changeset	177 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	178
1d019706d866 LLVM10 anatofuz parents: diff changeset	179 bool DivergenceAnalysis::inRegion(const Instruction &I) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	180 return I.getParent() && inRegion(*I.getParent());
1d019706d866 LLVM10 anatofuz parents: diff changeset	181 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	182
1d019706d866 LLVM10 anatofuz parents: diff changeset	183 bool DivergenceAnalysis::inRegion(const BasicBlock &BB) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	184 return (!RegionLoop && BB.getParent() == &F) \|\| RegionLoop->contains(&BB);
1d019706d866 LLVM10 anatofuz parents: diff changeset	185 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	186
1d019706d866 LLVM10 anatofuz parents: diff changeset	187 // marks all users of loop-carried values of the loop headed by LoopHeader as
1d019706d866 LLVM10 anatofuz parents: diff changeset	188 // divergent
1d019706d866 LLVM10 anatofuz parents: diff changeset	189 void DivergenceAnalysis::taintLoopLiveOuts(const BasicBlock &LoopHeader) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	190 auto *DivLoop = LI.getLoopFor(&LoopHeader);
1d019706d866 LLVM10 anatofuz parents: diff changeset	191 assert(DivLoop && "loopHeader is not actually part of a loop");
1d019706d866 LLVM10 anatofuz parents: diff changeset	192
1d019706d866 LLVM10 anatofuz parents: diff changeset	193 SmallVector<BasicBlock *, 8> TaintStack;
1d019706d866 LLVM10 anatofuz parents: diff changeset	194 DivLoop->getExitBlocks(TaintStack);
1d019706d866 LLVM10 anatofuz parents: diff changeset	195
1d019706d866 LLVM10 anatofuz parents: diff changeset	196 // Otherwise potential users of loop-carried values could be anywhere in the
1d019706d866 LLVM10 anatofuz parents: diff changeset	197 // dominance region of DivLoop (including its fringes for phi nodes)
1d019706d866 LLVM10 anatofuz parents: diff changeset	198 DenseSet<const BasicBlock *> Visited;
1d019706d866 LLVM10 anatofuz parents: diff changeset	199 for (auto *Block : TaintStack) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	200 Visited.insert(Block);
1d019706d866 LLVM10 anatofuz parents: diff changeset	201 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	202 Visited.insert(&LoopHeader);
1d019706d866 LLVM10 anatofuz parents: diff changeset	203
1d019706d866 LLVM10 anatofuz parents: diff changeset	204 while (!TaintStack.empty()) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	205 auto *UserBlock = TaintStack.back();
1d019706d866 LLVM10 anatofuz parents: diff changeset	206 TaintStack.pop_back();
1d019706d866 LLVM10 anatofuz parents: diff changeset	207
1d019706d866 LLVM10 anatofuz parents: diff changeset	208 // don't spread divergence beyond the region
1d019706d866 LLVM10 anatofuz parents: diff changeset	209 if (!inRegion(*UserBlock))
1d019706d866 LLVM10 anatofuz parents: diff changeset	210 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	211
1d019706d866 LLVM10 anatofuz parents: diff changeset	212 assert(!DivLoop->contains(UserBlock) &&
1d019706d866 LLVM10 anatofuz parents: diff changeset	213 "irreducible control flow detected");
1d019706d866 LLVM10 anatofuz parents: diff changeset	214
1d019706d866 LLVM10 anatofuz parents: diff changeset	215 // phi nodes at the fringes of the dominance region
1d019706d866 LLVM10 anatofuz parents: diff changeset	216 if (!DT.dominates(&LoopHeader, UserBlock)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	217 // all PHI nodes of UserBlock become divergent
1d019706d866 LLVM10 anatofuz parents: diff changeset	218 for (auto &Phi : UserBlock->phis()) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	219 Worklist.push_back(&Phi);
1d019706d866 LLVM10 anatofuz parents: diff changeset	220 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	221 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	222 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	223
1d019706d866 LLVM10 anatofuz parents: diff changeset	224 // taint outside users of values carried by DivLoop
1d019706d866 LLVM10 anatofuz parents: diff changeset	225 for (auto &I : *UserBlock) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	226 if (isAlwaysUniform(I))
1d019706d866 LLVM10 anatofuz parents: diff changeset	227 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	228 if (isDivergent(I))
1d019706d866 LLVM10 anatofuz parents: diff changeset	229 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	230
1d019706d866 LLVM10 anatofuz parents: diff changeset	231 for (auto &Op : I.operands()) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	232 auto *OpInst = dyn_cast<Instruction>(&Op);
1d019706d866 LLVM10 anatofuz parents: diff changeset	233 if (!OpInst)
1d019706d866 LLVM10 anatofuz parents: diff changeset	234 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	235 if (DivLoop->contains(OpInst->getParent())) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	236 markDivergent(I);
1d019706d866 LLVM10 anatofuz parents: diff changeset	237 pushUsers(I);
1d019706d866 LLVM10 anatofuz parents: diff changeset	238 break;
1d019706d866 LLVM10 anatofuz parents: diff changeset	239 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	240 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	241 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	242
1d019706d866 LLVM10 anatofuz parents: diff changeset	243 // visit all blocks in the dominance region
1d019706d866 LLVM10 anatofuz parents: diff changeset	244 for (auto *SuccBlock : successors(UserBlock)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	245 if (!Visited.insert(SuccBlock).second) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	246 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	247 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	248 TaintStack.push_back(SuccBlock);
1d019706d866 LLVM10 anatofuz parents: diff changeset	249 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	250 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	251 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	252
1d019706d866 LLVM10 anatofuz parents: diff changeset	253 void DivergenceAnalysis::pushPHINodes(const BasicBlock &Block) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	254 for (const auto &Phi : Block.phis()) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	255 if (isDivergent(Phi))
1d019706d866 LLVM10 anatofuz parents: diff changeset	256 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	257 Worklist.push_back(&Phi);
1d019706d866 LLVM10 anatofuz parents: diff changeset	258 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	259 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	260
1d019706d866 LLVM10 anatofuz parents: diff changeset	261 void DivergenceAnalysis::pushUsers(const Value &V) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	262 for (const auto *User : V.users()) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	263 const auto *UserInst = dyn_cast<const Instruction>(User);
1d019706d866 LLVM10 anatofuz parents: diff changeset	264 if (!UserInst)
1d019706d866 LLVM10 anatofuz parents: diff changeset	265 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	266
1d019706d866 LLVM10 anatofuz parents: diff changeset	267 if (isDivergent(*UserInst))
1d019706d866 LLVM10 anatofuz parents: diff changeset	268 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	269
1d019706d866 LLVM10 anatofuz parents: diff changeset	270 // only compute divergent inside loop
1d019706d866 LLVM10 anatofuz parents: diff changeset	271 if (!inRegion(*UserInst))
1d019706d866 LLVM10 anatofuz parents: diff changeset	272 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	273 Worklist.push_back(UserInst);
1d019706d866 LLVM10 anatofuz parents: diff changeset	274 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	275 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	276
1d019706d866 LLVM10 anatofuz parents: diff changeset	277 bool DivergenceAnalysis::propagateJoinDivergence(const BasicBlock &JoinBlock,
1d019706d866 LLVM10 anatofuz parents: diff changeset	278 const Loop *BranchLoop) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	279 LLVM_DEBUG(dbgs() << "\tpropJoinDiv " << JoinBlock.getName() << "\n");
1d019706d866 LLVM10 anatofuz parents: diff changeset	280
1d019706d866 LLVM10 anatofuz parents: diff changeset	281 // ignore divergence outside the region
1d019706d866 LLVM10 anatofuz parents: diff changeset	282 if (!inRegion(JoinBlock)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	283 return false;
1d019706d866 LLVM10 anatofuz parents: diff changeset	284 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	285
1d019706d866 LLVM10 anatofuz parents: diff changeset	286 // push non-divergent phi nodes in JoinBlock to the worklist
1d019706d866 LLVM10 anatofuz parents: diff changeset	287 pushPHINodes(JoinBlock);
1d019706d866 LLVM10 anatofuz parents: diff changeset	288
1d019706d866 LLVM10 anatofuz parents: diff changeset	289 // JoinBlock is a divergent loop exit
1d019706d866 LLVM10 anatofuz parents: diff changeset	290 if (BranchLoop && !BranchLoop->contains(&JoinBlock)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	291 return true;
1d019706d866 LLVM10 anatofuz parents: diff changeset	292 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	293
1d019706d866 LLVM10 anatofuz parents: diff changeset	294 // disjoint-paths divergent at JoinBlock
1d019706d866 LLVM10 anatofuz parents: diff changeset	295 markBlockJoinDivergent(JoinBlock);
1d019706d866 LLVM10 anatofuz parents: diff changeset	296 return false;
1d019706d866 LLVM10 anatofuz parents: diff changeset	297 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	298
1d019706d866 LLVM10 anatofuz parents: diff changeset	299 void DivergenceAnalysis::propagateBranchDivergence(const Instruction &Term) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	300 LLVM_DEBUG(dbgs() << "propBranchDiv " << Term.getParent()->getName() << "\n");
1d019706d866 LLVM10 anatofuz parents: diff changeset	301
1d019706d866 LLVM10 anatofuz parents: diff changeset	302 markDivergent(Term);
1d019706d866 LLVM10 anatofuz parents: diff changeset	303
1d019706d866 LLVM10 anatofuz parents: diff changeset	304 // Don't propagate divergence from unreachable blocks.
1d019706d866 LLVM10 anatofuz parents: diff changeset	305 if (!DT.isReachableFromEntry(Term.getParent()))
1d019706d866 LLVM10 anatofuz parents: diff changeset	306 return;
1d019706d866 LLVM10 anatofuz parents: diff changeset	307
1d019706d866 LLVM10 anatofuz parents: diff changeset	308 const auto *BranchLoop = LI.getLoopFor(Term.getParent());
1d019706d866 LLVM10 anatofuz parents: diff changeset	309
1d019706d866 LLVM10 anatofuz parents: diff changeset	310 // whether there is a divergent loop exit from BranchLoop (if any)
1d019706d866 LLVM10 anatofuz parents: diff changeset	311 bool IsBranchLoopDivergent = false;
1d019706d866 LLVM10 anatofuz parents: diff changeset	312
1d019706d866 LLVM10 anatofuz parents: diff changeset	313 // iterate over all blocks reachable by disjoint from Term within the loop
1d019706d866 LLVM10 anatofuz parents: diff changeset	314 // also iterates over loop exits that become divergent due to Term.
1d019706d866 LLVM10 anatofuz parents: diff changeset	315 for (const auto *JoinBlock : SDA.join_blocks(Term)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	316 IsBranchLoopDivergent \|= propagateJoinDivergence(*JoinBlock, BranchLoop);
1d019706d866 LLVM10 anatofuz parents: diff changeset	317 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	318
1d019706d866 LLVM10 anatofuz parents: diff changeset	319 // Branch loop is a divergent loop due to the divergent branch in Term
1d019706d866 LLVM10 anatofuz parents: diff changeset	320 if (IsBranchLoopDivergent) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	321 assert(BranchLoop);
1d019706d866 LLVM10 anatofuz parents: diff changeset	322 if (!DivergentLoops.insert(BranchLoop).second) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	323 return;
1d019706d866 LLVM10 anatofuz parents: diff changeset	324 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	325 propagateLoopDivergence(*BranchLoop);
1d019706d866 LLVM10 anatofuz parents: diff changeset	326 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	327 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	328
1d019706d866 LLVM10 anatofuz parents: diff changeset	329 void DivergenceAnalysis::propagateLoopDivergence(const Loop &ExitingLoop) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	330 LLVM_DEBUG(dbgs() << "propLoopDiv " << ExitingLoop.getName() << "\n");
1d019706d866 LLVM10 anatofuz parents: diff changeset	331
1d019706d866 LLVM10 anatofuz parents: diff changeset	332 // don't propagate beyond region
1d019706d866 LLVM10 anatofuz parents: diff changeset	333 if (!inRegion(*ExitingLoop.getHeader()))
1d019706d866 LLVM10 anatofuz parents: diff changeset	334 return;
1d019706d866 LLVM10 anatofuz parents: diff changeset	335
1d019706d866 LLVM10 anatofuz parents: diff changeset	336 const auto *BranchLoop = ExitingLoop.getParentLoop();
1d019706d866 LLVM10 anatofuz parents: diff changeset	337
1d019706d866 LLVM10 anatofuz parents: diff changeset	338 // Uses of loop-carried values could occur anywhere
1d019706d866 LLVM10 anatofuz parents: diff changeset	339 // within the dominance region of the definition. All loop-carried
1d019706d866 LLVM10 anatofuz parents: diff changeset	340 // definitions are dominated by the loop header (reducible control).
1d019706d866 LLVM10 anatofuz parents: diff changeset	341 // Thus all users have to be in the dominance region of the loop header,
1d019706d866 LLVM10 anatofuz parents: diff changeset	342 // except PHI nodes that can also live at the fringes of the dom region
1d019706d866 LLVM10 anatofuz parents: diff changeset	343 // (incoming defining value).
1d019706d866 LLVM10 anatofuz parents: diff changeset	344 if (!IsLCSSAForm)
1d019706d866 LLVM10 anatofuz parents: diff changeset	345 taintLoopLiveOuts(*ExitingLoop.getHeader());
1d019706d866 LLVM10 anatofuz parents: diff changeset	346
1d019706d866 LLVM10 anatofuz parents: diff changeset	347 // whether there is a divergent loop exit from BranchLoop (if any)
1d019706d866 LLVM10 anatofuz parents: diff changeset	348 bool IsBranchLoopDivergent = false;
1d019706d866 LLVM10 anatofuz parents: diff changeset	349
1d019706d866 LLVM10 anatofuz parents: diff changeset	350 // iterate over all blocks reachable by disjoint paths from exits of
1d019706d866 LLVM10 anatofuz parents: diff changeset	351 // ExitingLoop also iterates over loop exits (of BranchLoop) that in turn
1d019706d866 LLVM10 anatofuz parents: diff changeset	352 // become divergent.
1d019706d866 LLVM10 anatofuz parents: diff changeset	353 for (const auto *JoinBlock : SDA.join_blocks(ExitingLoop)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	354 IsBranchLoopDivergent \|= propagateJoinDivergence(*JoinBlock, BranchLoop);
1d019706d866 LLVM10 anatofuz parents: diff changeset	355 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	356
1d019706d866 LLVM10 anatofuz parents: diff changeset	357 // Branch loop is a divergent due to divergent loop exit in ExitingLoop
1d019706d866 LLVM10 anatofuz parents: diff changeset	358 if (IsBranchLoopDivergent) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	359 assert(BranchLoop);
1d019706d866 LLVM10 anatofuz parents: diff changeset	360 if (!DivergentLoops.insert(BranchLoop).second) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	361 return;
1d019706d866 LLVM10 anatofuz parents: diff changeset	362 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	363 propagateLoopDivergence(*BranchLoop);
1d019706d866 LLVM10 anatofuz parents: diff changeset	364 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	365 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	366
1d019706d866 LLVM10 anatofuz parents: diff changeset	367 void DivergenceAnalysis::compute() {
1d019706d866 LLVM10 anatofuz parents: diff changeset	368 for (auto *DivVal : DivergentValues) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	369 pushUsers(*DivVal);
1d019706d866 LLVM10 anatofuz parents: diff changeset	370 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	371
1d019706d866 LLVM10 anatofuz parents: diff changeset	372 // propagate divergence
1d019706d866 LLVM10 anatofuz parents: diff changeset	373 while (!Worklist.empty()) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	374 const Instruction &I = *Worklist.back();
1d019706d866 LLVM10 anatofuz parents: diff changeset	375 Worklist.pop_back();
1d019706d866 LLVM10 anatofuz parents: diff changeset	376
1d019706d866 LLVM10 anatofuz parents: diff changeset	377 // maintain uniformity of overrides
1d019706d866 LLVM10 anatofuz parents: diff changeset	378 if (isAlwaysUniform(I))
1d019706d866 LLVM10 anatofuz parents: diff changeset	379 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	380
1d019706d866 LLVM10 anatofuz parents: diff changeset	381 bool WasDivergent = isDivergent(I);
1d019706d866 LLVM10 anatofuz parents: diff changeset	382 if (WasDivergent)
1d019706d866 LLVM10 anatofuz parents: diff changeset	383 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	384
1d019706d866 LLVM10 anatofuz parents: diff changeset	385 // propagate divergence caused by terminator
1d019706d866 LLVM10 anatofuz parents: diff changeset	386 if (I.isTerminator()) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	387 if (updateTerminator(I)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	388 // propagate control divergence to affected instructions
1d019706d866 LLVM10 anatofuz parents: diff changeset	389 propagateBranchDivergence(I);
1d019706d866 LLVM10 anatofuz parents: diff changeset	390 continue;
1d019706d866 LLVM10 anatofuz parents: diff changeset	391 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	392 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	393
1d019706d866 LLVM10 anatofuz parents: diff changeset	394 // update divergence of I due to divergent operands
1d019706d866 LLVM10 anatofuz parents: diff changeset	395 bool DivergentUpd = false;
1d019706d866 LLVM10 anatofuz parents: diff changeset	396 const auto *Phi = dyn_cast<const PHINode>(&I);
1d019706d866 LLVM10 anatofuz parents: diff changeset	397 if (Phi) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	398 DivergentUpd = updatePHINode(*Phi);
1d019706d866 LLVM10 anatofuz parents: diff changeset	399 } else {
1d019706d866 LLVM10 anatofuz parents: diff changeset	400 DivergentUpd = updateNormalInstruction(I);
1d019706d866 LLVM10 anatofuz parents: diff changeset	401 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	402
1d019706d866 LLVM10 anatofuz parents: diff changeset	403 // propagate value divergence to users
1d019706d866 LLVM10 anatofuz parents: diff changeset	404 if (DivergentUpd) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	405 markDivergent(I);
1d019706d866 LLVM10 anatofuz parents: diff changeset	406 pushUsers(I);
1d019706d866 LLVM10 anatofuz parents: diff changeset	407 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	408 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	409 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	410
1d019706d866 LLVM10 anatofuz parents: diff changeset	411 bool DivergenceAnalysis::isAlwaysUniform(const Value &V) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	412 return UniformOverrides.find(&V) != UniformOverrides.end();
1d019706d866 LLVM10 anatofuz parents: diff changeset	413 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	414
1d019706d866 LLVM10 anatofuz parents: diff changeset	415 bool DivergenceAnalysis::isDivergent(const Value &V) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	416 return DivergentValues.find(&V) != DivergentValues.end();
1d019706d866 LLVM10 anatofuz parents: diff changeset	417 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	418
1d019706d866 LLVM10 anatofuz parents: diff changeset	419 bool DivergenceAnalysis::isDivergentUse(const Use &U) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	420 Value &V = *U.get();
1d019706d866 LLVM10 anatofuz parents: diff changeset	421 Instruction &I = *cast<Instruction>(U.getUser());
1d019706d866 LLVM10 anatofuz parents: diff changeset	422 return isDivergent(V) \|\| isTemporalDivergent(*I.getParent(), V);
1d019706d866 LLVM10 anatofuz parents: diff changeset	423 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	424
1d019706d866 LLVM10 anatofuz parents: diff changeset	425 void DivergenceAnalysis::print(raw_ostream &OS, const Module *) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	426 if (DivergentValues.empty())
1d019706d866 LLVM10 anatofuz parents: diff changeset	427 return;
1d019706d866 LLVM10 anatofuz parents: diff changeset	428 // iterate instructions using instructions() to ensure a deterministic order.
1d019706d866 LLVM10 anatofuz parents: diff changeset	429 for (auto &I : instructions(F)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	430 if (isDivergent(I))
1d019706d866 LLVM10 anatofuz parents: diff changeset	431 OS << "DIVERGENT:" << I << '\n';
1d019706d866 LLVM10 anatofuz parents: diff changeset	432 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	433 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	434
1d019706d866 LLVM10 anatofuz parents: diff changeset	435 // class GPUDivergenceAnalysis
1d019706d866 LLVM10 anatofuz parents: diff changeset	436 GPUDivergenceAnalysis::GPUDivergenceAnalysis(Function &F,
1d019706d866 LLVM10 anatofuz parents: diff changeset	437 const DominatorTree &DT,
1d019706d866 LLVM10 anatofuz parents: diff changeset	438 const PostDominatorTree &PDT,
1d019706d866 LLVM10 anatofuz parents: diff changeset	439 const LoopInfo &LI,
1d019706d866 LLVM10 anatofuz parents: diff changeset	440 const TargetTransformInfo &TTI)
1d019706d866 LLVM10 anatofuz parents: diff changeset	441 : SDA(DT, PDT, LI), DA(F, nullptr, DT, LI, SDA, false) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	442 for (auto &I : instructions(F)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	443 if (TTI.isSourceOfDivergence(&I)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	444 DA.markDivergent(I);
1d019706d866 LLVM10 anatofuz parents: diff changeset	445 } else if (TTI.isAlwaysUniform(&I)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	446 DA.addUniformOverride(I);
1d019706d866 LLVM10 anatofuz parents: diff changeset	447 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	448 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	449 for (auto &Arg : F.args()) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	450 if (TTI.isSourceOfDivergence(&Arg)) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	451 DA.markDivergent(Arg);
1d019706d866 LLVM10 anatofuz parents: diff changeset	452 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	453 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	454
1d019706d866 LLVM10 anatofuz parents: diff changeset	455 DA.compute();
1d019706d866 LLVM10 anatofuz parents: diff changeset	456 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	457
1d019706d866 LLVM10 anatofuz parents: diff changeset	458 bool GPUDivergenceAnalysis::isDivergent(const Value &val) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	459 return DA.isDivergent(val);
1d019706d866 LLVM10 anatofuz parents: diff changeset	460 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	461
1d019706d866 LLVM10 anatofuz parents: diff changeset	462 bool GPUDivergenceAnalysis::isDivergentUse(const Use &use) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	463 return DA.isDivergentUse(use);
1d019706d866 LLVM10 anatofuz parents: diff changeset	464 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	465
1d019706d866 LLVM10 anatofuz parents: diff changeset	466 void GPUDivergenceAnalysis::print(raw_ostream &OS, const Module *mod) const {
1d019706d866 LLVM10 anatofuz parents: diff changeset	467 OS << "Divergence of kernel " << DA.getFunction().getName() << " {\n";
1d019706d866 LLVM10 anatofuz parents: diff changeset	468 DA.print(OS, mod);
1d019706d866 LLVM10 anatofuz parents: diff changeset	469 OS << "}\n";
1d019706d866 LLVM10 anatofuz parents: diff changeset	470 }

Mercurial > hg > CbC > CbC_llvm

annotate llvm/lib/Analysis/DivergenceAnalysis.cpp @ 220:42394fc6a535