Members/tobaru/cbc/CbC_llvm: docs/GetElementPtr.rst annotate

annotate docs/GetElementPtr.rst @ 107:a03ddd01be7e

resolve warnings

author	Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
date	Sun, 31 Jan 2016 17:34:49 +0900
parents	afa8332a0e37
children	1172e4bd9c6f

rev	line source
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	1 =======================================
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	2 The Often Misunderstood GEP Instruction
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	3 =======================================
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	4
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	5 .. contents::
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	6 :local:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	7
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	8 Introduction
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	9 ============
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	10
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	11 This document seeks to dispel the mystery and confusion surrounding LLVM's
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	12 `GetElementPtr <LangRef.html#i_getelementptr>`_ (GEP) instruction. Questions
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	13 about the wily GEP instruction are probably the most frequently occurring
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	14 questions once a developer gets down to coding with LLVM. Here we lay out the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	15 sources of confusion and show that the GEP instruction is really quite simple.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	16
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	17 Address Computation
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	18 ===================
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	19
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	20 When people are first confronted with the GEP instruction, they tend to relate
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	21 it to known concepts from other programming paradigms, most notably C array
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	22 indexing and field selection. GEP closely resembles C array indexing and field
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	23 selection, however it is a little different and this leads to the following
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	24 questions.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	25
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	26 What is the first index of the GEP instruction?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	27 -----------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	28
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	29 Quick answer: The index stepping through the first operand.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	30
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	31 The confusion with the first index usually arises from thinking about the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	32 GetElementPtr instruction as if it was a C index operator. They aren't the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	33 same. For example, when we write, in "C":
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	34
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	35 .. code-block:: c++
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	36
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	37 AType *Foo;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	38 ...
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	39 X = &Foo->F;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	40
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	41 it is natural to think that there is only one index, the selection of the field
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	42 ``F``. However, in this example, ``Foo`` is a pointer. That pointer
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	43 must be indexed explicitly in LLVM. C, on the other hand, indices through it
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	44 transparently. To arrive at the same address location as the C code, you would
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	45 provide the GEP instruction with two index operands. The first operand indexes
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	46 through the pointer; the second operand indexes the field ``F`` of the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	47 structure, just as if you wrote:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	48
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	49 .. code-block:: c++
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	50
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	51 X = &Foo[0].F;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	52
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	53 Sometimes this question gets rephrased as:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	54
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	55 .. _GEP index through first pointer:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	56
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	57 *Why is it okay to index through the first pointer, but subsequent pointers
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	58 won't be dereferenced?*
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	59
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	60 The answer is simply because memory does not have to be accessed to perform the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	61 computation. The first operand to the GEP instruction must be a value of a
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	62 pointer type. The value of the pointer is provided directly to the GEP
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	63 instruction as an operand without any need for accessing memory. It must,
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	64 therefore be indexed and requires an index operand. Consider this example:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	65
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	66 .. code-block:: c++
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	67
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	68 struct munger_struct {
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	69 int f1;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	70 int f2;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	71 };
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	72 void munge(struct munger_struct *P) {
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	73 P[0].f1 = P[1].f1 + P[2].f2;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	74 }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	75 ...
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	76 munger_struct Array[3];
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	77 ...
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	78 munge(Array);
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	79
77 54457678186b LLVM 3.6 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	80 In this "C" example, the front end compiler (Clang) will generate three GEP
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	81 instructions for the three indices through "P" in the assignment statement. The
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	82 function argument ``P`` will be the first operand of each of these GEP
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	83 instructions. The second operand indexes through that pointer. The third
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	84 operand will be the field offset into the ``struct munger_struct`` type, for
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	85 either the ``f1`` or ``f2`` field. So, in LLVM assembly the ``munge`` function
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	86 looks like:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	87
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	88 .. code-block:: llvm
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	89
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	90 void %munge(%struct.munger_struct* %P) {
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	91 entry:
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	92 %tmp = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 1, i32 0
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	93 %tmp = load i32* %tmp
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	94 %tmp6 = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 2, i32 1
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	95 %tmp7 = load i32* %tmp6
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	96 %tmp8 = add i32 %tmp7, %tmp
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	97 %tmp9 = getelementptr %struct.munger_struct, %struct.munger_struct* %P, i32 0, i32 0
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	98 store i32 %tmp8, i32* %tmp9
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	99 ret void
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	100 }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	101
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	102 In each case the first operand is the pointer through which the GEP instruction
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	103 starts. The same is true whether the first operand is an argument, allocated
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	104 memory, or a global variable.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	105
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	106 To make this clear, let's consider a more obtuse example:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	107
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	108 .. code-block:: llvm
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	109
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	110 %MyVar = uninitialized global i32
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	111 ...
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	112 %idx1 = getelementptr i32, i32* %MyVar, i64 0
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	113 %idx2 = getelementptr i32, i32* %MyVar, i64 1
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	114 %idx3 = getelementptr i32, i32* %MyVar, i64 2
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	115
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	116 These GEP instructions are simply making address computations from the base
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	117 address of ``MyVar``. They compute, as follows (using C syntax):
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	118
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	119 .. code-block:: c++
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	120
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	121 idx1 = (char*) &MyVar + 0
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	122 idx2 = (char*) &MyVar + 4
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	123 idx3 = (char*) &MyVar + 8
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	124
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	125 Since the type ``i32`` is known to be four bytes long, the indices 0, 1 and 2
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	126 translate into memory offsets of 0, 4, and 8, respectively. No memory is
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	127 accessed to make these computations because the address of ``%MyVar`` is passed
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	128 directly to the GEP instructions.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	129
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	130 The obtuse part of this example is in the cases of ``%idx2`` and ``%idx3``. They
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	131 result in the computation of addresses that point to memory past the end of the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	132 ``%MyVar`` global, which is only one ``i32`` long, not three ``i32``\s long.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	133 While this is legal in LLVM, it is inadvisable because any load or store with
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	134 the pointer that results from these GEP instructions would produce undefined
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	135 results.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	136
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	137 Why is the extra 0 index required?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	138 ----------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	139
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	140 Quick answer: there are no superfluous indices.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	141
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	142 This question arises most often when the GEP instruction is applied to a global
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	143 variable which is always a pointer type. For example, consider this:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	144
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	145 .. code-block:: llvm
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	146
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	147 %MyStruct = uninitialized global { float*, i32 }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	148 ...
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	149 %idx = getelementptr { float, i32 }, { float, i32 }* %MyStruct, i64 0, i32 1
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	150
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	151 The GEP above yields an ``i32*`` by indexing the ``i32`` typed field of the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	152 structure ``%MyStruct``. When people first look at it, they wonder why the ``i64
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	153 0`` index is needed. However, a closer inspection of how globals and GEPs work
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	154 reveals the need. Becoming aware of the following facts will dispel the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	155 confusion:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	156
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	157 #. The type of ``%MyStruct`` is not ``{ float, i32 }`` but rather ``{ float,
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	158 i32 }*``. That is, ``%MyStruct`` is a pointer to a structure containing a
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	159 pointer to a ``float`` and an ``i32``.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	160
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	161 #. Point #1 is evidenced by noticing the type of the first operand of the GEP
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	162 instruction (``%MyStruct``) which is ``{ float, i32 }``.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	163
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	164 #. The first index, ``i64 0`` is required to step over the global variable
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	165 ``%MyStruct``. Since the first argument to the GEP instruction must always
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	166 be a value of pointer type, the first index steps through that pointer. A
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	167 value of 0 means 0 elements offset from that pointer.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	168
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	169 #. The second index, ``i32 1`` selects the second field of the structure (the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	170 ``i32``).
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	171
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	172 What is dereferenced by GEP?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	173 ----------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	174
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	175 Quick answer: nothing.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	176
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	177 The GetElementPtr instruction dereferences nothing. That is, it doesn't access
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	178 memory in any way. That's what the Load and Store instructions are for. GEP is
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	179 only involved in the computation of addresses. For example, consider this:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	180
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	181 .. code-block:: llvm
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	182
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	183 %MyVar = uninitialized global { [40 x i32 ]* }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	184 ...
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	185 %idx = getelementptr { [40 x i32]* }, { [40 x i32]* }* %MyVar, i64 0, i32 0, i64 0, i64 17
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	186
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	187 In this example, we have a global variable, ``%MyVar`` that is a pointer to a
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	188 structure containing a pointer to an array of 40 ints. The GEP instruction seems
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	189 to be accessing the 18th integer of the structure's array of ints. However, this
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	190 is actually an illegal GEP instruction. It won't compile. The reason is that the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	191 pointer in the structure must be dereferenced in order to index into the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	192 array of 40 ints. Since the GEP instruction never accesses memory, it is
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	193 illegal.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	194
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	195 In order to access the 18th integer in the array, you would need to do the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	196 following:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	197
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	198 .. code-block:: llvm
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	199
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	200 %idx = getelementptr { [40 x i32]* }, { [40 x i32]* }* %, i64 0, i32 0
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	201 %arr = load [40 x i32]** %idx
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	202 %idx = getelementptr [40 x i32], [40 x i32]* %arr, i64 0, i64 17
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	203
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	204 In this case, we have to load the pointer in the structure with a load
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	205 instruction before we can index into the array. If the example was changed to:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	206
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	207 .. code-block:: llvm
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	208
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	209 %MyVar = uninitialized global { [40 x i32 ] }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	210 ...
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	211 %idx = getelementptr { [40 x i32] }, { [40 x i32] }*, i64 0, i32 0, i64 17
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	212
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	213 then everything works fine. In this case, the structure does not contain a
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	214 pointer and the GEP instruction can index through the global variable, into the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	215 first field of the structure and access the 18th ``i32`` in the array there.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	216
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	217 Why don't GEP x,0,0,1 and GEP x,1 alias?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	218 ----------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	219
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	220 Quick Answer: They compute different address locations.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	221
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	222 If you look at the first indices in these GEP instructions you find that they
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	223 are different (0 and 1), therefore the address computation diverges with that
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	224 index. Consider this example:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	225
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	226 .. code-block:: llvm
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	227
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	228 %MyVar = global { [10 x i32] }
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	229 %idx1 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 0, i32 0, i64 1
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	230 %idx2 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	231
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	232 In this example, ``idx1`` computes the address of the second integer in the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	233 array that is in the structure in ``%MyVar``, that is ``MyVar+4``. The type of
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	234 ``idx1`` is ``i32``. However, ``idx2`` computes the address of the next*
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	235 structure after ``%MyVar``. The type of ``idx2`` is ``{ [10 x i32] }*`` and its
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	236 value is equivalent to ``MyVar + 40`` because it indexes past the ten 4-byte
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	237 integers in ``MyVar``. Obviously, in such a situation, the pointers don't
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	238 alias.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	239
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	240 Why do GEP x,1,0,0 and GEP x,1 alias?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	241 -------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	242
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	243 Quick Answer: They compute the same address location.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	244
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	245 These two GEP instructions will compute the same address because indexing
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	246 through the 0th element does not change the address. However, it does change the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	247 type. Consider this example:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	248
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	249 .. code-block:: llvm
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	250
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	251 %MyVar = global { [10 x i32] }
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	252 %idx1 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1, i32 0, i64 0
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 77 diff changeset	253 %idx2 = getelementptr { [10 x i32] }, { [10 x i32] }* %MyVar, i64 1
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	254
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	255 In this example, the value of ``%idx1`` is ``%MyVar+40`` and its type is
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	256 ``i32*``. The value of ``%idx2`` is also ``MyVar+40`` but its type is ``{ [10 x
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	257 i32] }*``.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	258
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	259 Can GEP index into vector elements?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	260 -----------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	261
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	262 This hasn't always been forcefully disallowed, though it's not recommended. It
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	263 leads to awkward special cases in the optimizers, and fundamental inconsistency
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	264 in the IR. In the future, it will probably be outright disallowed.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	265
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	266 What effect do address spaces have on GEPs?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	267 -------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	268
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	269 None, except that the address space qualifier on the first operand pointer type
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	270 always matches the address space qualifier on the result type.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	271
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	272 How is GEP different from ``ptrtoint``, arithmetic, and ``inttoptr``?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	273 ---------------------------------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	274
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	275 It's very similar; there are only subtle differences.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	276
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	277 With ptrtoint, you have to pick an integer type. One approach is to pick i64;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	278 this is safe on everything LLVM supports (LLVM internally assumes pointers are
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	279 never wider than 64 bits in many places), and the optimizer will actually narrow
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	280 the i64 arithmetic down to the actual pointer size on targets which don't
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	281 support 64-bit arithmetic in most cases. However, there are some cases where it
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	282 doesn't do this. With GEP you can avoid this problem.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	283
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	284 Also, GEP carries additional pointer aliasing rules. It's invalid to take a GEP
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	285 from one object, address into a different separately allocated object, and
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	286 dereference it. IR producers (front-ends) must follow this rule, and consumers
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	287 (optimizers, specifically alias analysis) benefit from being able to rely on
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	288 it. See the `Rules`_ section for more information.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	289
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	290 And, GEP is more concise in common cases.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	291
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	292 However, for the underlying integer computation implied, there is no
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	293 difference.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	294
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	295
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	296 I'm writing a backend for a target which needs custom lowering for GEP. How do I do this?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	297 -----------------------------------------------------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	298
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	299 You don't. The integer computation implied by a GEP is target-independent.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	300 Typically what you'll need to do is make your backend pattern-match expressions
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	301 trees involving ADD, MUL, etc., which are what GEP is lowered into. This has the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	302 advantage of letting your code work correctly in more cases.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	303
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	304 GEP does use target-dependent parameters for the size and layout of data types,
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	305 which targets can customize.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	306
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	307 If you require support for addressing units which are not 8 bits, you'll need to
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	308 fix a lot of code in the backend, with GEP lowering being only a small piece of
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	309 the overall picture.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	310
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	311 How does VLA addressing work with GEPs?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	312 ---------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	313
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	314 GEPs don't natively support VLAs. LLVM's type system is entirely static, and GEP
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	315 address computations are guided by an LLVM type.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	316
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	317 VLA indices can be implemented as linearized indices. For example, an expression
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	318 like ``X[a][b][c]``, must be effectively lowered into a form like
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	319 ``X[am+bn+c]``, so that it appears to the GEP as a single-dimensional array
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	320 reference.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	321
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	322 This means if you want to write an analysis which understands array indices and
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	323 you want to support VLAs, your code will have to be prepared to reverse-engineer
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	324 the linearization. One way to solve this problem is to use the ScalarEvolution
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	325 library, which always presents VLA and non-VLA indexing in the same manner.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	326
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	327 .. _Rules:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	328
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	329 Rules
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	330 =====
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	331
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	332 What happens if an array index is out of bounds?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	333 ------------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	334
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	335 There are two senses in which an array index can be out of bounds.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	336
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	337 First, there's the array type which comes from the (static) type of the first
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	338 operand to the GEP. Indices greater than the number of elements in the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	339 corresponding static array type are valid. There is no problem with out of
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	340 bounds indices in this sense. Indexing into an array only depends on the size of
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	341 the array element, not the number of elements.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	342
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	343 A common example of how this is used is arrays where the size is not known.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	344 It's common to use array types with zero length to represent these. The fact
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	345 that the static type says there are zero elements is irrelevant; it's perfectly
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	346 valid to compute arbitrary element indices, as the computation only depends on
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	347 the size of the array element, not the number of elements. Note that zero-sized
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	348 arrays are not a special case here.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	349
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	350 This sense is unconnected with ``inbounds`` keyword. The ``inbounds`` keyword is
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	351 designed to describe low-level pointer arithmetic overflow conditions, rather
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	352 than high-level array indexing rules.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	353
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	354 Analysis passes which wish to understand array indexing should not assume that
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	355 the static array type bounds are respected.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	356
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	357 The second sense of being out of bounds is computing an address that's beyond
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	358 the actual underlying allocated object.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	359
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	360 With the ``inbounds`` keyword, the result value of the GEP is undefined if the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	361 address is outside the actual underlying allocated object and not the address
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	362 one-past-the-end.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	363
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	364 Without the ``inbounds`` keyword, there are no restrictions on computing
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	365 out-of-bounds addresses. Obviously, performing a load or a store requires an
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	366 address of allocated and sufficiently aligned memory. But the GEP itself is only
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	367 concerned with computing addresses.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	368
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	369 Can array indices be negative?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	370 ------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	371
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	372 Yes. This is basically a special case of array indices being out of bounds.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	373
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	374 Can I compare two values computed with GEPs?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	375 --------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	376
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	377 Yes. If both addresses are within the same allocated object, or
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	378 one-past-the-end, you'll get the comparison result you expect. If either is
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	379 outside of it, integer arithmetic wrapping may occur, so the comparison may not
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	380 be meaningful.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	381
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	382 Can I do GEP with a different pointer type than the type of the underlying object?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	383 ----------------------------------------------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	384
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	385 Yes. There are no restrictions on bitcasting a pointer value to an arbitrary
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	386 pointer type. The types in a GEP serve only to define the parameters for the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	387 underlying integer computation. They need not correspond with the actual type of
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	388 the underlying object.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	389
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	390 Furthermore, loads and stores don't have to use the same types as the type of
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	391 the underlying object. Types in this context serve only to specify memory size
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	392 and alignment. Beyond that there are merely a hint to the optimizer indicating
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	393 how the value will likely be used.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	394
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	395 Can I cast an object's address to integer and add it to null?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	396 -------------------------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	397
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	398 You can compute an address that way, but if you use GEP to do the add, you can't
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	399 use that pointer to actually access the object, unless the object is managed
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	400 outside of LLVM.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	401
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	402 The underlying integer computation is sufficiently defined; null has a defined
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	403 value --- zero --- and you can add whatever value you want to it.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	404
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	405 However, it's invalid to access (load from or store to) an LLVM-aware object
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	406 with such a pointer. This includes ``GlobalVariables``, ``Allocas``, and objects
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	407 pointed to by noalias pointers.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	408
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	409 If you really need this functionality, you can do the arithmetic with explicit
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	410 integer instructions, and use inttoptr to convert the result to an address. Most
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	411 of GEP's special aliasing rules do not apply to pointers computed from ptrtoint,
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	412 arithmetic, and inttoptr sequences.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	413
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	414 Can I compute the distance between two objects, and add that value to one address to compute the other address?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	415 ---------------------------------------------------------------------------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	416
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	417 As with arithmetic on null, you can use GEP to compute an address that way, but
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	418 you can't use that pointer to actually access the object if you do, unless the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	419 object is managed outside of LLVM.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	420
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	421 Also as above, ptrtoint and inttoptr provide an alternative way to do this which
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	422 do not have this restriction.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	423
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	424 Can I do type-based alias analysis on LLVM IR?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	425 ----------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	426
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	427 You can't do type-based alias analysis using LLVM's built-in type system,
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	428 because LLVM has no restrictions on mixing types in addressing, loads or stores.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	429
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	430 LLVM's type-based alias analysis pass uses metadata to describe a different type
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	431 system (such as the C type system), and performs type-based aliasing on top of
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	432 that. Further details are in the `language reference <LangRef.html#tbaa>`_.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	433
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	434 What happens if a GEP computation overflows?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	435 --------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	436
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	437 If the GEP lacks the ``inbounds`` keyword, the value is the result from
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	438 evaluating the implied two's complement integer computation. However, since
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	439 there's no guarantee of where an object will be allocated in the address space,
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	440 such values have limited meaning.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	441
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	442 If the GEP has the ``inbounds`` keyword, the result value is undefined (a "trap
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	443 value") if the GEP overflows (i.e. wraps around the end of the address space).
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	444
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	445 As such, there are some ramifications of this for inbounds GEPs: scales implied
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	446 by array/vector/pointer indices are always known to be "nsw" since they are
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	447 signed values that are scaled by the element size. These values are also
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	448 allowed to be negative (e.g. "``gep i32 *%P, i32 -1``") but the pointer itself
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	449 is logically treated as an unsigned value. This means that GEPs have an
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	450 asymmetric relation between the pointer base (which is treated as unsigned) and
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	451 the offset applied to it (which is treated as signed). The result of the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	452 additions within the offset calculation cannot have signed overflow, but when
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	453 applied to the base pointer, there can be signed overflow.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	454
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	455 How can I tell if my front-end is following the rules?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	456 ------------------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	457
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	458 There is currently no checker for the getelementptr rules. Currently, the only
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	459 way to do this is to manually check each place in your front-end where
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	460 GetElementPtr operators are created.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	461
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	462 It's not possible to write a checker which could find all rule violations
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	463 statically. It would be possible to write a checker which works by instrumenting
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	464 the code with dynamic checks though. Alternatively, it would be possible to
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	465 write a static checker which catches a subset of possible problems. However, no
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	466 such checker exists today.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	467
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	468 Rationale
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	469 =========
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	470
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	471 Why is GEP designed this way?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	472 -----------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	473
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	474 The design of GEP has the following goals, in rough unofficial order of
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	475 priority:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	476
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	477 * Support C, C-like languages, and languages which can be conceptually lowered
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	478 into C (this covers a lot).
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	479
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	480 * Support optimizations such as those that are common in C compilers. In
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	481 particular, GEP is a cornerstone of LLVM's `pointer aliasing
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	482 model <LangRef.html#pointeraliasing>`_.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	483
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	484 * Provide a consistent method for computing addresses so that address
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	485 computations don't need to be a part of load and store instructions in the IR.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	486
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	487 * Support non-C-like languages, to the extent that it doesn't interfere with
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	488 other goals.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	489
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	490 * Minimize target-specific information in the IR.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	491
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	492 Why do struct member indices always use ``i32``?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	493 ------------------------------------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	494
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	495 The specific type i32 is probably just a historical artifact, however it's wide
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	496 enough for all practical purposes, so there's been no need to change it. It
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	497 doesn't necessarily imply i32 address arithmetic; it's just an identifier which
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	498 identifies a field in a struct. Requiring that all struct indices be the same
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	499 reduces the range of possibilities for cases where two GEPs are effectively the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	500 same but have distinct operand types.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	501
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	502 What's an uglygep?
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	503 ------------------
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	504
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	505 Some LLVM optimizers operate on GEPs by internally lowering them into more
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	506 primitive integer expressions, which allows them to be combined with other
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	507 integer expressions and/or split into multiple separate integer expressions. If
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	508 they've made non-trivial changes, translating back into LLVM IR can involve
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	509 reverse-engineering the structure of the addressing in order to fit it into the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	510 static type of the original first operand. It isn't always possibly to fully
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	511 reconstruct this structure; sometimes the underlying addressing doesn't
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	512 correspond with the static type at all. In such cases the optimizer instead will
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	513 emit a GEP with the base pointer casted to a simple address-unit pointer, using
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	514 the name "uglygep". This isn't pretty, but it's just as valid, and it's
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	515 sufficient to preserve the pointer aliasing guarantees that GEP provides.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	516
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	517 Summary
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	518 =======
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	519
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	520 In summary, here's some things to always remember about the GetElementPtr
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	521 instruction:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	522
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	523
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	524 #. The GEP instruction never accesses memory, it only provides pointer
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	525 computations.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	526
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	527 #. The first operand to the GEP instruction is always a pointer and it must be
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	528 indexed.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	529
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	530 #. There are no superfluous indices for the GEP instruction.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	531
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	532 #. Trailing zero indices are superfluous for pointer aliasing, but not for the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	533 types of the pointers.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	534
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	535 #. Leading zero indices are not superfluous for pointer aliasing nor the types
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	536 of the pointers.

Mercurial > hg > Members > tobaru > cbc > CbC_llvm

annotate docs/GetElementPtr.rst @ 107:a03ddd01be7e