annotate clang/docs/analyzer/developer-docs/InitializerLists.rst @ 173:0572611fdcc8 llvm10 llvm12

reorgnization done
author Shinji KONO <kono@ie.u-ryukyu.ac.jp>
date Mon, 25 May 2020 11:55:54 +0900
parents 1d019706d866
children c4bab56944e8
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
150
anatofuz
parents:
diff changeset
1 ================
anatofuz
parents:
diff changeset
2 Initializer List
anatofuz
parents:
diff changeset
3 ================
anatofuz
parents:
diff changeset
4 This discussion took place in https://reviews.llvm.org/D35216
anatofuz
parents:
diff changeset
5 "Escape symbols when creating std::initializer_list".
anatofuz
parents:
diff changeset
6
anatofuz
parents:
diff changeset
7 It touches problems of modelling C++ standard library constructs in general,
anatofuz
parents:
diff changeset
8 including modelling implementation-defined fields within C++ standard library
anatofuz
parents:
diff changeset
9 objects, in particular constructing objects into pointers held by such fields,
anatofuz
parents:
diff changeset
10 and separation of responsibilities between analyzer's core and checkers.
anatofuz
parents:
diff changeset
11
anatofuz
parents:
diff changeset
12 **Artem:**
anatofuz
parents:
diff changeset
13
anatofuz
parents:
diff changeset
14 I've seen a few false positives that appear because we construct
anatofuz
parents:
diff changeset
15 C++11 std::initializer_list objects with brace initializers, and such
anatofuz
parents:
diff changeset
16 construction is not properly modeled. For instance, if a new object is
anatofuz
parents:
diff changeset
17 constructed on the heap only to be put into a brace-initialized STL container,
anatofuz
parents:
diff changeset
18 the object is reported to be leaked.
anatofuz
parents:
diff changeset
19
anatofuz
parents:
diff changeset
20 Approach (0): This can be trivially fixed by this patch, which causes pointers
anatofuz
parents:
diff changeset
21 passed into initializer list expressions to immediately escape.
anatofuz
parents:
diff changeset
22
anatofuz
parents:
diff changeset
23 This fix is overly conservative though. So i did a bit of investigation as to
anatofuz
parents:
diff changeset
24 how model std::initializer_list better.
anatofuz
parents:
diff changeset
25
anatofuz
parents:
diff changeset
26 According to the standard, ``std::initializer_list<T>`` is an object that has
anatofuz
parents:
diff changeset
27 methods ``begin(), end(), and size()``, where ``begin()`` returns a pointer to continuous
anatofuz
parents:
diff changeset
28 array of ``size()`` objects of type T, and end() is equal to begin() plus size().
anatofuz
parents:
diff changeset
29 The standard does hint that it should be possible to implement
anatofuz
parents:
diff changeset
30 ``std::initializer_list<T>`` as a pair of pointers, or as a pointer and a size
anatofuz
parents:
diff changeset
31 integer, however specific fields that the object would contain are an
anatofuz
parents:
diff changeset
32 implementation detail.
anatofuz
parents:
diff changeset
33
anatofuz
parents:
diff changeset
34 Ideally, we should be able to model the initializer list's methods precisely.
anatofuz
parents:
diff changeset
35 Or, at least, it should be possible to explain to the analyzer that the list
anatofuz
parents:
diff changeset
36 somehow "takes hold" of the values put into it. Initializer lists can also be
anatofuz
parents:
diff changeset
37 copied, which is a separate story that i'm not trying to address here.
anatofuz
parents:
diff changeset
38
anatofuz
parents:
diff changeset
39 The obvious approach to modeling ``std::initializer_list`` in a checker would be to
anatofuz
parents:
diff changeset
40 construct a SymbolMetadata for the memory region of the initializer list object,
anatofuz
parents:
diff changeset
41 which would be of type ``T*`` and represent ``begin()``, so we'd trivially model ``begin()``
anatofuz
parents:
diff changeset
42 as a function that returns this symbol. The array pointed to by that symbol
anatofuz
parents:
diff changeset
43 would be ``bindLoc()``ed to contain the list's contents (probably as a ``CompoundVal``
anatofuz
parents:
diff changeset
44 to produce less bindings in the store). Extent of this array would represent
anatofuz
parents:
diff changeset
45 ``size()`` and would be equal to the length of the list as written.
anatofuz
parents:
diff changeset
46
anatofuz
parents:
diff changeset
47 So this sounds good, however apparently it does nothing to address our false
anatofuz
parents:
diff changeset
48 positives: when the list escapes, our ``RegionStoreManager`` is not magically
anatofuz
parents:
diff changeset
49 guessing that the metadata symbol attached to it, together with its contents,
anatofuz
parents:
diff changeset
50 should also escape. In fact, it's impossible to trigger a pointer escape from
anatofuz
parents:
diff changeset
51 within the checker.
anatofuz
parents:
diff changeset
52
anatofuz
parents:
diff changeset
53 Approach (1): If only we enabled ``ProgramState::bindLoc(..., notifyChanges=true)``
anatofuz
parents:
diff changeset
54 to cause pointer escapes (not only region changes) (which sounds like the right
anatofuz
parents:
diff changeset
55 thing to do anyway) such checker would be able to solve the false positives by
anatofuz
parents:
diff changeset
56 triggering escapes when binding list elements to the list. However, it'd be as
anatofuz
parents:
diff changeset
57 conservative as the current patch's solution. Ideally, we do not want escapes to
anatofuz
parents:
diff changeset
58 happen so early. Instead, we'd prefer them to be delayed until the list itself
anatofuz
parents:
diff changeset
59 escapes.
anatofuz
parents:
diff changeset
60
anatofuz
parents:
diff changeset
61 So i believe that escaping metadata symbols whenever their base regions escape
anatofuz
parents:
diff changeset
62 would be the right thing to do. Currently we didn't think about that because we
anatofuz
parents:
diff changeset
63 had neither pointer-type metadatas nor non-pointer escapes.
anatofuz
parents:
diff changeset
64
anatofuz
parents:
diff changeset
65 Approach (2): We could teach the Store to scan itself for bindings to
anatofuz
parents:
diff changeset
66 metadata-symbolic-based regions during scanReachableSymbols() whenever a region
anatofuz
parents:
diff changeset
67 turns out to be reachable. This requires no work on checker side, but it sounds
anatofuz
parents:
diff changeset
68 performance-heavy.
anatofuz
parents:
diff changeset
69
anatofuz
parents:
diff changeset
70 Approach (3): We could let checkers maintain the set of active metadata symbols
anatofuz
parents:
diff changeset
71 in the program state (ideally somewhere in the Store, which sounds weird but
anatofuz
parents:
diff changeset
72 causes the smallest amount of layering violations), so that the core knew what
anatofuz
parents:
diff changeset
73 to escape. This puts a stress on the checkers, but with a smart data map it
anatofuz
parents:
diff changeset
74 wouldn't be a problem.
anatofuz
parents:
diff changeset
75
anatofuz
parents:
diff changeset
76 Approach (4): We could allow checkers to trigger pointer escapes in arbitrary
anatofuz
parents:
diff changeset
77 moments. If we allow doing this within ``checkPointerEscape`` callback itself, we
anatofuz
parents:
diff changeset
78 would be able to express facts like "when this region escapes, that metadata
anatofuz
parents:
diff changeset
79 symbol attached to it should also escape". This sounds like an ultimate freedom,
anatofuz
parents:
diff changeset
80 with maximum stress on the checkers - still not too much stress when we have
anatofuz
parents:
diff changeset
81 smart data maps.
anatofuz
parents:
diff changeset
82
anatofuz
parents:
diff changeset
83 I'm personally liking the approach (2) - it should be possible to avoid
anatofuz
parents:
diff changeset
84 performance overhead, and clarity seems nice.
anatofuz
parents:
diff changeset
85
anatofuz
parents:
diff changeset
86 **Gabor:**
anatofuz
parents:
diff changeset
87
anatofuz
parents:
diff changeset
88 At this point, I am a bit wondering about two questions.
anatofuz
parents:
diff changeset
89
anatofuz
parents:
diff changeset
90 * When should something belong to a checker and when should something belong to the engine?
anatofuz
parents:
diff changeset
91 Sometimes we model library aspects in the engine and model language constructs in checkers.
anatofuz
parents:
diff changeset
92
anatofuz
parents:
diff changeset
93 * What is the checker programming model that we are aiming for? Maximum freedom or more easy checker development?
anatofuz
parents:
diff changeset
94
anatofuz
parents:
diff changeset
95 I think if we aim for maximum freedom, we do not need to worry about the
anatofuz
parents:
diff changeset
96 potential stress on checkers, and we can introduce abstractions to mitigate that
anatofuz
parents:
diff changeset
97 later on.
anatofuz
parents:
diff changeset
98 If we want to simplify the API, then maybe it makes more sense to move language
anatofuz
parents:
diff changeset
99 construct modeling to the engine when the checker API is not sufficient instead
anatofuz
parents:
diff changeset
100 of complicating the API.
anatofuz
parents:
diff changeset
101
anatofuz
parents:
diff changeset
102 Right now I have no preference or objections between the alternatives but there
anatofuz
parents:
diff changeset
103 are some random thoughts:
anatofuz
parents:
diff changeset
104
anatofuz
parents:
diff changeset
105 * Maybe it would be great to have a guideline how to evolve the analyzer and
anatofuz
parents:
diff changeset
106 follow it, so it can help us to decide in similar situations
anatofuz
parents:
diff changeset
107
anatofuz
parents:
diff changeset
108 * I do care about performance in this case. The reason is that we have a
anatofuz
parents:
diff changeset
109 limited performance budget. And I think we should not expect most of the checker
anatofuz
parents:
diff changeset
110 writers to add modeling of language constructs. So, in my opinion, it is ok to
anatofuz
parents:
diff changeset
111 have less nice/more verbose API for language modeling if we can have better
anatofuz
parents:
diff changeset
112 performance this way, since it only needs to be done once, and is done by the
anatofuz
parents:
diff changeset
113 framework developers.
anatofuz
parents:
diff changeset
114
anatofuz
parents:
diff changeset
115 **Artem:** These are some great questions, i guess it'd be better to discuss
anatofuz
parents:
diff changeset
116 them more openly. As a quick dump of my current mood:
anatofuz
parents:
diff changeset
117
anatofuz
parents:
diff changeset
118 * To me it seems obvious that we need to aim for a checker API that is both
anatofuz
parents:
diff changeset
119 simple and powerful. This can probably by keeping the API as powerful as
anatofuz
parents:
diff changeset
120 necessary while providing a layer of simple ready-made solutions on top of it.
anatofuz
parents:
diff changeset
121 Probably a few reusable components for assembling checkers. And this layer
anatofuz
parents:
diff changeset
122 should ideally be pleasant enough to work with, so that people would prefer to
anatofuz
parents:
diff changeset
123 extend it when something is lacking, instead of falling back to the complex
anatofuz
parents:
diff changeset
124 omnipotent API. I'm thinking of AST matchers vs. AST visitors as a roughly
anatofuz
parents:
diff changeset
125 similar situation: matchers are not omnipotent, but they're so nice.
anatofuz
parents:
diff changeset
126
anatofuz
parents:
diff changeset
127 * Separation between core and checkers is usually quite strange. Once we have
anatofuz
parents:
diff changeset
128 shared state traits, i generally wouldn't mind having region store or range
anatofuz
parents:
diff changeset
129 constraint manager as checkers (though it's probably not worth it to transform
anatofuz
parents:
diff changeset
130 them - just a mood). The main thing to avoid here would be the situation when
anatofuz
parents:
diff changeset
131 the checker overwrites stuff written by the core because it thinks it has a
anatofuz
parents:
diff changeset
132 better idea what's going on, so the core should provide a good default behavior.
anatofuz
parents:
diff changeset
133
anatofuz
parents:
diff changeset
134 * Yeah, i totally care about performance as well, and if i try to implement
anatofuz
parents:
diff changeset
135 approach, i'd make sure it's good.
anatofuz
parents:
diff changeset
136
anatofuz
parents:
diff changeset
137 **Artem:**
anatofuz
parents:
diff changeset
138
anatofuz
parents:
diff changeset
139 > Approach (2): We could teach the Store to scan itself for bindings to
anatofuz
parents:
diff changeset
140 > metadata-symbolic-based regions during scanReachableSymbols() whenever
anatofuz
parents:
diff changeset
141 > a region turns out to be reachable. This requires no work on checker side,
anatofuz
parents:
diff changeset
142 > but it sounds performance-heavy.
anatofuz
parents:
diff changeset
143
anatofuz
parents:
diff changeset
144 Nope, this approach is wrong. Metadata symbols may become out-of-date: when the
anatofuz
parents:
diff changeset
145 object changes, metadata symbols attached to it aren't changing (because symbols
anatofuz
parents:
diff changeset
146 simply don't change). The same metadata may have different symbols to denote its
anatofuz
parents:
diff changeset
147 value in different moments of time, but at most one of them represents the
anatofuz
parents:
diff changeset
148 actual metadata value. So we'd be escaping more stuff than necessary.
anatofuz
parents:
diff changeset
149
anatofuz
parents:
diff changeset
150 If only we had "ghost fields"
anatofuz
parents:
diff changeset
151 (https://lists.llvm.org/pipermail/cfe-dev/2016-May/049000.html), it would have
anatofuz
parents:
diff changeset
152 been much easier, because the ghost field would only contain the actual
anatofuz
parents:
diff changeset
153 metadata, and the Store would always know about it. This example adds to my
anatofuz
parents:
diff changeset
154 belief that ghost fields are exactly what we need for most C++ checkers.
anatofuz
parents:
diff changeset
155
anatofuz
parents:
diff changeset
156 **Devin:**
anatofuz
parents:
diff changeset
157
anatofuz
parents:
diff changeset
158 In this case, I would be fine with some sort of
anatofuz
parents:
diff changeset
159 AbstractStorageMemoryRegion that meant "here is a memory region and somewhere
anatofuz
parents:
diff changeset
160 reachable from here exists another region of type T". Or even multiple regions
anatofuz
parents:
diff changeset
161 with different identifiers. This wouldn't specify how the memory is reachable,
anatofuz
parents:
diff changeset
162 but it would allow for transfer functions to get at those regions and it would
anatofuz
parents:
diff changeset
163 allow for invalidation.
anatofuz
parents:
diff changeset
164
anatofuz
parents:
diff changeset
165 For ``std::initializer_list`` this reachable region would the region for the backing
anatofuz
parents:
diff changeset
166 array and the transfer functions for begin() and end() yield the beginning and
anatofuz
parents:
diff changeset
167 end element regions for it.
anatofuz
parents:
diff changeset
168
anatofuz
parents:
diff changeset
169 In my view this differs from ghost variables in that (1) this storage does
anatofuz
parents:
diff changeset
170 actually exist (it is just a library implementation detail where that storage
anatofuz
parents:
diff changeset
171 lives) and (2) it is perfectly valid for a pointer into that storage to be
anatofuz
parents:
diff changeset
172 returned and for another part of the program to read or write from that storage.
anatofuz
parents:
diff changeset
173 (Well, in this case just read since it is allowed to be read-only memory).
anatofuz
parents:
diff changeset
174
anatofuz
parents:
diff changeset
175 What I'm not OK with is modeling abstract analysis state (for example, the count
anatofuz
parents:
diff changeset
176 of a NSMutableArray or the typestate of a file handle) as a value stored in some
anatofuz
parents:
diff changeset
177 ginned up region in the store. This takes an easy problem that the analyzer does
anatofuz
parents:
diff changeset
178 well at (modeling typestate) and turns it into a hard one that the analyzer is
anatofuz
parents:
diff changeset
179 bad at (reasoning about the contents of the heap).
anatofuz
parents:
diff changeset
180
anatofuz
parents:
diff changeset
181 I think the key criterion here is: "is the region accessible from outside the
anatofuz
parents:
diff changeset
182 library". That is, does the library expose the region as a pointer that can be
anatofuz
parents:
diff changeset
183 read to or written from in the client program? If so, then it makes sense for
anatofuz
parents:
diff changeset
184 this to be in the store: we are modeling reachable storage as storage. But if
anatofuz
parents:
diff changeset
185 we're just modeling arbitrary analysis facts that need to be invalidated when a
anatofuz
parents:
diff changeset
186 pointer escapes then we shouldn't try to gin up storage for them just to get
anatofuz
parents:
diff changeset
187 invalidation for free.
anatofuz
parents:
diff changeset
188
anatofuz
parents:
diff changeset
189 **Artem:**
anatofuz
parents:
diff changeset
190
anatofuz
parents:
diff changeset
191 > In this case, I would be fine with some sort of ``AbstractStorageMemoryRegion``
anatofuz
parents:
diff changeset
192 > that meant "here is a memory region and somewhere reachable from here exists
anatofuz
parents:
diff changeset
193 > another region of type T". Or even multiple regions with different
anatofuz
parents:
diff changeset
194 > identifiers. This wouldn't specify how the memory is reachable, but it would
anatofuz
parents:
diff changeset
195 > allow for transfer functions to get at those regions and it would allow for
anatofuz
parents:
diff changeset
196 > invalidation.
anatofuz
parents:
diff changeset
197
anatofuz
parents:
diff changeset
198 Yeah, this is what we can easily implement now as a
anatofuz
parents:
diff changeset
199 symbolic-region-based-on-a-metadata-symbol (though we can make a new region
anatofuz
parents:
diff changeset
200 class for that if we eg. want it typed). The problem is that the relation
anatofuz
parents:
diff changeset
201 between such storage region and its parent object region is essentially
anatofuz
parents:
diff changeset
202 immaterial, similarly to the relation between ``SymbolRegionValue`` and its parent
anatofuz
parents:
diff changeset
203 region. Region contents are mutable: today the abstract storage is reachable
anatofuz
parents:
diff changeset
204 from its parent object, tomorrow it's not, and maybe something else becomes
anatofuz
parents:
diff changeset
205 reachable, something that isn't even abstract. So the parent region for the
anatofuz
parents:
diff changeset
206 abstract storage is most of the time at best a "nice to know" thing - we cannot
anatofuz
parents:
diff changeset
207 rely on it to do any actual work. We'd anyway need to rely on the checker to do
anatofuz
parents:
diff changeset
208 the job.
anatofuz
parents:
diff changeset
209
anatofuz
parents:
diff changeset
210 > For std::initializer_list this reachable region would the region for the
anatofuz
parents:
diff changeset
211 > backing array and the transfer functions for begin() and end() yield the
anatofuz
parents:
diff changeset
212 > beginning and end element regions for it.
anatofuz
parents:
diff changeset
213
anatofuz
parents:
diff changeset
214 So maybe in fact for std::initializer_list it may work fine because you cannot
anatofuz
parents:
diff changeset
215 change the data after the object is constructed - so this region's contents are
anatofuz
parents:
diff changeset
216 essentially immutable. For the future, i feel as if it is a dead end.
anatofuz
parents:
diff changeset
217
anatofuz
parents:
diff changeset
218 I'd like to consider another funny example. Suppose we're trying to model
anatofuz
parents:
diff changeset
219
anatofuz
parents:
diff changeset
220 .. code-block:: cpp
anatofuz
parents:
diff changeset
221
anatofuz
parents:
diff changeset
222 std::unique_ptr. Consider::
anatofuz
parents:
diff changeset
223
anatofuz
parents:
diff changeset
224 void bar(const std::unique_ptr<int> &x);
anatofuz
parents:
diff changeset
225
anatofuz
parents:
diff changeset
226 void foo(std::unique_ptr<int> &x) {
anatofuz
parents:
diff changeset
227 int *a = x.get(); // (a, 0, direct): &AbstractStorageRegion
anatofuz
parents:
diff changeset
228 *a = 1; // (AbstractStorageRegion, 0, direct): 1 S32b
anatofuz
parents:
diff changeset
229 int *b = new int;
anatofuz
parents:
diff changeset
230 *b = 2; // (SymRegion{conj_$0<int *>}, 0 ,direct): 2 S32b
anatofuz
parents:
diff changeset
231 x.reset(b); // Checker map: x -> SymRegion{conj_$0<int *>}
anatofuz
parents:
diff changeset
232 bar(x); // 'a' doesn't escape (the pointer was unique), 'b' does.
anatofuz
parents:
diff changeset
233 clang_analyzer_eval(*a == 1); // Making this true is up to the checker.
anatofuz
parents:
diff changeset
234 clang_analyzer_eval(*b == 2); // Making this unknown is up to the checker.
anatofuz
parents:
diff changeset
235 }
anatofuz
parents:
diff changeset
236
anatofuz
parents:
diff changeset
237 The checker doesn't totally need to ensure that ``*a == 1`` passes - even though the
anatofuz
parents:
diff changeset
238 pointer was unique, it could theoretically have ``.get()``-ed above and the code
anatofuz
parents:
diff changeset
239 could of course break the uniqueness invariant (though we'd probably want it).
anatofuz
parents:
diff changeset
240 The checker can say that "even if ``*a`` did escape, it was not because it was
anatofuz
parents:
diff changeset
241 stuffed directly into bar()".
anatofuz
parents:
diff changeset
242
anatofuz
parents:
diff changeset
243 The checker's direct responsibility, however, is to solve the ``*b == 2`` thing
anatofuz
parents:
diff changeset
244 (which is in fact the problem we're dealing with in this patch - escaping the
anatofuz
parents:
diff changeset
245 storage region of the object).
anatofuz
parents:
diff changeset
246
anatofuz
parents:
diff changeset
247 So we're talking about one more operation over the program state (scanning
anatofuz
parents:
diff changeset
248 reachable symbols and regions) that cannot work without checker support.
anatofuz
parents:
diff changeset
249
anatofuz
parents:
diff changeset
250 We can probably add a new callback "checkReachableSymbols" to solve this. This
anatofuz
parents:
diff changeset
251 is in fact also related to the dead symbols problem (we're scanning for live
anatofuz
parents:
diff changeset
252 symbols in the store and in the checkers separately, but we need to do so
anatofuz
parents:
diff changeset
253 simultaneously with a single worklist). Hmm, in fact this sounds like a good
anatofuz
parents:
diff changeset
254 idea; we can replace checkLiveSymbols with checkReachableSymbols.
anatofuz
parents:
diff changeset
255
anatofuz
parents:
diff changeset
256 Or we could just have ghost member variables, and no checker support required at
anatofuz
parents:
diff changeset
257 all. For ghost member variables, the relation with their parent region (which
anatofuz
parents:
diff changeset
258 would be their superregion) is actually useful, the mutability of their contents
anatofuz
parents:
diff changeset
259 is expressed naturally, and the store automagically sees reachable symbols, live
anatofuz
parents:
diff changeset
260 symbols, escapes, invalidations, whatever.
anatofuz
parents:
diff changeset
261
anatofuz
parents:
diff changeset
262 > In my view this differs from ghost variables in that (1) this storage does
anatofuz
parents:
diff changeset
263 > actually exist (it is just a library implementation detail where that storage
anatofuz
parents:
diff changeset
264 > lives) and (2) it is perfectly valid for a pointer into that storage to be
anatofuz
parents:
diff changeset
265 > returned and for another part of the program to read or write from that
anatofuz
parents:
diff changeset
266 > storage. (Well, in this case just read since it is allowed to be read-only
anatofuz
parents:
diff changeset
267 > memory).
anatofuz
parents:
diff changeset
268
anatofuz
parents:
diff changeset
269 > What I'm not OK with is modeling abstract analysis state (for example, the
anatofuz
parents:
diff changeset
270 > count of a NSMutableArray or the typestate of a file handle) as a value stored
anatofuz
parents:
diff changeset
271 > in some ginned up region in the store.This takes an easy problem that the
anatofuz
parents:
diff changeset
272 > analyzer does well at (modeling typestate) and turns it into a hard one that
anatofuz
parents:
diff changeset
273 > the analyzer is bad at (reasoning about the contents of the heap).
anatofuz
parents:
diff changeset
274
anatofuz
parents:
diff changeset
275 Yeah, i tend to agree on that. For simple typestates, this is probably an
anatofuz
parents:
diff changeset
276 overkill, so let's definitely put aside the idea of "ghost symbolic regions"
anatofuz
parents:
diff changeset
277 that i had earlier.
anatofuz
parents:
diff changeset
278
anatofuz
parents:
diff changeset
279 But, to summarize a bit, in our current case, however, the typestate we're
anatofuz
parents:
diff changeset
280 looking for is the contents of the heap. And when we try to model such
anatofuz
parents:
diff changeset
281 typestates (complex in this specific manner, i.e. heap-like) in any checker, we
anatofuz
parents:
diff changeset
282 have a choice between re-doing this modeling in every such checker (which is
anatofuz
parents:
diff changeset
283 something analyzer is indeed good at, but at a price of making checkers heavy)
anatofuz
parents:
diff changeset
284 or instead relying on the Store to do exactly what it's designed to do.
anatofuz
parents:
diff changeset
285
anatofuz
parents:
diff changeset
286 > I think the key criterion here is: "is the region accessible from outside
anatofuz
parents:
diff changeset
287 > the library". That is, does the library expose the region as a pointer that
anatofuz
parents:
diff changeset
288 > can be read to or written from in the client program? If so, then it makes
anatofuz
parents:
diff changeset
289 > sense for this to be in the store: we are modeling reachable storage as
anatofuz
parents:
diff changeset
290 > storage. But if we're just modeling arbitrary analysis facts that need to be
anatofuz
parents:
diff changeset
291 > invalidated when a pointer escapes then we shouldn't try to gin up storage
anatofuz
parents:
diff changeset
292 > for them just to get invalidation for free.
anatofuz
parents:
diff changeset
293
anatofuz
parents:
diff changeset
294 As a metaphor, i'd probably compare it to body farms - the difference between
anatofuz
parents:
diff changeset
295 ghost member variables and metadata symbols seems to me like the difference
anatofuz
parents:
diff changeset
296 between body farms and evalCall. Both are nice to have, and body farms are very
anatofuz
parents:
diff changeset
297 pleasant to work with, even if not omnipotent. I think it's fine for a
anatofuz
parents:
diff changeset
298 FunctionDecl's body in a body farm to have a local variable, even if such
anatofuz
parents:
diff changeset
299 variable doesn't actually exist, even if it cannot be seen from outside the
anatofuz
parents:
diff changeset
300 function call. I'm not seeing immediate practical difference between "it does
anatofuz
parents:
diff changeset
301 actually exist" and "it doesn't actually exist, just a handy abstraction".
anatofuz
parents:
diff changeset
302 Similarly, i think it's fine if we have a ``CXXRecordDecl`` with
anatofuz
parents:
diff changeset
303 implementation-defined contents, and try to farm up a member variable as a handy
anatofuz
parents:
diff changeset
304 abstraction (we don't even need to know its name or offset, only that it's there
anatofuz
parents:
diff changeset
305 somewhere).
anatofuz
parents:
diff changeset
306
anatofuz
parents:
diff changeset
307 **Artem:**
anatofuz
parents:
diff changeset
308
anatofuz
parents:
diff changeset
309 We've discussed it in person with Devin, and he provided more points to think
anatofuz
parents:
diff changeset
310 about:
anatofuz
parents:
diff changeset
311
anatofuz
parents:
diff changeset
312 * If the initializer list consists of non-POD data, constructors of list's
anatofuz
parents:
diff changeset
313 objects need to take the sub-region of the list's region as this-region In the
anatofuz
parents:
diff changeset
314 current (v2) version of this patch, these objects are constructed elsewhere and
anatofuz
parents:
diff changeset
315 then trivial-copied into the list's metadata pointer region, which may be
anatofuz
parents:
diff changeset
316 incorrect. This is our overall problem with C++ constructors, which manifests in
anatofuz
parents:
diff changeset
317 this case as well. Additionally, objects would need to be constructed in the
anatofuz
parents:
diff changeset
318 analyzer's core, which would not be able to predict that it needs to take a
anatofuz
parents:
diff changeset
319 checker-specific region as this-region, which makes it harder, though it might
anatofuz
parents:
diff changeset
320 be mitigated by sharing the checker state traits.
anatofuz
parents:
diff changeset
321
anatofuz
parents:
diff changeset
322 * Because "ghost variables" are not material to the user, we need to somehow
anatofuz
parents:
diff changeset
323 make super sure that they don't make it into the diagnostic messages.
anatofuz
parents:
diff changeset
324
anatofuz
parents:
diff changeset
325 So, because this needs further digging into overall C++ support and rises too
anatofuz
parents:
diff changeset
326 many questions, i'm delaying a better approach to this problem and will fall
anatofuz
parents:
diff changeset
327 back to the original trivial patch.