Mercurial > hg > CbC > CbC_llvm
comparison clang/docs/LibASTMatchersTutorial.rst @ 150:1d019706d866
LLVM10
author | anatofuz |
---|---|
date | Thu, 13 Feb 2020 15:10:13 +0900 |
parents | |
children | 2e18cbf3894f |
comparison
equal
deleted
inserted
replaced
147:c2174574ed3a | 150:1d019706d866 |
---|---|
1 =============================================================== | |
2 Tutorial for building tools using LibTooling and LibASTMatchers | |
3 =============================================================== | |
4 | |
5 This document is intended to show how to build a useful source-to-source | |
6 translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is | |
7 explicitly aimed at people who are new to Clang, so all you should need | |
8 is a working knowledge of C++ and the command line. | |
9 | |
10 In order to work on the compiler, you need some basic knowledge of the | |
11 abstract syntax tree (AST). To this end, the reader is encouraged to | |
12 skim the :doc:`Introduction to the Clang | |
13 AST <IntroductionToTheClangAST>` | |
14 | |
15 Step 0: Obtaining Clang | |
16 ======================= | |
17 | |
18 As Clang is part of the LLVM project, you'll need to download LLVM's | |
19 source code first. Both Clang and LLVM are in the same git repository, | |
20 under different directories. For further information, see the `getting | |
21 started guide <https://llvm.org/docs/GettingStarted.html>`_. | |
22 | |
23 .. code-block:: console | |
24 | |
25 cd ~/clang-llvm | |
26 git clone https://github.com/llvm/llvm-project.git | |
27 | |
28 Next you need to obtain the CMake build system and Ninja build tool. | |
29 | |
30 .. code-block:: console | |
31 | |
32 cd ~/clang-llvm | |
33 git clone https://github.com/martine/ninja.git | |
34 cd ninja | |
35 git checkout release | |
36 ./bootstrap.py | |
37 sudo cp ninja /usr/bin/ | |
38 | |
39 cd ~/clang-llvm | |
40 git clone git://cmake.org/stage/cmake.git | |
41 cd cmake | |
42 git checkout next | |
43 ./bootstrap | |
44 make | |
45 sudo make install | |
46 | |
47 Okay. Now we'll build Clang! | |
48 | |
49 .. code-block:: console | |
50 | |
51 cd ~/clang-llvm | |
52 mkdir build && cd build | |
53 cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra" -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. | |
54 ninja | |
55 ninja check # Test LLVM only. | |
56 ninja clang-test # Test Clang only. | |
57 ninja install | |
58 | |
59 And we're live. | |
60 | |
61 All of the tests should pass. | |
62 | |
63 Finally, we want to set Clang as its own compiler. | |
64 | |
65 .. code-block:: console | |
66 | |
67 cd ~/clang-llvm/build | |
68 ccmake ../llvm | |
69 | |
70 The second command will bring up a GUI for configuring Clang. You need | |
71 to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on | |
72 advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to | |
73 ``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to | |
74 configure, then ``'g'`` to generate CMake's files. | |
75 | |
76 Finally, run ninja one last time, and you're done. | |
77 | |
78 Step 1: Create a ClangTool | |
79 ========================== | |
80 | |
81 Now that we have enough background knowledge, it's time to create the | |
82 simplest productive ClangTool in existence: a syntax checker. While this | |
83 already exists as ``clang-check``, it's important to understand what's | |
84 going on. | |
85 | |
86 First, we'll need to create a new directory for our tool and tell CMake | |
87 that it exists. As this is not going to be a core clang tool, it will | |
88 live in the ``clang-tools-extra`` repository. | |
89 | |
90 .. code-block:: console | |
91 | |
92 cd ~/clang-llvm | |
93 mkdir clang-tools-extra/loop-convert | |
94 echo 'add_subdirectory(loop-convert)' >> clang-tools-extra/CMakeLists.txt | |
95 vim clang-tools-extra/loop-convert/CMakeLists.txt | |
96 | |
97 CMakeLists.txt should have the following contents: | |
98 | |
99 :: | |
100 | |
101 set(LLVM_LINK_COMPONENTS support) | |
102 | |
103 add_clang_executable(loop-convert | |
104 LoopConvert.cpp | |
105 ) | |
106 target_link_libraries(loop-convert | |
107 PRIVATE | |
108 clangTooling | |
109 clangBasic | |
110 clangASTMatchers | |
111 ) | |
112 | |
113 With that done, Ninja will be able to compile our tool. Let's give it | |
114 something to compile! Put the following into | |
115 ``clang-tools-extra/loop-convert/LoopConvert.cpp``. A detailed explanation of | |
116 why the different parts are needed can be found in the `LibTooling | |
117 documentation <LibTooling.html>`_. | |
118 | |
119 .. code-block:: c++ | |
120 | |
121 // Declares clang::SyntaxOnlyAction. | |
122 #include "clang/Frontend/FrontendActions.h" | |
123 #include "clang/Tooling/CommonOptionsParser.h" | |
124 #include "clang/Tooling/Tooling.h" | |
125 // Declares llvm::cl::extrahelp. | |
126 #include "llvm/Support/CommandLine.h" | |
127 | |
128 using namespace clang::tooling; | |
129 using namespace llvm; | |
130 | |
131 // Apply a custom category to all command-line options so that they are the | |
132 // only ones displayed. | |
133 static llvm::cl::OptionCategory MyToolCategory("my-tool options"); | |
134 | |
135 // CommonOptionsParser declares HelpMessage with a description of the common | |
136 // command-line options related to the compilation database and input files. | |
137 // It's nice to have this help message in all tools. | |
138 static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); | |
139 | |
140 // A help message for this specific tool can be added afterwards. | |
141 static cl::extrahelp MoreHelp("\nMore help text...\n"); | |
142 | |
143 int main(int argc, const char **argv) { | |
144 CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); | |
145 ClangTool Tool(OptionsParser.getCompilations(), | |
146 OptionsParser.getSourcePathList()); | |
147 return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>().get()); | |
148 } | |
149 | |
150 And that's it! You can compile our new tool by running ninja from the | |
151 ``build`` directory. | |
152 | |
153 .. code-block:: console | |
154 | |
155 cd ~/clang-llvm/build | |
156 ninja | |
157 | |
158 You should now be able to run the syntax checker, which is located in | |
159 ``~/clang-llvm/build/bin``, on any source file. Try it! | |
160 | |
161 .. code-block:: console | |
162 | |
163 echo "int main() { return 0; }" > test.cpp | |
164 bin/loop-convert test.cpp -- | |
165 | |
166 Note the two dashes after we specify the source file. The additional | |
167 options for the compiler are passed after the dashes rather than loading | |
168 them from a compilation database - there just aren't any options needed | |
169 right now. | |
170 | |
171 Intermezzo: Learn AST matcher basics | |
172 ==================================== | |
173 | |
174 Clang recently introduced the :doc:`ASTMatcher | |
175 library <LibASTMatchers>` to provide a simple, powerful, and | |
176 concise way to describe specific patterns in the AST. Implemented as a | |
177 DSL powered by macros and templates (see | |
178 `ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're | |
179 curious), matchers offer the feel of algebraic data types common to | |
180 functional programming languages. | |
181 | |
182 For example, suppose you wanted to examine only binary operators. There | |
183 is a matcher to do exactly that, conveniently named ``binaryOperator``. | |
184 I'll give you one guess what this matcher does: | |
185 | |
186 .. code-block:: c++ | |
187 | |
188 binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) | |
189 | |
190 Shockingly, it will match against addition expressions whose left hand | |
191 side is exactly the literal 0. It will not match against other forms of | |
192 0, such as ``'\0'`` or ``NULL``, but it will match against macros that | |
193 expand to 0. The matcher will also not match against calls to the | |
194 overloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` | |
195 matcher to handle overloaded operators. | |
196 | |
197 There are AST matchers to match all the different nodes of the AST, | |
198 narrowing matchers to only match AST nodes fulfilling specific criteria, | |
199 and traversal matchers to get from one kind of AST node to another. For | |
200 a complete list of AST matchers, take a look at the `AST Matcher | |
201 References <LibASTMatchersReference.html>`_ | |
202 | |
203 All matcher that are nouns describe entities in the AST and can be | |
204 bound, so that they can be referred to whenever a match is found. To do | |
205 so, simply call the method ``bind`` on these matchers, e.g.: | |
206 | |
207 .. code-block:: c++ | |
208 | |
209 variable(hasType(isInteger())).bind("intvar") | |
210 | |
211 Step 2: Using AST matchers | |
212 ========================== | |
213 | |
214 Okay, on to using matchers for real. Let's start by defining a matcher | |
215 which will capture all ``for`` statements that define a new variable | |
216 initialized to zero. Let's start with matching all ``for`` loops: | |
217 | |
218 .. code-block:: c++ | |
219 | |
220 forStmt() | |
221 | |
222 Next, we want to specify that a single variable is declared in the first | |
223 portion of the loop, so we can extend the matcher to | |
224 | |
225 .. code-block:: c++ | |
226 | |
227 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) | |
228 | |
229 Finally, we can add the condition that the variable is initialized to | |
230 zero. | |
231 | |
232 .. code-block:: c++ | |
233 | |
234 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( | |
235 hasInitializer(integerLiteral(equals(0)))))))) | |
236 | |
237 It is fairly easy to read and understand the matcher definition ("match | |
238 loops whose init portion declares a single variable which is initialized | |
239 to the integer literal 0"), but deciding that every piece is necessary | |
240 is more difficult. Note that this matcher will not match loops whose | |
241 variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of | |
242 zero besides the integer 0. | |
243 | |
244 The last step is giving the matcher a name and binding the ``ForStmt`` | |
245 as we will want to do something with it: | |
246 | |
247 .. code-block:: c++ | |
248 | |
249 StatementMatcher LoopMatcher = | |
250 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( | |
251 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); | |
252 | |
253 Once you have defined your matchers, you will need to add a little more | |
254 scaffolding in order to run them. Matchers are paired with a | |
255 ``MatchCallback`` and registered with a ``MatchFinder`` object, then run | |
256 from a ``ClangTool``. More code! | |
257 | |
258 Add the following to ``LoopConvert.cpp``: | |
259 | |
260 .. code-block:: c++ | |
261 | |
262 #include "clang/ASTMatchers/ASTMatchers.h" | |
263 #include "clang/ASTMatchers/ASTMatchFinder.h" | |
264 | |
265 using namespace clang; | |
266 using namespace clang::ast_matchers; | |
267 | |
268 StatementMatcher LoopMatcher = | |
269 forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( | |
270 hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); | |
271 | |
272 class LoopPrinter : public MatchFinder::MatchCallback { | |
273 public : | |
274 virtual void run(const MatchFinder::MatchResult &Result) { | |
275 if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) | |
276 FS->dump(); | |
277 } | |
278 }; | |
279 | |
280 And change ``main()`` to: | |
281 | |
282 .. code-block:: c++ | |
283 | |
284 int main(int argc, const char **argv) { | |
285 CommonOptionsParser OptionsParser(argc, argv, MyToolCategory); | |
286 ClangTool Tool(OptionsParser.getCompilations(), | |
287 OptionsParser.getSourcePathList()); | |
288 | |
289 LoopPrinter Printer; | |
290 MatchFinder Finder; | |
291 Finder.addMatcher(LoopMatcher, &Printer); | |
292 | |
293 return Tool.run(newFrontendActionFactory(&Finder).get()); | |
294 } | |
295 | |
296 Now, you should be able to recompile and run the code to discover for | |
297 loops. Create a new file with a few examples, and test out our new | |
298 handiwork: | |
299 | |
300 .. code-block:: console | |
301 | |
302 cd ~/clang-llvm/llvm/llvm_build/ | |
303 ninja loop-convert | |
304 vim ~/test-files/simple-loops.cc | |
305 bin/loop-convert ~/test-files/simple-loops.cc | |
306 | |
307 Step 3.5: More Complicated Matchers | |
308 =================================== | |
309 | |
310 Our simple matcher is capable of discovering for loops, but we would | |
311 still need to filter out many more ourselves. We can do a good portion | |
312 of the remaining work with some cleverly chosen matchers, but first we | |
313 need to decide exactly which properties we want to allow. | |
314 | |
315 How can we characterize for loops over arrays which would be eligible | |
316 for translation to range-based syntax? Range based loops over arrays of | |
317 size ``N`` that: | |
318 | |
319 - start at index ``0`` | |
320 - iterate consecutively | |
321 - end at index ``N-1`` | |
322 | |
323 We already check for (1), so all we need to add is a check to the loop's | |
324 condition to ensure that the loop's index variable is compared against | |
325 ``N`` and another check to ensure that the increment step just | |
326 increments this same variable. The matcher for (2) is straightforward: | |
327 require a pre- or post-increment of the same variable declared in the | |
328 init portion. | |
329 | |
330 Unfortunately, such a matcher is impossible to write. Matchers contain | |
331 no logic for comparing two arbitrary AST nodes and determining whether | |
332 or not they are equal, so the best we can do is matching more than we | |
333 would like to allow, and punting extra comparisons to the callback. | |
334 | |
335 In any case, we can start building this sub-matcher. We can require that | |
336 the increment step be a unary increment like this: | |
337 | |
338 .. code-block:: c++ | |
339 | |
340 hasIncrement(unaryOperator(hasOperatorName("++"))) | |
341 | |
342 Specifying what is incremented introduces another quirk of Clang's AST: | |
343 Usages of variables are represented as ``DeclRefExpr``'s ("declaration | |
344 reference expressions") because they are expressions which refer to | |
345 variable declarations. To find a ``unaryOperator`` that refers to a | |
346 specific declaration, we can simply add a second condition to it: | |
347 | |
348 .. code-block:: c++ | |
349 | |
350 hasIncrement(unaryOperator( | |
351 hasOperatorName("++"), | |
352 hasUnaryOperand(declRefExpr()))) | |
353 | |
354 Furthermore, we can restrict our matcher to only match if the | |
355 incremented variable is an integer: | |
356 | |
357 .. code-block:: c++ | |
358 | |
359 hasIncrement(unaryOperator( | |
360 hasOperatorName("++"), | |
361 hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) | |
362 | |
363 And the last step will be to attach an identifier to this variable, so | |
364 that we can retrieve it in the callback: | |
365 | |
366 .. code-block:: c++ | |
367 | |
368 hasIncrement(unaryOperator( | |
369 hasOperatorName("++"), | |
370 hasUnaryOperand(declRefExpr(to( | |
371 varDecl(hasType(isInteger())).bind("incrementVariable")))))) | |
372 | |
373 We can add this code to the definition of ``LoopMatcher`` and make sure | |
374 that our program, outfitted with the new matcher, only prints out loops | |
375 that declare a single variable initialized to zero and have an increment | |
376 step consisting of a unary increment of some variable. | |
377 | |
378 Now, we just need to add a matcher to check if the condition part of the | |
379 ``for`` loop compares a variable against the size of the array. There is | |
380 only one problem - we don't know which array we're iterating over | |
381 without looking at the body of the loop! We are again restricted to | |
382 approximating the result we want with matchers, filling in the details | |
383 in the callback. So we start with: | |
384 | |
385 .. code-block:: c++ | |
386 | |
387 hasCondition(binaryOperator(hasOperatorName("<")) | |
388 | |
389 It makes sense to ensure that the left-hand side is a reference to a | |
390 variable, and that the right-hand side has integer type. | |
391 | |
392 .. code-block:: c++ | |
393 | |
394 hasCondition(binaryOperator( | |
395 hasOperatorName("<"), | |
396 hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), | |
397 hasRHS(expr(hasType(isInteger()))))) | |
398 | |
399 Why? Because it doesn't work. Of the three loops provided in | |
400 ``test-files/simple.cpp``, zero of them have a matching condition. A | |
401 quick look at the AST dump of the first for loop, produced by the | |
402 previous iteration of loop-convert, shows us the answer: | |
403 | |
404 :: | |
405 | |
406 (ForStmt 0x173b240 | |
407 (DeclStmt 0x173afc8 | |
408 0x173af50 "int i = | |
409 (IntegerLiteral 0x173afa8 'int' 0)") | |
410 <<>> | |
411 (BinaryOperator 0x173b060 '_Bool' '<' | |
412 (ImplicitCastExpr 0x173b030 'int' | |
413 (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) | |
414 (ImplicitCastExpr 0x173b048 'int' | |
415 (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) | |
416 (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' | |
417 (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) | |
418 (CompoundStatement ... | |
419 | |
420 We already know that the declaration and increments both match, or this | |
421 loop wouldn't have been dumped. The culprit lies in the implicit cast | |
422 applied to the first operand (i.e. the LHS) of the less-than operator, | |
423 an L-value to R-value conversion applied to the expression referencing | |
424 ``i``. Thankfully, the matcher library offers a solution to this problem | |
425 in the form of ``ignoringParenImpCasts``, which instructs the matcher to | |
426 ignore implicit casts and parentheses before continuing to match. | |
427 Adjusting the condition operator will restore the desired match. | |
428 | |
429 .. code-block:: c++ | |
430 | |
431 hasCondition(binaryOperator( | |
432 hasOperatorName("<"), | |
433 hasLHS(ignoringParenImpCasts(declRefExpr( | |
434 to(varDecl(hasType(isInteger())))))), | |
435 hasRHS(expr(hasType(isInteger()))))) | |
436 | |
437 After adding binds to the expressions we wished to capture and | |
438 extracting the identifier strings into variables, we have array-step-2 | |
439 completed. | |
440 | |
441 Step 4: Retrieving Matched Nodes | |
442 ================================ | |
443 | |
444 So far, the matcher callback isn't very interesting: it just dumps the | |
445 loop's AST. At some point, we will need to make changes to the input | |
446 source code. Next, we'll work on using the nodes we bound in the | |
447 previous step. | |
448 | |
449 The ``MatchFinder::run()`` callback takes a | |
450 ``MatchFinder::MatchResult&`` as its parameter. We're most interested in | |
451 its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` | |
452 class to represent contextual information about the AST, as the name | |
453 implies, though the most functionally important detail is that several | |
454 operations require an ``ASTContext*`` parameter. More immediately useful | |
455 is the set of matched nodes, and how we retrieve them. | |
456 | |
457 Since we bind three variables (identified by ConditionVarName, | |
458 InitVarName, and IncrementVarName), we can obtain the matched nodes by | |
459 using the ``getNodeAs()`` member function. | |
460 | |
461 In ``LoopConvert.cpp`` add | |
462 | |
463 .. code-block:: c++ | |
464 | |
465 #include "clang/AST/ASTContext.h" | |
466 | |
467 Change ``LoopMatcher`` to | |
468 | |
469 .. code-block:: c++ | |
470 | |
471 StatementMatcher LoopMatcher = | |
472 forStmt(hasLoopInit(declStmt( | |
473 hasSingleDecl(varDecl(hasInitializer(integerLiteral(equals(0)))) | |
474 .bind("initVarName")))), | |
475 hasIncrement(unaryOperator( | |
476 hasOperatorName("++"), | |
477 hasUnaryOperand(declRefExpr( | |
478 to(varDecl(hasType(isInteger())).bind("incVarName")))))), | |
479 hasCondition(binaryOperator( | |
480 hasOperatorName("<"), | |
481 hasLHS(ignoringParenImpCasts(declRefExpr( | |
482 to(varDecl(hasType(isInteger())).bind("condVarName"))))), | |
483 hasRHS(expr(hasType(isInteger())))))).bind("forLoop"); | |
484 | |
485 And change ``LoopPrinter::run`` to | |
486 | |
487 .. code-block:: c++ | |
488 | |
489 void LoopPrinter::run(const MatchFinder::MatchResult &Result) { | |
490 ASTContext *Context = Result.Context; | |
491 const ForStmt *FS = Result.Nodes.getNodeAs<ForStmt>("forLoop"); | |
492 // We do not want to convert header files! | |
493 if (!FS || !Context->getSourceManager().isWrittenInMainFile(FS->getForLoc())) | |
494 return; | |
495 const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>("incVarName"); | |
496 const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>("condVarName"); | |
497 const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>("initVarName"); | |
498 | |
499 if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) | |
500 return; | |
501 llvm::outs() << "Potential array-based loop discovered.\n"; | |
502 } | |
503 | |
504 Clang associates a ``VarDecl`` with each variable to represent the variable's | |
505 declaration. Since the "canonical" form of each declaration is unique by | |
506 address, all we need to do is make sure neither ``ValueDecl`` (base class of | |
507 ``VarDecl``) is ``NULL`` and compare the canonical Decls. | |
508 | |
509 .. code-block:: c++ | |
510 | |
511 static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { | |
512 return First && Second && | |
513 First->getCanonicalDecl() == Second->getCanonicalDecl(); | |
514 } | |
515 | |
516 If execution reaches the end of ``LoopPrinter::run()``, we know that the | |
517 loop shell that looks like | |
518 | |
519 .. code-block:: c++ | |
520 | |
521 for (int i= 0; i < expr(); ++i) { ... } | |
522 | |
523 For now, we will just print a message explaining that we found a loop. | |
524 The next section will deal with recursively traversing the AST to | |
525 discover all changes needed. | |
526 | |
527 As a side note, it's not as trivial to test if two expressions are the same, | |
528 though Clang has already done the hard work for us by providing a way to | |
529 canonicalize expressions: | |
530 | |
531 .. code-block:: c++ | |
532 | |
533 static bool areSameExpr(ASTContext *Context, const Expr *First, | |
534 const Expr *Second) { | |
535 if (!First || !Second) | |
536 return false; | |
537 llvm::FoldingSetNodeID FirstID, SecondID; | |
538 First->Profile(FirstID, *Context, true); | |
539 Second->Profile(SecondID, *Context, true); | |
540 return FirstID == SecondID; | |
541 } | |
542 | |
543 This code relies on the comparison between two | |
544 ``llvm::FoldingSetNodeIDs``. As the documentation for | |
545 ``Stmt::Profile()`` indicates, the ``Profile()`` member function builds | |
546 a description of a node in the AST, based on its properties, along with | |
547 those of its children. ``FoldingSetNodeID`` then serves as a hash we can | |
548 use to compare expressions. We will need ``areSameExpr`` later. Before | |
549 you run the new code on the additional loops added to | |
550 test-files/simple.cpp, try to figure out which ones will be considered | |
551 potentially convertible. |