# HG changeset patch # User Kaito Tokumori # Date 1435921617 -32400 # Node ID 41fe2e188445ee9fa6a1bda6b929092f821b9547 # Parent a6907650e3e1a704ac55defebec505f59d472ef3 fix diff -r a6907650e3e1 -r 41fe2e188445 presentation/presen.html --- a/presentation/presen.html Fri Jul 03 13:38:40 2015 +0900 +++ b/presentation/presen.html Fri Jul 03 20:06:57 2015 +0900 @@ -165,20 +165,38 @@ +
-
+              
+
+
    +
  • A part of list program. +
  • Code segments like C functions. +
  • CbC transition is goto so code segments do not return to previous. +
  • There are no return values. +
+
+ + +
+

CbC sample

+ +
+
 __code code(struct Context* context, struct Allocate* allocate, struct Element* element) {
     allocate->after_append = Code2;
     element ->value        = 10;
     goto meta(context, Append);
 }
 
-__code append(struct Context* context, struct Allocate* allocate, struct List* list, struct Element* element) {
+__code append(struct Context* context, struct Allocate* allocate,
+              struct List* list, struct Element* element) {
     if(list->head) {
         list->tail->next = element;
     } else {
@@ -217,7 +235,7 @@
       
 
       
-

What is LLVM?

+

What are LLVM and Clang?

  • Compiler frameworks.
  • has a intermidiate language which is called LLVM IR, LLVM language or LLVM bitcode. @@ -229,20 +247,12 @@
-

What is Clang?

-
    -
  • C, C++ and Obj-C compiler frontend. -
  • Uses LLVM for compiler backend. -
-
- -
-

Why LLVM?

+

Why?

  • Apple supported.
  • OS X default compiler. -
  • LLVM IR's documantation is useful and readable. -
  • LLVM and Clang has readable documantation of their source codes. +
  • LLVM IR has readable documents. +
  • More readable and modifiable than GCC.
@@ -258,49 +268,45 @@

LLVM and Clang's intermidiate representations

- - - - - - - -
- Name - - Desctiption -
- clang AST - - Abstract Syntax Tree. It is a representation of the structure source codes. -
- LLVM IR - - The main intermidiate representation of LLVM. It has three diffirent forms: as an in-memory compiler IR, as an on-disk bitcode representation, and as a human readable assembly language representation. -
- SelectionDAG - - Directed Acyclic Graph. Its nodes indicate what operation the node performs and the operands to the operation. -
- Machine Code - - This representation is designed to support both an SSA representation for machine code, as well as register allocated, non-SSA form. -
- MC Layer - - It is used to represent and process code at the raw machine code level. User can some kinds of file (.s, .o, .ll, a.out) by same API. -
-
-

Intermidiate representations are not modified.

+
    +
  • clang AST +
  • LLVM IR +
  • SelectionDAG +
  • Machine Code +
  • MC Layer +
+

Intermidiate representations are not modified.

+
+ +
+

Clang AST

+
    +
  • Abstract Syntax Tree. +
  • Representation of the source codes structure. +
  • Basic node type: Stmt, Decl, Expr. +
-

Problems on implementating

+

LLVM IR

    -
  • How to implement code segments and data segments? -
  • How to implement jmp instruction based transition? -
  • How to implement goto with environment syntax? +
  • The main intermidiate representation. +
  • LLVM translate it into assembly codes. +
  • Three forms: in-memory compiler IR, on-disk bitcode, assembly language.
+ + + + +
+
+define fastcc void @factorial(i32 %x) #0 {
+  entry:
+  tail call fastcc void @factorial0(i32 1, i32 %x)
+  ret void
+}
+              
+
@@ -316,49 +322,77 @@

Implementating CbC compiler in LLVM and Clang

    -
  • add __code type for code segment. -
  • add goto syntax for transition. -
  • force to tail call elimination. -
  • goto with environment. -
  • automatically prototype declatation genarating. +
  • __code type. +
  • Goto syntax. +
  • Force to do tail call elimination. +
  • Goto with environment. +
  • Automatically prototype declatation genarating.
-

__code type

-

modify parser.

+

Parser

+
    +
  • __code type +
  • Prototype declaration generating +
  • Goto syntax for transitions +

__code type

- -
-
    -
  • Clang and LLVM handle code segments as __code type functions. -
  • Code segments do not have return value so they are handled like void functions. -
  • The following code is the place where Clang parse a __code type. -
  • DS.SetTypeSpecType() set AST nodes __code type. -
-
-
-  case tok::kw___code: {
-    LangOptions* LOP;
-    LOP = const_cast(&getLangOpts());
-    LOP->HasCodeSegment = 1;
-    isInvalid = DS.SetTypeSpecType(DeclSpec::TST___code, Loc, PrevSpec, DiagID);
-    break;
-  }
+
    +
  • Code segments as __code type functions. +
  • Handled like void functions. +
-

goto syntax for transition

-

modify parser.

-
+

Prototype declaration generating

+
    +
  • In CbC, programmer write a lot of code segments. +
  • When function pointer's arguments are omitted, TCE was failed sometimes. +
  • Automatically prototype declaration generating saves a lot of effort. +
  • When parser meet a code segment call, it stop current parsing and search called code segment declaration. +
  • If the declaration was not found, search definision and generate declaration. +
      +
    • Of course you can write declaration yourself too. +
    +
+ + + + + +
original input code + Clang genarates it +
+__code code1(int a, int b) {
+     :
+  goto code2(a,b);
+}
+
+__code code2(int a, int b){
+     :
+}
+              
+
+__code code2(int a, int b);
+__code code1(int a, int b) {
+     :
+  goto code2(a,b);
+}
+
+__code code2(int a, int b){
+     :
+}
+              
+
@@ -366,28 +400,11 @@ - -
    -
  • Add new goto syntax for transition. -
  • Jmp instraction based transition is enabled by tail call elimination. -
  • In this part, clang create AST for normal function call and force to tail call elimination later. -
  • The following code is the place where CbC goto was parsed. -
  • If the goto is not for C syntax, we judge it is for CbC syntax. +
  • New goto syntax for transition. +
  • Generate normal function call. +
  • Tail call elimination is forced later.
-
-case tok::kw_goto:
-#ifndef noCbC
-  if (!(NextToken().is(tok::identifier) && PP.LookAhead(1).is(tok::semi)) &&
-    NextToken().isNot(tok::star)) {
-      SemiError = "goto code segment";
-      return ParseCbCGotoStatement(Attrs, Stmts);
-    }
-#endif
-  Res = ParseGotoStatement();
-  SemiError = "goto";
-  break;
-
@@ -421,18 +438,6 @@
-

Jmp instruction based transition

-
    -
  • It is implemented by Tail Call Elimination (TCE). -
  • TCE is one of the optimization. -
  • If the function call is immediately followed by return, it is tail call. -
  • TCE replace tail call's call instructions with jmp instructions. -
  • Code segments transition is implemented by forced tail call elimination. -
-
-
- -

Forcing Tail Call Elimination

TCE is enabled at CodeGen.

TCE is act at SelectionDAGISel.

@@ -440,6 +445,17 @@
+

Jmp instruction based transition

+
    +
  • Tail call is immediately followed by return. +
  • Tail call elimination replace tail call's call instructions with jmp instructions. +
  • Transitions are implemented by forced tail call elimination. +
+
+
+ + +

Forcing Tail Call Elimination

  • LLVM IR has function call flags. @@ -464,40 +480,32 @@

    Forcing Tail Call Elimination

    -

    We have to meet the following requirements.

    +

    Tail Call Elimination requirements

      -
    • set tail flag at the code segments call. -
    • tailcallopt is enabled. -
    • the caller and calle's calling conventions must be the same and their types should be cc10, cc11 or fastcc. -
    • return value type has to be the same as the caller's. +
    • Set tail flag at the code segments call. +
    • Tailcallopt is enabled. +
    • The caller and calle's calling conventions must be the same and their types should be cc10, cc11 or fastcc. +
    • Return value type has to be the same as the caller's.

    Forcing Tail Call Elimination

    -

    We met them by following ways.

      -
    • Always add tail call elimination pass and set flag at the code segments call. -
    • If the input code contains code segment, tailcallopt is enabled automatically. +
    • Always add tail call elimination pass. +
    • Tailcallopt is enabled in CbC.
    • Fast cc is used consistently in code segments call.
    • All the code segments return value type is void.
    -

    Goto with environment

    -

    Goto with environment is enabled by modifying parser.

    -
    -
    - -

    What is a Goto with environment?

      -
    • Code segments do not have environment cut functions have. -
    • Usually, code segments can't return to functions. -
    • Goto with environment enable to it. -
    • In the GCC, use nested functions to implementing. -
    • In the LLVM and Clang, use setjmp and longjmp to implementing. +
    • Code segments do not have environment but C functions have. +
    • Code segments can reutn C functions by Goto with environment. +
    • In the GCC, use nested functions. +
    • In the LLVM and Clang, use setjmp and longjmp.
    @@ -551,55 +559,6 @@
-
-

Prototype declaration generating

-

modify parser.

-
-
- -
-

Prototype declaration generating

-
    -
  • In CbC, programmer write a lot of code segments. -
  • When function pointer's arguments are omitted, TCE was failed sometimes. -
  • Automatically prototype declaration generating saves a lot of effort. -
  • When parser meet a code segment call, it stop current parsing and search called code segment declaration. -
  • If the declaration was not found, search definision and generate declaration. -
      -
    • Of course you can write declaration yourself too. -
    - -
- - - - - -
original input code - Clang genarates it -
-__code code1(int a, int b) {
-     :
-  goto code2(a,b);
-}
-
-__code code2(int a, int b){
-     :
-}
-              
-
-__code code2(int a, int b);
-__code code1(int a, int b) {
-     :
-  goto code2(a,b);
-}
-
-__code code2(int a, int b){
-     :
-}
-              
-
-

Compiling result

@@ -638,7 +597,7 @@
  • If tail call elimination was failed, compiler output error messages.
  • - +

    Conclusion

      @@ -715,6 +674,44 @@
    +
    +

    LLVM and Clang's intermidiate representations

    + + + + + + + +
    + Name + + Desctiption +
    + clang AST + + Abstract Syntax Tree. It is a representation of the structure source codes. +
    + LLVM IR + + The main intermidiate representation of LLVM. It has three diffirent forms: as an in-memory compiler IR, as an on-disk bitcode representation, and as a human readable assembly language representation. +
    + SelectionDAG + + Directed Acyclic Graph. Its nodes indicate what operation the node performs and the operands to the operation. +
    + Machine Code + + This representation is designed to support both an SSA representation for machine code, as well as register allocated, non-SSA form. +
    + MC Layer + + It is used to represent and process code at the raw machine code level. User can some kinds of file (.s, .o, .ll, a.out) by same API. +
    +
    +

    Intermidiate representations are not modified.

    +
    +