Implementating Continuation based language in Clang and LLVM

Kaito Tokumori, Shinji Kono

Objective

Introducing new units of programming

Traditional units of programming

What we want to do with programming units?

It is not easy in the traditional units.

New programing units

Code segments

It is easy to divide or combine.

Data segments

It is easy to divide or combine.

Meta code / data segments

Meta code segments are executed right after the goto.

Meta data segments are kinds of process data.

Continuation based C (CbC)

CbC sample

__code f() {
    goto g();
}

__code g() {
    goto h();
}
              
  • Code segments like C functions.
  • CbC transition is goto.
  • Code segments do not return to previous.
  • There are no return values.

CbC sample with data segments

__code code(struct Context* context, struct Allocate* allocate,
            struct Element* element) {
    allocate->after_append = Code2;
    element ->value        = 10;
    goto meta(context, Append);
}

__code append(struct Context* context, struct Allocate* allocate,
              struct List* list, struct Element* element) {
    if(list->head) {
        list->tail->next = element;
    } else {
        list->head = element;
    }
    list->tail       = element;
    list->tail->next = 0;
    goto meta(context, allocate->after_append);
}

__code meta(struct Context* context, enum Code next) {
    goto (context->code[next])(context);
}
              
  • A part of list program.
  • Code segment transition into next one via meta code segment.
  • Context has code segments name.
  • Context give meta code segments next code segment pointer.

CbC compilers

What are LLVM and Clang?

Why?

LLVM and Clang's compilation flow

LLVM and Clang's intermidiate representations

Intermidiate representations are not modified.

Clang AST

LLVM IR

define fastcc void @factorial(i32 %x) #0 {
  entry:
  tail call fastcc void @factorial0(i32 1, i32 %x)
  ret void
}
              

Basic strategy of implementating

Parser

__code type

Prototype declaration generating

original input code Clang genarates it
__code code1(int a, int b) {
     :
  goto code2(a,b);
}

__code code2(int a, int b){
     :
}
              
__code code2(int a, int b);
__code code1(int a, int b) {
     :
  goto code2(a,b);
}

__code code2(int a, int b){
     :
}
              

goto syntax for transition

  • New goto syntax for transition.
  • Generate normal function call.
  • Tail call elimination is forced later.

goto syntax for transition

original input code Clang genarates it
__code code1() {
     :
  goto code2();
}
              
void code1() {
     :
  code2();
  return;
}
              

Forcing Tail Call Elimination

TCE is enabled at CodeGen.

TCE is act at SelectionDAGISel.

What is tail call elimination?

Forcing Tail Call Elimination

define fastcc void @factorial(i32 %x) #0 {
  entry:
  tail call fastcc void @factorial0(i32 1, i32 %x)
  ret void
}
              

Use them for force to tail call elimination.

Forcing Tail Call Elimination

Tail Call Elimination requirements

Forcing Tail Call Elimination

What is a Goto with environment?

Sample code of Goto with environment

  • Use new keywords __return and __environment.
  • __return is a code segment pointer for C functions.
  • __environment is a envitonment for C functions.
  • Code1 use a continuation with environments to return main function.
__code code1(int n,__code(*exit_code)(int,void *),void *exit_env){ printf("code1 : code entry1\n"); goto exit_code(n,exit_env); } int caller(){ printf("caller : main1 entry\n"); __code (*__ret)(int, void *) = __return; struct __CbC_env *__env = __environment; goto code1(1, __ret, __env); return 0; } int main(){ int n; n = caller(); printf("return = %d\n",n); return 0; }

Implementing goto with environment

Compiling result


__code caller(int x)
{
  goto code1(1, x); // should be jmp
}
              
_caller:                             ## @factorial
        .cfi_startproc
## BB#0:                                ## %entry
        subq    $24, %rsp
Ltmp5:
        .cfi_def_cfa_offset 32
        movl    $1, %eax
        movl    %edi, 20(%rsp)          ## 4-byte Spill
        movl    %eax, %edi
        movl    20(%rsp), %esi          ## 4-byte Reload
        addq    $24, %rsp
        jmp     _code1             ## TAILCALL
        .cfi_endproc
              

Conclusion

Future works

LLVM and Clang's intermidiate representations

Name Desctiption
clang AST Abstract Syntax Tree. It is a representation of the structure source codes.
LLVM IR The main intermidiate representation of LLVM. It has three diffirent forms: as an in-memory compiler IR, as an on-disk bitcode representation, and as a human readable assembly language representation.
SelectionDAG Directed Acyclic Graph. Its nodes indicate what operation the node performs and the operands to the operation.
Machine Code This representation is designed to support both an SSA representation for machine code, as well as register allocated, non-SSA form.
MC Layer It is used to represent and process code at the raw machine code level. User can some kinds of file (.s, .o, .ll, a.out) by same API.