Implementing Continuation based language in Clang and LLVM

Kaito Tokumori, Shinji Kono
University of the Ryukyus

Objective

Introducing new units of programming

Traditional units of programming

What we want to do with programming units?

It is not easy to do this in the traditional units.

New programming units

Code segments

It is easy to divide or combine.

Data segments

It is easy to divide or combine.

Meta code / data segments

A thread of code segments has a context as a meta data segment

Continuation based C (CbC)

code segment syntax

__code f() {
    goto g();
}

__code g() {
    goto h();
}
              
  • code segment transition is goto.
  • Code segments have no return statement.
  • There are no return values.

code segment syntax with data segments

data segments are arguments of a code segment and a goto statement.

__code code1(struct Allocate* allocate,
            struct Element* element) {
    element ->value        = 10;
    struct List* list = (struct List *)malloc(sizeof( struct List));
    goto append(allocate,list, element);
}

__code append(struct Allocate* allocate, struct List* list, struct Element* element) {
    if(list->head) {
        list->tail->next = element;
    } else {
        list->head = element;
    }
    list->tail       = element;
    list->tail->next = 0;
    goto code2(allocate,list, element);
}
              

CbC meta computation

add meta part to both code and data, keep object level computation

meta computation example

list with meta data segment

In cons cell, pointer part is visible to object level.

Hide pointer part in meta data segment.

Handle pointer operation in meta code segment.

CbC compilers

LLVM and Clang's compilation flow

Advantage of LLVM implementation

CbC implementation strategy

LLVM IR

define fastcc void @factorial(i32 %x) #0 {
  entry:
  tail call fastcc void @factorial0(i32 1, i32 %x)
  ret void
}
              

CbC implementation strategy

Prototype declaration generating

original input code Clang generates it internally
__code code1(int a, int b) {
     :
  goto code2(a,b);
}

__code code2(int a, int b){
     :
}
              
__code code2(int a, int b);
__code code1(int a, int b) {
     :
  goto code2(a,b);
}

__code code2(int a, int b){
     :
}
              

goto syntax for transition

original input code Clang generates it
__code code1() {
     :
  goto code2();
}
              
void code1() {
     :
  code2();
  return;
}
              

Forcing Tail Call Elimination

Tail call elimination pass is added in CodeGen.

Ensure TCE in SelectionDAGISel.

What is tail call elimination?

Forcing Tail Call Elimination

These are the reason why we need __code type for code segments.

goto a code segment from a normal C function

How to return to C from a code segment?

Goto with environment

Syntax of Goto with environment

  • Use new keywords __return and __environment.
  • __return is a code segment pointer for C functions caller.
  • __environment is a environment for C functions caller.
  • G use a continuation with environments to return main function (f's caller).
__code g(int n,__code(*exit_code)(int,void *),void *exit_env){ printf("code1 : code entry1\n"); goto exit_code(n,exit_env); } int f(){ printf("caller : main1 entry\n"); __code (*__ret)(int, void *) = __return; void *__env = __environment; goto g(1, __ret, __env); return 0; } int main(){ int n; n = caller(); printf("return = %d\n",n); return 0; }

Implementation of goto with environment

Several ways to implementation

LLVM Implementation of goto with environment

Compiling result


__code caller(int x)
{
  goto code1(1, x); // should be jmp
}
              
_caller:                             ## @factorial
        .cfi_startproc
## BB#0:                                ## %entry
        subq    $24, %rsp
Ltmp5:
        .cfi_def_cfa_offset 32
        movl    $1, %eax
        movl    %edi, 20(%rsp)          ## 4-byte Spill
        movl    %eax, %edi
        movl    20(%rsp), %esi          ## 4-byte Reload
        addq    $24, %rsp
        jmp     _code1             ## TAILCALL
        .cfi_endproc
              

Usage of CbC : as an instruction description

CbC can be used as a hardware description language (RTL level)

Usage of CbC : Parallel Task representation

CbC can be used as a parallel programming language

Usage of CbC : OS API description

Usage of CbC : meta computation

Usage of CbC : Model checking

Conclusion

Future works

LLVM and Clang's intermediate representations

Name Description
clang AST Abstract Syntax Tree. It is a representation of the structure source codes.
LLVM IR The main intermediate representation of LLVM. It has three different forms: as an in-memory compiler IR, as an on-disk bitcode representation, and as a human readable assembly language representation.
SelectionDAG Directed Acyclic Graph. Its nodes indicate what operation the node performs and the operands to the operation.
Machine Code This representation is designed to support both an SSA representation for machine code, as well as register allocated, non-SSA form.
MC Layer It is used to represent and process code at the raw machine code level. User can some kinds of file (.s, .o, .ll, a.out) by same API.

Calling Convention

Name Description
ccc C calling conbentions.
This calling convention supports varargs function calls and tolerates some mismatch in the declared prototype and implemented declaration of the function.
cc10 This calling convention has been implemented specifically for use by the Glasgow Haskell Compiler (GHC).
cc11 This calling convention has been implemented specifically for use by the High-Performance Erlang (HiPE) compiler.
fastcc This calling convention attempts to make calls as fast as possible.

Execution Result

Argument 1 Argument 2 Argument 3
Micro-C 6.875 2.4562 3.105
GCC -O2 2.9438 0.955 1.265
LLVM and Clang -O0 5.835 4.1887 5.0625
LLVM and Clang -O2 3.3875 2.29 2.5087
unit : seconds