Implementating Continuation based language in Clang and LLVM

Kaito Tokumori, Shinji Kono
University of the Ryukyus

Objective

Introducing new units of programming

Traditional units of programming

What we want to do with programming units?

It is not easy to do this in the traditional units.

New programming units

Code segments

It is easy to divide or combine.

Data segments

It is easy to divide or combine.

Meta code / data segments

A thread of code segments has a context as a meta data segment

Continuation based C (CbC)

code segment syntax

__code f() {
    goto g();
}

__code g() {
    goto h();
}
              
  • code segment transition is goto.
  • Code segments have no return statement.
  • There are no return values.

code segment syntax with data segments

__code code1(struct Allocate* allocate,
            struct Element* element) {
    element ->value        = 10;
    struct List* list = (struct List *)malloc(sizeof( struct List));
    goto append(allocate,list, element);
}

__code append(struct Allocate* allocate, struct List* list, struct Element* element) {
    if(list->head) {
        list->tail->next = element;
    } else {
        list->head = element;
    }
    list->tail       = element;
    list->tail->next = 0;
    goto code2(allocate,list, element);
}
              
  • data segments are arguments of a code segment and a goto statement.

CbC compilers

LLVM and Clang's compilation flow

Advantage of LLVM implementation

CbC implementation strategy

LLVM IR

define fastcc void @factorial(i32 %x) #0 {
  entry:
  tail call fastcc void @factorial0(i32 1, i32 %x)
  ret void
}
              

CbC implementation strategy

Prototype declaration generating

original input code Clang generates it internally
__code code1(int a, int b) {
     :
  goto code2(a,b);
}

__code code2(int a, int b){
     :
}
              
__code code2(int a, int b);
__code code1(int a, int b) {
     :
  goto code2(a,b);
}

__code code2(int a, int b){
     :
}
              

goto syntax for transition

original input code Clang generates it
__code code1() {
     :
  goto code2();
}
              
void code1() {
     :
  code2();
  return;
}
              

Forcing Tail Call Elimination

Tail call flag is set in CodeGen.

Ensure TCE in SelectionDAGISel.

What is tail call elimination?

Tail Call Elimination requirements

Forcing Tail Call Elimination

This is the reason why we need __code type for code segments.

goto a code segment from a normal C function

How to return to C from a code segment?

Goto with environment

We want provide continuation of function f. The continuation is not a simple code segment, because code segment has not state. We represent the continuation with a code segment (__return) and a meta data segment (__environment).

Syntax of Goto with environment

  • Use new keywords __return and __environment.
  • __return is a code segment pointer for C functions.
  • __environment is a environment for C functions.
  • Code1 use a continuation with environments to return main function.
__code code1(int n,__code(*exit_code)(int,void *),void *exit_env){ printf("code1 : code entry1\n"); goto exit_code(n,exit_env); } int caller(){ printf("caller : main1 entry\n"); __code (*__ret)(int, void *) = __return; struct __CbC_env *__env = __environment; goto code1(1, __ret, __env); return 0; } int main(){ int n; n = caller(); printf("return = %d\n",n); return 0; }

Implementation of goto with environment

Several ways to implementation

LLVM Implementation of goto with environment

Compiling result


__code caller(int x)
{
  goto code1(1, x); // should be jmp
}
              
_caller:                             ## @factorial
        .cfi_startproc
## BB#0:                                ## %entry
        subq    $24, %rsp
Ltmp5:
        .cfi_def_cfa_offset 32
        movl    $1, %eax
        movl    %edi, 20(%rsp)          ## 4-byte Spill
        movl    %eax, %edi
        movl    20(%rsp), %esi          ## 4-byte Reload
        addq    $24, %rsp
        jmp     _code1             ## TAILCALL
        .cfi_endproc
              

Conclusion

Future works

LLVM and Clang's intermediate representations

Name Description
clang AST Abstract Syntax Tree. It is a representation of the structure source codes.
LLVM IR The main intermediate representation of LLVM. It has three different forms: as an in-memory compiler IR, as an on-disk bitcode representation, and as a human readable assembly language representation.
SelectionDAG Directed Acyclic Graph. Its nodes indicate what operation the node performs and the operands to the operation.
Machine Code This representation is designed to support both an SSA representation for machine code, as well as register allocated, non-SSA form.
MC Layer It is used to represent and process code at the raw machine code level. User can some kinds of file (.s, .o, .ll, a.out) by same API.