compiler-design-eth/README.md

3.9 KiB

HW4: code generation

Better viewed as markdown

VTables

They are generated when each class is visited, except for those of Object, Object[], int[] and boolean[], which are generated in the bootstrapping.

When the object vtable is generated, so is the array table, which just contains a reference to the supertype (in our case Object) and an unused pointer to the declared type of the element's vtable.

supertype pointer (Object) <-- vtable pointer
element vtable pointer (unused)

Each type's vtable contains the supertype as the first element (trait shared with array type's vtables for easier cast type comparisons). Then they contain all the methods available to an object of that type, including all those inherited. As the dynamic type is not known, the number and position of the methods is not known when executing a method, therefore each method's pointer is prepended by the method's name hash (similar to fields in an object).

supertype pointer <-- vtable pointer
hash("method0")
method0 pointer
hash("method1")
method1 pointer
...
hash("methodN")
methodN pointer

Method calls

The basic scheme used for method calls and the structure of the stack frame is that described in the __cdecl used in GCC. In short, this is the execution of a method call:

  1. The caller pushes the arguments in the stack, from right-to-left, leaving the 0th argument (the reference to this) at the top of the stack.
  2. The caller executes call (the instruction pointer gets stored in the stack)
  3. The callee now takes control, and its first action is to save the base pointer to the stack and set the base pointer to the current stack pointer. Therefore, the arguments and return address are available to the method.
  4. The local variables are reserved and zeroed in the stack.
  5. The CALLEE_SAVED registers are saved to the stack.
  6. The method executes its code
  7. The CALLEE_SAVED registers are restored to their original status.
  8. The callee restores the stack and base registers to its initial statuses. This is accomplished with a leave instruction. Then it returns using ret. This removes the instruction pointer from the stack leaving only the arguments.
  9. The caller finally removes all the arguments from the stack and obtains the return value from EAX.

Therefore, the stack frame follows the following structure (N arguments and M local variables):

top of stack <-- ESP
...
saved registers
localM <-- EBP - 4 - M * 4
...
local1 <-- EBP - 8
local0 <-- EBP - 4
Old EBP <-- EBP
Return EIP
this <-- EBP + 8
arg0 <-- EBP + 12
...
argN <-- EBP + 12 + N * 4

Variable and object representation

Primitive variables

They are represented by value, with both integers and booleans occupying 4 bytes, and booleans being represented as TRUE = 0, FALSE = 1.

Reference variables

Stored in the heap, using calloc (to zero-initialize them).

Arrays

On top of storing the array elements, some overhead is stored with it, namely a pointer to the array vtable for casts, and the size of the array to check the bounds of the array on accesses or assignments. The pointer to the array points to the vtable pointer (the first element in the object structure):

array vtable pointer <-- var pointer
array size
element0
element1
...
elementN

Objects

Stores a pointer to the object's type vtable and all the fields. As the dynamic type of the object cannot be determined and therefore the fields not statically ordered, each field occupies 8 bytes, with the first 4 containing the hash for the VariableSymbol that represents it and the others the value or reference to the field.

The hash allows hiding variables, but introduces an overhead when accessing a field, to look for the position in which it is located.

vtable pointer <-- var pointer
field0's hash
field0
field1's hash
field1
...
fieldN's hash
fieldN