50.054 - Code Generation¶
Learning Outcomes¶
- Name the difference among the target code platforms
- Apply SSA-based register allocation to generate 3-address code from Pseudo Assembly
- Handle register spilling
- Implement the target code generation to WASM bytecode given a Pseudo Assembly Program
Recap Compiler Pipeline¶
graph LR
A[Lexing] -->B[Parsing] --> C[Semantic Analysis] --> D[Optimization] --> E[Target Code Generation]
D --> C
For Target Code Generation, we consider some IR as input, the target code (executable) as the output.
Instruction Selection¶
Instruction selection is a process of choosing the target platform on which the language to be executed.
There are mainly 3 kinds of target platforms.
- 3-address instruction
- RISC (Reduced Instruction Set Computer) architecture. E.g. Apple PowerPC, ARM, Pseudo Assembly
- 2-address instruction
- CISC (Complex Instruction Set Computer) architecture. E.g. Intel x86
- 1-address instruction
- Stack machine. E.g. JVM
Assembly code vs Machine code¶
Note that the instruction formats mentioned here are the human-readable representations of the target code. The actual target code (machine code) is in binary format.
3-address instruction¶
In 3-address instruction target platform, each instruction is set to use 3 addresses in maximum. For instance, the Pseudo Assembly we studied earlier is a kind of 3-address instruction without the hardware restriction.
For instance in 3 address instruction, we have instructions that look like
where r
, x
and y
are registers . Alternatively, in some other 3 address instruction format, we express the code fragement above in a prefix notation,
The advantage of having more register (addresses) per instruction allows us to huge room of code optimization while keeping a relative simple and small set of instructions (for instance, consider our Pseudo Assembly has a simple set.)
2-address instruction¶
In 2-address instruction target platform, each instruction has maximum 2 addresses. As a result, some of the single line instruction in 3-address instruction has to be encoded as multiple instructions in 2 address platform. For example, to add x
and y
and store the result in r
, we have to write
in the 3rd instruction we add the values stored in registers x
and y
. The sum will be stored in x
. In the last statement, we move the result from x
to r
.
As the result, we need fewer registers (in minimum) to carry out operations. On the other hands, the set of instructions in 2-address instruction are often more complex.
1-address instruction¶
In the exterem case, we find some target platform has only 1 address instruction. This kind of target is also known as the P-code (P for Pascal) or the stack machine code.
For example for the same program, we need t9o encode it in 1-address instruction as follows
In the first instruction, we push the constant 1 to the left operand register (or the 1st register). In the second instruction, we push the constant 2 to the right oeprand register (the 2nd register). In the 3rd instruction, we apply the add operation to sum up the two registers and the result is stored in the first register. The 2nd register is cleared (or popped). In the last instruction, we pop the result from the first register store it in a temporary variabler
The benefit of 1 address intruction is having a minimum and uniform requirement for the hardware. It requrest the least amount registers, for example, JVM has only 3 registers. On the other hand, its instruction set is the most complex.
From PA to 3-address target platform¶
In this section, we consider generating code for a target platform that using 3-address instruciton.
Register Allocation Problem¶
Let's consider the register allocation problem. Recall that in Pseudo Assembly, we have unlimited temporary variables and registers. Among all the examples of PA we seen so far, we did not use any register except for the return register rret
.
Such an assumption is no longer valid in the code generation phase. We face two major constraints.
- Most of the operations can be only applied to registers, not to temporary variables. Operands from temporary variables need to be loaded to some registers before the application of the operation.
- The number of registers is finite and often limited. This implies that we can't possibly load all the temporary variables to registers. At some point, we need to unload the content of some register to the temporary variable to make room for the next operation.
For example, the following PA program
has to be translated intoassuming we have 4 other registers r0
, r1
, r2
and r3
, besides rret
. We can map the PA variables {x : r0, y : r1, z : r2, w : r3}
When we only have 3 other registers excluding rret
we need to offload some result into some temporary variable. The offloading of the result from registers to temporary variables is also known as register spilling.
The above program will work within the hardware constraint (3 extra registers besides rret
). Now the register allocation, {x : r0, y : r1, z : r2}
is only valid for instructions 1-4
and the alloction for instructions 5-7
is {w : r0, y : r1, z: r2}
.
As we can argue, we could avoid the offloading by mapping w
to rret
since it is the one being retured.
w
might not be returned variable in some other examples.
We could also avoid the offloading by exploiting the liveness analysis, that x
is not live from instruction 3
onwards, hence we should not even save the result of r0
to the temporary variable x
.
x
is needed later.
The Register Allocation Problem is then define as follows.
Given a program \(p\), and \(k\) registers, find an optimal register assignment so that the register spilling is minimized.
Interference Graph¶
To solve the register allocation problem, we define a data structure called the interference graph.
Two temporary variables are interferring each other when they are both "live" at the same time in a program. In the following we include the liveness analysis result as the comments in the program PA1
.
We conclude that y
and z
are interfering each other. Hence they should not be sharing the same register.
graph TD;
input
x
y --- z
w
From the graph we can tell that "at peak" we need two registers concurrently, hence the above program can be translated to the target code using 2 registers excluding the rret
register.
For example we annotate the graph with the mapped registers r0
and r1
graph TD;
input["input(r0)"]
x["x(r0)"]
y["y(r0)"] --- z["z(r1)"]
w["w(r0)"]
And we can generate the following output
Graph Coloring Problem¶
From the above example, we find that we can recast the register allocation problem into a graph coloring problem.
The graph coloring problem is defined as follows.
Given a undirected graph, and \(k\) colors, find a coloring plan in which no adjacent vertices sharing the same color, if possible.
Unfortunately, this problem is NP-complete in general. No efficient algorithm is known.
Fortunatley, we do know a subset of graphs in which a polynomial time coloring algorithm exists.
Chordal Graph¶
A graph \(G = (V,E)\) is chordal if, for all cycle \(v_1,...,v_n\) in \(G\) with \(n > 3\) there exists an edge \((v_i,v_j) \in E\) and \(i, j \in \{1,...,n\}\) such that \((v_i, v_j)\) is not part of the cycle.
For example, the following graph
graph TD
v1 --- v2 --- v3 --- v4 --- v1
v2 --- v4
is chordal, because of \((v_2,v_4)\).
The following graph
graph TD
v1 --- v2 --- v3 --- v4 --- v1
is not chordal, or chordless.
It is a known result that a the coloring problem of chordal graphs can be solved in polynomial time.
An Example¶
Consider the following PA program with the variable liveness result as comments
We observe the interference graph
graph TD
a --- b --- c --- d
a --- e --- d
and find that it is chordless.
SSA saves the day!¶
With some research breakthroughs in 2002-2006, it was proven that programs in SSA forms are always having chordal interference graph.
For example, if we apply SSA conversion to PA2
We have the following
The liveness analysis algorithm can be adapted to SSA with the following adjustment.
We define the \(join(s_i)\) function as follows
where \(\Theta_{i,j}\) is a variable substitution derived from phi assignment of the labeled instruction at \(j : \overline{\phi}\ instr\).
The monotonic functions can be defined by the following cases.
- case \(l: \overline{\phi}\ ret\), \(s_l = \{\}\)
- case \(l: \overline{\phi}\ t \leftarrow src\), \(s_l = join(s_l) - \{ t \} \cup var(src)\)
- case \(l: \overline{\phi}\ t \leftarrow src_1\ op\ src_2\), \(s_l = join(s_l) - \{t\} \cup var(src_1) \cup var(src_2)\)
- case \(l: \overline{\phi}\ r \leftarrow src\), \(s_l = join(s_l) \cup var(src)\)
- case \(l: \overline{\phi}\ r \leftarrow src_1\ op\ src_2\), \(s_l = join(s_l) \cup var(src_1) \cup var(src_2)\)
- case \(l: \overline{\phi}\ ifn\ t\ goto\ l'\), \(s_l = join(s_l) \cup \{ t \}\)
- other cases: \(s_l = join(s_l)\)
Now the interference graph of the PA_SSA2
is as follows
graph TD;
a1 --- b1 --- c1 --- d1
a2 --- e1 --- d2
which is chordal.
Coloring Interference Graph generated from SSA¶
According to the findings of Budimlic's work and Hack's work, coloring the interference graph generated from an SSA program in in-order traversal of dominator tree gives us the optimal coloring.
In Hack's paper, it was discussed that the elimination step should be done in the post-order traveral of the dominator tree. From graph coloring problem, we know that the order of coloring is the reverse of the vertex eliminiation order.
In the context of PA, the in-order traversal of the dominator tree is always the same order of the instructions being labeled (assuming we generate the PA using the maximal munch algorithm introduced in the earlier lesson.)
Therefore we can color the above graph as follows,
graph TD;
a1("a1(r0)") --- b1("b1(r1)") --- c1("c1(r0)") --- d1("d1(r1)")
a2("a2(r0)") --- e1("e1(r1)") --- d2("d2(r0)")
From now onwards until the next section (JVM Bytecode generatoin), we assume that program to be register-allocated must be in SSA form.
Given that the program interference graph is chordal, the register allocation can be computed in polymomial type.
Instead of using building the interference graph, we consider using the live range table of an SSA program,
In the following table (of PA_SSA2
), the first row contains the program labels and the first column defines the variables and the last column is the allocated register. An *
in a cell (x, l)
represent variable x
is live at program location l
.
var | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | reg |
---|---|---|---|---|---|---|---|---|---|---|
a1 | * | * | r0 | |||||||
b1 | * | * | r1 | |||||||
c1 | * | * | r0 | |||||||
d1 | * | r1 | ||||||||
a2 | * | * | r0 | |||||||
e1 | * | * | r1 | |||||||
d2 | * | r0 |
At any point, (any column), the number of *
denotes the number of live variables concurrently. The above tables show that at any point in-time, the peak of the register usage is 2
(in some literature, it is also known as the chromatic of the interference graph). Therefore, minimumally we need 2 registers to allocate the above program without spilling.
Register Spilling¶
However register spilling is unavoidable due to program complexity and limit of hardware.
Let's consider another example
The SSA form is identical to the above, since there is no variable re-assignment. In the comment, we include the result of the liveness analysis.
var | 1 | 2 | 3 | 4 | 5 | 6 | 7 | reg |
---|---|---|---|---|---|---|---|---|
x | * | * | * | |||||
y | * | * | ||||||
z | * | * | ||||||
w | * | |||||||
u | * |
From the live range table able, we find that at peak i.e. instruction 4
, there are 3 live variables currently. We would need three registers for the allocation.
What if we only have two registers? Clearly, we need to "sacrifice" some live variable at instruction 4
, by spilling it back to the temporary variable
and reloading before it is needed again. But which one shall we "sacrifice"? There are a few options here.
- Spill the least urgently needed live variable. Recall that the liveness analysis is a may analaysis, its result is an over-approximation. Some live variables might not be needed at this point.
- Spill the live variable that interfere the most. This option works for the bruteforce searching coloring algorithm, the idea was to reduce the level of interference so that the remaining graph without this variable can be colored.
For now let's take the first option. Suppose we extend the liveness analysis to keep track of the label where a variable is marked live.
From the above results, we can conclude that at instruction 4
, we should sacrifice the live variable z
, because z
is marked live at label 5
which is needed in the instruction one-hop away in the CFG, compared to x
and y
which are marked live at label 4
. In other words, z
is not as urgently needed compared to x
and y
.
var | 1 | 2 | 3 | 4 | 5 | 6 | 7 | reg |
---|---|---|---|---|---|---|---|---|
x | * | * | * | r0 | ||||
y | * | * | r1 | |||||
z | - | * | ||||||
w | * | |||||||
u | * |
From the above, we find that the graph is colorable again. However register spilling requires some extra steps. First at label 3
, variable is z
is some register, either r0
or r1
,
assuming in the target code operation *
can use the same register for both operands and the result. We encounter another problem. To spill z
(from the register) to the temporary variable, we need to figure out which other live variable to be swapped out so that the spilling can be done. Let's illustrate using the same example.
There are two option here
??
isr1
. It implies that we need to spillr1
toy
first after instruction2
and then spillr1
toz
after instruction3
, and loady
back tor1
after instruction3
before instruction4.
??
isr0
. It implies that we need to spillr0
tox
first after instruction2
and then spillr0
toz
after instruction3
, and loadx
back tor0
after instruction3
before instruction4.
In this particular example, both options are equally good (or equally bad). In general, we can apply the heuristic of choosing the conflicting variable whose live range ends earlier, hopefully the main subject of spilling (z
in this example) is not needed until then.
Now let's say we pick the first option, the register allocation continues
var | 1 | 2 | 3 | 4 | 5 | 6 | 7 | reg |
---|---|---|---|---|---|---|---|---|
x | * | * | * | r0 | ||||
y | * | * | r1 | |||||
z | - | * | r1 | |||||
w | * | r0 | ||||||
u | * | r1 |
where -
indicates taht z
is being spilled from r1
before label 4
and it needs to be loaded back to r1
before label 5
.
And the complete code of PA3_REG
is as follows
As an exercise, work out what if we save
x
temporarily instead ofy
at label2
.
Register allocation for phi assignments¶
What remains to address is the treatment of the phi assignments.
Let's consider a slightly bigger example.
Let's convert it into SSA.
There are a few options of handling phi assignments.
- Treat them like normal assignment, i.e. translate them back to move instruction (refer to "SSA back to Pseudo Assembly" in the name analysis lesson.) This is the most conservative approach definitely work, but not necessary giving us optimized code
- Ensure the variables in the phi assignments sharing the same registers.
Let's consider the first approach
Conservative approach¶
When we translate the SSA back to PA
It is clear that the program is allocatable without spilling with 4 registers. Let's challenge ourselves with just 3 registers.
var | 1 | 2 | 3 | 3.1 | 4 | 5 | 6 | 7 | 7.1 | 8 | 9 | 10 | reg |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input1 | * | r0 | |||||||||||
x1 | * | * | * | * | - | - | - | - | - | r1 | |||
s1 | * | * | r2 | ||||||||||
c1 | * | r0 | |||||||||||
s2 | * | * | * | * | * | r2 | |||||||
c2 | * | * | * | * | * | r0 | |||||||
b1 | * | r1 | |||||||||||
s3 | * | * | r2 | ||||||||||
c3 | * | r0 |
At the peak of the live variables, i.e. instruction 5
, we realize that x1
is live but not urgently needed until 4
which is 5-hop away from the current location. Hence we spill it from register r1
to the temporary variable to free up r1
. Registers are allocated by the next available in round-robin manner.
What if at instruction 7
, we allocate r1
to s3
instead of r2
? Thanks to some indeterminism, we could have a slightly different register allocation as follows
var | 1 | 2 | 3 | 3.1 | 4 | 5 | 6 | 7 | 7.1 | 8 | 9 | 10 | reg |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
input1 | * | r0 | |||||||||||
x1 | * | * | * | * | - | - | - | - | - | r1 | |||
s1 | * | * | r2 | ||||||||||
c1 | * | r0 | |||||||||||
s2 | * | * | * | * | * | r2 | |||||||
c2 | * | * | * | * | * | r0 | |||||||
b1 | * | r1 | |||||||||||
s3 | * | * | r1 | ||||||||||
c3 | * | r2 |
In this case we have to introduce some additional register shuffling at 7.1
. Compared to PA4_REG1
, this result is less efficient.
Register coalesced approach - Ensure the variables in the phi assignments sharing the same registers¶
Note that we should not enforce the variable on the LHS of a phi assignment to share the same register as the operands on the RHS. Otherwise, we could lose the chordal graph property of SSA.
What we could construct the live range table as follow.
var | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | reg |
---|---|---|---|---|---|---|---|---|---|---|---|
input1 | * | r0 | |||||||||
x1 | * | * | * | - | - | - | - | r1 | |||
s1 | * | r2 | |||||||||
c1 | r0 | ||||||||||
s2 | * | * | * | * | r2 | ||||||
c2 | * | * | * | * | r0 | ||||||
b1 | * | r1 | |||||||||
s3 | * | * | r2 | ||||||||
c3 | * | r0 |
Although from the above we find c1
seems to be always dead, but it is not, because its value is merged into c2 in label 4
. This is because in our SSA language, the phi assignment is not an instruction alone while liveness analysis is performed on per instruction level.
We also take note we want to c1
and c3
to share the same register, and s1
and s3
to share the same register. Hence we can allocate the 3 registers according to the above plan. In this case, we have the same result as the first attempt in the conservative approach PA4_REG1
.
Note that this approach is not guanranteed to produce more efficient results than the conversvative approach.
Summary so far¶
To sum up the code generation process from PA to 3-address target could be carried out as follows,
- Convert the PA program into a SSA.
- Perform Liveness Analysis on the SSA.
- Generate the live range table based on the liveness analysis results.
- Allocate registers based on the live range table. Detect potential spilling.
- Depending on the last approach, either
- convert SSA back to PA and generate the target code according to the live range table, or
- generate the target code from SSA with register coalesced for the phi assignment operands.
Further Reading for SSA-based Register Allocation¶
- https://compilers.cs.uni-saarland.de/papers/ssara.pdf
- https://dl.acm.org/doi/10.1145/512529.512534
WASM bytecode (reduced set)¶
In this section, we consider the generated Web Assembly (WASM) codes from PA.
As mentioned, WASM has 3 registers
- a register for the first operand and result
- a register for the second operand
- a register for controlling the state of the stack operation (we can't used.)
Technically speaking we only have 2 registers.
An Example of WASM byte codes is illustrated as follows
Supposed we have a PA program as follows,
For ease of reasoning, we assume that we map PA temporary variables to WASM variables with the same names.
WASM bytecode operational semantics¶
To describe the operational semantics of WASM bytecodes, we define the following meta symbols.
\(\Delta\) is local environment that maps WASM variables to constants. \(S\) is a 2-slot stack where the left slot is the bottom (\(r_0\)) and the right slot is the top (\(r_1\)). \(\_\) denotes that a slot is vacant. \(B\) is a many-slot stack. Each stack frame captures the current block instruction and its following instructions. We assume that the size of \(B\) is unbounded.
We can decribe the operational semantics of WASM byte codes using the follow rule form
The rule rewrites a configuration \((\Delta, S, wis, B)\) to the next configuration \((\Delta', S', wis', B')\), where \(\Delta\) and \(\Delta'\) are the current and the next states of the local environment, \(S\) and \(S'\) are the current and the next states of the value stack, \(wis\) and \(wis'\) are the currrent and next sets of instructions to be processed, \(B\) and \(B'\) are the current and the next states of the block stack.
The rules \((\tt get1)\) and \((\tt get2)\) handles the loading variable's content to the stack registers. The rules \((\tt const1)\) and \((\tt const2)\) handles the loading constant to the stack registers.
The rule \((\tt set)\) processes the \(set\ n\) instruction by popping the register \(r_0\) from the stack and store its content with variable \(n\) in \(\Delta\).
The rules \((\tt add)\), \((\tt sub)\) and \((\tt mul)\) process the binary operation assuming both registers in the stack holding some constants. The result of the computation is stored in \(r_0\) while \(r_1\) becomes empty.
The rules \((\tt eq1)\), \((\tt eq2)\), \({\tt lt1}\) and \((\tt lt2)\) process the boolean operation assuming both registers in the stack holding some constants. The result of the computation is stored in \(r_0\) while \(r_1\) becomes empty.
The next set of rules evaluate by pushing block instructions to the block instruction stack.
- The rule \((\tt block)\) processes a sequence of instructions starting with a block instruction. It proceeds by evaluating the body of the block instruction and pushing the block instruction and its following instructions into the block instruction stack.
- The rule \({\tt loop}\) works in a similar manner.
- The rules \((\tt ifT)\) and \((\tt ifF)\) handle the if-instruction. Depending on the value residing in register
0
in the value stack, it proceeds with evaluating the then-instructions or the else-instructions, by pushing the to-be-evaluated instruction and the following instructions into the block instruction stack as a "block instruction".
The next set of rules evaluate by popping the top of the block instruction stack.
-
The rules \((\tt br0Block)\) and \((\tt br0Loop)\) handle the case in which a branch instruction is applied with
0
.- When the top of the block instruction stack is a block instruction, the computation proceeds with the "continuation" of the block instruction, namely \(wis''\).
- When the top of hte stack is a loop instruction, the computation proceeds by going through another iteration of the loop.
-
The rule \((\tt brN)\) handle the case in which the branch instruction's operaand is a positive integer \(n\), the computation proceeds by evaluating the branch operation with \(n-1\) with the top of the block instruction stack removed.
-
The rules \((\tt brIfT0Block)\) and \((\tt brIfT0Loop)\) process the conditional branch instructions in the similar with as \((\tt br0Block)\) and \((\tt br0Loop)\), except that they are applicable only when the register 0 is storing the value
1
. - The rule \((\tt brIfTN)\) is similar to \((\tt brN)\), excep that it is applicable when the register 0 is having value
1
. - The rule \((\tt brIfF)\) is applied when the register
0
is having value0
and skips the the conditional branch.
The last set of rules handle the no-op and empty instruction sequence.
The \((\tt Nop)\) rule skip the \(nop\) instruction. The \((\tt empty)\) rule denotes the end of the current sequence and proceeds with the "following" instructions in the block instruction stack.
Conversion from PA to WASM bytecodes¶
A simple conversion from PA to WASM bytecodes can be described using the following deduction system.
Let \(M\) be a mapping from PA temporary variables to WASM local variables.
We have three types of rules.
- \(M \vdash lis \Rightarrow wis\), converts a sequence of PA labeled instructions to a sequence of WASM bytecode instructions.
- \(M \vdash_{src} s \Rightarrow wis\), converts a PA (source) operand into a sequence of WASM bytecode instructions.
- \(M(t)\), converts a PA variable into a WASM variable.
Converting PA labeled instructions¶
The rule \({\tt (wReturn)}\) convers the PA return instructions. It first converts the operand \(s\) into a sequence of WASM instructions \(wis_1\). At this stage, the content of \(s\) must have been loaded to the register 0. Next we invoke the WASM \(return\).
The rule \({\tt (wMove)}\) handles the case of a move instruction. In this case we make use of the auxiliary rule \(M \vdash s \Rightarrow wis_1\) to convert the operand \(s\) into a loading instruction in WASM bytecodes. The content of \(s\) should be now in the regstier. We make use of \(get\ M(t)\) to transfer the value into the WASM variable \(M(t)\). Details fo these auxiliary functions can be found in the next subsection. Using recursion, we convert the instructions sequence \(lis\) into \(wis_2\).
The rules \({\tt (wEqLoop)}\) and \({\tt (wEqIf)}\) deal with the scenarios in which the leading PA instructions are an equality test followed by a conditional jump. There are two sub cases.
- When the target of the condional jump, \(l_3\) has an preceding instruction is a \(goto\ l_4\) where \(l_4\) is the label of the equality test. We conclude that this PA sequence is translated from a loop in the source SIMP program. We apply an auxilary function \(split(l_3, lis')\) to split \(lis'\) into two sub sequences \(lis_1\) and \(lis_2\) by \(l_3\), \(lis_1\) must be the body of the loop. and \(lis_2\) are the following instructions after the loop. We compile operands of the equality test into \(wis_1\) and \(wis_2\). We concatenate an \(eq\) instruction with a \(if\) nested \(loop\) block. \(wis_3\) is the compiled WASM codes of the loop body, and \(wis_4\) is the compiled codes of the instructions following the loop.
- When the target of the condition jump, \(l_3\) has a preceding instruction is a \(goto\ l_4\) where the l_4$ is not the label of the equality test. We conclude that this PA sequence is translated from a if-else statement from the source SIMP program and \(l_4\) should be the end of loop. This is because our maximal munch algorithm is structure-preserving, i.e. we insert jumping labels at the end of the then and else branches.
The rule \((\tt wEq)\) handles a normal equality test which is not followed by a conditional jump.
The above rules \({\tt (wLtLoop)}\), \({\tt (wLtIf)}\) and \({\tt (wLt)}\) handle the less than test. They are similar to their equality test counter-parts described earlier.
Lastly, the rules \({\tt (wPlus)}\), \({\tt (wMinus)}\) and \({\tt (wMult)}\) convert the binary arithmetic operations. \({\tt (wGoto)}\) skips the goto instruction, which should be handled by the other rules.
Converting PA Source Operands¶
Optimizing WASM bytecode¶
Though it is limited, there is room to optimize the WASM bytecode. For example,
From the following SIMP program
we generate the following PA code via the Maximal Munch
In turn if we apply the above PA to JVM bytecode conversion
As observe, theset t
followed by get t
are rundandant, because t
is not needed later (dead).
This can either be done via
- Liveness analysis on PA level or
- Generate WASM byte code directly from SIMP.
- This requires the expression of SIMP assignment to be left nested.
- The conversion is beyond the scope of this module.
Prettier WASM syntax - Folded Expression¶
WASM supports a prettier syntax, known as folded expression.
The earlier WASM example can be rewritten as the following in folded expression form.
In the project, we'll find the PA to WASM conversion rules being re-phrased into the folded expression form.
Further Reading for WASM bytecode generation¶
- https://webassembly.org/
- https://developer.mozilla.org/en-US/docs/WebAssembly/Reference
Summary for WASM bytecode generation¶
- To generate WASM bytecode w/o optimization can be done via deduction system
- To optimize WASM bytecode, we could apply liveness analysis to eliminate redundant store-then-load sequence.