50.054 - Pseudo Assembly¶
Learning Outcomes¶
By the end of this lesson, you should be able to
- Describe the syntax of the source language SIMP.
- Describe the syntax of the intermediate representation language pseudo-assembly.
- Describe how pseudo-assembly program is executed.
- Apply Maximal Munch algorithms to generate a pseudo-assembly code from a given SIMP source code.
Recap the compiler pipeline¶
Recall the compiler pipeline
graph LR
A[Lexing] -->B[Parsing] --> C[Semantic Analysis] --> D[Optimization] --> E[Target Code Generation]
- Lexing
- Input: Source file in String
- Output: A sequence of valid tokens according to the language specification (grammar)
- Parsing
- Input: Output from the Lexer
- Output: A parse tree representing parsed result according to the parse derivation
And recall that a parse tree can be considered the first intermediate representation (IR). However the parse tree is to close to the source level which makes it hard to be used for code generation. For now let's fast forward to consider another IR which is closer to the target code, we refer to it as pseudo assembly. In this unit, we skip the semantic analysis and consider a direct translation from the source language (SIMP) to the pseudo assembly.
The SIMP Language¶
We consider the syntax of SIMP as follows
For simplicity, we ignore functions and procedures. We assume a special variable \(input\) serving as the input argument to the program. We write \(\overline{S}\) to denote a sequence of statements. \(return\) statement takes a variable instead of an expression. \(nop\) stands a "no-op" statement, which implies no action preformed. The rest of the syntax is very similar to Java and C except that the type annotations are omitted.
For example (Example SIMP1)
Pseudo Assembly¶
We consider the Pseudo Assembly language as follows.
where \(li\), a labeled instruction, is a label \(l\) associated with an instruction \(i\). For simplicity, we use positive integers as labels.
An instruction is either a move operation (moving value from source operand \(s\) to destination operatnd \(d\)), a binary move operation, a return instruction, a conditional jump instruction and a jump instruction. Some non-syntactical restriction exists, e.g. a constant can't be used in a destination position. In Psuedo Assembly, we use 0
to denote false
and 1
to denote true
.
\(r_{ret}\) is a special register for the return statement.
Example (PA1)
Informal Specification of Pseudo Assembly¶
We assume that statements of a pseudo assembly program are stored in a list. There exists a mapping from labels to the corresponding instructions, When we execute a pseudo assembly program, we use a program counter to keep track of the current execution context (i.e. the current labeled instruction being considered) and use a set to keep track of the variable to value mapping.
For example when we execute the above program with input = 2
Program Counter | Local Memory | Next Instr |
---|---|---|
1 | {input: 2, x : 2} | 2 |
2 | {input: 2, x : 2, s : 0} | 3 |
3 | {input: 2, x : 2, s : 0, c : 0} | 4 |
4 | {input: 2, x : 2, s : 0, c : 0, t : 1} | 5 |
5 | {input: 2, x : 2, s : 0, c : 0, t : 1} | 6 |
6 | {input: 2, x : 2, s : 0, c : 0, t : 1} | 7 |
7 | {input: 2, x : 2, s : 0, c : 1, t : 1} | 8 |
8 | {input: 2, x : 2, s : 0, c : 1, t : 1} | 4 |
4 | {input: 2, x : 2, s : 0, c : 1, t : 1} | 5 |
5 | {input: 2, x : 2, s : 0, c : 1, t : 1} | 6 |
6 | {input: 2, x : 2, s : 1, c : 1, t : 1} | 7 |
7 | {input: 2, x : 2, s : 1, c : 2, t : 1} | 8 |
8 | {input: 2, x : 2, s : 1, c : 2, t : 1} | 4 |
4 | {input: 2, x : 2, s : 1, c : 2, t : 0} | 5 |
5 | {input: 2, x : 2, s : 1, c : 2, t : 0} | 9 |
9 | {input: 2, x : 2, s : 1, c : 2, t : 0, rret : 1} | 10 |
10 | {input: 2, x : 2, s : 1, c : 2, t : 0, rret : 1} | - |
For the time being, we use a table to illusrate the execution of an PA program. Each entry in the table has 3 fields, the program counter, the current local memory (mapping from operands to values), and the next intruction. Move and binary operations updates the local memory. For non-jump instructions, the next instruction is the current instruction label incremeented by 1. For goto, the next instruction is the one associated with the destination label. For conditional jump, the next instruction is dependent on the source operand's value. We study the formal specification of the up-coming lessons.
Maximal Munch Algorithm¶
To convert a SIMP program into the pseudo assembly, we could consider the Maximal Munch Algorithm which is described in terms of the set of deduction rules in the following.
In case we have an assignment statement \(X = E\), we call a helper function \(G_a\) to generate the Peudo Assembly (PA) labeled instructions.
In case we have a return statement \(return\ E\), we make use of the same helper function \(G_a\) to generate the instructions of assigning \(E\) to the special register \(r_{ret}\). We then generate a new label \(l\), and append \(l:ret\) to the instructions.
In case we have a sequence of statements, we apply \(G_s\) recurisvely to the individual statements in order, then we merge all the results by concatenation.
In case we have a if-else statement, we
1. generate a fresh variable \(t\), and call \(G_a(t)(E)\) to convert the conditional expression into PA instructions.
2. generate a new label \(l_{IfCondJ}\) (conditional jump).
3. call \(G_s(S_2)\) to generate the PA instructions for the then branch.
4. generate a new label \(l_{EndThen}\) which is associated with the "end-of-then-branch" goto instruction.
5. peek into the label generator to find out what is the next upcoming label and refer to it as \(l_{Else}\).
6. call \(G_s(S_3)\) to generate the PA instructions for the else branch.
7. generate a new label \(l_{EndElse}\), which is associated with the "end-of-else-branch" goto instruction. (Note that we can assume the next instruction after this is the end of If, in case of nested if-else.)
8. peek into the label generator to find out what the next upcoming label is and refer to it as \(l_{EndIf}\)
In case we have a while statement, we
1. peek into the label generator to find out what is the next upcoming label and refer it as \(l_{While}\), which can be used later as the reference for the backward jump.
2. generate a fresh variable \(t\), and call \(G_a(t)(E)\) to convert the conditional expression into PA instructions.
3. generate a new label \(l_{WhileCondJ}\) (conditional jump).
4. call \(G_s(S)\) to generate the PA instructions for the body.
5. generate a new label \(l_{EndBody}\) which is associated with the "end-of-loop-body" goto instruction.
6. peek into the label generator to find out what is the next upcoming label and refer to it as \(l_{EndWhile}\). (Note that we can assume the next instruction after this is the end of While, in case of nested while)
7. peek into the label generator to find out what the next upcoming label is and refer to it as \(l_{EndIf}\)
The above summarizes all cases of \(G_s(S)\). We now consider the sub algorithm, \(G_a(d)(E)\), it takes a destination operand and SIMP expression \(E\) and return a set of labeled instructions.
The case of \(nop\) statement is stratight-forward.
In the above rule, given a SIMP variable \(X\) and a constant \(C\) we generate a labeled instruction \(X \leftarrow c\). where \(c\) is the PA constant converted from SIMP's counter-part through the \(conv()\) function.
For simplicity, we omitted the conversion from the SIMP variable to the IR temp variable.
In the above rule, we generate labeled instruction for the case of assigning a SIMP variable \(Y\) to another variable \(X\). The treat is similar to the case of \({\tt (Const)}\).
In the rule \({\tt (mParen)}\), we generate the IR labeled instructions by calling the generation algorithm recursively with the inner expression.
The above rule handles the case where the RHS of the SIMP assignment statement is a binary operation \(E_1\ OP\ E_2\), we generate two temp variables \(t_1\) and \(t_2\), and apply the generation function recursively to \(E_1\) and \(E_2\). Finally we concatenate the results \(lis_1\) and \(lis_2\) with the binary move operation \(X \leftarrow t_1\ OP\ t_2\).
For example, given the source in Example SIMP1, we apply the above algorithm and observe the following derivation.
- Firstly we apply \({\tt (mSequence)}\) rule to individual statement,
The derivation for Gs(x = input)
is trivial, we apply \({\tt (mAssign)}\) rule.
and
- Next we consider the while statement
- Finally we convert the return statement
Putting 1,2,3 together
As we observe, we don't quite get the exact output as example PA1. The main reason is that we generate extra steps thanks to the \({\tt (mOp)}\) rule, (in which each operand of the binary operator takes up a new instruction).
Maximal Munch Algorithm V2¶
Since the \({\tt (mOp)}\) rule is the culprit of causing extra move steps generated in the IR.
We consider a variant the Maximal Munch Algorithm. Instead of using \(G_a(X)(E)\) to generate labeled instructions \(lis\) , we use a different sub system \(G_e(E)\) to generate a pair of results, \(\hat{e}, \check{e}\). where \(\check{e}\) is a sequence of label instructions generated from \(E\) and \(\hat{e}\) is the "result" operand storing the final result of \(\check{e}\).
The adjusted \({\tt (mConst)}\), \({\tt (mVar)}\) and \({\tt (mOp)}\) rules are as follows,
The rules \({\tt (m2Const)}\) and \({\tt (m2Var)}\) are simple. We just return the constant (variable) as the \(\hat{e}\) with an empty set of label instructions.
In the rule \({\tt (m2Paren)}\), we generate the results by recursivelly applying the algorithm to the inner expression.
In the \({\tt (m2Op)}\) rule, we call \(G_e(\cdot)\) recursively to generate the results for \(E_1\) and \(E_2\), namely \((\hat{e}_1, \check{e}_1)\) and \((\hat{e}_2, \check{e}_2)\). We then use them to synthesis the final output.
The \(G_s(S)\) rules are now calling \(G_e(E)\) instead of \(G_a(X)(E)\).
By comparing this version with the first one, we note that we reduce the number of labels as well as the numbers of temp variables being created througout the conversion from SIMP to PA.
For example, if we apply the optimized version of Maximal Munch to the example SIMP1, we should obtain example PA1 as result.