50.054 - Dynamic Semantics¶
Learning Outcomes¶
- Explain the small step operational semantics of a programming language.
- Explain the big step operational semantics of a programming language.
- Formalize the run-time behavior of a programming language using small step operational semantics.
- Formalize the run-time behavior of a programming language using big step operational semantics.
Recall that by formalizing the dynamic semantics of a program we are keen to find out
- How does the program get executed?
- What does the program compute / return?
Operational Semantics¶
Operational Semantics specifies how a program get executed.
For example, in the earlier unit, when studying lambada expression, we made use of the \(\beta\)-reduction, the substitution and alpha renaming rules to formalize the execution of a simple lambda expression. As the language grows to include let-binding, conditional expression, we extend the set of rules to include \({\tt (Let)}\), \({\tt (IfI)}\), \({\tt (IfT)}\) and \({\tt (IfF)}\). The set of rules in this example defines the operational semantics of the programming language lambda expression. We can apply these rules to "evaluate" a lambda expression by rewriting it by picking a matching rule (w.r.t to the LHS) and turn it into the form of the RHS. This style of semantics specification is called the small step operational semantics as we only specify the intermediate result when we apply a rule.
As we are going to design and implement a compiler for the SIMP language, it is essential to find out how a SIMP program gets executed.
To formalize the execution of SIMP program, we can define a set of rewriting rules similar to those for lambda calculus. We need to consider different cases.
Small-Step Operational Semantics of SIMP¶
Let's try to formalize the Operational Semantics of SIMP language,
$$ \begin{array}{rccl} (\tt SIMP Environment) & \Delta & \subseteq & (X \times C) \end{array} $$ We model the memory environment of a SIMP program as pair of variable and values. We write \(dom(\Delta)\) to denote the domain of \(\Delta\), i.e. \(\{ X \mid (X,C) \in \Delta \}\). We assume for all \(X \in dom(\Delta)\), there exists only one entry of \((X,C) \in \Delta\).
Given \(S\) is a set of pairs, we write \(S(x)\) to denote \(a\) if \((x,a) \in S\), an error otherwise. We write \(S \oplus (x,a)\) to denote \(S - \{(x, S(x))\} \cup \{(x, a)\}\).
We define the operational semantics of SIMP with two sets of rules.
The first set of rules deal with expression.
Small Step Operational Semantics of SIMP Expression¶
The set of small step operational semantics for expressions is defined in a relation \(\Delta \vdash E \longrightarrow E'\).
The \({\tt (sVar)}\) rule looks up the value of variable \(X\) from the memory environment. If the variable is not found, it gets stuck and an error is returned.
The above three rules handle the binary operation expression.
- \({\tt (sOp1)}\) matches with the case where both operands are not constant values. It evalues the first operand by one step.
- \({\tt (sOp2)}\) matches with the case where the first operand becomes constant, it evaluates the second operand by one step.
- \({\tt (sOp3)}\) matches with the case where both operands are constant. It returns the result by applying the binary operation to the two constant values.
The rules \({\tt (sParen1)}\) and \({\tt (sParent2)}\) evaluate an expression enclosed by parantheses.
Small Step Operational Semantics of SIMP statement¶
The small step operational semantics of statements are defined by the relation \((\Delta, S) \longrightarrow (\Delta', S')\). The pair of a environment and a statement is called a program configuration.
The rules \({\tt (sAssign1)}\) and \({\tt (sAssign2)}\) handle the assignment statements.
- \({\tt (sAssign1)}\) matches with the case that the RHS of the assignment is not a constant, it evaluates the RHS expression by one step.
- \({\tt (sAssign2)}\) matches with the case that the RHS is a constant, it updates the environment by setting \(C\) as the new value of variable \(X\). The statement of the resulting configuration a \(nop\).
The rules \({\tt (sIf1)}\), \({\tt (sIf2)}\) and \({\tt (sIf3)}\) handle the if-else statement.
- \({\tt (sIf1)}\) matches with the case where the condition expression \(E\) is not a constant value. It evaluates \(E\) to \(E'\) one step.
- \({\tt (sIf2)}\) matches with the case where the condition expression is \(true\), it proceeds to evaluate the statements in the then clauses.
- \({\tt (sIf3)}\) matches with the case where the condition expression is \(false\), it proceeds to evaluate the statements in the else clauses.
The rule \({\tt (sWhile)}\) evaluates the while statement by reiwrting it into a if-else statement.
- In the then branch, we unroll the while loop body once followed by the while loop.
- In the else branch, we should exit the while loop thus, a \(nop\) statement is used.
The rules \({\tt (sNopSeq)}\) and \({\tt (sSeq)}\) handle a sequence of statements.
- \({\tt (sNopSeq)}\) rule handles the special case where the leading statement is a \(nop\).
- \({\tt (Seq)}\) rule handles the case where the leading statement is not a \(nop\). It evalues \(S\) by one step.
For example,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
|
At last the derivation stop at the return statement. We can return the value 0
as result.
Big Step Operational Semantics¶
Small step operational semantics defines the run-time behavior of programs step by step (kinda like slow motion.) Some times we want to define the run-time behaviors by "fast-forwarding" to the result. This leads us to the big step operatinal semantics. Big step operatinal semantics in some literature is also called the structural operational semantics as it leverages on the syntactic structure of the program.
Big Step Operational Semantics for SIMP expressions¶
We define the big step oeprational semantics for SIMP expressions via a relation \(\Delta \vdash E \Downarrow C\), which reads under the memory environment \(\Delta\) the expression \(E\) is evaluated constant \(C\).
We consider the following three rules
In case that the expression is a constant, we return the constant itself.
In case that the expression is a variable \(X\), we return the value associated with \(X\) in \(\Delta\).
in case that the expression is a binary operation, we evaluate the two operands to values and apply the binary operation to the constant values.
the last rule \({\tt (bParen)}\) evaluates an expression enclosed by parantheses.
Big Step Operational Semantics for SIMP statements¶
We define the big step operational semantics for SIMP statement using a relation \((\Delta, S) \Downarrow \Delta'\), which says the program configuration \((\Delta, S)\) is evaluated to result memory environment \(\Delta'\) assuming \(S\) is terminating under \(\Delta\). Note that big step operational semantics for SIMP statement can only defines the behavior of terminating program configurations.
We consider the following rules
In case that the statement is an assignment, we evaluate the RHS expression to a constant value \(c\) and update the memory environment.
In case that the statement is an if-else statement, we evaluate \(\overline{S_1}\) if the conditional expression is \(true\), otherwise evaluate \(\overline{S_2}\).
In case that the statment is a while loop. We evaluate the body followed by the while loop again when the loop condition expression is \(true\), otherwise, we exit the while loop and return the existing memory environment.
In case that the statement is a nop
statement, there is no change to the memory environment.
In case of a sequence of statement, we evaluate the leading statement to an updated environment and use the updated environment to evaluate the following statements.
For example, the following derivation (tree) is the evaluate of our running example using the big step operational semantics. The reason of having a tree derivation as we are evaluating the SIMP program to the final result directly by evaluating its sub components recursively / inductively.
where sub derivation[sub tree 1]
is as follows
where
[sub tree 2]
is
where
[sub tree 3]
is
where
[sub tree 4]
is
where
[sub tree 5]
is
Quick Summary: Small step vs Big Step operational semantics¶
Small step operational semantics | Big step operational semantics | |
---|---|---|
mode | one step of change at a time | many steps of changes at a time |
derivation | it is linear | it is a tree |
cons | it is slow-paced and lengthy, requires more rules | it is a fast-forward version, requirews fewer rules |
pros | it is expressive, supports non-terminiating program | it assumes program is terminating |
Formal Results¶
We use \(\longrightarrow^*\) to denote multiple steps of derivation with \(\longrightarrow\).
Lemma 1 (Agreement of Small Step and Big Step Operational Semantics of SIMP)¶
Let \(\overline{S}\) be a SIMP program, \(\Delta\) be a memory environment. Then \(\Delta, \overline{S} \Downarrow \Delta'\) iff \((\Delta, \overline{S}) \longrightarrow^* (\Delta', return\ X)\) for some \(X\).
Proof of this lemma requires some knowledge which will be discussed in the upcoming classes.
Operational Semantics of Pseudo Assembly¶
Next we consider the operational semantics of pseudo assembly.
Let's define the environments required for the rules.
We use \(P\) to denote a PA program, which is a mapping from label to labeled instructions. We use \(L\) to denote a memory environment which is a mapping from temp variable or register to constant values.
Small Step Operational Semantics of Pseudo Assembly¶
The dynamic semantics of the pseudo assembly program can be defined using a rule of shape \(P \vdash (L, li) \longrightarrow (L', li')\), which reads, given a PA program \(P\), the current program context \((L,li)\) is evaluated to \((L', li')\). Note that we use a memory environment and program label instruction pair to denote a program context.
In the \({\tt (pConst)}\) rule, we evaluate an assignment instruction of which the RHS is a constant. We update the value of the LHS in the memory environment as \(c\) and move on to the next instruction.
In the \({\tt (pRegister)}\) and the \({\tt (pTempVar)}\) rules, we evaluate an assignment instruction of which the RHS is a register (or a temp variable). We look up the value of the register (or the temp variable) from the memory environment and use it as the updated value of the LHS in the memory environment. We move on to the next label instruction.
The \({\tt (pOp)}\) rule handles the case where the RHS of the assignment is a binary operation. We first look up the values of the operands from the memory environment. We then apply the binary operation to the values. The result will be used to update the value of the LHS in the memory environment.
The rules \({\tt (pIfn0)}\) and \({\tt (pIfnNot0)}\) deal with the conditional jump instruction. We first look up the conditional operand's value in the memory environment. If it is 0, we ignore the jump and move on to the next instruction, otherwiwse, we perform a jump but changing the program context to the target label instruction.
The rule \({\tt (pGoto)}\) jumps to to the target label instruction.
Note that there is no rule for \(ret\) as the program execution will stop there. Further more, the set of rules does not mention the scenario in which the look up of a register (or a temp variable) in the environment fails. In these cases, the program exit with an error.
For example, let \(P\) be
and \(input = 1\).
We have the following derivation
Formal Results¶
Definition: Consistency of the memory environments¶
Let \(\Delta\) be a SIMP memory environment and \(L\) be a pseudo assembly memory environment. We say \(\Delta\) is consistent with \(L\) (written \(\Delta \Vdash L\)), iff
- \(\forall (x,v) \in \Delta\), \((x,conv(v)) \in L\), and
- \(\forall (y,u) \in L\), \((y, v) \in \Delta\) where \(u=conv(v)\).
Lemma: Correctness of the Maximal Munch Algorithm¶
Let \(S\) and \(S'\) be SIMP program statements. Let \(\Delta\) and \(\Delta'\) be SIMP memory environments such that \((\Delta, S) \longrightarrow (\Delta', S')\). Let \(P\) be a pseudo assembly program such that \(G_s(S) = P\). Let \(L\) and \(L'\) be pseudo assembly memory enviornments. Let \(\Delta \Vdash L\). Then we have \(P \vdash (L, l:i) \longrightarrow (L', l':i')\) and \(\Delta' \Vdash L'\)
Proof¶
Since the \(S\) could be a non-terminating program, the derivation via small step operational semantics could be infinite. We need a co-inductive proof, which is beyond the scope of this module. We will only discuss about this when we have time.
What about big step operational semantics of Pseudo Assembly?¶
As Pseudo Assembly is a flatten language with goto statement, there is no nesting of statement or expression. There is no much value in defining the big step operatnal semantics, i.e. there is no way to "fast-forward" a sub statement / a sub expression per se.
If you are interested in details of big step operational semantics, you may refer to this paper, which presented the operational and denotational semantics with a language with GOTO (more structures than our Pseudo Assembly.)
Denotational Semantics (Optional Materials)¶
Next we briefly discuss another form of dynamic semantics specification. Denotational Semantics aims to provide a meaning to a program. The "meaning" here is to find the result returned by the program. Now we may argue that is it the same as the big step operational semantics? There is some difference between the denotational semantics and big step operational semantics. We will defer the discussion and comparison towards the end of this unit.
In denotational semantics, the "meaning" of a program is given by a set of semantic functions. These functions are mapping program objects from the syntactic domain to math objects in the semantic domain.
Syntactic Domains¶
In many cases, the syntactic domains are defined by the grammar rules.
For SIMP program, we have the following syntactic domains.
- \(S\) denotes the domain of all valid single statement
- \(E\) denotes the domain of all valid expressions
- \(\overline{S}\) denotes the domain of all valid sequence statements
- \(OP\) denotes the domain of all valid operators.
- \(C\) denotes the domain of all constant values.
- \(X\) denotes the domain of all variables.
Semantic Domains¶
- \(Int\) denotes the set of all integers values
- \(Bool\) denotes the set of \(\{true, false\}\)
- Given that \(D_1\) and \(D_2\) are domains, \(D_1 \times D_2\) denotes the cartesian product of the two.
- Given that \(D_1\) and \(D_2\) are domains, \(D_1 \cup D_2\) denotes the union and \(D_1 \cap D_2\) denotes the intersection.
- Given that \(D_1\) and \(D_2\) are domains, \(D_1 \rightarrow D_2\) denotes a functional mapping from domain \(D_1\) to domain \(D_2\).
- Note that \(D_1 \rightarrow D_2 \rightarrow D_3\) is intepreted as \(D_1 \rightarrow (D_2 \rightarrow D_3)\).
- Given that \(D\) is a domain, \({\cal P}(D)\) denots the power set of \(D\).
Denotational Semantics for SIMP expressions¶
The denotational semantics for the SIMP expression is defined by the following semantic functions.
Let \(\Sigma = {\cal P} (X \times (Int\cup Bool))\)
The signature of the semantic function indicates that we map a SIMP expression into a function that takes a memory environment and returns a contant value.
Implicitly, we assume that there exists a builtin semantic function that maps operator symbols to the (actual) semantic operators, i.e., \([\![ + ]\!]\) gives us the sum operation among two integers. Sometimes we omit the parenthesis for function application when there is no ambiguity, e.g. \({\mathbb E}[\![ E]\!]\sigma\) is the short hand for \(({\mathbb E}[\![ E]\!])(\sigma)\)
As we observe, \({\mathbb E}[\![ \cdot ]\!]\) takes an object from the expression syntactic domain and a memory store object from the domain of \(\Sigma\), returns a value frmo the union of \(Int\) and \(Bool\) semantic domains.
Denotational Semantics for SIMP statements¶
To define the denotational semantics, we need some extra preparation, in order to support non-terminating programs.
Let \(\bot\) be a special element, called undefined, that denotes failure or divergence. Let \(f\) and \(g\) be functions, we define
which is a function composition that propogates \(\bot\) if present. Now we define the semantic function for SIMP statements.
The signature of the semantic function indicates that we map a SIMP statement into a function that takes a memory environment and returns another memory environment or divergence.
In case of \(nop\) and return statement, the semantic function returns an identiy function. In case of an assignment, the semantic function takes an memory environment object and update the binding of \(X\) to the meaning of \(E\). In case of sequence statements, the semantic function returns a \(\bot\)-function composition of the semantic function of the leading statement and the semantic function of the the trailing statements. In case of if-else statement, the semantic function returns the semantics of the then or the else branch statement depending on the meaning of the condition expression. In case of while statement, the semantic function returns a fixed point function. This is due to the fact that the underlying domain theory framework we are using does not support recursion. Hence a fixed point operator \(fix\) is used, which is kind of like recursion, (as we learnd in lambda caluclus), and it is more expresive as it gives a fixed term notiation for a sequence of infinitely many function objects applications. To help our understanding, we give a cheating version as if recursive function is supported in the underlying domain theory framework and we are allow to refer to a function application as a name function, we would have
which means the function \(g\) in the earlier version is a recursive reference to \({\mathbb S} [\![ while \ E\ \{\overline{S}\} ]\!]\)
For example, let \(\sigma = \{ (input, 1)\}\)
where
Let's consider another example of a non-terminating program, we can't use the cheating version here as it would gives the infinite sequence of function compositions. Let \(\sigma = \{(input, true)\}\)
where
Since \({\mathbb E}[\![ input ]\!]\sigma\) is always \(true\),
With some math proof, we find that \(fix(F)\) is function of type \(\Sigma \rightarrow \bot\). We won't be able to discuss the proof until we look into lattice theory in the upcoming classes.
In simple term, using the \(fix\) operator to define the while statement denotational semantics allows us to "collapse" the infinite sequence of function composition/application into a fixed point, which is a non-terminating function.
Denotational Semantics vs Big Step operational Semantics vs Small Step Semantics¶
support non-terminating programs | don't support non-terminating programs | |
---|---|---|
focused on the step by step derivation | Small Step Operational Semantics | |
focused on the returned results | Denotational Semantics | Big Step Operational Semantics |
Denotational Semantics is often used characterizing programming language model in a compositional way. It allows us to relates syntax objects to semantic objects. For example, if we want to argue that two languages are equivalent, we can map their syntax objects into the same semantic objects. We could also use denotational semantics to reason about concurrency.