50.054 Static Semantics For SIMP¶
Learning Outcomes¶
- Explain what static semantics is.
- Apply type checking rules to verify the type correctness property of a SIMP program.
- Explain the relation between type system and operational semantics.
- Apply type inference algorithm to generate a type environment given a SIMP program.
What is static semantics?¶
While dynamic semantics defines the run-time behavior of the given program, static semantics defines the compile-time properties of the given program.
For example, a statically correct program, must satisfy some properties
- all uses of variables in it must be defined somewhere earlier.
- all the use of variables, the types must be matching with the expected type in the context.
- ...
Here is a statically correct SIMP program,
because it satifies the first two properties.
The following program is not statically correct.
Static checking is to rule out the statically incorrect programs.
Type Checking for SIMP¶
We consider the type checking for SIMP programs.
Recall the syntax rules for SIMP
We use the symbol \(\Gamma\) to denote a type environments mapping SIMP variables to types. \(T\) to denote a type. We write \(dom(\Gamma)\) to denote the domain of \(\Gamma\), i.e. \(\{ X \mid (x,T) \in \Gamma \}\). We assume for all \(x \in dom(\Gamma)\), there exists only one entry of \((x,T) \in \Gamma\).
We define two different relations,
- \(\Gamma \vdash E : T\), which type-checks a SIMP expresion \(E\) against a type \(T\) under \(\Gamma\).
- \(\Gamma \vdash \overline{S}\), which type-checks a SIMP statement sequence \(\overline{S}\) under \(\Gamma\).
Type checking rules for SIMP Expressions¶
In the rule \({\tt (tVar)}\), we type check the variable \(X\) having type \(T\) under the type environment \(\Gamma\) if we can find the entry \((X,T)\) in \(\Gamma\).
In the rule \({\tt (tInt)}\), we type check an integer constant having type \(int\). Similarly, we type check a boolean constant having type \(bool\).
In the rule \({\tt (tOp1)}\), we type check an integer arithmetic operation having type \(int\), if both operands can be type-checked against \(int\). In the rule \({\tt (tOp2)}\), we type check an integer comparison operation having type \(bool\), if both operands can be type-checked against \(int\). In the rule \({\tt (tOp3)}\), we type check a boolean comparison operation having type \(bool\), if both operands can be type-checked against \(bool\).
Lastly in rule \({\tt (tParen)}\), we type check a parenthesized expression by type-checking the inner expression.
Type Checking rules for SIMP Statements¶
The typing rules for statement is in form of \(\Gamma \vdash \overline{S}\) instead of \(\Gamma \vdash \overline{S} : T\), this is because statements do not return a value (except for return statement, which returns a value for the entire program.)
The \({\tt (tSeq)}\) rule type checks a non empty sequence of statement \(S \overline{S}\) under the type environment \(\Gamma\). It is typeable (a proof exists) iff if \(S\) is typeable under \(\Gamma\) and \(\overline{S}\) is typeable under \(\Gamma\).
The \({\tt (tAssign)}\) rule type checks an assignment statement \(X = E\) under \(\Gamma\). It is typeable if both \(X\) and \(E\) are typeable under \(\Gamma\) respectively and their types agree.
The \({\tt (tReturn)}\) rule type checks the return statement. It is typeable, if the variable \(X\) is typeable. The \({\tt (tNop)}\) rule type checks the nop statement, which is always typeable.
The \({\tt (tIf)}\) rule type checks the if-else statement, \(if\ E\ \{\overline{S_1}\}\ else\ \{ \overline{S_2} \}\). It is typeable if \(E\) has type \(bool\) under \(\Gamma\) and both then- and else- branches are typeable under the \(\Gamma\). The \({\tt (tWhile)}\) rule type checks the while statement in a similar way.
We say that a SIMP program \(\overline{S}\) is typeable under \(\Gamma\), i.e. it type checks with \(\Gamma\) iff \(\Gamma \vdash \overline{S}\). On the other hand, we say that a SIMP program \(\overline{S}\) is not typeable, i.e. it does not type check, iff there exists no \(\Gamma\) such that \(\Gamma \vdash \overline{S}\).
Let \(\Gamma = \{ (input, int), (x,int), (s,int) \}\), we consider the type checking derivation of
Where [sub tree 1] is
Note that the following two programs are not typeable.
The above is untypeable because we use x of typeint
in a context where it is also expected as bool
.
The above is unteable because we can't find a type environment which has both (y,int)
and (y,bool)
.
So far these two "counter" examples are bad programs. However we also note that our type system is too conservative.
Even though we note that when x > 1
, we have x * x * x < x * x == false
hence the statement y = true
is not executed. Our type system still rejects this program.
We will discuss this issue in details in the upcoming units.
Let's connect the type-checking rules for SIMP with it dynamic semantics.
Definition 1 - Type and Value Environments Consistency¶
We say \(\Gamma \vdash \Delta\) iff for all \((X,C) \in \Delta\) we have \((X,T) \in \Gamma\) and \(\Gamma \vdash C : T\).
It means the type environments and value environments are consistent.
Property 2 - Progress¶
The following property says that a well typed SIMP program must not be stuck until it reaches the return statement.
Let \(\overline{S}\) be a SIMP statement sequence. Let \(\Gamma\) be a type environment such that \(\Gamma \vdash \overline{S}\). Then \(\overline{S}\) is either 1. a return statement, or 1. a sequence of statements, and there exist \(\Delta\), \(\Delta'\) and \(\overline{S'}\) such that \(\Gamma \vdash \Delta\) and \((\Delta, \overline{S}) \longrightarrow (\Delta', \overline{S'})\).
Property 3 - Preservation¶
The following property says that the evaluation of a SIMP program does not change its typeability.
Let \(\Delta\), \(\Delta'\) be value environments. Let \(\overline{S}\) and \(\overline{S'}\) be SIMP statement sequences such that \((\Delta, \overline{S}) \longrightarrow (\Delta', \overline{S'})\). Let \(\Gamma\) be a type environment such that \(\Gamma \vdash \Delta\) and \(\Gamma \vdash \overline{S}\). Then \(\Gamma \vdash \Delta'\) and \(\Gamma \vdash \overline{S'}\).
What is Type Inference¶
Type inference is also known as type reconstruction is a static semantics analysis process that aims to reconstruct the missing (or omitted) typing info from the source programs.
For example, given the Haskell function
the compiler is able to deduce that the type of f
is Int -> Int
.
Likewise for the following SIMP program
y
is a of type int
.
What we aim to achieve is a sound and systematic process to deduce the omitted type information.
Type inference for SIMP program¶
Given a SIMP program \(\overline{S}\), the goal of type inference is to find the "best" type environment \(\Gamma\) such that \(\Gamma \vdash \overline{S}\).
Given that \(\Gamma\) is a set of variable to type mappings, the "best" can be defined as the smallest possible set that make \(\overline{S}\) typeable. This is also called the most general solution.
Definition - Most general type (envrionment)¶
Let \(\Gamma\) be type environment and \(\overline{S}\) be a sequence of SIMP statements, such that \(\Gamma \vdash \overline{S}\). \(\Gamma\) is the most general type environment iff for all \(\Gamma'\) such that \(\Gamma' \vdash \overline{S}\) we have \(\Gamma \subseteq \Gamma'\).
Type Inference Rules¶
We would like to design type inference process using a deduction system. First of all, let's introduce some extra meta syntax terms that serve as intermediate data structures.
Where \(\alpha\) denotes a type variable. \(\kappa\) define a set of pairs of extended types that are supposed to be equal, e.g. \(\{ (\alpha, \beta), (\beta, int) \}\) means \(\alpha = \beta \wedge \beta = int\).
Type substititution replace type variable to some other type.
Type substiution can be compositional.
The SIMP type inference rules are defined in terms of a deduction system consists of two type of rule forms.
Type Inference Rules for SIMP statements¶
The type inference rules for SIMP statements are described in a form of \(\overline{S} \vDash \kappa\), which reads give a sequence of statements \(\overline{S}\), we generate a set of type constraints \(\kappa\).
The \({\tt (tiNOP)}\) rule handles the \(nop\) statement, an empty constraint set is returned. Similar observation applies to the return statement.
The \({\tt (tiSeq)}\) rule generates the type constraints of a sequence statement \(S\overline{S}\). We can do so by first generate the constraints \(\kappa_1\) from \(S\) and \(\kappa_2\) from \(\overline{S}\) and union \(\kappa_1\) and \(\kappa_2\).
The inference rule for assignment statement requires the premise \(E \vDash \hat{T}, \kappa\), the inference for the expression \(E\) returning the type of \(E\) and a constraint set \(\kappa\), which will be discussed shortly. The \({\tt (tiAssign)}\) rule "calls" the expression inference rule to generate the type \(\hat{T}\) and the constraints \(\kappa\), it prepends an entry \((\alpha_X,\hat{T})\) to \(\kappa\) to ensure that \(X\)'s type and the type of the assignment's RHS must agree.
The inference rule for if-else statatement first infers the type of the conditional expression \(E\)'s type has \(\hat{T_1}\) and the constraints \(\kappa_1\). \(\kappa_2\) and \(\kappa_3\) are the constraints inferred from the then- and else-branches. The final result is forming a union of \(\kappa_1\), \(\kappa_2\) and \(\kappa_3\), in addition, requiring \(E\)'s type must be \(bool\).
The inference for while statement is very similar to if-else statement. We skip the explanation.
Type Inference Rules for SIMP expressions¶
The type inference rules for the SIMP expressions are defined in a form of \(E \vDash \hat{T}, \kappa\).
When the expression is an integer constant, we return \(int\) as the inferred type and an empty constraint set. Likewise for boolean constant, we return \(bool\) and \(\{\}\).
The \({\tt (tiVar)}\) rule just generates a "skolem" type variable \(\alpha_X\) which is specifically "reserved" for variable \(X\). A skolem type variable is a type variable that is free in the current context but it has a specific "purpose".
For detailed explanation of skolem variable, refer to https://stackoverflow.com/questions/12719435/what-are-skolems and https://en.wikipedia.org/wiki/Skolem_normal_form.
The rules \({\tt (tiOp1)}\) and \({\tt (tiOp2)}\) infer the type of binary operation expressions. Note that they can be broken into 6 different rules to be syntax-directed. \({\tt (tiOp1)}\) is applied when the operator is an arithmethic operation, the returned type is \(int\) and the inferred constraint set is the union of the constraints inferred from the operands plus the entries of enforcing both \(\hat{T_1}\) and \(\hat{T_2}\) are \(int\). \({\tt (tiOp2)}\) supports the case where the operator is a boolean comparison.
The inference ruel for parenthesis expression is trivial, we infer the type from the inner expression.
Unification¶
To solve the set of generated type constraints from the above inference rules, we need to use a unification algorithm.
The \(mgu(\cdot, \cdot)\) function generates a type substitution that unifies the two arguments. \(mgu\) is a short hand for most general unifier. Note that \(mgu\) function is a partial function, cases that are not mentioned in the above will result in a unification failure.
At the moment \(mgu\) only unifies two extended types. We overload \(mgu()\) to apply to a set of constraints as follows
There are two cases.
- the constraint set is empty, we return the empty (identity) substitution.
- the constriant set is non-empty, we apply the first version of \(mgu\) to unify one entry \((\hat{T_1}, \hat{T_2})\), which yields a subsitution \(\Psi_1\). We apply \(\Psi_1\) to the rest of the constraints \(\kappa\) to obtain \(\kappa'\). Next we apply \(mgu\) to \(\kappa'\) recursively to generate another type substitution \(\Psi_2\). The final result is a composition of \(\Psi_2\) with \(\Psi_1\).
Note that the choice of the particular entry \((\hat{T_1}, \hat{T_2})\) does not matter, the algorithm will always produce the same result when we apply the final subsitution to all the skolem type variable \(\alpha_X\). We see that in an example shortly.
An Example¶
Consider the following SIMP program
For the ease of access we put the inferred constraint entry as comments next to the statements. The detail derivation of the inference algorithm is as follows
Where [subtree 1] is as follows
Where [subtree 2] is as follows
From Type Substitution to Type Environment¶
To derive the inferred type environment, we apply the type substitution to all the type variabales we created.
Let \(V(\overline{S})\) denote all the variables used in a SIMP program \(\overline{S}\).
Given a type substitution \(\Psi\) obtained from the unification step, the type environment \(\Gamma\) can be computed as follows,
Recall that the set of constraints generated from the running example is
Unification from left to right¶
Suppose the unification progress pick the entries from left to right
Where derivation of \(mgu(\kappa_1)\)
Hence the final result is
We apply this type substitution to all the variables in the program.
So we have computed the inferred type environment
Unification from right to left¶
Now let's consider a different of order of applying the \(mgu\) function to the constraint set. Instead of going from left to right, we solve the constraints from right to left.
Where derivation of \(mgu(\kappa_1)\)
Hence the final result is
We apply this type substitution to all the variables in the program.
So we have computed the inferred the same type environment
In face regardless the order of picking entries from the constraint sets, we compute the same \(\Gamma\).
If you have time, you can try another order.
Input's type¶
In our running example, our inference algorithm is able to infer the program's input type i.e. \(\alpha_{input}\).
This is not always possible. Let's consider the following program.
In the genereated constraints, our algorithm can construct the subtitution
Which fails to "ground" type variables \(\alpha_{input}\) and \(\alpha_x\).
We may argue that this is an ill-defined program as input
and x
are not used in the rest of the program, which should be rejected if we employ some name analysis, (which we will learn in the upcoming lesson). Hence we simply reject this kind of programs.
Alternatively, we can preset the type of the program, which is a common practice for many program languages. When generating the set of constraint \(\kappa\), we manually add an entry \((\alpha_{input}, int)\) assuming the input's type is expected to be \(int\).
Uninitialized Variable¶
There is another situatoin in which the inference algorithm fails to ground all the type variables.
z
is not initialized before use. In this case we argue that such a program should be rejected either by the type inference or the name analysis.
Property 4: Type Inference Soundness¶
The following property states that the type environment generated from a SIMP program by the type inference algorithm is able to type check the SIMP program.
Let \(\overline{S}\) be a SIMP program and \(\Gamma\) is a type environment inferred using the described inference algorithm. Then \(\Gamma \vdash \overline{S}\).
Property 5: Principality¶
The following property states that the type environment generated from a SIMP program by the type inference algorithm is a principal type environment.
Let \(\overline{S}\) be a SIMP program and \(\Gamma\) is a type environment inferred using the described inference algorithm. Then \(\Gamma\) is the most general type environment that can type-check \(\overline{S}\).