@@ -4,32 +4,32 @@ Typescript and Flow are extensions of Javascript that allow the programmer to sp
(typeof(x) === "number")? x++ : x.length
\}
\end{alltt}
Apart from the type annotation of the function parameter, the above is
Apart from the type annotation (in red) of the function parameter, the above is
standard JavaScript code defining a function that checks whether
its argument is an integer; if it is so, then it returns the argument's sucessor
(\code{x++}), otherwise it returns the property \code{length} of the
argument. The annotation specifies that the parameter is either a
number or a string (the vertical bar denotes a union type). If this annotation is respected and the function
is applied either to an integer or to a string, then this application
is applied to either an integer or a string, then the application
cannot fail fail because of a type error (\code{length} is a property
defined for string) and the type-checkers of both TypeScript and Flow
rightly accept this function. This is possible because both languages
defined for string) and both the type-checker of TypeScript and the one of Flow
rightly accept this function. This is possible because both type-checkers
implement a specific type discipline called \emph{occurrence typing} or \emph{flow
typing}:\footnote{%
TypeScript calls it ``type guard recognition'' while Flow uses the terminology ``type
refinements''.}
as a matter of fact, standard type discipline would reject this function.
as a matter of fact, standard type disciplines would reject this function.
The reason for that is that standard type disciplines would try to
type the body of the function under the assumption that \code{x} has
type \code{number\,|\,string} and would fail, since the successor is
type the whole body of the function under the assumption that \code{x} has
type \code{number\,|\,string} and they would fail, since the successor is
not defined for strings and the property \code{length} is not defined
for numbers. This is so because standard disciplines do not take into account the type
test performed on \code{x}. Occurrence typing is the typing technique
that uses the information provided by the test to specialize the type
of \code{x} in the branches of the conditional: since we tested that
of \code{x} in the branches of the conditional: since the program tested that
\code{x} is of type \code{number}, then we can safely assume that
\code{x} is of type \code{number} in the ``then'' branch, and that it
is \emph{not} of type \code{number} (and thus it is of type
is \emph{not} of type \code{number} (and thus deduce from the type annotation that it must be of type
\code{string}) in the ``else'' branch.
Occurrence typing was first defined and formally studied by \citet{THF08} to statically type-check untyped Schemes programs. It was just for variables, path, user-defined predicates, propagation thereof?. It is for variables in TypeScript, also paths in Flow. STRESS THE USE OF INTERSECTIONS AND UNIONS. ACTUALLY SPEAK OF UNION AND NEGATION (UNION FOR ALTERNATIVES, NEGATION FOR TYPING THE ELSE BRANCH AND THEREFORE WE GET INTERSECTION FOR FREE)
...
...
@@ -39,7 +39,7 @@ defined predicates.
We focus our study on conditionals test types and consider the following syntax:
We focus our study on conditionals that test types and consider the following syntax:
\(
\ifty{e}{t}{e}{e}
\)
...
...
@@ -47,13 +47,13 @@ and therefore for instance in our syntax the body of the function above will be
\[\ifty{x}{\Int}{x+1}{(\textsf{length } x)}\]
Here we want establish a formal framework to
extract as much static information as possible from a type test. A
typical example is the expression
extract as much static information as possible from a type test. In particular, we initially concentrate on applications, since many other cases can be reduced to it. A typical example is the expression
\[\ifty{x_1x_2}{t}{e_1}{e_2}\]
where $x_i$'s denote variables and $e_i$'s are generic expressions.
Depending the actual $t$ and the static types of $x_1$ and $x_2$ we
can make type assumptions for $x_1$, for $x_2$\emph{and} for $x_1x_2$
when typing $e_1$ that are different from those we can make when typing
$e_2$. If for instance instead of $x_1$ we use the identity function,
$e_2$. For instance, if $x_1$ is bound to the identity function,
then it is not hard to see that the term
%
\[\texttt{let }x_1\texttt{\,=\,}\lambda x.x\texttt{ in }\ifty{x_1x_2}{\Int}{((x_1x_2)+x_2)}{\texttt{42}}\]
...
...
@@ -66,7 +66,8 @@ the same).
Of course such a reasoning holds not only when $x_1$ is bound to the identity
function, but also when the static type of $x_1$ is the same as the
one of the identity function. Let us forget polymorphism and type
one of the identity function (e.g., $\forall\alpha.\alpha\to\alpha$).
Let us forget polymorphism and type
variables. We can mimic a finite form of polymorphism with
intersection types. So for instance we can give the identity function
the type $(\Int\to\Int)\wedge(\Bool\to\Bool)$. If the static type of
@@ -141,22 +141,86 @@ Let us recap: if $e$ is an expression of type $t_0$ then when typing
%
\[\ifty{e}{t}{e_1}{e_2}\]
%
the we can assume that $e$ has type $t_0\wedge t$ when typing $e_1$
then we can assume that $e$ has type $t_0\wedge t$ when typing $e_1$
and type $t_0\setminus t$ when typing $e_2$. If furthermore $e$ is of
the form $e'e''$, then we may also specialize the types for $e'$ (in
particular if its static type is a union of arrows) and for $e''$ (in
particular if the static type of $e'$ is an intersection of
arrows. Additionnally we can repeat the reasoning for subterms of $e'$
and $e''$ and deduce distinct types for all subexpressions of $e$ that
arrows). Additionally, we can repeat the reasoning for all subterms of $e'$
and $e''$as long as they are applications, and deduce distinct types for all subexpressions of $e$ that
form applications. How to do it precisely is explained in the rest of
the paper but the key ideas are pretty simple and are based on the
observation that typing is monotonic component-wise: the larger the
types of the terms involved in an application the larger the type of
the result. So when we try to determine whether the result of an
application $e_1e_2$ will be of type $t$ we will try to determine the
larger subtypes of the static types of $e_i$ that ensure that this
application wil ... argh ... réécrire.
the paper but the key ideas are pretty simple and are explained next.
\subsection{Key ideas} First of all, in a strict language we can consider a type as denoting the set of values of that type and subtyping as set-containment of the denoted values.
Imagine we are testing whether the result of an application $e_1e_2$
is of type $t$ or not, and suppose we know that the static types of
$e_1$ and $e_2$ are $t_1$ and $t_2$ respectively. If the application $e_1e_2$ is
well typed, then there is a lot of useful information that we can deduce from it:
first, that $t_1$ is a functional type (i.e., it denotes a set of
well-typed lambdas) whose domain, denoted by $\dom{t_1}$ is a types
denoting the set of all values that are accepted by any function in
$t_1$; second that $t_2$ must be a subtype of the domain of $t_1$;
third, we also know of the type of the application, that is the type
that denotes all the values resulting from the application of a
function in $t_1$ to an argument in $t_2$ and that we denote by
$t_1\circ t_2$. For instance, if $t_1=\Int\to\Bool$ and $t_2=\Int$,
then $\dom{t_1}=\Int$ and $t_1\circ t_2=\Bool$. What we want to do
is to refine the types of $e_1$ and $e_2$ (i.e., $t_1$ and $t_2$) for the cases where the test
succeeds or fails.
Let us start with refining the type $t_2$ of $e_2$ for the case in
which the test succeeds. Intuitively, we want to remove from $t_2$ all
the values for which the application will surely return a result not
in $t$, thus making the test fail. Consider $t_1$ and let $s$ be the
largest subtype of $\dom{t_1}$ such that
\begin{equation}\label{eq1}
t_1\circ s\leq\neg t
\end{equation}
In other terms $s$ contains all the arguments that make any function
in $t_1$ return a result not in $t$. Then we can safely remove from
$t_2$ all the values in $s$ or, equivalently, keep in $t_2$ all the
values of $\dom{t_1}$ that are not in $s$. Let us implement the second
solution: the set of all elements of $\dom{t_1}$ for which an
application \emph{does not} surely give a result in $\neg t$ is
denoted $\worra{t_1}t$ and defined as $\min\{u\leq\dom{t_1}\alt
t_1\circ(\dom{t_1}\setminus u)\leq\neg t\}$: it is easy to see that
according to this definition $\dom{t_1}\setminus(\worra{t_1} t)$ is
the largest subset of $\dom{t_1}$ satisfying \eqref{eq1}. Then we can
refine the type $e_1$ for when the test is successful by using the
type $t_2\wedge(\worra{t_1} t)$: we intersect all the possible results
of $e_2$, that is $t_2$, with the elements of the domain that
\emph{may} yield a result in $t$, that is $\worra{t_1} t$. It is now
easy to see how to refine the type of $e_2$ for when the test fails:
simply use all the other possible results of $e_2$, that is
$t_2\setminus(\worra{t_1} t)$. To sum up, to refine the type of an
argument in test of applications, all we need is to define
$\worra{t_1} t$, the set of arguments that when applied to a function
of type $t_1$\emph{may} return a result in $t$; then we can refine the type of $e_2$ as $t_2^+= t_2\wedge(\worra{t_1} t)$ in the ``then'' branch and as $t_2^-= t_2\setminus(\worra{t_1} t)$ in the else branch.
As a side remark note
that the set $\worra{t_1} t$ is different from the set of elements that return a
result in $t$ (though it is a supertype of it). To see that consider
for $t_1$ the type $(\Bool\to\Bool)\wedge(\Int\to(\String\vee\Int))$,
that is, the type of functions that when applied to a Boolean return a
Boolean and when applied to an integer return either an integer or a
string; then $\dom{t_1}=\Int\vee\Bool$ and $\worra{t_1}\String=\Int$,
but there is no (non-empty) type that assures that an application of a
function in $t_1$ will surely yield a $\String$ result.
Once we determined $t_2^+$ it is then easy to refine the type $t_1$,
too. If the test succeeded then we know that the function was applied
to a value in $t_2^+$ and thus returned a result in $t_1\!\circ
t_2^+$. Therefore we can exclude from $t_1$ all the functions that do
not accept values $t_2^+$ (though, for technical reasons there aren't
any) or that return results not in $t_1\!\circ t_2^+$. This can simply
be obtained by intersecting $t_1$ with the type $t_2^+\to(t_1\!\circ
t_2^+)$. Therefore for what concerns $e_1$ we can refine its type as
$t_1^+=t_1\wedge(t_2^+\to(t_1\!\circ t_2^+))$ in the ``then'' branch
and as $t_1^-=t_1\setminus(t_2^+\to(t_1\!\circ t_2^+))$ in the
``else'' branch.
This is essentially what we formalize in Section~\ref{sec:language}.
Let $e$ be and expression and $\pi\in\{0,1\}^*$ a \emph{path}; we denote $\occ e\pi$ the occurrence of $e$ reached by the path $\pi$, that is
Let $e$ be an expression and $\pi\in\{0,1\}^*$ a \emph{path}; we denote $\occ e\pi$ the occurrence of $e$ reached by the path $\pi$, that is
\[
\begin{array}{r@{\downarrow}l@{\quad=\quad}l}
e&\epsilon& e\\
...
...
@@ -19,8 +19,8 @@ A type environment $\Gamma$ is a mapping from occurrences (i.e., expressions) to
We suppose w.l.o.g that all variables abstracted in $\lambda$-abstractions are distincts (otherwise we should add an extra parameter in our definitions for the variables).
Let $e$ be and expression, $t$ a type, $\pi\in\{0,1\}^*$ and $p\in\{+,-\}$, we define $\typep p \pi{e,t}$ and $\Gp p {e,t}(\pi)$ as follows
(ok all this must be parametrized by $\Gamma$ otherwise typeof is not defined).
Let $e$ be an expression, $t$ a type, $\Gamma$ a type environment, $\pi\in\{0,1\}^*$ and $p\in\{+,-\}$, we define $\typep p \pi{\Gamma,e,t}$ and $\Gp p {\Gamma,e,t}(\pi)$ as follows