Commit 4447a6e1 authored by Giuseppe Castagna's avatar Giuseppe Castagna
Browse files

added explaination of the ideas

parent 9319ca46
......@@ -4,32 +4,32 @@ Typescript and Flow are extensions of Javascript that allow the programmer to sp
(typeof(x) === "number")? x++ : x.length
\}
\end{alltt}
Apart from the type annotation of the function parameter, the above is
Apart from the type annotation (in red) of the function parameter, the above is
standard JavaScript code defining a function that checks whether
its argument is an integer; if it is so, then it returns the argument's sucessor
(\code{x++}), otherwise it returns the property \code{length} of the
argument. The annotation specifies that the parameter is either a
number or a string (the vertical bar denotes a union type). If this annotation is respected and the function
is applied either to an integer or to a string, then this application
is applied to either an integer or a string, then the application
cannot fail fail because of a type error (\code{length} is a property
defined for string) and the type-checkers of both TypeScript and Flow
rightly accept this function. This is possible because both languages
defined for string) and both the type-checker of TypeScript and the one of Flow
rightly accept this function. This is possible because both type-checkers
implement a specific type discipline called \emph{occurrence typing} or \emph{flow
typing}:\footnote{%
TypeScript calls it ``type guard recognition'' while Flow uses the terminology ``type
refinements''.}
as a matter of fact, standard type discipline would reject this function.
as a matter of fact, standard type disciplines would reject this function.
The reason for that is that standard type disciplines would try to
type the body of the function under the assumption that \code{x} has
type \code{number\,|\,string} and would fail, since the successor is
type the whole body of the function under the assumption that \code{x} has
type \code{number\,|\,string} and they would fail, since the successor is
not defined for strings and the property \code{length} is not defined
for numbers. This is so because standard disciplines do not take into account the type
test performed on \code{x}. Occurrence typing is the typing technique
that uses the information provided by the test to specialize the type
of \code{x} in the branches of the conditional: since we tested that
of \code{x} in the branches of the conditional: since the program tested that
\code{x} is of type \code{number}, then we can safely assume that
\code{x} is of type \code{number} in the ``then'' branch, and that it
is \emph{not} of type \code{number} (and thus it is of type
is \emph{not} of type \code{number} (and thus deduce from the type annotation that it must be of type
\code{string}) in the ``else'' branch.
Occurrence typing was first defined and formally studied by \citet{THF08} to statically type-check untyped Schemes programs. It was just for variables, path, user-defined predicates, propagation thereof?. It is for variables in TypeScript, also paths in Flow. STRESS THE USE OF INTERSECTIONS AND UNIONS. ACTUALLY SPEAK OF UNION AND NEGATION (UNION FOR ALTERNATIVES, NEGATION FOR TYPING THE ELSE BRANCH AND THEREFORE WE GET INTERSECTION FOR FREE)
......@@ -39,7 +39,7 @@ defined predicates.
We focus our study on conditionals test types and consider the following syntax:
We focus our study on conditionals that test types and consider the following syntax:
\(
\ifty{e}{t}{e}{e}
\)
......@@ -47,13 +47,13 @@ and therefore for instance in our syntax the body of the function above will be
\[\ifty{x}{\Int}{x+1}{(\textsf{length } x)}\]
Here we want establish a formal framework to
extract as much static information as possible from a type test. A
typical example is the expression
extract as much static information as possible from a type test. In particular, we initially concentrate on applications, since many other cases can be reduced to it. A typical example is the expression
\[\ifty{x_1x_2}{t}{e_1}{e_2}\]
where $x_i$'s denote variables and $e_i$'s are generic expressions.
Depending the actual $t$ and the static types of $x_1$ and $x_2$ we
can make type assumptions for $x_1$, for $x_2$ \emph{and} for $x_1x_2$
when typing $e_1$ that are different from those we can make when typing
$e_2$. If for instance instead of $x_1$ we use the identity function,
$e_2$. For instance, if $x_1$ is bound to the identity function,
then it is not hard to see that the term
%
\[\texttt{let }x_1 \texttt{\,=\,}\lambda x.x\texttt{ in } \ifty{x_1x_2}{\Int}{((x_1x_2)+x_2)}{\texttt{42}}\]
......@@ -66,7 +66,8 @@ the same).
Of course such a reasoning holds not only when $x_1$ is bound to the identity
function, but also when the static type of $x_1$ is the same as the
one of the identity function. Let us forget polymorphism and type
one of the identity function (e.g., $\forall\alpha.\alpha\to\alpha$).
Let us forget polymorphism and type
variables. We can mimic a finite form of polymorphism with
intersection types. So for instance we can give the identity function
the type $(\Int\to\Int)\wedge(\Bool\to\Bool)$. If the static type of
......@@ -79,9 +80,8 @@ hard to see that the expression
\begin{equation}
\ifty{x_1x_2}{\Int}{((x_1x_2)+x_2)}{((x_1x_2)\texttt{\,\&\&\,}x_2)}
\end{equation}
has type $\Int\vee\Bool$.
In all these reasoning require to make different type assumption for
has type $\Int\vee\Bool$ as we show next.
All these type deductions require to make different type assumptions for
the typing of the ``then'' and of the ``else'' branches. Let us
examine it more in detail for the last expression. This expression is
typed in the following type environment:
......@@ -90,7 +90,7 @@ can deduce is then that the application $x_1x_2$ has type
$\Int\vee\Bool$ which is not enough to type either the ``then''branch
or the ``else'' branch. In order to type the ``then'' branch
$(x_1x_2)+x_2$ we must be able to deduce that both $x_1x_2$ and $x_2$
are of type \Int. Since we are in the ``then'' branch the we know that
are of type \Int. Since we are in the ``then'' branch, then we know that
the type test succeed and that therefore $x_1x_2$ has type \Int. Thus
we can assume in typing this branch that $x_1x_2$ has both its static
type and type \Int and thus their intersection:
......@@ -101,15 +101,15 @@ returns an \Int\ only if its argument is of type \Int. Resoning as
above we thus deduce that in the ``then'' branch the type of $x_2$ is
the intersection of its static type with \Int:
$(\Int\vee\Bool)\wedge\Int$ that is \Int. To type the ``else'' branch
we reason exactly in the same way, we the only difference that since
the type test has failed we know that the type of tested expression is
\emph{not} \Int. That is the expression can produce any possible value
we reason exactly in the same way, with the only difference that, since
the type test has failed, then we know that the type of tested expression is
\emph{not} \Int. That is, the expression $x_1x_2$ can produce any possible value
barring an \Int. If we denote by \Any\ the type of all values and by
$\setminus$ the set difference this means that in the else branch we
$\setminus$ the set difference, then this means that in the else branch we
know that $x_1x_2$ has type $\Any{\setminus}\Int$ ---written
$\neg\Int$---. Reasoning as for the ``then''branch we then assume that
$x_1x_2$ has type $(\Int\vee\Bool)\wedge\neg\Int$ (i.e., $(\Int\vee\Bool)\setminus\Int$, that is,\Bool), that
$x_2$ must be of type \Bool for the application to have type
$x_2$ must be of type \Bool\ for the application to have type
$\neg\Int$ and therefore we assume that $x_2$ has type
$(\Int\vee\Bool)\wedge\Bool$ (i.e., again \Bool).
......@@ -119,7 +119,7 @@ but what about the type of $x_1$? Well, this depends on the type of
$x_1$ itself. In particular, if instead of an intersection type $x_2$
is typed by a union type, then the test may give us information about
the type of the function in the various branches. So for instance if
$x_2$ is of type, say, $(\Int\to\Int)\vee(\Bool\to\Bool)$, the we can
$x_2$ is of type, say, $(\Int\to\Int)\vee(\Bool\to\Bool)$, then we can
assume that $x_2$ has type $\Int\to\Int$ in the branch ``then'' and
$\Bool\to\Bool$ in the branch ``else''. (bad example ... we can apply only diverging expressions).
......@@ -128,7 +128,7 @@ $\Bool\to\Bool$ in the branch ``else''. (bad example ... we can apply only diver
So for instance if
$x_1:(\Int{\vee}\String\to\Int)\vee(\Bool{\vee}\String\to\Bool)$ and
$x_1x_2$ is well-typed, the we can deduce for
$x_1x_2$ is well-typed, then we can deduce for
\[
\ifty{x_1x_2}{\Int}{(x_1(x_1x_2)+42)}{\texttt{not}(x_1(x_1x_2))}
\]
......@@ -141,22 +141,86 @@ Let us recap: if $e$ is an expression of type $t_0$ then when typing
%
\[\ifty{e}{t}{e_1}{e_2}\]
%
the we can assume that $e$ has type $t_0\wedge t$ when typing $e_1$
then we can assume that $e$ has type $t_0\wedge t$ when typing $e_1$
and type $t_0\setminus t$ when typing $e_2$. If furthermore $e$ is of
the form $e'e''$, then we may also specialize the types for $e'$ (in
particular if its static type is a union of arrows) and for $e''$ (in
particular if the static type of $e'$ is an intersection of
arrows. Additionnally we can repeat the reasoning for subterms of $e'$
and $e''$ and deduce distinct types for all subexpressions of $e$ that
arrows). Additionally, we can repeat the reasoning for all subterms of $e'$
and $e''$ as long as they are applications, and deduce distinct types for all subexpressions of $e$ that
form applications. How to do it precisely is explained in the rest of
the paper but the key ideas are pretty simple and are based on the
observation that typing is monotonic component-wise: the larger the
types of the terms involved in an application the larger the type of
the result. So when we try to determine whether the result of an
application $e_1e_2$ will be of type $t$ we will try to determine the
larger subtypes of the static types of $e_i$ that ensure that this
application wil ... argh ... réécrire.
the paper but the key ideas are pretty simple and are explained next.
\subsection{Key ideas} First of all, in a strict language we can consider a type as denoting the set of values of that type and subtyping as set-containment of the denoted values.
Imagine we are testing whether the result of an application $e_1e_2$
is of type $t$ or not, and suppose we know that the static types of
$e_1$ and $e_2$ are $t_1$ and $t_2$ respectively. If the application $e_1e_2$ is
well typed, then there is a lot of useful information that we can deduce from it:
first, that $t_1$ is a functional type (i.e., it denotes a set of
well-typed lambdas) whose domain, denoted by $\dom{t_1}$ is a types
denoting the set of all values that are accepted by any function in
$t_1$; second that $t_2$ must be a subtype of the domain of $t_1$;
third, we also know of the type of the application, that is the type
that denotes all the values resulting from the application of a
function in $t_1$ to an argument in $t_2$ and that we denote by
$t_1\circ t_2$. For instance, if $t_1=\Int\to\Bool$ and $t_2=\Int$,
then $\dom{t_1}=\Int$ and $t_1\circ t_2 = \Bool$. What we want to do
is to refine the types of $e_1$ and $e_2$ (i.e., $t_1$ and $t_2$) for the cases where the test
succeeds or fails.
Let us start with refining the type $t_2$ of $e_2$ for the case in
which the test succeeds. Intuitively, we want to remove from $t_2$ all
the values for which the application will surely return a result not
in $t$, thus making the test fail. Consider $t_1$ and let $s$ be the
largest subtype of $\dom{t_1}$ such that
\begin{equation}\label{eq1}
t_1\circ s\leq \neg t
\end{equation}
In other terms $s$ contains all the arguments that make any function
in $t_1$ return a result not in $t$. Then we can safely remove from
$t_2$ all the values in $s$ or, equivalently, keep in $t_2$ all the
values of $\dom{t_1}$ that are not in $s$. Let us implement the second
solution: the set of all elements of $\dom{t_1}$ for which an
application \emph{does not} surely give a result in $\neg t$ is
denoted $\worra{t_1}t$ and defined as $\min\{u\leq \dom{t_1}\alt
t_1\circ(\dom{t_1}\setminus u)\leq\neg t\}$: it is easy to see that
according to this definition $\dom{t_1}\setminus(\worra{t_1} t)$ is
the largest subset of $\dom{t_1}$ satisfying \eqref{eq1}. Then we can
refine the type $e_1$ for when the test is successful by using the
type $t_2\wedge(\worra{t_1} t)$: we intersect all the possible results
of $e_2$, that is $t_2$, with the elements of the domain that
\emph{may} yield a result in $t$, that is $\worra{t_1} t$. It is now
easy to see how to refine the type of $e_2$ for when the test fails:
simply use all the other possible results of $e_2$, that is
$t_2\setminus(\worra{t_1} t)$. To sum up, to refine the type of an
argument in test of applications, all we need is to define
$\worra{t_1} t$, the set of arguments that when applied to a function
of type $t_1$ \emph{may} return a result in $t$; then we can refine the type of $e_2$ as $t_2^+ = t_2\wedge(\worra{t_1} t)$ in the ``then'' branch and as $t_2^- = t_2\setminus(\worra{t_1} t)$ in the else branch.
As a side remark note
that the set $\worra{t_1} t$ is different from the set of elements that return a
result in $t$ (though it is a supertype of it). To see that consider
for $t_1$ the type $(\Bool \to\Bool)\wedge(\Int\to(\String\vee\Int))$,
that is, the type of functions that when applied to a Boolean return a
Boolean and when applied to an integer return either an integer or a
string; then $\dom{t_1}=\Int\vee\Bool$ and $\worra{t_1}\String=\Int$,
but there is no (non-empty) type that assures that an application of a
function in $t_1$ will surely yield a $\String$ result.
Once we determined $t_2^+$ it is then easy to refine the type $t_1$,
too. If the test succeeded then we know that the function was applied
to a value in $t_2^+$ and thus returned a result in $t_1\!\circ
t_2^+$. Therefore we can exclude from $t_1$ all the functions that do
not accept values $t_2^+$ (though, for technical reasons there aren't
any) or that return results not in $t_1\!\circ t_2^+$. This can simply
be obtained by intersecting $t_1$ with the type $t_2^+\to (t_1\!\circ
t_2^+)$. Therefore for what concerns $e_1$ we can refine its type as
$t_1^+=t_1\wedge (t_2^+\to(t_1\!\circ t_2^+))$ in the ``then'' branch
and as $t_1^-=t_1\setminus (t_2^+\to(t_1\!\circ t_2^+))$ in the
``else'' branch.
This is essentially what we formalize in Section~\ref{sec:language}.
\subsection{Outline}
......@@ -5,7 +5,7 @@
\end{array}
\]
Let $e$ be and expression and $\pi\in\{0,1\}^*$ a \emph{path}; we denote $\occ e\pi$ the occurrence of $e$ reached by the path $\pi$, that is
Let $e$ be an expression and $\pi\in\{0,1\}^*$ a \emph{path}; we denote $\occ e\pi$ the occurrence of $e$ reached by the path $\pi$, that is
\[
\begin{array}{r@{\downarrow}l@{\quad=\quad}l}
e&\epsilon & e\\
......@@ -19,8 +19,8 @@ A type environment $\Gamma$ is a mapping from occurrences (i.e., expressions) to
We suppose w.l.o.g that all variables abstracted in $\lambda$-abstractions are distincts (otherwise we should add an extra parameter in our definitions for the variables).
Let $e$ be and expression, $t$ a type, $\pi\in\{0,1\}^*$ and $p\in\{+,-\}$, we define $\typep p \pi {e,t}$ and $\Gp p {e,t}(\pi)$ as follows
(ok all this must be parametrized by $\Gamma$ otherwise typeof is not defined).
Let $e$ be an expression, $t$ a type, $\Gamma$ a type environment, $\pi\in\{0,1\}^*$ and $p\in\{+,-\}$, we define $\typep p \pi {\Gamma,e,t}$ and $\Gp p {\Gamma,e,t}(\pi)$ as follows
\[
\begin{array}{lcl}
......
......@@ -182,6 +182,12 @@ Text of abstract \ldots.
\label{sec:extensions}
\input{extensions}
\section{Towards a practical implementation}
\label{sec:practical}
\input{practical}
%% Acknowledgments
\begin{acks} %% acks environment is optional
%% contents suppressed with 'anonymous'
......
\indent
- selection of occurrences whose specific typing is useful
- when refinement for a variable is useful
- handling side-effects
- generic boolean expression in if
- encoding of other constructions (isInt, typeof ...)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment