@@ -4,32 +4,32 @@ Typescript and Flow are extensions of Javascript that allow the programmer to sp

(typeof(x) === "number")? x++ : x.length

\}

\end{alltt}

Apart from the type annotation of the function parameter, the above is

Apart from the type annotation (in red) of the function parameter, the above is

standard JavaScript code defining a function that checks whether

its argument is an integer; if it is so, then it returns the argument's sucessor

(\code{x++}), otherwise it returns the property \code{length} of the

argument. The annotation specifies that the parameter is either a

number or a string (the vertical bar denotes a union type). If this annotation is respected and the function

is applied either to an integer or to a string, then this application

is applied to either an integer or a string, then the application

cannot fail fail because of a type error (\code{length} is a property

defined for string) and the type-checkers of both TypeScript and Flow

rightly accept this function. This is possible because both languages

defined for string) and both the type-checker of TypeScript and the one of Flow

rightly accept this function. This is possible because both type-checkers

implement a specific type discipline called \emph{occurrence typing} or \emph{flow

typing}:\footnote{%

TypeScript calls it ``type guard recognition'' while Flow uses the terminology ``type

refinements''.}

as a matter of fact, standard type discipline would reject this function.

as a matter of fact, standard type disciplines would reject this function.

The reason for that is that standard type disciplines would try to

type the body of the function under the assumption that \code{x} has

type \code{number\,|\,string} and would fail, since the successor is

type the whole body of the function under the assumption that \code{x} has

type \code{number\,|\,string} and they would fail, since the successor is

not defined for strings and the property \code{length} is not defined

for numbers. This is so because standard disciplines do not take into account the type

test performed on \code{x}. Occurrence typing is the typing technique

that uses the information provided by the test to specialize the type

of \code{x} in the branches of the conditional: since we tested that

of \code{x} in the branches of the conditional: since the program tested that

\code{x} is of type \code{number}, then we can safely assume that

\code{x} is of type \code{number} in the ``then'' branch, and that it

is \emph{not} of type \code{number} (and thus it is of type

is \emph{not} of type \code{number} (and thus deduce from the type annotation that it must be of type

\code{string}) in the ``else'' branch.

Occurrence typing was first defined and formally studied by \citet{THF08} to statically type-check untyped Schemes programs. It was just for variables, path, user-defined predicates, propagation thereof?. It is for variables in TypeScript, also paths in Flow. STRESS THE USE OF INTERSECTIONS AND UNIONS. ACTUALLY SPEAK OF UNION AND NEGATION (UNION FOR ALTERNATIVES, NEGATION FOR TYPING THE ELSE BRANCH AND THEREFORE WE GET INTERSECTION FOR FREE)

...

...

@@ -39,7 +39,7 @@ defined predicates.

We focus our study on conditionals test types and consider the following syntax:

We focus our study on conditionals that test types and consider the following syntax:

\(

\ifty{e}{t}{e}{e}

\)

...

...

@@ -47,13 +47,13 @@ and therefore for instance in our syntax the body of the function above will be

\[\ifty{x}{\Int}{x+1}{(\textsf{length } x)}\]

Here we want establish a formal framework to

extract as much static information as possible from a type test. A

typical example is the expression

extract as much static information as possible from a type test. In particular, we initially concentrate on applications, since many other cases can be reduced to it. A typical example is the expression

\[\ifty{x_1x_2}{t}{e_1}{e_2}\]

where $x_i$'s denote variables and $e_i$'s are generic expressions.

Depending the actual $t$ and the static types of $x_1$ and $x_2$ we

can make type assumptions for $x_1$, for $x_2$\emph{and} for $x_1x_2$

when typing $e_1$ that are different from those we can make when typing

$e_2$. If for instance instead of $x_1$ we use the identity function,

$e_2$. For instance, if $x_1$ is bound to the identity function,

then it is not hard to see that the term

%

\[\texttt{let }x_1\texttt{\,=\,}\lambda x.x\texttt{ in }\ifty{x_1x_2}{\Int}{((x_1x_2)+x_2)}{\texttt{42}}\]

...

...

@@ -66,7 +66,8 @@ the same).

Of course such a reasoning holds not only when $x_1$ is bound to the identity

function, but also when the static type of $x_1$ is the same as the

one of the identity function. Let us forget polymorphism and type

one of the identity function (e.g., $\forall\alpha.\alpha\to\alpha$).

Let us forget polymorphism and type

variables. We can mimic a finite form of polymorphism with

intersection types. So for instance we can give the identity function

the type $(\Int\to\Int)\wedge(\Bool\to\Bool)$. If the static type of

@@ -141,22 +141,86 @@ Let us recap: if $e$ is an expression of type $t_0$ then when typing

%

\[\ifty{e}{t}{e_1}{e_2}\]

%

the we can assume that $e$ has type $t_0\wedge t$ when typing $e_1$

then we can assume that $e$ has type $t_0\wedge t$ when typing $e_1$

and type $t_0\setminus t$ when typing $e_2$. If furthermore $e$ is of

the form $e'e''$, then we may also specialize the types for $e'$ (in

particular if its static type is a union of arrows) and for $e''$ (in

particular if the static type of $e'$ is an intersection of

arrows. Additionnally we can repeat the reasoning for subterms of $e'$

and $e''$ and deduce distinct types for all subexpressions of $e$ that

arrows). Additionally, we can repeat the reasoning for all subterms of $e'$

and $e''$as long as they are applications, and deduce distinct types for all subexpressions of $e$ that

form applications. How to do it precisely is explained in the rest of

the paper but the key ideas are pretty simple and are based on the

observation that typing is monotonic component-wise: the larger the

types of the terms involved in an application the larger the type of

the result. So when we try to determine whether the result of an

application $e_1e_2$ will be of type $t$ we will try to determine the

larger subtypes of the static types of $e_i$ that ensure that this

application wil ... argh ... réécrire.

the paper but the key ideas are pretty simple and are explained next.

\subsection{Key ideas} First of all, in a strict language we can consider a type as denoting the set of values of that type and subtyping as set-containment of the denoted values.

Imagine we are testing whether the result of an application $e_1e_2$

is of type $t$ or not, and suppose we know that the static types of

$e_1$ and $e_2$ are $t_1$ and $t_2$ respectively. If the application $e_1e_2$ is

well typed, then there is a lot of useful information that we can deduce from it:

first, that $t_1$ is a functional type (i.e., it denotes a set of

well-typed lambdas) whose domain, denoted by $\dom{t_1}$ is a types

denoting the set of all values that are accepted by any function in

$t_1$; second that $t_2$ must be a subtype of the domain of $t_1$;

third, we also know of the type of the application, that is the type

that denotes all the values resulting from the application of a

function in $t_1$ to an argument in $t_2$ and that we denote by

$t_1\circ t_2$. For instance, if $t_1=\Int\to\Bool$ and $t_2=\Int$,

then $\dom{t_1}=\Int$ and $t_1\circ t_2=\Bool$. What we want to do

is to refine the types of $e_1$ and $e_2$ (i.e., $t_1$ and $t_2$) for the cases where the test

succeeds or fails.

Let us start with refining the type $t_2$ of $e_2$ for the case in

which the test succeeds. Intuitively, we want to remove from $t_2$ all

the values for which the application will surely return a result not

in $t$, thus making the test fail. Consider $t_1$ and let $s$ be the

largest subtype of $\dom{t_1}$ such that

\begin{equation}\label{eq1}

t_1\circ s\leq\neg t

\end{equation}

In other terms $s$ contains all the arguments that make any function

in $t_1$ return a result not in $t$. Then we can safely remove from

$t_2$ all the values in $s$ or, equivalently, keep in $t_2$ all the

values of $\dom{t_1}$ that are not in $s$. Let us implement the second

solution: the set of all elements of $\dom{t_1}$ for which an

application \emph{does not} surely give a result in $\neg t$ is

denoted $\worra{t_1}t$ and defined as $\min\{u\leq\dom{t_1}\alt

t_1\circ(\dom{t_1}\setminus u)\leq\neg t\}$: it is easy to see that

according to this definition $\dom{t_1}\setminus(\worra{t_1} t)$ is

the largest subset of $\dom{t_1}$ satisfying \eqref{eq1}. Then we can

refine the type $e_1$ for when the test is successful by using the

type $t_2\wedge(\worra{t_1} t)$: we intersect all the possible results

of $e_2$, that is $t_2$, with the elements of the domain that

\emph{may} yield a result in $t$, that is $\worra{t_1} t$. It is now

easy to see how to refine the type of $e_2$ for when the test fails:

simply use all the other possible results of $e_2$, that is

$t_2\setminus(\worra{t_1} t)$. To sum up, to refine the type of an

argument in test of applications, all we need is to define

$\worra{t_1} t$, the set of arguments that when applied to a function

of type $t_1$\emph{may} return a result in $t$; then we can refine the type of $e_2$ as $t_2^+= t_2\wedge(\worra{t_1} t)$ in the ``then'' branch and as $t_2^-= t_2\setminus(\worra{t_1} t)$ in the else branch.

As a side remark note

that the set $\worra{t_1} t$ is different from the set of elements that return a

result in $t$ (though it is a supertype of it). To see that consider

for $t_1$ the type $(\Bool\to\Bool)\wedge(\Int\to(\String\vee\Int))$,

that is, the type of functions that when applied to a Boolean return a

Boolean and when applied to an integer return either an integer or a

string; then $\dom{t_1}=\Int\vee\Bool$ and $\worra{t_1}\String=\Int$,

but there is no (non-empty) type that assures that an application of a

function in $t_1$ will surely yield a $\String$ result.

Once we determined $t_2^+$ it is then easy to refine the type $t_1$,

too. If the test succeeded then we know that the function was applied

to a value in $t_2^+$ and thus returned a result in $t_1\!\circ

t_2^+$. Therefore we can exclude from $t_1$ all the functions that do

not accept values $t_2^+$ (though, for technical reasons there aren't

any) or that return results not in $t_1\!\circ t_2^+$. This can simply

be obtained by intersecting $t_1$ with the type $t_2^+\to(t_1\!\circ

t_2^+)$. Therefore for what concerns $e_1$ we can refine its type as

$t_1^+=t_1\wedge(t_2^+\to(t_1\!\circ t_2^+))$ in the ``then'' branch

and as $t_1^-=t_1\setminus(t_2^+\to(t_1\!\circ t_2^+))$ in the

``else'' branch.

This is essentially what we formalize in Section~\ref{sec:language}.

Let $e$ be and expression and $\pi\in\{0,1\}^*$ a \emph{path}; we denote $\occ e\pi$ the occurrence of $e$ reached by the path $\pi$, that is

Let $e$ be an expression and $\pi\in\{0,1\}^*$ a \emph{path}; we denote $\occ e\pi$ the occurrence of $e$ reached by the path $\pi$, that is

\[

\begin{array}{r@{\downarrow}l@{\quad=\quad}l}

e&\epsilon& e\\

...

...

@@ -19,8 +19,8 @@ A type environment $\Gamma$ is a mapping from occurrences (i.e., expressions) to

We suppose w.l.o.g that all variables abstracted in $\lambda$-abstractions are distincts (otherwise we should add an extra parameter in our definitions for the variables).

Let $e$ be and expression, $t$ a type, $\pi\in\{0,1\}^*$ and $p\in\{+,-\}$, we define $\typep p \pi{e,t}$ and $\Gp p {e,t}(\pi)$ as follows

(ok all this must be parametrized by $\Gamma$ otherwise typeof is not defined).

Let $e$ be an expression, $t$ a type, $\Gamma$ a type environment, $\pi\in\{0,1\}^*$ and $p\in\{+,-\}$, we define $\typep p \pi{\Gamma,e,t}$ and $\Gp p {\Gamma,e,t}(\pi)$ as follows