intro.tex 32.5 KB
Newer Older
1
TypeScript and Flow are extensions of JavaScript that allow the programmer to specify in the code type annotations used to statically type-check the program. For instance, the following function definition is valid in both languages  
Giuseppe Castagna's avatar
Giuseppe Castagna committed
2
3
\begin{alltt}\color{darkblue}
  function foo(x\textcolor{darkred}{ : number | string}) \{
4
      return (typeof(x) === "number")? x+1 : x.trim();  \refstepcounter{equation}                               \mbox{\color{black}\rm(\theequation)}\label{foo}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
5
  \}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
6
\end{alltt}
7
Apart from the type annotation (in red) of the function parameter, the above is
Giuseppe Castagna's avatar
Giuseppe Castagna committed
8
standard JavaScript code defining a function that checks whether
Giuseppe Castagna's avatar
Giuseppe Castagna committed
9
its argument is an integer; if it is so, then it returns the argument's successor
Kim Nguyễn's avatar
Kim Nguyễn committed
10
(\code{x+1}), otherwise it calls the method \code{trim()} of the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
11
argument. The annotation specifies that the parameter is either a
Giuseppe Castagna's avatar
Giuseppe Castagna committed
12
number or a string (the vertical bar denotes a union type).  If this annotation is respected and the function
13
is applied to either an integer or a string, then the application
Giuseppe Castagna's avatar
Giuseppe Castagna committed
14
15
16
cannot fail because of a type error (\code{trim()} is a string method of the ECMAScript 5 standard that
trims whitespaces from the beginning and end of the string) and  both the type-checker of TypeScript and the one of Flow
rightly accept this function. This is possible because both type-checkers
Giuseppe Castagna's avatar
Giuseppe Castagna committed
17
implement a specific type discipline called \emph{occurrence typing} or \emph{flow
Giuseppe Castagna's avatar
Giuseppe Castagna committed
18
typing}:\footnote{%
Giuseppe Castagna's avatar
Giuseppe Castagna committed
19
20
  TypeScript calls it ``type guard recognition'' while Flow uses the terminology ``type
  refinements''.}
21
as a matter of fact, standard type disciplines would reject this function.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
22
The reason for that is that standard type disciplines would try to
Giuseppe Castagna's avatar
Giuseppe Castagna committed
23
type every part of the body of the function under the assumption that \code{x} has
24
type \code{number\,|\,string} and they would fail, since the successor is
Giuseppe Castagna's avatar
Giuseppe Castagna committed
25
not defined for strings and the method \code{trim()} is not defined
Giuseppe Castagna's avatar
Giuseppe Castagna committed
26
for numbers.  This is so because standard disciplines do not take into account the type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
27
test performed on \code{x}. Occurrence typing is the typing technique
Giuseppe Castagna's avatar
Giuseppe Castagna committed
28
that uses the information provided by the test to specialize---precisely, to \emph{refine}---the type
29
of the occurrences of \code{x} in the branches of the conditional: since the program tested that
Giuseppe Castagna's avatar
Giuseppe Castagna committed
30
31
\code{x} is of type \code{number}, then we can safely assume that
\code{x} is of type \code{number} in the ``then'' branch, and that it
32
is \emph{not} of type \code{number} (and thus deduce from the type annotation that it must be of type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
33
34
\code{string}) in the ``else'' branch.

Giuseppe Castagna's avatar
Giuseppe Castagna committed
35
Occurrence typing was first defined and formally studied by
Kim Nguyễn's avatar
Kim Nguyễn committed
36
\citet{THF08} to statically type-check untyped Scheme programs, and later extended by \citet{THF10}
Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
37
yielding the development of Typed Racket. From its inception,
Giuseppe Castagna's avatar
Giuseppe Castagna committed
38
39
occurrence typing was intimately tied to type systems with
set-theoretic types: unions, intersections, and negation of
Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
40
types. Union was the first type connective to appear, since it was already used by
Giuseppe Castagna's avatar
Giuseppe Castagna committed
41
42
43
44
45
46
\citet{THF08} where its presence was needed to characterize the
different control flows of a type test, as our \code{foo} example
shows: one flow for integer arguments and another for
strings. Intersection types appear (in limited forms) combined with
occurrence typing both in TypeScript and in Flow and serve to give, among other,
more precise types to functions such as \code{foo}. For instance,
Kim Nguyễn's avatar
Reword.    
Kim Nguyễn committed
47
since \code{x\,+\,1} evaluates to an integer and \code{x.trim()} to a string, then our
Giuseppe Castagna's avatar
Giuseppe Castagna committed
48
function \code{foo} has type
49
50
51
52
53
54
\code{(number|string)$\to$(number|string)}.
% \footnote{\label{null}Actually,
%   both Flow and TypeScript deduce as return type
%   \code{number|string|void}, since they track when operations may yield \code{void} results. Considering this would be easy but also clutter our presentation, which is why we omit
%   such details.}
But it is clear that a more precise type would be
Giuseppe Castagna's avatar
Giuseppe Castagna committed
55
one that states that \code{foo} returns a number when it is applied to
Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
56
a number and returns a string when it is applied to a string, so that the type deduced for, say, \code{foo(42)} would be \code{number} rather than \code{number|string}. This is
Giuseppe Castagna's avatar
Giuseppe Castagna committed
57
58
59
60
61
exactly what the \emph{intersection type}
\begin{equation}\label{eq:inter}
  \code{(number$\to$number) \&  (string$\to$string)}
\end{equation}
states (intuitively, an expression has an intersection of types, noted \code{\&}, if and only if it has all the types of the intersection) and corresponds in Flow to declaring \code{foo} as follows:
Giuseppe Castagna's avatar
Giuseppe Castagna committed
62
63
\begin{alltt}\color{darkblue}
  var foo : \textcolor{darkred}{(number => number) & (string => string)} =  x => \{
64
      return (typeof(x) === "number")? x+1 : x.trim();  \refstepcounter{equation}                               \mbox{\color{black}\rm(\theequation)}\label{foo2}
Giuseppe Castagna's avatar
bho?    
Giuseppe Castagna committed
65
  \} 
Giuseppe Castagna's avatar
Giuseppe Castagna committed
66
67
68
69
70
71
\end{alltt}
For what concerns negation types, they are pervasive in the occurrence
typing approach, even though they are used only at meta-theoretic
level,\footnote{At the moment of writing there is a pending pull
  request to add negation types to the syntax of TypeScript, but that is all.} in
particular to determine the type environment when the type case
72
fails. We already saw negation types at work when we informally typed the ``else''
Giuseppe Castagna's avatar
Giuseppe Castagna committed
73
branch in \code{foo}, for which we assumed that $x$ did \emph{not} have
Giuseppe Castagna's avatar
typos    
Giuseppe Castagna committed
74
75
type \code{number}---i.e., it had the (negation) type \code{$\neg$number}---and
deduced from it that $x$ then had type \code{string}---i.e.,
76
77
\code{(number|string)\&$\neg$number} which is equivalent to the set-theoretic difference
\code{(number|string)\textbackslash\,number} and, thus, to \code{string}.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
78

79
The approaches cited above essentially focus on refining the type
Kim Nguyễn's avatar
Kim Nguyễn committed
80
of variables that occur in an expression whose type is being tested. They
Giuseppe Castagna's avatar
typos    
Giuseppe Castagna committed
81
do it when the variable occurs at top-level in the test (i.e., the variable is the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
82
83
84
expression being tested) or under some specific positions such as in
nested pairs or at the end of a path of selectors. \beppe{Not precise,
  please check which are the cases that are handled in the cited
Giuseppe Castagna's avatar
space    
Giuseppe Castagna committed
85
  papers}  In this work we aim at removing this limitation on the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
86
contexts and develop a general theory to refine the type of variables
Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
87
that occur in tested expressions under generic contexts, such as variables occurring in the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
88
89
90
91
left or the right expressions of an application. In other words, we aim at establishing a formal framework to
extract as much static information as possible from a type test. We leverage our
analysis on the presence of full-fledged set-theoretic types
connectives provided by the theory of semantic subtyping. Our analysis
92
will also yield two important byproducts. First, to refine the type of the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
93
94
95
96
variables we have to refine the type of the expressions they occur in and
we can use this information to improve our analysis.  Therefore our
occurrence typing approach will refine not only the types of variables
but also the types of generic expressions (bypassing usual type
97
inference). Second, and most importantly, the result of our analysis can be used to infer
Giuseppe Castagna's avatar
Giuseppe Castagna committed
98
intersection types for functions, even in the absence of precise type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
99
annotations such as the one in the definition of \code{foo} in~\eqref{foo2}: to put it simply, we are able to
100
infer the type~\eqref{eq:inter} for the unannotated pure JavaScript
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
code of \code{foo} (i.e., no type annotation at all), while in TypeScript and Flow (and any other
formalism we are aware of) 
this requires an explicit and full type annotation as the one given in~\eqref{foo2}.

Finally, the natural target for occurrence typing are languages with
dynamic type tests, in particular, dynamic languages. To type such
languages occurrence typing is often combined not only, as discussed
above, with set-theoretic types, but also with extensible record types
(to type objects) and gradual type system (to combine static and
dynamic typing) two features that we study in
Section~\ref{sec:extensions} as two extensions of our core formalism. Of
particular interest is the latter. \citet{Gre19} singles out
occurrence typing and gradual typing as \emph{the} two ``lineages'' that
partition the research on combining static and dynamic typing: he
identifies the former as the ``pragmatic,
implementation-oriented dynamic-first'' lineage and the latter as the
``formal, type-theoretic, static-first'' lineage.  Here we demonstrate that
these two ``lineages'' are not orthogonal or mutually independent, and 
we combine occurrence and gradual typing  showing, in particular, how
the former can be used to optimize the compilation of the latter.

\subsection{Motivating examples}
123
We focus our study on conditionals that  test types and consider the following syntax:
Giuseppe Castagna's avatar
Giuseppe Castagna committed
124
125
\(
\ifty{e}{t}{e}{e}
126
\) (e.g., in this syntax the body of \code{foo} in \eqref{foo} and \eqref{foo2} is rendered as
Giuseppe Castagna's avatar
Giuseppe Castagna committed
127
128
129
\(
  \ifty{x}{\Int}{x+1}{(\textsf{trim } x)}
\)).
130
In particular, in this introduction we concentrate on applications, since they constitute the most difficult case and many other cases can be reduced to them. A typical example is the expression
131
132
133
\begin{equation}\label{typical}
\ifty{x_1x_2}{t}{e_1}{e_2}
\end{equation}
Giuseppe Castagna's avatar
Wording    
Giuseppe Castagna committed
134
where $x_i$'s denote variables, $t$ is some type, and $e_i$'s are generic expressions. 
Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
135
Depending on the actual $t$ and on the static types of $x_1$ and $x_2$, we
136
can make type assumptions for $x_1$, for $x_2$, \emph{and} for the application $x_1x_2$
Giuseppe Castagna's avatar
Giuseppe Castagna committed
137
when typing $e_1$ that are different from those we can make when typing
Giuseppe Castagna's avatar
spaces    
Giuseppe Castagna committed
138
$e_2$. For instance, suppose $x_1$ is bound to the function \code{foo} defined in \eqref{foo2}. Thus $x_1$ has type $(\Int\to\Int)\wedge(\String\to\String)$ (we used the syntax of the types of Section~\ref{sec:language} where unions and intersections are denoted by $\vee$ and $\wedge$ and have priority over $\to$ and $\times$, but not over $\neg$).
Giuseppe Castagna's avatar
Giuseppe Castagna committed
139
Then, it is not hard to see that if $x_2:\Int{\vee}\String$, then the expression\footnote{This and most of the following expressions are just given for the sake of example. Determining  the type \emph{in each branch} of expressions other than variables is interesting for constructors but less so for destructors such as applications, projections, and selections: any reasonable programmer would not repeat the same application twice, (s)he would store its result in a variable. This becomes meaningful with constructor such as pairs, as we do for instance in the expression in~\eqref{pair}.}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
140
%
Giuseppe Castagna's avatar
Giuseppe Castagna committed
141
142
143
\begin{equation}\label{mezzo}
\texttt{let }x_1 \texttt{\,=\,}\code{foo}\texttt{ in } \ifty{x_1x_2}{\Int}{((x_1x_2)+x_2)}{\texttt{42}}
\end{equation}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
144
145
%
is well typed with type $\Int$: when typing the branch ``then'' we
146
know that the test $x_1x_2\in \Int$
147
succeeded and that, therefore, not
Giuseppe Castagna's avatar
Giuseppe Castagna committed
148
149
150
only $x_1x_2$ is of type \Int, but also that $x_2$ is of type $\Int$: the other possibility,
$x_2:\String$, would have made the test fail.
For~\eqref{mezzo} we reasoned only on the type of the variables in the ``then'' branch but we can do the same
Giuseppe Castagna's avatar
typos    
Giuseppe Castagna committed
151
on the ``else'' branch as shown by the following expression, where \code{@} denotes string concatenation
Giuseppe Castagna's avatar
Giuseppe Castagna committed
152
\begin{equation}\label{two}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
153
  \ifty{x_1x_2}{\Int}{((x_1x_2)+x_2)}{((x_1x_2)\code{\,@\,}x_2)}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
154
\end{equation}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
155
156
157
158
159
If the static type of
$x_1$ is $(\Int\to\Int)\wedge(\String\to\String)$ then $x_1x_2$ is well
typed only if the static type of $x_2$ is (a subtype of)
$\Int\vee\String$ and from that it is not hard to deduce that~\eqref{two}
has type $\Int\vee\String$. Let us see this in detail. The expression in~\eqref{two} is
Giuseppe Castagna's avatar
Giuseppe Castagna committed
160
typed in the following type environment:
Giuseppe Castagna's avatar
Giuseppe Castagna committed
161
$x_1:(\Int\to\Int)\wedge(\String\to\String), x_2:\Int\vee\String$. All we
Giuseppe Castagna's avatar
Giuseppe Castagna committed
162
163
can deduce, then, is that the application $x_1x_2$ has type
$\Int\vee\String$, which is not enough to type either the ``then'' branch
Giuseppe Castagna's avatar
Giuseppe Castagna committed
164
165
or the ``else'' branch. In order to type the ``then'' branch
$(x_1x_2)+x_2$ we must be able to deduce that both $x_1x_2$ and $x_2$
166
are of type \Int. Since we are in the ``then'' branch, then we know that
Giuseppe Castagna's avatar
typos    
Giuseppe Castagna committed
167
the type test succeeded and that, therefore, $x_1x_2$ has type \Int. Thus
Giuseppe Castagna's avatar
Giuseppe Castagna committed
168
we can assume in typing this branch that $x_1x_2$ has both its static
Giuseppe Castagna's avatar
typo    
Giuseppe Castagna committed
169
type and type \Int{} and, thus, their intersection:
Giuseppe Castagna's avatar
Giuseppe Castagna committed
170
$(\Int\vee\String)\wedge\Int$, that is \Int. For what concerns $x_2$ we
Giuseppe Castagna's avatar
Giuseppe Castagna committed
171
use the static type of $x_1$, that is
Giuseppe Castagna's avatar
Giuseppe Castagna committed
172
$(\Int\to\Int)\wedge(\String\to\String)$, and notice that this function
Giuseppe Castagna's avatar
Giuseppe Castagna committed
173
returns an \Int\ only if its argument is of type \Int. Reasoning as
Giuseppe Castagna's avatar
Giuseppe Castagna committed
174
175
above we thus deduce that in the ``then'' branch the type of $x_2$ is
the intersection of its static type with \Int:
Giuseppe Castagna's avatar
Giuseppe Castagna committed
176
$(\Int\vee\String)\wedge\Int$ that is \Int. To type the ``else'' branch
177
we reason exactly in the same way, with the only difference that, since
Kim Nguyễn's avatar
Kim Nguyễn committed
178
the type test has failed, then we know that the type of the tested expression is
179
\emph{not} \Int. That is, the expression $x_1x_2$ can produce any possible value
Giuseppe Castagna's avatar
typo    
Giuseppe Castagna committed
180
181
barring an \Int. If we denote by \Any\ the type of all values (i.e., the
type \code{any} of TypeScript and Flow) and by
182
$\setminus$ the set difference, then this means that in the else branch we
Giuseppe Castagna's avatar
Wording    
Giuseppe Castagna committed
183
184
know that $x_1x_2$ has type $\Any{\setminus}\Int$---written
$\neg\Int$---, that is, it can return values of any type barred \Int. Reasoning as for the ``then'' branch we then assume that
Giuseppe Castagna's avatar
Giuseppe Castagna committed
185
186
$x_1x_2$ has type $(\Int\vee\String)\wedge\neg\Int$ (i.e., $(\Int\vee\String)\setminus\Int$, that is, \String), that
$x_2$ must be of type \String\ for the application to have type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
187
$\neg\Int$ and therefore we assume that $x_2$ has type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
188
$(\Int\vee\String)\wedge\String$ (i.e., again \String).
Giuseppe Castagna's avatar
Giuseppe Castagna committed
189

Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
190
191
We have seen that we can specialize in both branches the type of the
whole expression $x_1x_2$, the type of the argument $x_2$,
Giuseppe Castagna's avatar
Giuseppe Castagna committed
192
but what about the type of the function $x_1$? Well, this depends on the type of
Kim Nguyễn's avatar
Kim Nguyễn committed
193
$x_1$ itself. In particular, if instead of an intersection type $x_1$
194
195
is typed by a union type (e.g., when the function bound to $x_1$ is
the result of a branching expression), then the test may give us information about
Giuseppe Castagna's avatar
Giuseppe Castagna committed
196
197
the type of the function in the various branches. So for instance if in the expression
in~\eqref{typical} $x_1$ is of type, say, $(s_1\to t)\vee(s_2\to\neg t)$, then we can
Giuseppe Castagna's avatar
Giuseppe Castagna committed
198
assume for the expression~\eqref{typical} that $x_1$ has type $(s_1\to t)$ in the branch ``then'' and
Giuseppe Castagna's avatar
Giuseppe Castagna committed
199
200
$(s_2\to \neg t)$ in the branch ``else''. 
As a more concrete example, if
Giuseppe Castagna's avatar
Giuseppe Castagna committed
201
$x_1:(\Int{\vee}\String\to\Int)\vee(\Bool{\vee}\String\to\Bool)$ and
202
$x_1x_2$ is well-typed, then we can deduce for
Giuseppe Castagna's avatar
Giuseppe Castagna committed
203
\begin{equation}\label{exptre}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
204
\ifty{x_1x_2}{\Int}{(x_1(x_1x_2)+42)}{\texttt{not}(x_1(x_1x_2))}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
205
\end{equation}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
206
207
the type $\Int\vee\Bool$: in the ``then'' branch $x_1$ has type
$\Int{\vee}\String\to\Int$ and $x_1x_2$ is of type $\Int$; in the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
208
``else'' branch $x_1$ has type $\Bool{\vee}\String\to\Bool$ and
Giuseppe Castagna's avatar
Giuseppe Castagna committed
209
210
$x_1x_2$ is of type $\Bool$.

Giuseppe Castagna's avatar
Wording    
Giuseppe Castagna committed
211
Let us recap. If $e$ is an expression of type $t_0$ and we are trying to type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
212
%
213
\(\ifty{e}{t}{e_1}{e_2}\),
Giuseppe Castagna's avatar
Giuseppe Castagna committed
214
%
215
then we can assume that $e$ has type $t_0\wedge t$ when typing $e_1$
Giuseppe Castagna's avatar
Giuseppe Castagna committed
216
and type $t_0\setminus t$ when typing $e_2$. If furthermore $e$ is of
Giuseppe Castagna's avatar
typo    
Giuseppe Castagna committed
217
the form $e'e''$, then we may also be able to specialize the types for $e'$ (in
Giuseppe Castagna's avatar
Giuseppe Castagna committed
218
219
particular if its static type is a union of arrows) and for $e''$ (in
particular if the static type of $e'$ is an intersection of
220
221
arrows). Additionally, we can repeat the reasoning for all subterms of $e'$
and $e''$ as long as they are applications, and deduce distinct types for all subexpressions of $e$ that
Giuseppe Castagna's avatar
Giuseppe Castagna committed
222
form applications. How to do it precisely---not only for applications, but also for other terms such as pairs, projections, records etc---is explained in the rest of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
223
the paper but the key ideas are pretty simple and are presented next.
224

225
226
227
\subsection{Key ideas}\label{sec:ideas}

First of all, in a strict language we can consider a type as denoting the set of values of that type and subtyping as set-containment of the denoted values.
228
229
230
231
232
Imagine we are testing whether the result of an application $e_1e_2$
is of type $t$ or not, and suppose we know that the static types of
$e_1$ and $e_2$ are $t_1$ and $t_2$ respectively. If the application $e_1e_2$ is
well typed, then there is a lot of useful information that we can deduce from it:
first, that $t_1$ is a functional type (i.e., it denotes a set of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
233
well-typed $\lambda$-abstractions, the values of functional type) whose domain, denoted by $\dom{t_1}$, is a type denoting the set of all values that are accepted by any function in
234
$t_1$; second that $t_2$ must be a subtype of the domain of $t_1$;
Giuseppe Castagna's avatar
Giuseppe Castagna committed
235
third, we also know the type of the application, that is the type
236
that denotes all the values that may result from the application of a
237
function in $t_1$ to an argument in $t_2$, type that we denote by
238
$t_1\circ t_2$. For instance, if $t_1=\Int\to\Bool$ and $t_2=\Int$,
239
240
241
then $\dom{t_1}=\Int$ and $t_1\circ t_2 = \Bool$. Notice that, introducing operations such as $\dom{}$ and $\circ$ is redundant when working with simple types, but becomes necessary in the presence of set-theoretic types. If for instance $t_1$ is the type of \eqref{foo2}, that is, \(t_1=(\Int{\to}\Int)\) \(\wedge\) \((\String{\to}\String)\),
%---which  denotes functions that return an integer if applied to an integer and a string if applied to a string---
then
242
\( \dom{t} = \Int \vee \String \), 
243
that is the union of all the possible
244
245
input types, while the precise return type of such a 
function depends on the type of the argument the function is applied to:
Giuseppe Castagna's avatar
Giuseppe Castagna committed
246
either an integer, or a string, or both (i.e., the union type
247
248
249
\(\Int\vee\String\)). So we have \( t_1 \circ \Int = \Int \),
\( t_1 \circ \String = \String \), and 
\( t_1 \circ (\Int\vee\String) = \Int \vee \String \) (see Section~\ref{sec:typeops} for the formal definition of $\circ$). 
250

251
252
What we want to do
is to refine the types of $e_1$ and $e_2$ (i.e., $t_1$ and $t_2$) for the cases where the test
Giuseppe Castagna's avatar
Giuseppe Castagna committed
253
that $e_1e_2$ has type $t$ succeeds or fails. Let us start with refining the type $t_2$ of $e_2$ for the case in
254
255
256
which the test succeeds. Intuitively, we want to remove from $t_2$ all
the values for which the application will surely return a result not
in $t$, thus making the test fail. Consider $t_1$ and let $s$ be the
Kim Nguyễn's avatar
Kim Nguyễn committed
257
258
largest subtype of $\dom{t_1}$ such that%
\svvspace{-1.29mm}
259
260
261
\begin{equation}\label{eq1}
  t_1\circ s\leq \neg t
\end{equation}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
262
In other terms, $s$ contains all the legal arguments that make any function
263
264
265
in $t_1$ return a result not in $t$. Then we can safely remove from
$t_2$ all the values in $s$ or, equivalently, keep in $t_2$ all the
values of $\dom{t_1}$ that are not in $s$. Let us implement the second
266
viewpoint: the set of all elements of $\dom{t_1}$ for which an
267
application \emph{does not} surely give a result in $\neg t$ is
268
denoted $\worra{t_1}t$ (read, ``$t_1$ worra $t$'') and defined as $\min\{u\leq \dom{t_1}\alt
269
270
271
t_1\circ(\dom{t_1}\setminus u)\leq\neg t\}$: it is easy to see that
according to this definition $\dom{t_1}\setminus(\worra{t_1} t)$ is
the largest subset of $\dom{t_1}$ satisfying \eqref{eq1}. Then we can
Giuseppe Castagna's avatar
Giuseppe Castagna committed
272
refine the type of $e_2$ for when the test is successful by using the
273
274
type $t_2\wedge(\worra{t_1} t)$: we intersect all the possible results
of $e_2$, that is $t_2$, with the elements of the domain that
275
276
277
\emph{may} yield a result in $t$, that is $\worra{t_1} t$. When the test fails,
the type of $e_2$ can be refined in a similar way just by replacing $t$ by $\neg t$:
we get the refined type $t_2\land(\worra{t_1}{\neg t})$. To sum up, to refine the type of an
Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
278
argument in the test of an application, all we need is to define
279
$\worra{t_1} t$, the set of arguments that when applied to a function
Giuseppe Castagna's avatar
Giuseppe Castagna committed
280
of type $t_1$ \emph{may} return a result in $t$; then we can refine the 
Giuseppe Castagna's avatar
Giuseppe Castagna committed
281
282
type of $e_2$ as $t_2^+ \eqdeftiny t_2\wedge(\worra{t_1} t)$ in the ``then'' branch (we call it the \emph{positive} branch)  
and as  $t_2^- \eqdeftiny t_2\setminus(\worra{t_1} t)$ in the ``else'' branch (we call it the \emph{negative} branch).
283
As a side remark note\marginpar{\tiny\beppe{Remove if space is needed}}
284
that the set $\worra{t_1} t$ is different from the set of elements that return a
Giuseppe Castagna's avatar
Wording    
Giuseppe Castagna committed
285
result in $t$ (though it is a supertype of it). To see that, consider
286
for $t$ the type \String{} and for $t_1$ the type $(\Bool \to\Bool)\wedge(\Int\to(\String\vee\Int))$,
287
288
that is, the type of functions that when applied to a Boolean return a
Boolean and when applied to an integer return either an integer or a
289
290
string; then we have that $\dom{t_1}=\Int\vee\Bool$ and $\worra{t_1}\String=\Int$,
but there is no (non-empty) type that ensures that an application of a
291
function in $t_1$ will surely yield a $\String$ result.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
292

293

Giuseppe Castagna's avatar
Giuseppe Castagna committed
294
Once we have determined $t_2^+$, it is then not very difficult to refine the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
295
type $t_1$ for the positive branch, too. If the test succeeded, then we know two facts: first,
296
that the function was applied to a value in $t_2^+$ and, second, that
297
298
299
300
301
the application did not diverge and returned a result in
$t$. Therefore, we can exclude from $t_1$ all the functions that, when applied to an argument
in $t_2^+$, yield a result not in $t$.
It can be obtained simply by removing from $t_1$ the functions in
$t_2^+\to \neg t$, that is, we refine the type of $e_1$ in the ``then'' branch as
Giuseppe Castagna's avatar
Giuseppe Castagna committed
302
$t_1^+=t_1\setminus (t_2^+\to\neg t)$.
303
304
Note that this also removes functions diverging on $t_2^+$ arguments.
In particular, the interpretation of a type $t\to s$ is the set of all functions that when applied to an argument of type $t$
Giuseppe Castagna's avatar
Giuseppe Castagna committed
305
either diverge or return a value in $s$. As such the interpretation of $t\to s$ contains
306
307
all the functions that diverge (at least) on $t$. Therefore removing $t\to s$ from a type $u$ removes from $u$
not only all the functions that when applied to a $t$ argument return a result in $s$, but also all the functions that diverge on $t$.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
308
Ergo $t_1\setminus (t_2^+\to
309
310
311
312
\neg t)$ removes, among others, all functions in $t_1$ that diverge on $t_2^+$.
Let us see all this on our example \eqref{exptre}, in particular, by showing how this technique deduces that the type of $x_1$ in the positive branch is (a subtype of) $\Int{\vee}\String\to\Int$.
Take the static type of $x_1$, that is $(\Int{\vee}\String\to\Int)\vee(\Bool{\vee}\String\to\Bool)$ and intersect it with
$(t_2^+\to \neg t)$, that is, $\String\to\neg\Int$. Since intersection distributes over unions we
Kim Nguyễn's avatar
Kim Nguyễn committed
313
314
obtain
\svvspace{-1mm}
315
%
Kim Nguyễn's avatar
Kim Nguyễn committed
316
\[((\Int{\vee}\String{\to}\Int)\wedge\neg(\String{\to}\neg\Int))\vee((\Bool{\vee}\String{\to}\Bool)\wedge\neg(\String{\to}\neg\Int))\svvspace{-1mm}\]
317
318
%
and since
Giuseppe Castagna's avatar
Giuseppe Castagna committed
319
$(\Bool{\vee}\String{\to}\Bool)\wedge\neg(\String{\to}\neg\Int)$ is empty
Giuseppe Castagna's avatar
Giuseppe Castagna committed
320
(because $\String\to\neg\Int$ contains $\Bool{\vee}\String\to\Bool$),
321
then what we obtain is the left summand, a strict subtype of $(\Int{\vee}\String)\to\Int$, namely the functions of type $\Int{\vee}\String{\to}\Int$ minus those
Giuseppe Castagna's avatar
Giuseppe Castagna committed
322
that diverge on all \String{} arguments.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
323
324


Giuseppe Castagna's avatar
Giuseppe Castagna committed
325
326


327

328
This is essentially what we formalize in Section~\ref{sec:language}, in the type system by the rule \Rule{PAppL} and in the typing algorithm with the case \eqref{due} of the definition of the function \constrf. 
329
330


331
\subsection{Technical challenges}\label{sec:challenges}
332

Giuseppe Castagna's avatar
Giuseppe Castagna committed
333
In the previous section we outlined the main ideas of our approach to occurrence typing. However, devil is in the details. So the formalization we give in Section~\ref{sec:language} is not so smooth as we just outlined: we must introduce several auxiliary definitions to handle some corner cases. This section presents by tiny examples the main technical difficulties we had to overcome and the definitions we introduced to handle them. As such it provides a kind of road-map for the technicalities of  Section~\ref{sec:language}.
334

Kim Nguyễn's avatar
Kim Nguyễn committed
335
336
337
338
339
340
341
342
343
344
\paragraph{Typing occurrences} As it should be clear by now, not only variables
but also generic expressions are given different types in the ``then'' and
``else'' branches of type tests. For instance, in \eqref{two} the expression
$x_1x_2$ has type \Int{} in the positive branch and type \Bool{} in the negative
one. In this specific case it is possible to deduce these typings from the
refined types of the variables (in particular, thanks to the fact that $x_2$ has
type \Int{} the positive branch and \Bool{} in the negative one), but this is
not possible in general. For instance, consider $x_1:\Int\to(\Int\vee\Bool)$,
$x_2:\Int$, and the expression
\svvspace{-1mm}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
345
\begin{equation}\label{twobis}
Kim Nguyễn's avatar
Kim Nguyễn committed
346
  \ifty{x_1x_2}{\Int}{...x_1x_2...}{...x_1x_2...}\svvspace{-1mm}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
347
348
\end{equation}
It is not possible to specialize the type of the variables in the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
349
branches. Nevertheless, we want to be able to deduce that $x_1x_2$ has
Giuseppe Castagna's avatar
Giuseppe Castagna committed
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
type \Int{} in the positive branch and type \Bool{} in the negative
one. In order to do so in Section~\ref{sec:language} we will use
special type environments that map not only variables but also generic
expressions to types. So to type, say, the positive branch of
\eqref{twobis} we extend the current type environment with the
hypothesis that the expression $x_1x_2$ has type \Int{}.

When we test the type of an expression we try to deduce the type of
some subexpressions occurring in it. Therefore we must cope with
subexpressions occurring multiple times. A simple example is given by
using product types and pairs as in
$\ifty{(x,x)}{\pair{t_1}{t_2}}{e_1}{e_2}$. It is easy to see that the
positive branch $e_1$ is selected only if $x$ has type $t_1$
\emph{and} type $t_2$ and deduce from that that $x$ must be typed in
$e_1$ by their intersection, $t_1\wedge t_2$. To deal with multiple
Giuseppe Castagna's avatar
Giuseppe Castagna committed
365
366
occurrences of a same subexpression the type inference system of
Section~\ref{sec:language} will use the classic rule for introducing intersections \Rule{Inter}, while the algorithmic counterpart will use the operator $\Refine{}{}{}$ that
Giuseppe Castagna's avatar
Giuseppe Castagna committed
367
368
369
370
371
intersects the static type of an expression with all the types deduced
for the multiple occurrences of it.



Giuseppe Castagna's avatar
spaces    
Giuseppe Castagna committed
372
\paragraph{Type preservation} We want our type system to be sound in the sense of~\citet{Wright1994}, that is, that it
Giuseppe Castagna's avatar
Giuseppe Castagna committed
373
satisfies progress and type preservation. The latter property is
Giuseppe Castagna's avatar
Giuseppe Castagna committed
374
challenging because, as explained just above, our type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
375
assumptions are not only about variables but also about 
Mickael Laurent's avatar
Mickael Laurent committed
376
expressions. Two corner cases are particularly difficult. The first is
Kim Nguyễn's avatar
Kim Nguyễn committed
377
shown by the following example\svvspace{-.9mm}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
378
\begin{equation}\label{bistwo}
Kim Nguyễn's avatar
Kim Nguyễn committed
379
  \ifty{e(42)}{\Bool}{e}{...} \svvspace{-.9mm}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
380
381
382
\end{equation}
If $e$ is an expression of type $\Int\to t$, then, as discussed before,
the positive branch will have type $(\Int\to t)\setminus(\Int\to\neg
Giuseppe Castagna's avatar
Giuseppe Castagna committed
383
\Bool)$. If furthermore the negative branch is of the same type (or of a subtype), then this
Giuseppe Castagna's avatar
Giuseppe Castagna committed
384
will also be the type of the whole expression in \eqref{bistwo}. Now imagine
Giuseppe Castagna's avatar
Giuseppe Castagna committed
385
that the application $e(42)$ reduces to a Boolean value, then the whole
Giuseppe Castagna's avatar
Giuseppe Castagna committed
386
  expression in \eqref{bistwo} reduces to $e$; but this has type
Giuseppe Castagna's avatar
spaces    
Giuseppe Castagna committed
387
  $\Int\to t$ which, in general, is \emph{not} a subtype of $(\Int\to
Giuseppe Castagna's avatar
Giuseppe Castagna committed
388
  t)\setminus(\Int\to\neg\Bool)$, and therefore type is not preserved by the reduction. To cope with this problem, the proof
Giuseppe Castagna's avatar
Giuseppe Castagna committed
389
  of type preservation (see \Appendix\ref{app:subject-reduction}) resorts to \emph{type schemes}, a
Giuseppe Castagna's avatar
spaces    
Giuseppe Castagna committed
390
  technique introduced by~\citet{Frisch2008} to type expressions by
Giuseppe Castagna's avatar
Giuseppe Castagna committed
391
  sets of types, so that the expression in \eqref{bistwo} will have both the types at issue. 
Giuseppe Castagna's avatar
Giuseppe Castagna committed
392

Giuseppe Castagna's avatar
Giuseppe Castagna committed
393
The second corner case is a modification of the example above
Giuseppe Castagna's avatar
Giuseppe Castagna committed
394
where the positive branch is $e(42)$, e.g.,
Giuseppe Castagna's avatar
Giuseppe Castagna committed
395
396
397
398
$\ifty{e(42)}{\Bool}{e(42)}{\textsf{true}}$. In this case the type deduced for the
whole expression is \Bool, while after reduction we would obtain
the expression $e(42)$ which is not of type \Bool{} but of type $t$ (even though it will
eventually reduce to a \Bool). This problem will be handled in the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
399
proof of type preservation by considering parallel reductions (e.g, if
Giuseppe Castagna's avatar
Giuseppe Castagna committed
400
401
$e(42)$ reduces in a step to, say, $\textsf{false}$, then
$\ifty{e(42)}{\Bool}{e(42)}{\textsf{true}}$ reduces in one step to
Giuseppe Castagna's avatar
Giuseppe Castagna committed
402
$\ifty{\textsf{false}}{\Bool}{\textsf{false}}{\textsf{true}}$): see \Appendix\ref{app:parallel}.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
403
404


Giuseppe Castagna's avatar
Giuseppe Castagna committed
405
406
\paragraph{Interdependence of checks} The last class of technical problems arise
from the mutual dependence of different type checks. In particular, there are two cases
Giuseppe Castagna's avatar
Giuseppe Castagna committed
407
that pose a problem. The first can be shown by two functions $f$ and $g$ both of type $(\Int\to\Int)\wedge(\Any\to\Bool)$, $x$ of type $\Any$ and the test:
Giuseppe Castagna's avatar
Giuseppe Castagna committed
408
\begin{equation}\label{nest1}
Kim Nguyễn's avatar
.    
Kim Nguyễn committed
409
\ifty{(f\,x,g\,x)}{\pair\Int\Bool}{\,...\,}{\,...}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
410
\end{equation}
Kim Nguyễn's avatar
.    
Kim Nguyễn committed
411
If we independently check $f\,x$ against $\Int$ and $g\,x$ against $\Bool$
Giuseppe Castagna's avatar
Giuseppe Castagna committed
412
413
414
we deduce $\Int$ for the first occurrence of $x$ and $\Any$ for the
second. Thus we would type the positive branch of \eqref{nest1} under the hypothesis
that $x$ is of type $\Int$. But if we use the hypothesis generated by
Kim Nguyễn's avatar
.    
Kim Nguyễn committed
415
the test of $f\,x$, that is, that $x$ is of type \Int, to check $g\,x$ against \Bool,
Giuseppe Castagna's avatar
Giuseppe Castagna committed
416
then the type deduced for $x$ is $\Empty$---i.e., the branch is never selected. In other words, we want to produce type
417
environments for occurrence typing by taking into account  all
Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
418
the available hypotheses, even when these hypotheses are formulated later
Giuseppe Castagna's avatar
Giuseppe Castagna committed
419
in the flow of control. This will be done in the type systems of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
420
421
422
Section~\ref{sec:language} by the rule \Rule{Path} and will require at
algorithmic level to look for a fix-point solution of a function, or
an approximation thereof.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
423

424
425
Finally, a nested check may help refining
the type assumptions on some outer expressions. For instance, when typing
Kim Nguyễn's avatar
Kim Nguyễn committed
426
the positive branch $e$ of\svvspace{-.9mm}
427
\begin{equation}\label{pair}
Kim Nguyễn's avatar
Kim Nguyễn committed
428
\ifty{(x,y)}{(\pair{(\Int\vee\Bool)}\Int)}{e}{...}\svvspace{-.9mm}
429
\end{equation}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
430
431
432
we can assume that the expression $(x,y)$ is of type
$\pair{(\Int\vee\Bool)}\Int$ and put it in the type environment. But
if in $e$ there is a test like
433
$\ifty{x}{\Int}{{\color{darkred}(x,y)}}{(...)}$ then we do not want use
Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
434
the assumption in the type environment to type the expression $(x,y)$
Giuseppe Castagna's avatar
Giuseppe Castagna committed
435
436
occurring in the inner test (in red). Instead we want to give to that occurrence of the expression
$(x,y)$ the type $\pair{\Int}\Int$. This will be done by temporary
Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
437
438
removing the type assumption about $(x,y)$ from the type environment and
by retyping the expression without that assumption (see rule
Giuseppe Castagna's avatar
Giuseppe Castagna committed
439
\Rule{Env\Aa} in Section~\ref{sec:algorules}).
440
441


Giuseppe Castagna's avatar
Giuseppe Castagna committed
442
443
444
445
446
447
448
\iflongversion
\subsubsection*{Outline}
\else
\subsubsection*{Outline and Contributions}
\fi
In Section~\ref{sec:language} we formalize the
ideas we just presented: we define the types and
Giuseppe Castagna's avatar
Giuseppe Castagna committed
449
450
451
expressions of our system, their dynamic semantics and a type system that
implements occurrence typing together with the algorithms that decide
whether an expression is well typed or not. Section~\ref{sec:extensions} extends our formalism to record
452
453
454
types and presents two applications of our analysis: the inference of
arrow types for functions and a static analysis to reduce the number
of casts inserted by a compiler of a gradually-typed
Giuseppe Castagna's avatar
Giuseppe Castagna committed
455
language. Practical aspects are discussed in
Giuseppe Castagna's avatar
Giuseppe Castagna committed
456
457
458
459
460
Section~\ref{sec:practical} where we give several paradigmatic examples of code typed by our prototype implementation, that can be interactively tested at
\ifsubmission
the (anonymized) site
\fi
\url{https://occtyping.github.io/}. Section~\ref{sec:related} presents
Giuseppe Castagna's avatar
Giuseppe Castagna committed
461
462
463
related work.
\iflongversion
A discussion of future work concludes this
464
presentation.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
465
\fi
Giuseppe Castagna's avatar
typos    
Giuseppe Castagna committed
466
%
Giuseppe Castagna's avatar
Giuseppe Castagna committed
467
468
To ease the presentation all the proofs are
omitted from the main text and can be found in the appendix
Giuseppe Castagna's avatar
Giuseppe Castagna committed
469
\ifsubmission
Giuseppe Castagna's avatar
Giuseppe Castagna committed
470
joint to the submission as supplemental material.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
471
\else
472
available online.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
473
\fi
Giuseppe Castagna's avatar
Giuseppe Castagna committed
474

Giuseppe Castagna's avatar
Giuseppe Castagna committed
475
476
477
\iflongversion
\subsubsection*{Contributions}
\fi
478
\noindent The main contributions of our work can be summarized as follows:
Giuseppe Castagna's avatar
Giuseppe Castagna committed
479
\begin{itemize}[left=0pt .. \parindent,nosep]
480
 \item We provide a theoretical framework to refine the type of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
481
   expressions occurring in type tests, thus removing the limitations
482
   of current occurrence typing approaches which require both the tests and the refinement
Giuseppe Castagna's avatar
Giuseppe Castagna committed
483
484
   to take place on variables.

Giuseppe Castagna's avatar
Giuseppe Castagna committed
485
 \item We define a type-theoretic approach alternative to the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
486
487
488
489
490
491
   current flow-based approaches. As such it provides different
   results and it can be thus profitably combined with flow-based techniques.
   
 \item We use our analysis for defining a formal framework that
   reconstructs intersection types for unannotated or
   partially-annotated functions, something that, in our ken, no
492
   other current system can do.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
493
494
495
   
 \item We prove the soundness of our system. We define algorithms to
   infer the types that we prove to be sound and show different completeness results
496
497
498
499
   which in practice yield the completeness of any reasonable implementation.

 \item We show how to extend our approach to records with field
   addition, update, and deletion operations.  
Giuseppe Castagna's avatar
Giuseppe Castagna committed
500
   
501
502
 \item We show how  occurrence typing can be extended to and combined with
   gradual typing and apply our results to optimize the compilation of the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
503
504
   latter.

Giuseppe Castagna's avatar
Giuseppe Castagna committed
505
506
% \item We show how it is possible to extend our framework to
%   exploit the richer information provided by polymorphic types.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
507
\end{itemize}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
508
  We end this introduction by stressing the practical
Giuseppe Castagna's avatar
Giuseppe Castagna committed
509
  implications of our work: a perfunctory inspection may give the
510
  wrong impression that the only interest of the heavy formalization
Giuseppe Castagna's avatar
Giuseppe Castagna committed
511
  that follows is to have generic expressions, rather than just variables, in type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
512
513
514
515
516
  cases: this would be a bad trade-off. The important point is,
  instead, that our formalization is what makes analyses such as those
  presented in Section~\ref{sec:extensions} possible (e.g., the
  reconstruction of the type~\eqref{eq:inter} for the unannotated pure
  JavaScript code of \code{foo}), which is where the actual added
517
  practical value and potential of our work resides.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
518