intro.tex 32.5 KB
 Giuseppe Castagna committed May 11, 2021 1 TypeScript and Flow are extensions of JavaScript that allow the programmer to specify in the code type annotations used to statically type-check the program. For instance, the following function definition is valid in both languages  Giuseppe Castagna committed Dec 07, 2018 2 3 \begin{alltt}\color{darkblue} function foo(x\textcolor{darkred}{ : number | string}) \{  Giuseppe Castagna committed May 05, 2021 4  return (typeof(x) === "number")? x+1 : x.trim(); \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{foo}  Giuseppe Castagna committed Dec 07, 2018 5  \}  Giuseppe Castagna committed Dec 06, 2018 6 \end{alltt}  Giuseppe Castagna committed Jan 03, 2019 7 Apart from the type annotation (in red) of the function parameter, the above is  Giuseppe Castagna committed Dec 07, 2018 8 standard JavaScript code defining a function that checks whether  Giuseppe Castagna committed May 29, 2019 9 its argument is an integer; if it is so, then it returns the argument's successor  Kim Nguyễn committed Mar 03, 2020 10 (\code{x+1}), otherwise it calls the method \code{trim()} of the  Giuseppe Castagna committed Dec 07, 2018 11 argument. The annotation specifies that the parameter is either a  Giuseppe Castagna committed Dec 07, 2018 12 number or a string (the vertical bar denotes a union type). If this annotation is respected and the function  Giuseppe Castagna committed Jan 03, 2019 13 is applied to either an integer or a string, then the application  Giuseppe Castagna committed Jul 03, 2019 14 15 16 cannot fail because of a type error (\code{trim()} is a string method of the ECMAScript 5 standard that trims whitespaces from the beginning and end of the string) and both the type-checker of TypeScript and the one of Flow rightly accept this function. This is possible because both type-checkers  Giuseppe Castagna committed Dec 07, 2018 17 implement a specific type discipline called \emph{occurrence typing} or \emph{flow  Giuseppe Castagna committed Dec 10, 2018 18 typing}:\footnote{%  Giuseppe Castagna committed Dec 07, 2018 19 20  TypeScript calls it type guard recognition'' while Flow uses the terminology type refinements''.}  Giuseppe Castagna committed Jan 03, 2019 21 as a matter of fact, standard type disciplines would reject this function.  Giuseppe Castagna committed Dec 10, 2018 22 The reason for that is that standard type disciplines would try to  Giuseppe Castagna committed Jul 02, 2019 23 type every part of the body of the function under the assumption that \code{x} has  Giuseppe Castagna committed Jan 03, 2019 24 type \code{number\,|\,string} and they would fail, since the successor is  Giuseppe Castagna committed Jul 02, 2019 25 not defined for strings and the method \code{trim()} is not defined  Giuseppe Castagna committed Dec 07, 2018 26 for numbers. This is so because standard disciplines do not take into account the type  Giuseppe Castagna committed Dec 07, 2018 27 test performed on \code{x}. Occurrence typing is the typing technique  Giuseppe Castagna committed Feb 25, 2020 28 that uses the information provided by the test to specialize---precisely, to \emph{refine}---the type  Giuseppe Castagna committed Jul 08, 2020 29 of the occurrences of \code{x} in the branches of the conditional: since the program tested that  Giuseppe Castagna committed Dec 07, 2018 30 31 \code{x} is of type \code{number}, then we can safely assume that \code{x} is of type \code{number} in the then'' branch, and that it  Giuseppe Castagna committed Jan 03, 2019 32 is \emph{not} of type \code{number} (and thus deduce from the type annotation that it must be of type  Giuseppe Castagna committed Dec 07, 2018 33 34 \code{string}) in the else'' branch.  Giuseppe Castagna committed Jul 03, 2019 35 Occurrence typing was first defined and formally studied by  Kim Nguyễn committed Jul 03, 2019 36 \citet{THF08} to statically type-check untyped Scheme programs, and later extended by \citet{THF10}  Victor Lanvin committed Jul 10, 2019 37 yielding the development of Typed Racket. From its inception,  Giuseppe Castagna committed Jul 03, 2019 38 39 occurrence typing was intimately tied to type systems with set-theoretic types: unions, intersections, and negation of  Victor Lanvin committed Jul 10, 2019 40 types. Union was the first type connective to appear, since it was already used by  Giuseppe Castagna committed Jul 03, 2019 41 42 43 44 45 46 \citet{THF08} where its presence was needed to characterize the different control flows of a type test, as our \code{foo} example shows: one flow for integer arguments and another for strings. Intersection types appear (in limited forms) combined with occurrence typing both in TypeScript and in Flow and serve to give, among other, more precise types to functions such as \code{foo}. For instance,  Kim Nguyễn committed Mar 03, 2020 47 since \code{x\,+\,1} evaluates to an integer and \code{x.trim()} to a string, then our  Giuseppe Castagna committed Jul 03, 2019 48 function \code{foo} has type  Kim Nguyễn committed Feb 28, 2020 49 50 51 52 53 54 \code{(number|string)$\to$(number|string)}. % \footnote{\label{null}Actually, % both Flow and TypeScript deduce as return type % \code{number|string|void}, since they track when operations may yield \code{void} results. Considering this would be easy but also clutter our presentation, which is why we omit % such details.} But it is clear that a more precise type would be  Giuseppe Castagna committed Jul 03, 2019 55 one that states that \code{foo} returns a number when it is applied to  Victor Lanvin committed Jul 10, 2019 56 a number and returns a string when it is applied to a string, so that the type deduced for, say, \code{foo(42)} would be \code{number} rather than \code{number|string}. This is  Giuseppe Castagna committed Feb 25, 2020 57 58 59 60 61 exactly what the \emph{intersection type} \label{eq:inter} \code{(number$\to$number) \& (string$\to$string)} states (intuitively, an expression has an intersection of types, noted \code{\&}, if and only if it has all the types of the intersection) and corresponds in Flow to declaring \code{foo} as follows:  Giuseppe Castagna committed Jul 03, 2019 62 63 \begin{alltt}\color{darkblue} var foo : \textcolor{darkred}{(number => number) & (string => string)} = x => \{  Giuseppe Castagna committed May 05, 2021 64  return (typeof(x) === "number")? x+1 : x.trim(); \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{foo2}  Giuseppe Castagna committed Jul 11, 2019 65  \}  Giuseppe Castagna committed Jul 03, 2019 66 67 68 69 70 71 \end{alltt} For what concerns negation types, they are pervasive in the occurrence typing approach, even though they are used only at meta-theoretic level,\footnote{At the moment of writing there is a pending pull request to add negation types to the syntax of TypeScript, but that is all.} in particular to determine the type environment when the type case  Giuseppe Castagna committed Jul 08, 2019 72 fails. We already saw negation types at work when we informally typed the else''  Giuseppe Castagna committed Jul 03, 2019 73 branch in \code{foo}, for which we assumed that $x$ did \emph{not} have  Giuseppe Castagna committed Jul 04, 2019 74 75 type \code{number}---i.e., it had the (negation) type \code{$\neg$number}---and deduced from it that $x$ then had type \code{string}---i.e.,  Giuseppe Castagna committed Jul 05, 2019 76 77 \code{(number|string)\&$\neg$number} which is equivalent to the set-theoretic difference \code{(number|string)\textbackslash\,number} and, thus, to \code{string}.  Giuseppe Castagna committed Dec 07, 2018 78   Giuseppe Castagna committed Jul 08, 2019 79 The approaches cited above essentially focus on refining the type  Kim Nguyễn committed Jul 03, 2019 80 of variables that occur in an expression whose type is being tested. They  Giuseppe Castagna committed Jul 04, 2019 81 do it when the variable occurs at top-level in the test (i.e., the variable is the  Giuseppe Castagna committed Jul 03, 2019 82 83 84 expression being tested) or under some specific positions such as in nested pairs or at the end of a path of selectors. \beppe{Not precise, please check which are the cases that are handled in the cited  Giuseppe Castagna committed Jul 10, 2019 85  papers} In this work we aim at removing this limitation on the  Giuseppe Castagna committed Jul 03, 2019 86 contexts and develop a general theory to refine the type of variables  Victor Lanvin committed Jul 10, 2019 87 that occur in tested expressions under generic contexts, such as variables occurring in the  Giuseppe Castagna committed Jul 03, 2019 88 89 90 91 left or the right expressions of an application. In other words, we aim at establishing a formal framework to extract as much static information as possible from a type test. We leverage our analysis on the presence of full-fledged set-theoretic types connectives provided by the theory of semantic subtyping. Our analysis  Giuseppe Castagna committed May 11, 2021 92 will also yield two important byproducts. First, to refine the type of the  Giuseppe Castagna committed Jul 03, 2019 93 94 95 96 variables we have to refine the type of the expressions they occur in and we can use this information to improve our analysis. Therefore our occurrence typing approach will refine not only the types of variables but also the types of generic expressions (bypassing usual type  Giuseppe Castagna committed Jul 08, 2020 97 inference). Second, and most importantly, the result of our analysis can be used to infer  Giuseppe Castagna committed Jul 03, 2019 98 intersection types for functions, even in the absence of precise type  Giuseppe Castagna committed Mar 02, 2020 99 annotations such as the one in the definition of \code{foo} in~\eqref{foo2}: to put it simply, we are able to  Giuseppe Castagna committed Jul 08, 2020 100 infer the type~\eqref{eq:inter} for the unannotated pure JavaScript  Giuseppe Castagna committed May 11, 2021 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 code of \code{foo} (i.e., no type annotation at all), while in TypeScript and Flow (and any other formalism we are aware of) this requires an explicit and full type annotation as the one given in~\eqref{foo2}. Finally, the natural target for occurrence typing are languages with dynamic type tests, in particular, dynamic languages. To type such languages occurrence typing is often combined not only, as discussed above, with set-theoretic types, but also with extensible record types (to type objects) and gradual type system (to combine static and dynamic typing) two features that we study in Section~\ref{sec:extensions} as two extensions of our core formalism. Of particular interest is the latter. \citet{Gre19} singles out occurrence typing and gradual typing as \emph{the} two lineages'' that partition the research on combining static and dynamic typing: he identifies the former as the pragmatic, implementation-oriented dynamic-first'' lineage and the latter as the formal, type-theoretic, static-first'' lineage. Here we demonstrate that these two lineages'' are not orthogonal or mutually independent, and we combine occurrence and gradual typing showing, in particular, how the former can be used to optimize the compilation of the latter. \subsection{Motivating examples}  Giuseppe Castagna committed Jan 03, 2019 123 We focus our study on conditionals that test types and consider the following syntax:  Giuseppe Castagna committed Dec 07, 2018 124 125 $$\ifty{e}{t}{e}{e}  Giuseppe Castagna committed Jul 08, 2020 126 $$ (e.g., in this syntax the body of \code{foo} in \eqref{foo} and \eqref{foo2} is rendered as  Giuseppe Castagna committed Jul 03, 2019 127 128 129 $$\ifty{x}{\Int}{x+1}{(\textsf{trim } x)}$$).  Giuseppe Castagna committed Jul 08, 2019 130 In particular, in this introduction we concentrate on applications, since they constitute the most difficult case and many other cases can be reduced to them. A typical example is the expression  Giuseppe Castagna committed Apr 01, 2019 131 132 133 \label{typical} \ifty{x_1x_2}{t}{e_1}{e_2}  Giuseppe Castagna committed May 28, 2019 134 where $x_i$'s denote variables, $t$ is some type, and $e_i$'s are generic expressions.  Victor Lanvin committed Jul 10, 2019 135 Depending on the actual $t$ and on the static types of $x_1$ and $x_2$, we  Giuseppe Castagna committed Jul 08, 2019 136 can make type assumptions for $x_1$, for $x_2$, \emph{and} for the application $x_1x_2$  Giuseppe Castagna committed Dec 10, 2018 137 when typing $e_1$ that are different from those we can make when typing  Giuseppe Castagna committed Jul 10, 2019 138 $e_2$. For instance, suppose $x_1$ is bound to the function \code{foo} defined in \eqref{foo2}. Thus $x_1$ has type $(\Int\to\Int)\wedge(\String\to\String)$ (we used the syntax of the types of Section~\ref{sec:language} where unions and intersections are denoted by $\vee$ and $\wedge$ and have priority over $\to$ and $\times$, but not over $\neg$).  Giuseppe Castagna committed Mar 02, 2020 139 Then, it is not hard to see that if $x_2:\Int{\vee}\String$, then the expression\footnote{This and most of the following expressions are just given for the sake of example. Determining the type \emph{in each branch} of expressions other than variables is interesting for constructors but less so for destructors such as applications, projections, and selections: any reasonable programmer would not repeat the same application twice, (s)he would store its result in a variable. This becomes meaningful with constructor such as pairs, as we do for instance in the expression in~\eqref{pair}.}  Giuseppe Castagna committed Dec 07, 2018 140 %  Giuseppe Castagna committed Jul 03, 2019 141 142 143 \label{mezzo} \texttt{let }x_1 \texttt{\,=\,}\code{foo}\texttt{ in } \ifty{x_1x_2}{\Int}{((x_1x_2)+x_2)}{\texttt{42}}  Giuseppe Castagna committed Dec 06, 2018 144 145 % is well typed with type $\Int$: when typing the branch then'' we  Giuseppe Castagna committed Jul 09, 2019 146 know that the test $x_1x_2\in \Int$  Giuseppe Castagna committed Jul 08, 2020 147 succeeded and that, therefore, not  Giuseppe Castagna committed Jul 03, 2019 148 149 150 only $x_1x_2$ is of type \Int, but also that $x_2$ is of type $\Int$: the other possibility, $x_2:\String$, would have made the test fail. For~\eqref{mezzo} we reasoned only on the type of the variables in the then'' branch but we can do the same  Giuseppe Castagna committed Jul 04, 2019 151 on the else'' branch as shown by the following expression, where \code{@} denotes string concatenation  Giuseppe Castagna committed Apr 17, 2019 152 \label{two}  Giuseppe Castagna committed Jul 03, 2019 153  \ifty{x_1x_2}{\Int}{((x_1x_2)+x_2)}{((x_1x_2)\code{\,@\,}x_2)}  Giuseppe Castagna committed Dec 06, 2018 154   Giuseppe Castagna committed Jul 03, 2019 155 156 157 158 159 If the static type of $x_1$ is $(\Int\to\Int)\wedge(\String\to\String)$ then $x_1x_2$ is well typed only if the static type of $x_2$ is (a subtype of) $\Int\vee\String$ and from that it is not hard to deduce that~\eqref{two} has type $\Int\vee\String$. Let us see this in detail. The expression in~\eqref{two} is  Giuseppe Castagna committed Dec 07, 2018 160 typed in the following type environment:  Giuseppe Castagna committed Jul 03, 2019 161 $x_1:(\Int\to\Int)\wedge(\String\to\String), x_2:\Int\vee\String$. All we  Giuseppe Castagna committed Feb 25, 2020 162 163 can deduce, then, is that the application $x_1x_2$ has type $\Int\vee\String$, which is not enough to type either the then'' branch  Giuseppe Castagna committed Dec 07, 2018 164 165 or the else'' branch. In order to type the then'' branch $(x_1x_2)+x_2$ we must be able to deduce that both $x_1x_2$ and $x_2$  Giuseppe Castagna committed Jan 03, 2019 166 are of type \Int. Since we are in the then'' branch, then we know that  Giuseppe Castagna committed Mar 25, 2019 167 the type test succeeded and that, therefore, $x_1x_2$ has type \Int. Thus  Giuseppe Castagna committed Dec 07, 2018 168 we can assume in typing this branch that $x_1x_2$ has both its static  Giuseppe Castagna committed Mar 25, 2019 169 type and type \Int{} and, thus, their intersection:  Giuseppe Castagna committed Jul 03, 2019 170 $(\Int\vee\String)\wedge\Int$, that is \Int. For what concerns $x_2$ we  Giuseppe Castagna committed Dec 07, 2018 171 use the static type of $x_1$, that is  Giuseppe Castagna committed Jul 03, 2019 172 $(\Int\to\Int)\wedge(\String\to\String)$, and notice that this function  Giuseppe Castagna committed May 29, 2019 173 returns an \Int\ only if its argument is of type \Int. Reasoning as  Giuseppe Castagna committed Dec 07, 2018 174 175 above we thus deduce that in the then'' branch the type of $x_2$ is the intersection of its static type with \Int:  Giuseppe Castagna committed Jul 03, 2019 176 $(\Int\vee\String)\wedge\Int$ that is \Int. To type the else'' branch  Giuseppe Castagna committed Jan 03, 2019 177 we reason exactly in the same way, with the only difference that, since  Kim Nguyễn committed Jul 03, 2019 178 the type test has failed, then we know that the type of the tested expression is  Giuseppe Castagna committed Jan 03, 2019 179 \emph{not} \Int. That is, the expression $x_1x_2$ can produce any possible value  Giuseppe Castagna committed Sep 17, 2019 180 181 barring an \Int. If we denote by \Any\ the type of all values (i.e., the type \code{any} of TypeScript and Flow) and by  Giuseppe Castagna committed Jan 03, 2019 182 $\setminus$ the set difference, then this means that in the else branch we  Giuseppe Castagna committed May 28, 2019 183 184 know that $x_1x_2$ has type $\Any{\setminus}\Int$---written $\neg\Int$---, that is, it can return values of any type barred \Int. Reasoning as for the then'' branch we then assume that  Giuseppe Castagna committed Jul 03, 2019 185 186 $x_1x_2$ has type $(\Int\vee\String)\wedge\neg\Int$ (i.e., $(\Int\vee\String)\setminus\Int$, that is, \String), that $x_2$ must be of type \String\ for the application to have type  Giuseppe Castagna committed Dec 07, 2018 187 $\neg\Int$ and therefore we assume that $x_2$ has type  Giuseppe Castagna committed Jul 03, 2019 188 $(\Int\vee\String)\wedge\String$ (i.e., again \String).  Giuseppe Castagna committed Dec 07, 2018 189   Victor Lanvin committed Jul 10, 2019 190 191 We have seen that we can specialize in both branches the type of the whole expression $x_1x_2$, the type of the argument $x_2$,  Giuseppe Castagna committed Feb 25, 2020 192 but what about the type of the function $x_1$? Well, this depends on the type of  Kim Nguyễn committed Jul 03, 2019 193 $x_1$ itself. In particular, if instead of an intersection type $x_1$  Giuseppe Castagna committed Jul 08, 2020 194 195 is typed by a union type (e.g., when the function bound to $x_1$ is the result of a branching expression), then the test may give us information about  Giuseppe Castagna committed Apr 17, 2019 196 197 the type of the function in the various branches. So for instance if in the expression in~\eqref{typical} $x_1$ is of type, say, $(s_1\to t)\vee(s_2\to\neg t)$, then we can  Giuseppe Castagna committed Feb 25, 2020 198 assume for the expression~\eqref{typical} that $x_1$ has type $(s_1\to t)$ in the branch then'' and  Giuseppe Castagna committed Apr 17, 2019 199 200 $(s_2\to \neg t)$ in the branch else''. As a more concrete example, if  Giuseppe Castagna committed Dec 07, 2018 201 $x_1:(\Int{\vee}\String\to\Int)\vee(\Bool{\vee}\String\to\Bool)$ and  Giuseppe Castagna committed Jan 03, 2019 202 $x_1x_2$ is well-typed, then we can deduce for  Giuseppe Castagna committed Jul 03, 2019 203 \label{exptre}  Giuseppe Castagna committed Dec 07, 2018 204 \ifty{x_1x_2}{\Int}{(x_1(x_1x_2)+42)}{\texttt{not}(x_1(x_1x_2))}  Giuseppe Castagna committed Apr 17, 2019 205   Giuseppe Castagna committed Dec 07, 2018 206 207 the type $\Int\vee\Bool$: in the then'' branch $x_1$ has type $\Int{\vee}\String\to\Int$ and $x_1x_2$ is of type $\Int$; in the  Giuseppe Castagna committed Apr 17, 2019 208 else'' branch $x_1$ has type $\Bool{\vee}\String\to\Bool$ and  Giuseppe Castagna committed Dec 07, 2018 209 210 $x_1x_2$ is of type $\Bool$.  Giuseppe Castagna committed May 28, 2019 211 Let us recap. If $e$ is an expression of type $t_0$ and we are trying to type  Giuseppe Castagna committed Dec 07, 2018 212 %  Giuseppe Castagna committed Jul 08, 2019 213 $$\ifty{e}{t}{e_1}{e_2}$$,  Giuseppe Castagna committed Dec 07, 2018 214 %  Giuseppe Castagna committed Jan 03, 2019 215 then we can assume that $e$ has type $t_0\wedge t$ when typing $e_1$  Giuseppe Castagna committed Dec 07, 2018 216 and type $t_0\setminus t$ when typing $e_2$. If furthermore $e$ is of  Giuseppe Castagna committed Mar 25, 2019 217 the form $e'e''$, then we may also be able to specialize the types for $e'$ (in  Giuseppe Castagna committed Dec 10, 2018 218 219 particular if its static type is a union of arrows) and for $e''$ (in particular if the static type of $e'$ is an intersection of  Giuseppe Castagna committed Jan 03, 2019 220 221 arrows). Additionally, we can repeat the reasoning for all subterms of $e'$ and $e''$ as long as they are applications, and deduce distinct types for all subexpressions of $e$ that  Giuseppe Castagna committed Jul 03, 2019 222 form applications. How to do it precisely---not only for applications, but also for other terms such as pairs, projections, records etc---is explained in the rest of  Giuseppe Castagna committed Mar 04, 2020 223 the paper but the key ideas are pretty simple and are presented next.  Giuseppe Castagna committed Jan 03, 2019 224   Giuseppe Castagna committed Jun 26, 2019 225 226 227 \subsection{Key ideas}\label{sec:ideas} First of all, in a strict language we can consider a type as denoting the set of values of that type and subtyping as set-containment of the denoted values.  Giuseppe Castagna committed Jan 03, 2019 228 229 230 231 232 Imagine we are testing whether the result of an application $e_1e_2$ is of type $t$ or not, and suppose we know that the static types of $e_1$ and $e_2$ are $t_1$ and $t_2$ respectively. If the application $e_1e_2$ is well typed, then there is a lot of useful information that we can deduce from it: first, that $t_1$ is a functional type (i.e., it denotes a set of  Giuseppe Castagna committed Feb 25, 2020 233 well-typed $\lambda$-abstractions, the values of functional type) whose domain, denoted by $\dom{t_1}$, is a type denoting the set of all values that are accepted by any function in  Giuseppe Castagna committed Jan 03, 2019 234 $t_1$; second that $t_2$ must be a subtype of the domain of $t_1$;  Giuseppe Castagna committed Apr 17, 2019 235 third, we also know the type of the application, that is the type  Giuseppe Castagna committed Apr 21, 2019 236 that denotes all the values that may result from the application of a  Giuseppe Castagna committed Jul 05, 2019 237 function in $t_1$ to an argument in $t_2$, type that we denote by  Giuseppe Castagna committed Jan 03, 2019 238 $t_1\circ t_2$. For instance, if $t_1=\Int\to\Bool$ and $t_2=\Int$,  Giuseppe Castagna committed Jul 08, 2019 239 240 241 then $\dom{t_1}=\Int$ and $t_1\circ t_2 = \Bool$. Notice that, introducing operations such as $\dom{}$ and $\circ$ is redundant when working with simple types, but becomes necessary in the presence of set-theoretic types. If for instance $t_1$ is the type of \eqref{foo2}, that is, $$t_1=(\Int{\to}\Int)$$ $$\wedge$$ $$(\String{\to}\String)$$, %---which denotes functions that return an integer if applied to an integer and a string if applied to a string--- then  Giuseppe Castagna committed Jul 05, 2019 242 $$\dom{t} = \Int \vee \String$$,  Giuseppe Castagna committed Jul 04, 2019 243 that is the union of all the possible  Giuseppe Castagna committed Jul 05, 2019 244 245 input types, while the precise return type of such a function depends on the type of the argument the function is applied to:  Giuseppe Castagna committed Jul 03, 2020 246 either an integer, or a string, or both (i.e., the union type  Giuseppe Castagna committed Jul 05, 2019 247 248 249 $$\Int\vee\String$$). So we have $$t_1 \circ \Int = \Int$$, $$t_1 \circ \String = \String$$, and $$t_1 \circ (\Int\vee\String) = \Int \vee \String$$ (see Section~\ref{sec:typeops} for the formal definition of $\circ$).  Giuseppe Castagna committed Jan 03, 2019 250   Giuseppe Castagna committed Jul 04, 2019 251 252 What we want to do is to refine the types of $e_1$ and $e_2$ (i.e., $t_1$ and $t_2$) for the cases where the test  Giuseppe Castagna committed Jul 04, 2019 253 that $e_1e_2$ has type $t$ succeeds or fails. Let us start with refining the type $t_2$ of $e_2$ for the case in  Giuseppe Castagna committed Jan 03, 2019 254 255 256 which the test succeeds. Intuitively, we want to remove from $t_2$ all the values for which the application will surely return a result not in $t$, thus making the test fail. Consider $t_1$ and let $s$ be the  Kim Nguyễn committed Feb 22, 2021 257 258 largest subtype of $\dom{t_1}$ such that% \svvspace{-1.29mm}  Giuseppe Castagna committed Jan 03, 2019 259 260 261 \label{eq1} t_1\circ s\leq \neg t  Giuseppe Castagna committed Mar 02, 2020 262 In other terms, $s$ contains all the legal arguments that make any function  Giuseppe Castagna committed Jan 03, 2019 263 264 265 in $t_1$ return a result not in $t$. Then we can safely remove from $t_2$ all the values in $s$ or, equivalently, keep in $t_2$ all the values of $\dom{t_1}$ that are not in $s$. Let us implement the second  Giuseppe Castagna committed Apr 21, 2019 266 viewpoint: the set of all elements of $\dom{t_1}$ for which an  Giuseppe Castagna committed Jan 03, 2019 267 application \emph{does not} surely give a result in $\neg t$ is  Giuseppe Castagna committed Jul 08, 2019 268 denoted $\worra{t_1}t$ (read, $t_1$ worra $t$'') and defined as $\min\{u\leq \dom{t_1}\alt  Giuseppe Castagna committed Jan 03, 2019 269 270 271 t_1\circ(\dom{t_1}\setminus u)\leq\neg t\}$: it is easy to see that according to this definition $\dom{t_1}\setminus(\worra{t_1} t)$ is the largest subset of $\dom{t_1}$ satisfying \eqref{eq1}. Then we can  Giuseppe Castagna committed Apr 17, 2019 272 refine the type of $e_2$ for when the test is successful by using the  Giuseppe Castagna committed Jan 03, 2019 273 274 type $t_2\wedge(\worra{t_1} t)$: we intersect all the possible results of $e_2$, that is $t_2$, with the elements of the domain that  Mickael Laurent committed Sep 09, 2020 275 276 277 \emph{may} yield a result in $t$, that is $\worra{t_1} t$. When the test fails, the type of $e_2$ can be refined in a similar way just by replacing $t$ by $\neg t$: we get the refined type $t_2\land(\worra{t_1}{\neg t})$. To sum up, to refine the type of an  Victor Lanvin committed Jul 10, 2019 278 argument in the test of an application, all we need is to define  Giuseppe Castagna committed Jan 03, 2019 279 $\worra{t_1} t$, the set of arguments that when applied to a function  Giuseppe Castagna committed Jan 03, 2019 280 of type $t_1$ \emph{may} return a result in $t$; then we can refine the  Giuseppe Castagna committed May 29, 2019 281 282 type of $e_2$ as $t_2^+ \eqdeftiny t_2\wedge(\worra{t_1} t)$ in the then'' branch (we call it the \emph{positive} branch) and as $t_2^- \eqdeftiny t_2\setminus(\worra{t_1} t)$ in the else'' branch (we call it the \emph{negative} branch).  Giuseppe Castagna committed Jul 08, 2019 283 As a side remark note\marginpar{\tiny\beppe{Remove if space is needed}}  Giuseppe Castagna committed Jan 03, 2019 284 that the set $\worra{t_1} t$ is different from the set of elements that return a  Giuseppe Castagna committed May 28, 2019 285 result in $t$ (though it is a supertype of it). To see that, consider  Giuseppe Castagna committed Apr 21, 2019 286 for $t$ the type \String{} and for $t_1$ the type $(\Bool \to\Bool)\wedge(\Int\to(\String\vee\Int))$,  Giuseppe Castagna committed Jan 03, 2019 287 288 that is, the type of functions that when applied to a Boolean return a Boolean and when applied to an integer return either an integer or a  Giuseppe Castagna committed Apr 21, 2019 289 290 string; then we have that $\dom{t_1}=\Int\vee\Bool$ and $\worra{t_1}\String=\Int$, but there is no (non-empty) type that ensures that an application of a  Giuseppe Castagna committed Jan 03, 2019 291 function in $t_1$ will surely yield a $\String$ result.  Giuseppe Castagna committed Jul 09, 2019 292   Giuseppe Castagna committed Jan 03, 2019 293   Giuseppe Castagna committed Apr 21, 2019 294 Once we have determined $t_2^+$, it is then not very difficult to refine the  Giuseppe Castagna committed Jun 04, 2019 295 type $t_1$ for the positive branch, too. If the test succeeded, then we know two facts: first,  Giuseppe Castagna committed Apr 21, 2019 296 that the function was applied to a value in $t_2^+$ and, second, that  Mickael Laurent committed Sep 15, 2019 297 298 299 300 301 the application did not diverge and returned a result in $t$. Therefore, we can exclude from $t_1$ all the functions that, when applied to an argument in $t_2^+$, yield a result not in $t$. It can be obtained simply by removing from $t_1$ the functions in $t_2^+\to \neg t$, that is, we refine the type of $e_1$ in the then'' branch as  Giuseppe Castagna committed Feb 25, 2020 302 $t_1^+=t_1\setminus (t_2^+\to\neg t)$.  Mickael Laurent committed Sep 15, 2019 303 304 Note that this also removes functions diverging on $t_2^+$ arguments. In particular, the interpretation of a type $t\to s$ is the set of all functions that when applied to an argument of type $t$  Giuseppe Castagna committed Jul 04, 2019 305 either diverge or return a value in $s$. As such the interpretation of $t\to s$ contains  Mickael Laurent committed Sep 15, 2019 306 307 all the functions that diverge (at least) on $t$. Therefore removing $t\to s$ from a type $u$ removes from $u$ not only all the functions that when applied to a $t$ argument return a result in $s$, but also all the functions that diverge on $t$.  Giuseppe Castagna committed Jul 04, 2019 308 Ergo $t_1\setminus (t_2^+\to  Mickael Laurent committed Sep 15, 2019 309 310 311 312 \neg t)$ removes, among others, all functions in $t_1$ that diverge on $t_2^+$. Let us see all this on our example \eqref{exptre}, in particular, by showing how this technique deduces that the type of $x_1$ in the positive branch is (a subtype of) $\Int{\vee}\String\to\Int$. Take the static type of $x_1$, that is $(\Int{\vee}\String\to\Int)\vee(\Bool{\vee}\String\to\Bool)$ and intersect it with $(t_2^+\to \neg t)$, that is, $\String\to\neg\Int$. Since intersection distributes over unions we  Kim Nguyễn committed Feb 22, 2021 313 314 obtain \svvspace{-1mm}  Giuseppe Castagna committed Apr 21, 2019 315 %  Kim Nguyễn committed Feb 22, 2021 316 $((\Int{\vee}\String{\to}\Int)\wedge\neg(\String{\to}\neg\Int))\vee((\Bool{\vee}\String{\to}\Bool)\wedge\neg(\String{\to}\neg\Int))\svvspace{-1mm}$  Giuseppe Castagna committed Apr 21, 2019 317 318 % and since  Giuseppe Castagna committed Jul 05, 2019 319 $(\Bool{\vee}\String{\to}\Bool)\wedge\neg(\String{\to}\neg\Int)$ is empty  Giuseppe Castagna committed Apr 21, 2019 320 (because $\String\to\neg\Int$ contains $\Bool{\vee}\String\to\Bool$),  Mickael Laurent committed Sep 15, 2019 321 then what we obtain is the left summand, a strict subtype of $(\Int{\vee}\String)\to\Int$, namely the functions of type $\Int{\vee}\String{\to}\Int$ minus those  Giuseppe Castagna committed Jul 04, 2019 322 that diverge on all \String{} arguments.  Giuseppe Castagna committed Apr 18, 2019 323 324   Giuseppe Castagna committed Apr 17, 2019 325 326   Giuseppe Castagna committed Jan 03, 2019 327   Giuseppe Castagna committed Jul 08, 2019 328 This is essentially what we formalize in Section~\ref{sec:language}, in the type system by the rule \Rule{PAppL} and in the typing algorithm with the case \eqref{due} of the definition of the function \constrf.  Giuseppe Castagna committed May 28, 2019 329 330   Giuseppe Castagna committed Jun 25, 2019 331 \subsection{Technical challenges}\label{sec:challenges}  Giuseppe Castagna committed May 28, 2019 332   Giuseppe Castagna committed Jul 04, 2019 333 In the previous section we outlined the main ideas of our approach to occurrence typing. However, devil is in the details. So the formalization we give in Section~\ref{sec:language} is not so smooth as we just outlined: we must introduce several auxiliary definitions to handle some corner cases. This section presents by tiny examples the main technical difficulties we had to overcome and the definitions we introduced to handle them. As such it provides a kind of road-map for the technicalities of Section~\ref{sec:language}.  Giuseppe Castagna committed May 28, 2019 334   Kim Nguyễn committed Feb 22, 2021 335 336 337 338 339 340 341 342 343 344 \paragraph{Typing occurrences} As it should be clear by now, not only variables but also generic expressions are given different types in the then'' and else'' branches of type tests. For instance, in \eqref{two} the expression $x_1x_2$ has type \Int{} in the positive branch and type \Bool{} in the negative one. In this specific case it is possible to deduce these typings from the refined types of the variables (in particular, thanks to the fact that $x_2$ has type \Int{} the positive branch and \Bool{} in the negative one), but this is not possible in general. For instance, consider $x_1:\Int\to(\Int\vee\Bool)$, $x_2:\Int$, and the expression \svvspace{-1mm}  Giuseppe Castagna committed May 29, 2019 345 \label{twobis}  Kim Nguyễn committed Feb 22, 2021 346  \ifty{x_1x_2}{\Int}{...x_1x_2...}{...x_1x_2...}\svvspace{-1mm}  Giuseppe Castagna committed May 29, 2019 347 348  It is not possible to specialize the type of the variables in the  Giuseppe Castagna committed Jul 04, 2019 349 branches. Nevertheless, we want to be able to deduce that $x_1x_2$ has  Giuseppe Castagna committed May 29, 2019 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 type \Int{} in the positive branch and type \Bool{} in the negative one. In order to do so in Section~\ref{sec:language} we will use special type environments that map not only variables but also generic expressions to types. So to type, say, the positive branch of \eqref{twobis} we extend the current type environment with the hypothesis that the expression $x_1x_2$ has type \Int{}. When we test the type of an expression we try to deduce the type of some subexpressions occurring in it. Therefore we must cope with subexpressions occurring multiple times. A simple example is given by using product types and pairs as in $\ifty{(x,x)}{\pair{t_1}{t_2}}{e_1}{e_2}$. It is easy to see that the positive branch $e_1$ is selected only if $x$ has type $t_1$ \emph{and} type $t_2$ and deduce from that that $x$ must be typed in $e_1$ by their intersection, $t_1\wedge t_2$. To deal with multiple  Giuseppe Castagna committed Jul 03, 2019 365 366 occurrences of a same subexpression the type inference system of Section~\ref{sec:language} will use the classic rule for introducing intersections \Rule{Inter}, while the algorithmic counterpart will use the operator $\Refine{}{}{}$ that  Giuseppe Castagna committed May 29, 2019 367 368 369 370 371 intersects the static type of an expression with all the types deduced for the multiple occurrences of it.  Giuseppe Castagna committed Jul 10, 2019 372 \paragraph{Type preservation} We want our type system to be sound in the sense of~\citet{Wright1994}, that is, that it  Giuseppe Castagna committed May 29, 2019 373 satisfies progress and type preservation. The latter property is  Giuseppe Castagna committed Jul 08, 2019 374 challenging because, as explained just above, our type  Giuseppe Castagna committed May 29, 2019 375 assumptions are not only about variables but also about  Mickael Laurent committed Jun 05, 2019 376 expressions. Two corner cases are particularly difficult. The first is  Kim Nguyễn committed Feb 22, 2021 377 shown by the following example\svvspace{-.9mm}  Giuseppe Castagna committed May 29, 2019 378 \label{bistwo}  Kim Nguyễn committed Feb 22, 2021 379  \ifty{e(42)}{\Bool}{e}{...} \svvspace{-.9mm}  Giuseppe Castagna committed May 29, 2019 380 381 382  If $e$ is an expression of type $\Int\to t$, then, as discussed before, the positive branch will have type $(\Int\to t)\setminus(\Int\to\neg  Giuseppe Castagna committed Jul 04, 2019 383 \Bool)$. If furthermore the negative branch is of the same type (or of a subtype), then this  Giuseppe Castagna committed May 29, 2019 384 will also be the type of the whole expression in \eqref{bistwo}. Now imagine  Giuseppe Castagna committed Jul 04, 2019 385 that the application $e(42)$ reduces to a Boolean value, then the whole  Giuseppe Castagna committed May 29, 2019 386  expression in \eqref{bistwo} reduces to $e$; but this has type  Giuseppe Castagna committed Jul 10, 2019 387  $\Int\to t$ which, in general, is \emph{not} a subtype of $(\Int\to  Giuseppe Castagna committed Feb 25, 2020 388  t)\setminus(\Int\to\neg\Bool)$, and therefore type is not preserved by the reduction. To cope with this problem, the proof  Giuseppe Castagna committed May 05, 2021 389  of type preservation (see \Appendix\ref{app:subject-reduction}) resorts to \emph{type schemes}, a  Giuseppe Castagna committed Jul 10, 2019 390  technique introduced by~\citet{Frisch2008} to type expressions by  Giuseppe Castagna committed Feb 25, 2020 391  sets of types, so that the expression in \eqref{bistwo} will have both the types at issue.  Giuseppe Castagna committed May 29, 2019 392   Giuseppe Castagna committed Jul 04, 2019 393 The second corner case is a modification of the example above  Giuseppe Castagna committed May 29, 2019 394 where the positive branch is $e(42)$, e.g.,  Giuseppe Castagna committed Jul 04, 2019 395 396 397 398 $\ifty{e(42)}{\Bool}{e(42)}{\textsf{true}}$. In this case the type deduced for the whole expression is \Bool, while after reduction we would obtain the expression $e(42)$ which is not of type \Bool{} but of type $t$ (even though it will eventually reduce to a \Bool). This problem will be handled in the  Giuseppe Castagna committed Jul 03, 2019 399 proof of type preservation by considering parallel reductions (e.g, if  Giuseppe Castagna committed Jul 04, 2019 400 401 $e(42)$ reduces in a step to, say, $\textsf{false}$, then $\ifty{e(42)}{\Bool}{e(42)}{\textsf{true}}$ reduces in one step to  Giuseppe Castagna committed May 05, 2021 402 $\ifty{\textsf{false}}{\Bool}{\textsf{false}}{\textsf{true}}$): see \Appendix\ref{app:parallel}.  Giuseppe Castagna committed May 29, 2019 403 404   Giuseppe Castagna committed Jul 08, 2019 405 406 \paragraph{Interdependence of checks} The last class of technical problems arise from the mutual dependence of different type checks. In particular, there are two cases  Giuseppe Castagna committed Jul 10, 2019 407 that pose a problem. The first can be shown by two functions $f$ and $g$ both of type $(\Int\to\Int)\wedge(\Any\to\Bool)$, $x$ of type $\Any$ and the test:  Giuseppe Castagna committed May 29, 2019 408 \label{nest1}  Kim Nguyễn committed Jul 11, 2019 409 \ifty{(f\,x,g\,x)}{\pair\Int\Bool}{\,...\,}{\,...}  Giuseppe Castagna committed May 29, 2019 410   Kim Nguyễn committed Jul 11, 2019 411 If we independently check $f\,x$ against $\Int$ and $g\,x$ against $\Bool$  Giuseppe Castagna committed Jul 08, 2019 412 413 414 we deduce $\Int$ for the first occurrence of $x$ and $\Any$ for the second. Thus we would type the positive branch of \eqref{nest1} under the hypothesis that $x$ is of type $\Int$. But if we use the hypothesis generated by  Kim Nguyễn committed Jul 11, 2019 415 the test of $f\,x$, that is, that $x$ is of type \Int, to check $g\,x$ against \Bool,  Giuseppe Castagna committed Jul 10, 2019 416 then the type deduced for $x$ is $\Empty$---i.e., the branch is never selected. In other words, we want to produce type  Giuseppe Castagna committed May 11, 2021 417 environments for occurrence typing by taking into account all  Victor Lanvin committed Jul 10, 2019 418 the available hypotheses, even when these hypotheses are formulated later  Giuseppe Castagna committed Jul 08, 2019 419 in the flow of control. This will be done in the type systems of  Giuseppe Castagna committed Jul 03, 2019 420 421 422 Section~\ref{sec:language} by the rule \Rule{Path} and will require at algorithmic level to look for a fix-point solution of a function, or an approximation thereof.  Giuseppe Castagna committed May 29, 2019 423   Giuseppe Castagna committed Jul 08, 2019 424 425 Finally, a nested check may help refining the type assumptions on some outer expressions. For instance, when typing  Kim Nguyễn committed Feb 22, 2021 426 the positive branch $e$ of\svvspace{-.9mm}  Giuseppe Castagna committed Jul 04, 2019 427 \label{pair}  Kim Nguyễn committed Feb 22, 2021 428 \ifty{(x,y)}{(\pair{(\Int\vee\Bool)}\Int)}{e}{...}\svvspace{-.9mm}  Giuseppe Castagna committed Jul 04, 2019 429   Giuseppe Castagna committed Jul 03, 2019 430 431 432 we can assume that the expression $(x,y)$ is of type $\pair{(\Int\vee\Bool)}\Int$ and put it in the type environment. But if in $e$ there is a test like  Giuseppe Castagna committed Jul 04, 2019 433 $\ifty{x}{\Int}{{\color{darkred}(x,y)}}{(...)}$ then we do not want use  Victor Lanvin committed Jul 10, 2019 434 the assumption in the type environment to type the expression $(x,y)$  Giuseppe Castagna committed Jul 03, 2019 435 436 occurring in the inner test (in red). Instead we want to give to that occurrence of the expression $(x,y)$ the type $\pair{\Int}\Int$. This will be done by temporary  Victor Lanvin committed Jul 10, 2019 437 438 removing the type assumption about $(x,y)$ from the type environment and by retyping the expression without that assumption (see rule  Giuseppe Castagna committed Jul 03, 2019 439 \Rule{Env\Aa} in Section~\ref{sec:algorules}).  Giuseppe Castagna committed Jan 03, 2019 440 441   Giuseppe Castagna committed Mar 01, 2020 442 443 444 445 446 447 448 \iflongversion \subsubsection*{Outline} \else \subsubsection*{Outline and Contributions} \fi In Section~\ref{sec:language} we formalize the ideas we just presented: we define the types and  Giuseppe Castagna committed Jul 08, 2019 449 450 451 expressions of our system, their dynamic semantics and a type system that implements occurrence typing together with the algorithms that decide whether an expression is well typed or not. Section~\ref{sec:extensions} extends our formalism to record  Giuseppe Castagna committed Jul 08, 2019 452 453 454 types and presents two applications of our analysis: the inference of arrow types for functions and a static analysis to reduce the number of casts inserted by a compiler of a gradually-typed  Giuseppe Castagna committed Jul 09, 2019 455 language. Practical aspects are discussed in  Giuseppe Castagna committed Mar 02, 2020 456 457 458 459 460 Section~\ref{sec:practical} where we give several paradigmatic examples of code typed by our prototype implementation, that can be interactively tested at \ifsubmission the (anonymized) site \fi \url{https://occtyping.github.io/}. Section~\ref{sec:related} presents  Giuseppe Castagna committed Mar 01, 2020 461 462 463 related work. \iflongversion A discussion of future work concludes this  Giuseppe Castagna committed Jul 08, 2019 464 presentation.  Giuseppe Castagna committed Mar 01, 2020 465 \fi  Giuseppe Castagna committed Jul 10, 2019 466 %  Giuseppe Castagna committed May 05, 2021 467 468 To ease the presentation all the proofs are omitted from the main text and can be found in the appendix  Giuseppe Castagna committed Jul 10, 2019 469 \ifsubmission  Giuseppe Castagna committed Mar 02, 2020 470 joint to the submission as supplemental material.  Giuseppe Castagna committed Jul 10, 2019 471 \else  Giuseppe Castagna committed Jul 08, 2019 472 available online.  Giuseppe Castagna committed Jul 10, 2019 473 \fi  Giuseppe Castagna committed Feb 25, 2020 474   Giuseppe Castagna committed Mar 01, 2020 475 476 477 \iflongversion \subsubsection*{Contributions} \fi  Giuseppe Castagna committed May 11, 2021 478 \noindent The main contributions of our work can be summarized as follows:  Giuseppe Castagna committed Feb 27, 2020 479 \begin{itemize}[left=0pt .. \parindent,nosep]  Giuseppe Castagna committed May 11, 2021 480  \item We provide a theoretical framework to refine the type of  Giuseppe Castagna committed Feb 25, 2020 481  expressions occurring in type tests, thus removing the limitations  Giuseppe Castagna committed May 11, 2021 482  of current occurrence typing approaches which require both the tests and the refinement  Giuseppe Castagna committed Feb 25, 2020 483 484  to take place on variables.  Giuseppe Castagna committed Feb 27, 2020 485  \item We define a type-theoretic approach alternative to the  Giuseppe Castagna committed Feb 25, 2020 486 487 488 489 490 491  current flow-based approaches. As such it provides different results and it can be thus profitably combined with flow-based techniques. \item We use our analysis for defining a formal framework that reconstructs intersection types for unannotated or partially-annotated functions, something that, in our ken, no  Giuseppe Castagna committed May 11, 2021 492  other current system can do.  Giuseppe Castagna committed Feb 25, 2020 493 494 495  \item We prove the soundness of our system. We define algorithms to infer the types that we prove to be sound and show different completeness results  Giuseppe Castagna committed May 11, 2021 496 497 498 499  which in practice yield the completeness of any reasonable implementation. \item We show how to extend our approach to records with field addition, update, and deletion operations.  Giuseppe Castagna committed Feb 25, 2020 500   Giuseppe Castagna committed May 11, 2021 501 502  \item We show how occurrence typing can be extended to and combined with gradual typing and apply our results to optimize the compilation of the  Giuseppe Castagna committed Feb 25, 2020 503 504  latter.  Giuseppe Castagna committed Feb 27, 2020 505 506 % \item We show how it is possible to extend our framework to % exploit the richer information provided by polymorphic types.  Giuseppe Castagna committed Feb 25, 2020 507 \end{itemize}  Giuseppe Castagna committed Jul 03, 2020 508  We end this introduction by stressing the practical  Giuseppe Castagna committed Jul 03, 2020 509  implications of our work: a perfunctory inspection may give the  Giuseppe Castagna committed May 11, 2021 510  wrong impression that the only interest of the heavy formalization  Giuseppe Castagna committed Jul 03, 2020 511  that follows is to have generic expressions, rather than just variables, in type  Giuseppe Castagna committed Jul 03, 2020 512 513 514 515 516  cases: this would be a bad trade-off. The important point is, instead, that our formalization is what makes analyses such as those presented in Section~\ref{sec:extensions} possible (e.g., the reconstruction of the type~\eqref{eq:inter} for the unannotated pure JavaScript code of \code{foo}), which is where the actual added  Giuseppe Castagna committed Jul 03, 2020 517  practical value and potential of our work resides.  Giuseppe Castagna committed Feb 25, 2020 518