\newlength{\sk} \setlength{\sk}{-1.9pt} \iflongversion In this section we formalize the ideas we outlined in the introduction. We start by the definition of types followed by the language and its reduction semantics. The static semantics is the core of our work: we first present a declarative type system that deduces (possibly many) types for well-typed expressions and then the algorithms to decide whether an expression is well typed or not. \fi \subsection{Types} \begin{definition}[Types]\label{def:types} %\iflongversion%%%%%%% The set of types \types{} is formed by the terms $t$ coinductively produced by the grammar:\svvspace{-1.45mm} $\begin{array}{lrcl} \textbf{Types} & t & ::= & b\alt t\to t\alt t\times t\alt t\vee t \alt \neg t \alt \Empty \end{array}$ and that satisfy the following conditions \begin{itemize}[nosep] \item (regularity) every term has a finite number of different sub-terms; \item (contractivity) every infinite branch of a term contains an infinite number of occurrences of the arrow or product type constructors.\svvspace{-1mm} \end{itemize} \iffalse%%%%%%%%%%%%%%%%%%%%%%%%%% A type $t\in\types{}$ is a term coinductively produced by the grammar:\svvspace{-1.45mm} $\begin{array}{lrcl} \textbf{Types} & t & ::= & b\alt t\to t\alt t\times t\alt t\vee t \alt \neg t \alt \Empty \end{array}$ that satisfies the following conditions: $(1)$\emph{Regularity}: the term has a finite number of different sub-terms; $(2)$ \emph{Contractivity}: every infinite branch of the term contains an infinite number of occurrences of the arrow or product type constructors.\svvspace{-1mm} \fi%%%%%%%%%%%%%%%%%%% \end{definition} We use the following abbreviations: $t_1 \land t_2 \eqdef \neg (\neg t_1 \vee \neg t_2)$, $t_ 1 \setminus t_2 \eqdef t_1 \wedge \neg t_2$, $\Any \eqdef \neg \Empty$. $b$ ranges over basic types (e.g., \Int, \Bool), $\Empty$ and $\Any$ respectively denote the empty (that types no value) and top (that types all values) types. Coinduction accounts for recursive types and the condition on infinite branches bars out ill-formed types such as $t = t \lor t$ (which does not carry any information about the set denoted by the type) or $t = \neg t$ (which cannot represent any set). \iflongversion%%%%%%%%%%%%%%%%%% It also ensures that the binary relation $\vartriangleright \,\subseteq\!\types{\times}\types$ defined by $t_1 \lor t_2 \vartriangleright t_i$, $t_1 \land t_2 \vartriangleright t_i$, $\neg t \vartriangleright t$ is Noetherian. This gives an induction principle on $\types$ that we will use without any further explicit reference to the relation.\footnote{In a nutshell, we can do proofs by induction on the structure of unions and negations---and, thus, intersections---but arrows, products, and basic types are the base cases for the induction.} \fi%%%%%%%%%%%%%%%%%%%%%% We refer to $b$, $\times$, and $\to$ as \emph{type constructors} and to $\lor$, $\land$, $\lnot$, and $\setminus$ as \emph{type connectives}. The subtyping relation for these types, noted $\leq$, is the one defined by~\citet{Frisch2008} to which the reader may refer for the formal definition (we recall it in Appendix~\ref{sec:subtyping} for the reader's convenience). A detailed description of the algorithm to decide this relation can be found in~\cite{Cas15}. For this presentation it suffices to consider that types are interpreted as sets of \emph{values} ({i.e., either constants, $\lambda$-abstractions, or pairs of values: see Section~\ref{sec:syntax} right below) that have that type, and that subtyping is set containment (i.e., a type $s$ is a subtype of a type $t$ if and only if $t$ contains all the values of type $s$). In particular, $s\to t$ contains all $\lambda$-abstractions that when applied to a value of type $s$, if their computation terminates, then they return a result of type $t$ (e.g., $\Empty\to\Any$ is the set of all functions\footnote{\label{allfunctions}Actually, for every type $t$, all types of the form $\Empty{\to}t$ are equivalent and each of them denotes the set of all functions.} and $\Any\to\Empty$ is the set of functions that diverge on every argument). Type connectives (i.e., union, intersection, negation) are interpreted as the corresponding set-theoretic operators (e.g.,~$s\vee t$ is the union of the values of the two types). We use $\simeq$ to denote the symmetric closure of $\leq$: thus $s\simeq t$ (read, $s$ is equivalent to $t$) means that $s$ and $t$ denote the same set of values and, as such, they are semantically the same type. \subsection{Syntax}\label{sec:syntax} The expressions $e$ and values $v$ of our language are inductively generated by the following grammars:\svvspace{-1mm} $$\label{expressions} \begin{array}{lrclr} \textbf{Expr} &e &::=& c\alt x\alt ee\alt\lambda^{\wedge_{i\in I}s_i\to t_i} x.e\alt \pi_j e\alt(e,e)\alt\tcase{e}{t}{e}{e}\$.3mm] \textbf{Values} &v &::=& c\alt\lambda^{\wedge_{i\in I}s_i\to t_i} x.e\alt (v,v)\\[-1mm] \end{array}$$ for j=1,2. In~\eqref{expressions}, c ranges over constants (e.g., \texttt{true}, \texttt{false}, \texttt{1}, \texttt{2}, ...) which are values of basic types (we use \basic{c} to denote the basic type of the constant c); x ranges over variables; (e,e) denotes pairs and \pi_i e their projections; \tcase{e}{t}{e_1}{e_2} denotes the type-case expression that evaluates either e_1 or e_2 according to whether the value returned by e (if any) is of type t or not; \lambda^{\wedge_{i\in I}s_i\to t_i} x.e is a value of type \wedge_{i\in I}s_i\to t_i and denotes the function of parameter x and body e. An expression has an intersection type if and only if it has all the types that compose the intersection. Therefore, intuitively, \lambda^{\wedge_{i\in I}s_i\to t_i} x.e is a well-typed value if for all i{\in} I the hypothesis that x is of type s_i implies that the body e has type t_i, that is to say, it is well typed if \lambda^{\wedge_{i\in I}s_i\to t_i} x.e has type s_i\to t_i for all i\in I. Every value is associated to a most specific type (mst): the mst of c is \basic c; the mst of \lambda^{\wedge_{i\in I}s_i\to t_i} x.e is \wedge_{i\in I}s_i\to t_i; and, inductively, the mst of a pair of values is the product of the mst's of the values. We write v\in t if the most specific type of v is a subtype of t (see Appendix~\ref{app:typeschemes} for the formal definition of v\in t which deals with some corner cases for negated arrow types). \subsection{Dynamic semantics}\label{sec:opsem} The dynamic semantics is defined as a classic left-to-right call-by-value reduction for a \lambda-calculus with pairs, enriched with specific rules for type-cases. We have the following notions of reduction:\svvspace{-1.2mm} \[ \begin{array}{rcll} (\lambda^{\wedge_{i\in I}s_i\to t_i} x.e)\,v &\reduces& e\subst x v\\[-.4mm] \pi_i(v_1,v_2) &\reduces& v_i & i=1,2\\[-.4mm] \tcase{v}{t}{e_1}{e_2} &\reduces& e_1 &v\in t\\[-.4mm] \tcase{v}{t}{e_1}{e_2} &\reduces& e_2 &v\not\in t\\[-1.3mm] \end{array}$ Contextual reductions are defined by the following evaluation contexts:\$1mm] \centerline{$$%\[ \Cx[] ::= [\,]\alt \Cx e\alt v\Cx \alt (\Cx,e)\alt (v,\Cx)\alt \pi_i\Cx\alt \tcase{\Cx}tee %$$$}\$1mm] As usual we denote by \Cx[e] the term obtained by replacing e for the hole in the context \Cx and we have that e\reduces e' implies \Cx[e]\reduces\Cx[e']. \subsection{Static semantics}\label{sec:static} While the syntax and reduction semantics are, on the whole, pretty standard, for what concerns the type system we will have to introduce several unconventional features that we anticipated in Section~\ref{sec:challenges} and are at the core of our work. Let us start with the standard part, that is the typing of the functional core and the use of subtyping, given by the following typing rules:\svvspace{-1mm} \begin{mathpar} \Infer[Const] { } {\Gamma\vdash c:\basic{c}} { } \quad \Infer[App] { \Gamma \vdash e_1: \arrow {t_1}{t_2}\quad \Gamma \vdash e_2: t_1 } { \Gamma \vdash {e_1}{e_2}: t_2 } { } \quad \Infer[Abs+] {{\scriptstyle\forall i\in I}\quad\Gamma,x:s_i\vdash e:t_i} { \Gamma\vdash\lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e:\textstyle \bigwedge_{i\in I}\arrow {s_i} {t_i} } { } % \Infer[If] % {\Gamma\vdash e:t_0\\ % %t_0\not\leq \neg t \Rightarrow % \Gamma \cvdash + e t e_1:t'\\ % %t_0\not\leq t \Rightarrow % \Gamma \cvdash - e t e_2:t'} % {\Gamma\vdash \ite {e} t {e_1}{e_2}: t'} % { } \end{mathpar} \begin{mathpar} \Infer[Sel] {\Gamma \vdash e:\pair{t_1}{t_2}} {\Gamma \vdash \pi_i e:t_i} { } \qquad \Infer[Pair] {\Gamma \vdash e_1:t_1 \and \Gamma \vdash e_2:t_2} {\Gamma \vdash (e_1,e_2):\pair {t_1} {t_2}} { } \qquad \Infer[Subs] { \Gamma \vdash e:t\\t\leq t' } { \Gamma \vdash e: t' } { } \qquad\svvspace{-3mm} \end{mathpar} These rules are quite standard and do not need any particular explanation besides those already given in Section~\ref{sec:syntax}. Just notice subtyping is embedded in the system by the classic \Rule{Subs} subsumption rule. Next we focus on the unconventional aspects of our system, from the simplest to the hardest. The first unconventional aspect is that, as explained in Section~\ref{sec:challenges}, our type assumptions are about expressions. Therefore, in our rules the type environments, ranged over by \Gamma, map \emph{expressions}---rather than just variables---into types. This explains why the classic typing rule for variables is replaced by a more general \Rule{Env} rule defined below:\svvspace{-1mm} \begin{mathpar} \Infer[Env] { } { \Gamma \vdash e: \Gamma(e) } { e\in\dom\Gamma } \qquad \Infer[Inter] { \Gamma \vdash e:t_1\\\Gamma \vdash e:t_2 } { \Gamma \vdash e: t_1 \wedge t_2 } { }\svvspace{-3mm} \end{mathpar} The \Rule{Env} rule is coupled with the standard intersection introduction rule \Rule{Inter} which allows us to deduce for a complex expression the intersection of the types recorded by the occurrence typing analysis in the environment \Gamma with the static type deduced for the same expression by using the other typing rules. This same intersection rule is also used to infer the second unconventional aspect of our system, that is, the fact that \lambda-abstractions can have negated arrow types, as long as these negated types do not make the type deduced for the function empty:\svvspace{-.5mm} \begin{mathpar} \Infer[Abs-] {\Gamma \vdash \lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e:t} { \Gamma \vdash\lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e:\neg(t_1\to t_2) } { ((\wedge_{i\in I}\arrow {s_i} {t_i})\wedge\neg(t_1\to t_2))\not\simeq\Empty }\svvspace{-1.2mm} \end{mathpar} %\beppe{I have doubt: is this safe or should we play it safer and % deduce t\wedge\neg(t_1\to t_2)? In other terms is is possible to % deduce two separate negation of arrow types that when intersected % with the interface are non empty, but by intersecting everything % makes the type empty? It should be safe since otherwise intersection % would not be admissible in semantic subtyping (see Theorem 6.15 in % JACM), but I think we should doube ckeck it.} As explained in Section~\ref{sec:challenges}, we need to be able to deduce for, say, the function \lambda^{\Int\to\Int} x.x a type such as (\Int\to\Int)\wedge\neg(\Bool\to\Bool) (in particular, if this is the term e in equation \eqref{bistwo} we need to deduce for it the type (\Int\to t)\wedge\neg(\Int\to\neg\Bool), that is, (\Int\to t)\setminus(\Int\to\neg\Bool) ). But the sole rule \Rule{Abs+} above does not allow us to deduce negations of arrows for abstractions: the rule \Rule{Abs-} makes this possible. As an aside, note that this kind of deduction is already present in the system by~\citet{Frisch2008} though in that system this presence was motivated by the semantics of types rather than, as in our case, by the soundness of the type system. Rules \Rule{Abs+} and \Rule{Abs-} are not enough to deduce for \lambda-abstractions all the types we wish. In particular, these rules alone are not enough to type general overloaded functions. For instance, consider this simple example of a function that applied to an integer returns its successor and applied to anything else returns \textsf{true}:\\[1mm] \centerline{$$%\[ \lambda^{(\Int\to\Int)\wedge(\neg\Int\to\Bool)} x\,.\,\tcase{x}{\Int}{x+1}{\textsf{true}} %$$$}\$.6mm] Clearly, the expression above is well typed, but the rule \Rule{Abs+} alone is not enough to type it. In particular, according to \Rule{Abs+} we have to prove that under the hypothesis that x is of type \Int the expression (\tcase{x}{\Int}{x+1}{\textsf{true}}) is of type \Int, too. That is, that under the hypothesis that x has type \Int\wedge\Int (we apply occurrence typing) the expression x+1 is of type \Int{} (which holds) and that under the hypothesis that x has type \Int\setminus\Int, that is \Empty (we apply once more occurrence typing), \textsf{true} is of type \Int{} (which \emph{does not} hold). The problem is that we are trying to type the second case of a type-case even if we know that there is no chance that, when x is bound to an integer, that case will be ever selected. The fact that it is never selected is witnessed by the presence of a type hypothesis with \Empty type. To avoid this problem (and type the term above) we add the rule \Rule{Efq} (\emph{ex falso quodlibet}) that allows the system to deduce any type for an expression that will never be selected, that is, for an expression whose type environment contains an empty assumption: \begin{mathpar} \Infer[Efq] { } { \Gamma, (e:\Empty) \vdash e': t } { }\svvspace{-3mm} \end{mathpar} Once more, this kind of deduction was already present in the system by~\citet{Frisch2008} to type full fledged overloaded functions, though it was embedded in the typing rule for the type-case. Here we need the rule \Rule{Efq}, which is more general, to ensure the property of subject reduction. %\beppe{Example?} Finally, there remains one last rule in our type system, the one that implements occurrence typing, that is, the rule for the type-case:\svvspace{-1mm} \begin{mathpar} \Infer[Case] {\Gamma\vdash e:t_0\\ %t_0\not\leq \neg t \Rightarrow \Gamma \evdash e t \Gamma_1 \\ \Gamma_1 \vdash e_1:t'\\ %t_0\not\leq t \Rightarrow \Gamma \evdash e {\neg t} \Gamma_2 \\ \Gamma_2 \vdash e_2:t'} {\Gamma\vdash \tcase {e} t {e_1}{e_2}: t'} { }\svvspace{-3mm} \end{mathpar} The rule \Rule{Case} checks whether the expression e, whose type is being tested, is well-typed and then performs the occurrence typing analysis that produces the environments \Gamma_i's under whose hypothesis the expressions e_i's are typed. The production of these environments is represented by the judgments \Gamma \evdash e {(\neg)t} \Gamma_i. The intuition is that when \Gamma \evdash e t \Gamma_1 is provable then \Gamma_1 is a version of \Gamma extended with type hypotheses for all expressions occurring in e, type hypotheses that can be deduced assuming that the test e\in t succeeds. Likewise, \Gamma \evdash e {\neg t} \Gamma_2 (notice the negation on t) extends \Gamma with the hypothesis deduced assuming that e\in\neg t, that is, for when the test e\in t fails. All it remains to do is to show how to deduce judgments of the form \Gamma \evdash e t \Gamma'. For that we first define how to denote occurrences of an expression. These are identified by paths in the syntax tree of the expressions, that is, by possibly empty strings of characters denoting directions starting from the root of the tree (we use \epsilon for the empty string/path, which corresponds to the root of the tree). Let e be an expression and \varpi\in\{0,1,l,r,f,s\}^* a \emph{path}; we denote \occ e\varpi the occurrence of e reached by the path \varpi, that is (for i=0,1, and undefined otherwise)\svvspace{-.4mm} %% \[ %% \begin{array}{l} %% \begin{array}{r@{\downarrow}l@{\quad=\quad}l} %% e&\epsilon & e\\ %% e_0e_1& i.\varpi & \occ{e_i}\varpi\qquad i=0,1\\ %% (e_0,e_1)& l.\varpi & \occ{e_0}\varpi\\ %% (e_0,e_1)& r.\varpi & \occ{e_1}\varpi\\ %% \pi_1 e& f.\varpi & \occ{e}\varpi\\ %% \pi_2 e& s.\varpi & \occ{e}\varpi\\ %% \end{array}\\ %% \text{undefined otherwise} %% \end{array} %%$ $\begin{array}{r@{\downarrow}l@{\quad=\quad}lr@{\downarrow}l@{\quad=\quad}lr@{\downarrow}l@{\quad=\quad}l} e&\epsilon & e & (e_1,e_2)& l.\varpi & \occ{e_1}\varpi &\pi_1 e& f.\varpi & \occ{e}\varpi\\ e_0\,e_1& i.\varpi & \occ{e_i}\varpi \quad\qquad& (e_1,e_2)& r.\varpi & \occ{e_2}\varpi \quad\qquad& \pi_2 e& s.\varpi & \occ{e}\varpi\\[-.4mm] \end{array}$ To ease our analysis we used different directions for each kind of term. So we have $0$ and $1$ for the function and argument of an application, $l$ and $r$ for the $l$eft and $r$ight expressions forming a pair, and $f$ and $s$ for the argument of a $f$irst or of a $s$econd projection. Note also that we do not consider occurrences under $\lambda$'s (since their type is frozen in their annotations) and type-cases (since they reset the analysis). % The judgments $\Gamma \evdash e t \Gamma'$ are then deduced by the following two rules:\svvspace{-1mm} \begin{mathpar} % \Infer[Base] % { \Gamma \vdash e':t' } % { \Gamma \cvdash p e t e':t' } % { } % \qquad % \Infer[Path] % { \pvdash \Gamma p e t \varpi:t_1 \\ \Gamma,(\occ e \varpi:t_1) \cvdash p e t e':t_2 } % { \Gamma \cvdash p e t e':t_2 } % { } \Infer[Base] { } { \Gamma \evdash e t \Gamma } { } \qquad \Infer[Path] { \pvdash {\Gamma'} e t \varpi:t' \\ \Gamma \evdash e t \Gamma' } { \Gamma \evdash e t \Gamma',(\occ e \varpi:t') } { }\svvspace{-1.5mm} \end{mathpar} These rules describe how to produce by occurrence typing the type environments while checking that an expression $e$ has type $t$. They state that $(i)$ we can deduce from $\Gamma$ all the hypothesis already in $\Gamma$ (rule \Rule{Base}) and that $(ii)$ if we can deduce a given type $t'$ for a particular occurrence $\varpi$ of the expression $e$ being checked, then we can add this hypothesis to the produced type environment (rule \Rule{Path}). The rule \Rule{Path} uses a (last) auxiliary judgement $\pvdash {\Gamma} e t \varpi:t'$ to deduce the type $t'$ of the occurrence $\occ e \varpi$ when checking $e$ against $t$ under the hypotheses $\Gamma$. This rule \Rule{Path} is subtler than it may appear at first sight, insofar as the deduction of the type for $\varpi$ may already use some hypothesis on $\occ e \varpi$ (in $\Gamma'$) and, from an algorithmic viewpoint, this will imply the computation of a fix-point (see Section~\ref{sec:typenv}). The last ingredient for our type system is the deduction of the judgements of the form $\pvdash {\Gamma} e t \varpi:t'$ where $\varpi$ is a path to an expression occurring in $e$. This is given by the following set of rules. \begin{mathpar} \Infer[PSubs] { \pvdash \Gamma e t \varpi:t_1 \\ t_1\leq t_2 } { \pvdash \Gamma e t \varpi:t_2 } { } \quad \Infer[PInter] { \pvdash \Gamma e t \varpi:t_1 \\ \pvdash \Gamma e t \varpi:t_2 } { \pvdash \Gamma e t \varpi:t_1\land t_2 } { } \quad \Infer[PTypeof] { \Gamma \vdash \occ e \varpi:t' } { \pvdash \Gamma e t \varpi:t' } { } \svvspace{-1.2mm}\\ \Infer[PEps] { } { \pvdash \Gamma e t \epsilon:t } { } \qquad \Infer[PAppR] { \pvdash \Gamma e t \varpi.0:\arrow{t_1}{t_2} \\ \pvdash \Gamma e t \varpi:t_2'} { \pvdash \Gamma e t \varpi.1:\neg t_1 } { t_2\land t_2' \simeq \Empty } \end{mathpar}\begin{mathpar}\svvspace{-2mm} \Infer[PAppL] { \pvdash \Gamma e t \varpi.1:t_1 \\ \pvdash \Gamma e t \varpi:t_2 } { \pvdash \Gamma e t \varpi.0:\neg (\arrow {t_1} {\neg t_2}) } { } \qquad \Infer[PPairL] { \pvdash \Gamma e t \varpi:\pair{t_1}{t_2} } { \pvdash \Gamma e t \varpi.l:t_1 } { } \svvspace{-1.2mm}\\ \Infer[PPairR] { \pvdash \Gamma e t \varpi:\pair{t_1}{t_2} } { \pvdash \Gamma e t \varpi.r:t_2 } { } \qquad \Infer[PFst] { \pvdash \Gamma e t \varpi:t' } { \pvdash \Gamma e t \varpi.f:\pair {t'} \Any } { } \qquad \Infer[PSnd] { \pvdash \Gamma e t \varpi:t' } { \pvdash \Gamma e t \varpi.s:\pair \Any {t'} } { }\svvspace{-0.9mm} \end{mathpar} These rules implement the analysis described in Section~\ref{sec:ideas} for functions and extend it to products. Let us comment each rule in detail. \Rule{PSubs} is just subsumption for the deduction $\vdashp$. The rule \Rule{PInter} combined with \Rule{PTypeof} allows the system to deduce for an occurrence $\varpi$ the intersection of the static type of $\occ e \varpi$ (deduced by \Rule{PTypeof}) with the type deduced for $\varpi$ by the other $\vdashp$ rules. The rule \Rule{PEps} is the starting point of the analysis: if we are assuming that the test $e\in t$ succeeds, then we can assume that $e$ (i.e., $\occ e\epsilon$) has type $t$ (recall that assuming that the test $e\in t$ fails corresponds to having $\neg t$ at the index of the turnstyle). The rule \Rule{PAppR} implements occurrence typing for the arguments of applications, since it states that if a function maps arguments of type $t_1$ in results of type $t_2$ and an application of this function yields results (in $t'_2$) that cannot be in $t_2$ (since $t_2\land t_2' \simeq \Empty$), then the argument of this application cannot be of type $t_1$. \Rule{PAppL} performs the occurrence typing analysis for the function part of an application, since it states that if an application has type $t_2$ and the argument of this application has type $t_1$, then the function in this application cannot have type $t_1\to\neg t_2$. Rules \Rule{PPair\_} are straightforward since they state that the $i$-th projection of a pair that is of type $\pair{t_1}{t_2}$ must be of type $t_i$. So are the last two rules that essentially state that if $\pi_1 e$ (respectively, $\pi_2 e$) is of type $t'$, then the type of $e$ must be of the form $\pair{t'}\Any$ (respectively, $\pair\Any{t'}$). This concludes the presentation of all the rules of our type system (they are summarized for the reader's convenience in Appendix~\ref{sec:declarative}), which satisfies the property of safety, deduced, as customary, from the properties of progress and subject reduction (\emph{cf.} Appendix~\ref{app:soundness}).\svvspace{-.5mm} \begin{theorem}[type safety] For every expression $e$ such that $\varnothing\vdash e:t$ either $e$ diverges or there exists a value $v$ of type $t$ such that $e\reduces^* v$. \end{theorem} \svvspace{-2.1mm}