\newlength{\sk} \setlength{\sk}{-1.9pt} In this section we formalize the ideas we outlined in the introduction. We start by the definition of types followed by the language and its reduction semantics. The static semantics is the core of our work: we first present a declarative type system that deduces (possibly many) types for well-typed expressions and then the algorithms to decide whether an expression is well typed or not. \subsection{Types} \begin{definition}[Types]\label{def:types} The set of types \types{} is formed by the terms $t$ coinductively produced by the grammar:\vspace{-1.45mm} $\begin{array}{lrcl} \textbf{Types} & t & ::= & b\alt t\to t\alt t\times t\alt t\vee t \alt \neg t \alt \Empty \end{array}$ and that satisfy the following conditions\vspace{-.85mm} \begin{itemize} \item (regularity) every term has a finite number of different sub-terms; \item (contractivity) every infinite branch of a term contains an infinite number of occurrences of the arrow or product type constructors.\vspace{-1mm} \end{itemize} \end{definition} We use the following abbreviations: $t_1 \land t_2 \eqdef \neg (\neg t_1 \vee \neg t_2)$, $t_ 1 \setminus t_2 \eqdef t_1 \wedge \neg t_2$, $\Any \eqdef \neg \Empty$. $b$ ranges over basic types (e.g., \Int, \Bool), $\Empty$ and $\Any$ respectively denote the empty (that types no value) and top (that types all values) types. Coinduction accounts for recursive types and the condition on infinite branches bars out ill-formed types such as $t = t \lor t$ (which does not carry any information about the set denoted by the type) or $t = \neg t$ (which cannot represent any set). It also ensures that the binary relation $\vartriangleright \,\subseteq\!\types^{2}$ defined by $t_1 \lor t_2 \vartriangleright t_i$, $t_1 \land t_2 \vartriangleright t_i$, $\neg t \vartriangleright t$ is Noetherian. This gives an induction principle on $\types$ that we will use without any further explicit reference to the relation.\footnote{In a nutshell, we can do proofs by induction on the structure of unions and negations---and, thus, intersections---but arrows, products, and basic types are the base cases for the induction.} We refer to $b$, $\times$, and $\to$ as \emph{type constructors} and to $\lor$, $\land$, $\lnot$, and $\setminus$ as \emph{type connectives}. The subtyping relation for these types, noted $\leq$, is the one defined by~\citet{Frisch2008} to which the reader may refer. A detailed description of the algorithm to decide it can be found in~\cite{Cas15}. For this presentation it suffices to consider that types are interpreted as sets of \emph{values} ({i.e., either constants, $\lambda$-abstractions, or pairs of values: see Section~\ref{sec:syntax} right below) that have that type, and that subtyping is set containment (i.e., a type $s$ is a subtype of a type $t$ if and only if $t$ contains all the values of type $s$). In particular, $s\to t$ contains all $\lambda$-abstractions that when applied to a value of type $s$, if their computation terminates, then they return a result of type $t$ (e.g., $\Empty\to\Any$ is the set of all functions\footnote{\label{allfunctions}Actually, for every type $t$, all types of the form $\Empty{\to}t$ are equivalent and each of them denotes the set of all functions.} and $\Any\to\Empty$ is the set of functions that diverge on every argument). Type connectives (i.e., union, intersection, negation) are interpreted as the corresponding set-theoretic operators (e.g.,~$s\vee t$ is the union of the values of the two types). We use $\simeq$ to denote the symmetric closure of $\leq$: thus $s\simeq t$ (read, $s$ is equivalent to $t$) means that $s$ and $t$ denote the same set of values and, as such, they are semantically the same type. \subsection{Syntax}\label{sec:syntax} The expressions $e$ and values $v$ of our language are inductively generated by the following grammars:\vspace{-1mm} $$\label{expressions} \begin{array}{lrclr} \textbf{Expr} &e &::=& c\alt x\alt ee\alt\lambda^{\wedge_{i\in I}s_i\to t_i} x.e\alt \pi_j e\alt(e,e)\alt\tcase{e}{t}{e}{e}\$.3mm] \textbf{Values} &v &::=& c\alt\lambda^{\wedge_{i\in I}s_i\to t_i} x.e\alt (v,v)\\[-1mm] \end{array}$$ for j=1,2. In~\eqref{expressions}, c ranges over constants (e.g., \texttt{true}, \texttt{false}, \texttt{1}, \texttt{2}, ...) which are values of basic types (we use \basic{c} to denote the basic type of the constant c); x ranges over variables; (e,e) denote pairs and \pi_i e their projections; \tcase{e}{t}{e_1}{e_2} denotes the type-case expression that evaluates either e_1 or e_2 according to whether the value returned by e (if any) is of type t or not; \lambda^{\wedge_{i\in I}s_i\to t_i} x.e is a value of type \wedge_{i\in I}s_i\to t_i and denotes the function of parameter x and body e. An expression has an intersection type if and only if it has all the types that compose the intersection. Therefore, intuitively, \lambda^{\wedge_{i\in I}s_i\to t_i} x.e is a well-typed value if for all i{\in} I the hypothesis that x is of type s_i implies that the body e has type t_i, that is to say, it is well typed if \lambda^{\wedge_{i\in I}s_i\to t_i} x.e has type s_i\to t_i for all i\in I. Every value is associated to a type: the type of c is \basic c; the type of \lambda^{\wedge_{i\in I}s_i\to t_i} x.e is \wedge_{i\in I}s_i\to t_i; and, inductively, the type of a pair of values is the product of the types of the values. \subsection{Dynamic semantics}\label{sec:opsem} The dynamic semantics is defined as a classic left-to-right call-by-value reduction for a \lambda-calculus with pairs, enriched with specific rules for type-cases. We have the following notions of reduction:\vspace{-1.2mm} \[ \begin{array}{rcll} (\lambda^{\wedge_{i\in I}s_i\to t_i} x.e)v &\reduces& e\subst x v\\[-.4mm] \pi_i(v_1,v_2) &\reduces& v_i & i=1,2\\[-.4mm] \tcase{v}{t}{e_1}{e_2} &\reduces& e_1 &v\in t\\[-.4mm] \tcase{v}{t}{e_1}{e_2} &\reduces& e_2 &v\not\in t\\[-1.3mm] \end{array}$ The semantics of type-cases uses the relation $v\in t$ that we informally defined in the previous section. We delay its formal definition to Section~\ref{sec:type-schemes} (where it deals with some corner cases for negated arrow types). Contextual reductions are defined by the following evaluation contexts:\$1mm] \centerline{$$%\[ \Cx[] ::= [\,]\alt \Cx e\alt v\Cx \alt (\Cx,e)\alt (v,\Cx)\alt \pi_i\Cx\alt \tcase{\Cx}tee %$$$}\$1mm] As usual we denote by \Cx[e] the term obtained by replacing e for the hole in the context \Cx and we have that e\reduces e' implies \Cx[e]\reduces\Cx[e']. \subsection{Static semantics}\label{sec:static} While the syntax and reduction semantics are, on the whole, pretty standard for the type system, we will have to introduce several unconventional features that we anticipated in Section~\ref{sec:challenges} and are at the core of our work. Let us start with the standard part, that is the typing of the functional core and the use of subtyping, given by the following typing rules:\vspace{-1mm} \begin{mathpar} \Infer[Const] { } {\Gamma\vdash c:\basic{c}} { } \quad \Infer[App] { \Gamma \vdash e_1: \arrow {t_1}{t_2}\quad \Gamma \vdash e_2: t_1 } { \Gamma \vdash {e_1}{e_2}: t_2 } { } \quad \Infer[Abs+] {{\scriptstyle\forall i\in I}\quad\Gamma,x:s_i\vdash e:t_i} { \Gamma\vdash\lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e:\textstyle \bigwedge_{i\in I}\arrow {s_i} {t_i} } { } % \Infer[If] % {\Gamma\vdash e:t_0\\ % %t_0\not\leq \neg t \Rightarrow % \Gamma \cvdash + e t e_1:t'\\ % %t_0\not\leq t \Rightarrow % \Gamma \cvdash - e t e_2:t'} % {\Gamma\vdash \ite {e} t {e_1}{e_2}: t'} % { } \vspace{-2mm} \\ \Infer[Sel] {\Gamma \vdash e:\pair{t_1}{t_2}} {\Gamma \vdash \pi_i e:t_i} { } \qquad \Infer[Pair] {\Gamma \vdash e_1:t_1 \and \Gamma \vdash e_2:t_2} {\Gamma \vdash (e_1,e_2):\pair {t_1} {t_2}} { } \qquad \Infer[Subs] { \Gamma \vdash e:t\\t\leq t' } { \Gamma \vdash e: t' } { } \qquad\vspace{-3mm} \end{mathpar} These rules are quite standard and do not need any particular explanation besides those already given in Section~\ref{sec:syntax}. Just notice that we used a classic subsumption rule (i.e., \Rule{Subs}) to embed subtyping in the type system. Let us next focus on the unconventional aspects of our system, from the simplest to the hardest. The first unconventional aspect is that, as explained in Section~\ref{sec:challenges}, our type assumptions are about expressions. Therefore, in our rules the type environments, ranged over by \Gamma, map \emph{expressions}---rather than just variables---into types. This explains why the classic typing rule for variables is replaced by a more general \Rule{Env} rule defined below:\vspace{-1mm} \begin{mathpar} \Infer[Env] { } { \Gamma \vdash e: \Gamma(e) } { e\in\dom\Gamma } \qquad \Infer[Inter] { \Gamma \vdash e:t_1\\\Gamma \vdash e:t_2 } { \Gamma \vdash e: t_1 \wedge t_2 } { }\vspace{-3mm} \end{mathpar} The \Rule{Env} rule is coupled with the standard intersection introduction rule \Rule{Inter} which allows us to deduce for a complex expression the intersection of the types recorded by the occurrence typing analysis in the environment \Gamma with the static type deduced for the same expression by using the other typing rules. This same intersection rule is also used to infer the second unconventional aspect of our system, that is, the fact that \lambda-abstractions can have negated arrow types, as long as these negated types do not make the type deduced for the function empty:\vspace{-.5mm} \begin{mathpar} \Infer[Abs-] {\Gamma \vdash \lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e:t} { \Gamma \vdash\lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e:\neg(t_1\to t_2) } { ((\wedge_{i\in I}\arrow {s_i} {t_i})\wedge\neg(t_1\to t_2))\not\simeq\Empty }\vspace{-1mm} \end{mathpar} %\beppe{I have doubt: is this safe or should we play it safer and % deduce t\wedge\neg(t_1\to t_2)? In other terms is is possible to % deduce two separate negation of arrow types that when intersected % with the interface are non empty, but by intersecting everything % makes the type empty? It should be safe since otherwise intersection % would not be admissible in semantic subtyping (see Theorem 6.15 in % JACM), but I think we should doube ckeck it.} As explained in Section~\ref{sec:challenges}, we need to be able to deduce for, say, the function \lambda^{\Int\to\Int} x.x a type such as (\Int\to\Int)\wedge\neg(\Bool\to\Bool) (in particular, if this is the term e in equation \eqref{bistwo} we need to deduce for it the type (\Int\to t)\wedge\neg(\Int\to\neg\Bool), that is, (\Int\to t)\setminus(\Int\to\neg\Bool) ). But the sole rule \Rule{Abs+} above does not allow us to deduce negations of arrows for abstractions: the rule \Rule{Abs-} makes this possible. As an aside, note that this kind of deduction was already present in the system by~\citet{Frisch2008} though in that system this presence was motivated by the semantics of types rather than, as in our case, by the soundness of the type system. Rules \Rule{Abs+} and \Rule{Abs-} are not enough to deduce for \lambda-abstractions all the types we wish. In particular, these rules alone are not enough to type general overloaded functions. For instance, consider this simple example of a function that applied to an integer returns its successor and applied to anything else returns \textsf{true}:\\[1mm] \centerline{$$%\[ \lambda^{(\Int\to\Int)\wedge(\neg\Int\to\Bool)} x\,.\,\tcase{x}{\Int}{x+1}{\textsf{true}} %$$$}\$1mm] Clearly, the expression above is well typed, but the rule \Rule{Abs+} alone is not enough to type it. In particular, according to \Rule{Abs+} we have to prove that under the hypothesis that x is of type \Int the expression (\tcase{x}{\Int}{x+1}{\textsf{true}}) is of type \Int, too. That is, that under the hypothesis that x has type \Int\wedge\Int (we apply occurrence typing) the expression x+1 is of type \Int{} (which holds) and that under the hypothesis that x has type \Int\setminus\Int, that is \Empty (we apply once more occurrence typing), \textsf{true} is of type \Int{} (which \emph{does not} hold). The problem is that we are trying to type the second case of a type-case even if we know that there is no chance that, when x is bound to an integer, that case will be ever selected. The fact that it is never selected is witnessed by the presence of a type hypothesis with \Empty type. To avoid this problem (and type the term above) we add the rule \Rule{Efq} (\emph{ex falso quodlibet}) that allows the system to deduce any type for an expression that will never be selected, that is, for an expression whose type environment contains an empty assumption: \begin{mathpar} \Infer[Efq] { } { \Gamma, (e:\Empty) \vdash e': t } { }\vspace{-3mm} \end{mathpar} Once more, this kind of deduction was already present in the system by~\citet{Frisch2008} to type full fledged overloaded functions, though it was embedded in the typing rule for the type-case. Here we need the rule \Rule{Efq}, which is more general, to ensure the property of subject reduction. %\beppe{Example?} Finally, there remains one last rule in our type system, the one that implements occurrence typing, that is, the rule for the type-case:\vspace{-1mm} \begin{mathpar} \Infer[Case] {\Gamma\vdash e:t_0\\ %t_0\not\leq \neg t \Rightarrow \Gamma \evdash e t \Gamma_1 \\ \Gamma_1 \vdash e_1:t'\\ %t_0\not\leq t \Rightarrow \Gamma \evdash e {\neg t} \Gamma_2 \\ \Gamma_2 \vdash e_2:t'} {\Gamma\vdash \tcase {e} t {e_1}{e_2}: t'} { }\vspace{-3mm} \end{mathpar} The rule \Rule{Case} checks whether the expression e, whose type is being tested, is well-typed and then performs the occurrence typing analysis that produces the environments \Gamma_i's under whose hypothesis the expressions e_i's are typed. The production of these environments is represented by the judgments \Gamma \evdash e {(\neg)t} \Gamma_i. The intuition is that when \Gamma \evdash e t \Gamma_1 is provable then \Gamma_1 is a version of \Gamma extended with type hypotheses for all expressions occurring in e, type hypotheses that can be deduced assuming that the test e\in t succeeds. Likewise, \Gamma \evdash e {\neg t} \Gamma_2 (notice the negation on t) extends \Gamma with the hypothesis deduced assuming that e\in\neg t, that is, for when the test e\in t fails. All it remains to do is to show how to deduce judgments of the form \Gamma \evdash e t \Gamma'. For that we first define how to denote occurrences of an expression. These are identified by paths in the syntax tree of the expressions, that is, by possibly empty strings of characters denoting directions starting from the root of the tree (we use \epsilon for the empty string/path, which corresponds to the root of the tree). Let e be an expression and \varpi\in\{0,1,l,r,f,s\}^* a \emph{path}; we denote \occ e\varpi the occurrence of e reached by the path \varpi, that is (for i=1,2, and undefined otherwise)\vspace{-.4mm} %% \[ %% \begin{array}{l} %% \begin{array}{r@{\downarrow}l@{\quad=\quad}l} %% e&\epsilon & e\\ %% e_0e_1& i.\varpi & \occ{e_i}\varpi\qquad i=0,1\\ %% (e_0,e_1)& l.\varpi & \occ{e_0}\varpi\\ %% (e_0,e_1)& r.\varpi & \occ{e_1}\varpi\\ %% \pi_1 e& f.\varpi & \occ{e}\varpi\\ %% \pi_2 e& s.\varpi & \occ{e}\varpi\\ %% \end{array}\\ %% \text{undefined otherwise} %% \end{array} %%$ $\begin{array}{r@{\downarrow}l@{\quad=\quad}lr@{\downarrow}l@{\quad=\quad}lr@{\downarrow}l@{\quad=\quad}l} e&\epsilon & e & (e_0,e_1)& l.\varpi & \occ{e_0}\varpi &\pi_1 e& f.\varpi & \occ{e}\varpi\\ e_0e_1& i.\varpi & \occ{e_i}\varpi \quad\qquad& (e_0,e_1)& r.\varpi & \occ{e_1}\varpi \quad\qquad& \pi_2 e& s.\varpi & \occ{e}\varpi\\[-.4mm] \end{array}$ To ease our analysis we used different directions for each kind of term. So we have $0$ and $1$ for the function and argument of an application, $l$ and $r$ for the $l$eft and $r$ight expressions forming a pair, and $f$ and $s$ for the argument of a $f$irst or of a $s$econd projection. Note also that we do not consider occurrences under $\lambda$'s (since their type is frozen in their annotations) and type-cases (since they reset the analysis). % The judgments $\Gamma \evdash e t \Gamma'$ are then deduced by the following two rules:\vspace{-1mm} \begin{mathpar} % \Infer[Base] % { \Gamma \vdash e':t' } % { \Gamma \cvdash p e t e':t' } % { } % \qquad % \Infer[Path] % { \pvdash \Gamma p e t \varpi:t_1 \\ \Gamma,(\occ e \varpi:t_1) \cvdash p e t e':t_2 } % { \Gamma \cvdash p e t e':t_2 } % { } \Infer[Base] { } { \Gamma \evdash e t \Gamma } { } \qquad \Infer[Path] { \pvdash {\Gamma'} e t \varpi:t' \\ \Gamma \evdash e t \Gamma' } { \Gamma \evdash e t \Gamma',(\occ e \varpi:t') } { }\vspace{-1.5mm} \end{mathpar} These rules describe how to produce by occurrence typing the type environments while checking that an expression $e$ has type $t$. They state that $(i)$ we can deduce from $\Gamma$ all the hypothesis already in $\Gamma$ (rule \Rule{Base}) and that $(ii)$ if we can deduce a given type $t'$ for a particular occurrence $\varpi$ of the expression $e$ being checked, then we can add this hypothesis to the produced type environment (rule \Rule{Path}). The rule \Rule{Path} uses a (last) auxiliary judgement $\pvdash {\Gamma} e t \varpi:t'$ to deduce the type $t'$ of the occurrence $\occ e \varpi$ when checking $e$ against $t$ under the hypotheses $\Gamma$. This rule \Rule{Path} is subtler than it may appear at first sight, insofar as the deduction of the type for $\varpi$ may already use some hypothesis on $\occ e \varpi$ (in $\Gamma'$) and, from an algorithmic viewpoint, this will imply the computation of a fix-point (see Section~\ref{sec:typenv}). The last ingredient for our type system is the deduction of the judgements of the form $\pvdash {\Gamma} e t \varpi:t'$ where $\varpi$ is a path to an expression occurring in $e$. This is given by the following set of rules. \begin{mathpar} \Infer[PSubs] { \pvdash \Gamma e t \varpi:t_1 \\ t_1\leq t_2 } { \pvdash \Gamma e t \varpi:t_2 } { } \quad \Infer[PInter] { \pvdash \Gamma e t \varpi:t_1 \\ \pvdash \Gamma e t \varpi:t_2 } { \pvdash \Gamma e t \varpi:t_1\land t_2 } { } \quad \Infer[PTypeof] { \Gamma \vdash \occ e \varpi:t' } { \pvdash \Gamma e t \varpi:t' } { } \vspace{-1.2mm}\\ \Infer[PEps] { } { \pvdash \Gamma e t \epsilon:t } { } \qquad \Infer[PAppR] { \pvdash \Gamma e t \varpi.0:\arrow{t_1}{t_2} \\ \pvdash \Gamma e t \varpi:t_2'} { \pvdash \Gamma e t \varpi.1:\neg t_1 } { t_2\land t_2' \simeq \Empty } \vspace{-1mm}\\ \Infer[PAppL] { \pvdash \Gamma e t \varpi.1:t_1 \\ \pvdash \Gamma e t \varpi:t_2 } { \pvdash \Gamma e t \varpi.0:\neg (\arrow {t_1} {\neg t_2}) } { } \qquad \Infer[PPairL] { \pvdash \Gamma e t \varpi:\pair{t_1}{t_2} } { \pvdash \Gamma e t \varpi.l:t_1 } { } \vspace{-1.2mm}\\ \Infer[PPairR] { \pvdash \Gamma e t \varpi:\pair{t_1}{t_2} } { \pvdash \Gamma e t \varpi.r:t_2 } { } \qquad \Infer[PFst] { \pvdash \Gamma e t \varpi:t' } { \pvdash \Gamma e t \varpi.f:\pair {t'} \Any } { } \qquad \Infer[PSnd] { \pvdash \Gamma e t \varpi:t' } { \pvdash \Gamma e t \varpi.s:\pair \Any {t'} } { }\vspace{-1.9mm} \end{mathpar} These rules implement the analysis described in Section~\ref{sec:ideas} for functions and extend it to products. Let us comment each rule in detail. \Rule{PSubs} is just subsumption for the deduction $\vdashp$. The rule \Rule{PInter} combined with \Rule{PTypeof} allows the system to deduce for an occurrence $\varpi$ the intersection of the static type of $\occ e \varpi$ (deduced by \Rule{PTypeof}) with the type deduced for $\varpi$ by the other $\vdashp$ rules. The rule \Rule{PEps} is the starting point of the analysis: if we are assuming that the test $e\in t$ succeeds, then we can assume that $e$ (i.e., $\occ e\epsilon$) has type $t$ (recall that assuming that the test $e\in t$ fails corresponds to having $\neg t$ at the index of the turnstyle). The rule \Rule{PApprR} implements occurrence typing for the arguments of applications, since it states that if a function maps arguments of type $t_1$ in results of type $t_2$ and an application of this function yields results (in $t'_2$) that cannot be in $t_2$ (since $t_2\land t_2' \simeq \Empty$), then the argument of this application cannot be of type $t_1$. \Rule{PAppR} performs the occurrence typing analysis for the function part of an application, since it states that if an application has type $t_2$ and the argument of this application has type $t_1$, then the function in this application cannot have type $t_1\to\neg t_2$. Rules \Rule{PPair\_} are straightforward since they state that the $i$-th projection of a pair that is of type $\pair{t_1}{t_2}$ must be of type $t_i$. So are the last two rules that essentially state that if $\pi_1 e$ (respectively, $\pi_2 e$) is of type $t'$, then the type of $e$ must be of the form $\pair{t'}\Any$ (respectively, $\pair\Any{t'}$). This concludes the presentation of our type system, which satisfies the property of safety, deduced, as customary, from the properties of progress and subject reduction (\emph{cf.} Appendix~\ref{app:soundness}). \begin{theorem}[type safety] For every expression $e$ such that $\varnothing\vdash e:t$ either $e$ diverges or there exists a value $v$ of type $t$ such that $e\reduces^* v$. \end{theorem} \subsection{Algorithmic system} \label{ssec:algorithm} The system we defined in the previous section implements the ideas we illustrated in the introduction and it is safe. Now the problem is to decide whether an expression is well typed or not, that is, to find an algorithm that given a type environment $\Gamma$ and an expression $e$ decides whether there exists a type $t$ such that $\Gamma\vdash e:t$ is provable. For that we need to solve essentially two problems: $(i)$~how to handle the fact that it is possible to deduce several types for the same well-typed expression and $(ii)$~how to compute the auxiliary deduction system for paths. Multiple types have two distinct origins each requiring a distinct technical solution. The first origin is the rule \Rule{Abs-} by which it is possible to deduce for every well-typed lambda abstraction infinitely many types, that is the annotation of the function intersected with as many negations of arrow types as it is possible without making the type empty. To handle this multiplicity we use and adapt the technique of \emph{type schemes} defined by~\citet{Frisch2008}. Type schemes---whose definition we recall in Section~\ref{sec:type-schemes}---are canonical representations of the infinite sets of types of $\lambda$-abstractions. The second origin is due to the presence of structural rules\footnote{\label{fo:rules}In logic, logical rules refer to a particular connective (here, a type constructor, that is, either $\to$, or $\times$, or $b$), while identity rules (e.g., axioms and cuts) and structural rules (e.g., weakening and contraction) do not.} such as \Rule{Subs} and \Rule{Inter}. We handle this presence in the classic way: we define an algorithmic system that tracks the miminum type---actually, the minimum \emph{type scheme}---of an expression; this system is obtained from the original system by removing the two structural rules and by distributing suitable checks of the subtyping relation in the remaining rules. To do that in the presence of set-theoretic types we need to define some operators on types, which are given in Section~\ref{sec:typeops}. For what concerns the use of the auxiliary derivation for the  judgments, we present in Section~\ref{sec:typenv} an algorithm that is sound and satisfies a limited form of completeness. All these notions are then used in the algorithmic typing system given in Section~\ref{sec:algorules}. %% \beppe{\begin{enumerate} %% \item type of functions -> type schemes %% \item elimination rules (app, proj,if) ->operations on types and how to compute them %% \item not syntax directed: rules Subs, Inter, Env. %% \item Compute the environments for occurrence typing. Algorithm to compute $\Gamma\vdash\Gamma$ %% \end{enumerate} %% } \subsubsection{Type schemes}\label{sec:type-schemes} We introduce the new syntactic category of \emph{types schemes} which are the terms~$\ts$ inductively produced by the following grammar. $\begin{array}{lrcl} \textbf{Type schemes} & \ts & ::= & t \alt \tsfun {\arrow t t ; \cdots ; \arrow t t} \alt \ts \tstimes \ts \alt \ts \tsor \ts \alt \tsempty \end{array}$ Type schemes denote sets of types, as formally stated by the following definition: \begin{definition}[Interpretation of type schemes] We define the function $\tsint {\_}$ that maps type schemes into sets of types.\vspace{-2.5mm} \begin{align*} \begin{array}{lcl} \tsint t &=& \{s\alt t \leq s\}\\ \tsint {\tsfunone {t_i} {s_i}_{i=1..n}} &=& \{s\alt \exists s_0 = \bigwedge_{i=1..n} \arrow {t_i} {s_i} \land \bigwedge_{j=1..m} \neg (\arrow {t_j'} {s_j'}).\ \Empty \not\simeq s_0 \leq s \}\\ \tsint {\ts_1 \tstimes \ts_2} &=& \{s\alt \exists t_1 \in \tsint {\ts_1}\ \exists t_2 \in \tsint {\ts_2}.\ \pair {t_1} {t_2} \leq s\}\\ \tsint {\ts_1 \tsor \ts_2} &=& \{s\alt \exists t_1 \in \tsint {\ts_1}\ \exists t_2 \in \tsint {\ts_2}.\ {t_1} \vee {t_2} \leq s\}\\ \tsint \tsempty &=& \varnothing \end{array} \end{align*} \end{definition} Note that $\tsint \ts$ is closed under subsumption and intersection and that $\tsempty$, which denotes the empty set of types is different from $\Empty$ whose interpretation is the set of all types. \begin{lemma}[\cite{Frisch2008}] Let $\ts$ be a type scheme and $t$ a type. It is possible to decide the assertion $t \in \tsint \ts$, which we also write $\ts \leq t$. \end{lemma} We can now formally define the relation $v\in t$ used in Section~\ref{sec:opsem} to define the dynamic semantics of the language. First, we associate each (possibly, not well-typed) value to a type scheme representing the best type information about the value. By induction on the definition of values: $\tyof c {} = \basic c$, $\tyof {\lambda^{\wedge_{i\in I}s_i\to t_i} x.e}{}= \tsfun{s_i\to t_i}_{i\in I}$, $\tyof {(v_1,v_2)}{} = \tyof {v_1}{}\tstimes\tyof {v_1}{}$. Then we have $v\in t\iffdef \tyof v{}\leq t$. We also need to perform intersections of type schemes so as to intersect the static type of an expression (i.e., the one deduced by conventional rules) with the one deduced by occurrence typing (i.e., the one derived by $\vdashp$). For our algorithmic system (see \Rule{Env$_{\scriptscriptstyle\mathcal{A}}$} in Section~\ref{sec:algorules}) all we need to define is the intersection of a type scheme with a type: \begin{lemma}[\cite{Frisch2008}] Let $\ts$ be a type scheme and $t$ a type. We can compute a type scheme, written $t \tsand \ts$, such that $$\tsint {t \tsand \ts} = \{s \alt \exists t' \in \tsint \ts.\ t \land t' \leq s \}$$ \end{lemma} Finally, given a type scheme $\ts$ it is straightforward to choose in its interpretation a type $\tsrep\ts$ which serves as the canonical representative of the set (i.e., $\tsrep \ts \in \tsint \ts$): \begin{definition}[Representative] We define a function $\tsrep {\_}$ that maps every non-empty type scheme into a type, \textit{representative} of the set of types denoted by the scheme.\vspace{-2mm} \begin{align*} \begin{array}{lcllcl} \tsrep t &=& t & \tsrep {\ts_1 \tstimes \ts_2} &=& \pair {\tsrep {\ts_1}} {\tsrep {\ts_2}}\\ \tsrep {\tsfunone {t_i} {s_i}_{i\in I}} &=& \bigwedge_{i\in I} \arrow {t_i} {s_i} \qquad& \tsrep {\ts_1 \tsor \ts_2} &=& \tsrep {\ts_1} \vee \tsrep {\ts_2}\\ \tsrep \tsempty && \textit{undefined} \end{array}\vspace{-4mm} \end{align*} \end{definition} \subsubsection{Operators for type constructors}\label{sec:typeops} In order to define the algorithmic typing of expressions like applications and projections we need to define the operators on types we used in Section~\ref{sec:ideas}. Consider the rule \Rule{App} for applications. It essentially does three things: $(1)$ it checks that the function has functional type; $(2)$ it checks that the argument is in the domain of the function, and $(3)$ it returns the type of the application. In systems without set-theoretic types these operations are quite straightforward: $(1)$ corresponds to checking that the function has an arrow type, $(2)$ corresponds to checking that the argument is in the domain of the arrow deduced for the function and $(3)$ corresponds to returning the codomain of that same arrow. With set-theoretic types things get more difficult, since a function can be typed by, say, a union of intersection of arrows and negations of types. Checking that the function has a functional type is easy since it corresponds to checking that it has a type subtype of $\Empty{\to}\Any$. Determining its domain and the type of the application is more complicated and needs the operators $\dom{}$ and $\circ$ we informally described in Section~\ref{sec:ideas} where we also introduced the operator $\worra{}{}$. These three operators are used by our algorithm and formally defined as:\vspace{-2mm} \begin{eqnarray} \dom t & = & \max \{ u \alt t\leq u\to \Any\} \$-1mm] \apply t s & = &\,\min \{ u \alt t\leq s\to u\} \\[-1mm] \worra t s & = &\,\min\{u \alt t\circ(\dom t\setminus u)\leq \neg s\}\label{worra} \end{eqnarray}\vspace{-5.75mm}\\ %In short, \dom t is the largest domain of any single arrow that %subsumes t, \apply t s is the smallest domain of an arrow type %that subsumes t and has domain s and \worra t s was explained %before. We need similar operators for projections since the type t of e in \pi_i e may not be a single product type but, say, a union of products: all we know is that t must be a subtype of \pair\Any\Any. So let t be a type such that t\leq\pair\Any\Any, then we define:\vspace{-1.2mm} $$\begin{array}{lcrlcr} \bpl t & = & \min \{ u \alt t\leq \pair u\Any\}\qquad&\qquad \bpr t & = & \min \{ u \alt t\leq \pair \Any u\} \end{array}\vspace{-1.2mm}$$ All the operators above but \worra{}{} are already present in the theory of semantic subtyping: the reader can find how to compute them and how to extend their definition to type schemes in~\cite[Section 6.11]{Frisch2008} (see also~\citep[\S4.4]{Cas15} for a detailed description). Below we just show the formula that computes \worra t s for a t subtype of \Empty\to\Any. For that, we use a result of semantic subtyping that states that every type t is equivalent to a type in disjunctive normal form and that if furthermore t\leq\Empty\to\Any, then t \simeq \bigvee_{i\in I}\left(\bigwedge_{p\in P_i}(s_p\to t_p)\bigwedge_{n\in N_i}\neg(s_n'\to t_n')\right) with \bigwedge_{p\in P_i}(s_p\to t_p)\bigwedge_{n\in N_i}\neg(s_n'\to t_n') \not\simeq \Empty for all i in I. For such a t and any type s then we have:\vspace{-1.7mm} % $$\worra t s = \dom t \wedge\bigvee_{i\in I}\left(\bigwedge_{\{P\subseteq P_i\alt s\leq \bigvee_{p \in P} \neg t_p\}}\left(\bigvee_{p \in P} \neg s_p\right) \right)\vspace{-1.7mm}$$ The formula considers only the positive arrows of each summand that forms t and states that, for each summand, whenever you take a subset P of its positive arrows that cannot yield results in s (since s does not overlap the intersection of the codomains of these arrows), then the success of the test cannot depend on these arrows and therefore the intersection of the domains of these arrows---i.e., the values that would precisely select that set of arrows---can be removed from \dom t. The proof that this type satisfies \eqref{worra} is given in the Appendix~\ref{app:worra}. \subsubsection{Type environments for occurrence typing}\label{sec:typenv} The last step for our presentation is to define the algorithm for the deduction of \Gamma \evdash e t \Gamma', that is an algorithm that takes as input \Gamma, e, and t, and returns an environment that extends \Gamma with hypotheses on the occurrences of e that are the most general that can be deduced by assuming that e\in t succeeds. For that we need the notation \tyof{e}{\Gamma} which denotes the type scheme deduced for e under the type environment \Gamma in the algorithmic type system of Section~\ref{sec:algorules}. That is, \tyof{e}{\Gamma}=\ts if and only if \Gamma\vdashA e:\ts is provable. We start by defining the algorithm for each single occurrence, that is for the deduction of \pvdash \Gamma e t \varpi:t'. This is obtained by defining two mutually recursive functions \constrf and \env{}{}:\vspace{-1.3mm} \begin{eqnarray} \constr\epsilon{\Gamma,e,t} & = & t\label{uno}\\[\sk] \constr{\varpi.0}{\Gamma,e,t} & = & \neg(\arrow{\env {\Gamma,e,t}{(\varpi.1)}}{\neg \env {\Gamma,e,t} (\varpi)})\label{due}\\[\sk] \constr{\varpi.1}{\Gamma,e,t} & = & \worra{\tsrep {\tyof{\occ e{\varpi.0}}\Gamma}}{\env {\Gamma,e,t} (\varpi)}\label{tre}\\[\sk] \constr{\varpi.l}{\Gamma,e,t} & = & \bpl{\env {\Gamma,e,t} (\varpi)}\label{quattro}\\[\sk] \constr{\varpi.r}{\Gamma,e,t} & = & \bpr{\env {\Gamma,e,t} (\varpi)}\label{cinque}\\[\sk] \constr{\varpi.f}{\Gamma,e,t} & = & \pair{\env {\Gamma,e,t} (\varpi)}\Any\label{sei}\\[\sk] \constr{\varpi.s}{\Gamma,e,t} & = & \pair\Any{\env {\Gamma,e,t} (\varpi)}\label{sette}\\[.8mm] \env {\Gamma,e,t} (\varpi) & = & \tsrep {\constr \varpi {\Gamma,e,t} \tsand \tyof {\occ e \varpi} \Gamma}\label{otto} \end{eqnarray}\vspace{-5mm}\\ All the functions above are defined if and only if the initial path \varpi is valid for e (i.e., \occ e{\varpi} is defined) and e is well-typed (which implies that all \tyof {\occ e{\varpi}} \Gamma in the definition are defined).\footnote{Note that the definition is well-founded. This can be seen by analyzing the rule \Rule{Case\Aa}: the definition of \Refine {e,t} \Gamma and \Refine {e,\neg t} \Gamma use \tyof{\occ e{\varpi}}\Gamma, and this is defined for all \varpi since the first premisses of \Rule{Case\Aa} states that \Gamma\vdash e:\ts_0 (and this is possible only if we were able to deduce under the hypothesis \Gamma the type of every occurrence of e.)} Each case of the definition of the \constrf function corresponds to the application of a logical rule (\emph{cf.} Footnote~\ref{fo:rules}) in the deduction system for \vdashp: case \eqref{uno} corresponds to the application of \Rule{PEps}; case \eqref{due} implements \Rule{Pappl} straightforwardly; the implementation of rule \Rule{PAppR} is subtler: instead of finding the best t_1 to subtract (by intersection) from the static type of the argument, \eqref{tre} finds directly the best type for the argument by applying the \worra{}{} operator to the static types of the function and the refined type of the application. The remaining (\ref{quattro}--\ref{sette}) cases are the straightforward implementations of the rules \Rule{PPairL}, \Rule{PPairR}, \Rule{PFst}, and \Rule{PSnd}, respectively. The other recursive function, \env{}{}, implements the two structural rules \Rule{PInter} and \Rule{PTypeof} by intersecting the type obtained for \varpi by the logical rules, with the static type deduced by the type system for the expression occurring at \varpi. The remaining structural rule, \Rule{Psubs}, is accounted for by the use of the operators \worra{}{} and \boldsymbol{\pi}_i in the definition of \constrf. It remains to explain how to compute the environment \Gamma' produced from \Gamma by the deduction system for \Gamma \evdash e t \Gamma'. Alas, this is the most delicate part of our algorithm. % In a nutshell what we want to do is to define a function \Refine{\_,\_}{\_} that takes a type environment \Gamma, an expression e and a type t and returns the best type environment \Gamma' such that \Gamma \evdash e t \Gamma' holds. By the best environment we mean the one in which the occurrences of e are associated to the largest possible types (type environments are hypotheses so they are contravariant: the larger the type the better the hypothesis). Recall that in Section~\ref{sec:challenges} we said that we want our analysis to be able to capture all the information available from nested checks. If we gave up such a kind of precision then the definition of \Refinef would be pretty easy: it must map each subexpression of e to the intersection of the types deduced by \vdashp (i.e., by \env{}{}) for each of its occurrences. That is, for each expression e' occurring in e, \Refine {e,t}\Gamma would be the type environment that maps e' into \bigwedge_{\{\varpi \alt \occ e \varpi \equiv e'\}} \env {\Gamma,e,t} (\varpi). As we explained in Section~\ref{sec:challenges} the intersection is needed to apply occurrence typing to expressions such as \tcase{(x,x)}{\pair{t_1}{t_2}}{e_1}{e_2} where some expressions---here x---occur multiple times. In order to capture most of the type information from nested queries the rule \Rule{Path} allows the deduction of the type of some occurrence \varpi to use a type environment \Gamma' that may contain information about some suboccurrences of \varpi. On the algorithm this would correspond to applying the \Refinef defined above to an environment that already is the result of \Refinef, and so on. Therefore, ideally our algorithm should compute the type environment as a fixpoint of the function X\mapsto\Refine{e,t}{X}. Unfortunately, an iteration of \Refinef may not converge. As an example, consider the (dumb) expression \tcase {x_1x_2}{\Any}{e_1}{e_2}. If x_1:\Any\to\Any, then every iteration of \Refinef yields for x_1 a type strictly more precise than the type deduced in the previous iteration. The solution we adopt here is to bound the number of iterations to some number n_o. From a formal point of view, this means to give up the completeness of the algorithm: \Refinef will be complete (i.e., it will find all solutions) for the deductions of \Gamma \evdash e t \Gamma' of depth at most n_o. This is obtained by the following definition of \Refinef\vspace{-1mm} \[ \begin{array}{rcl} \Refinef_{e,t} \eqdef (\RefineStep{e,t})^{n_o}\\[-2mm] \text{where }\RefineStep {e,t}(\Gamma)(e') &=& \left\{\begin{array}{ll} %\tyof {e'} \Gamma \tsand \bigwedge_{\{\varpi \alt \occ e \varpi \equiv e'\}} \env {\Gamma,e,t} (\varpi) & \text{if } \exists \varpi.\ \occ e \varpi \equiv e' \\ \Gamma(e') & \text{otherwise, if } e'\in\dom\Gamma\\ \text{undefined} & \text{otherwise} \end{array}\right. \end{array}\vspace{-1.5mm}$ Note in particular that $\Refine{e,t}\Gamma$ extends $\Gamma$ with hypotheses on the expressions occurring in $e$, since $\dom{\Refine{e,t}\Gamma} = \dom{\RefineStep {e,t}(\Gamma)} = \dom{\Gamma} \cup \{e' \alt \exists \varpi.\ \occ e \varpi \equiv e'\}$. In other terms, we try to find a fixpoint of $\RefineStep{e,t}$ but we bound our search to $n_o$ iterations. Since $\RefineStep {e,t}$ is monotone (w.r.t.\ the subtyping pre-order extended to type environments pointwise), then every iteration yields a better solution. While this is unsatisfactory from a formal point of view, in practice the problem is a very mild one. Divergence may happen only when refining the type of a function in an application: not only such a refinement is meaningful only when the function is typed by a union type (which is quite rare in practice)\footnote{\label{foo:typecase}The only impact of adding a negated arrow type to the type of a function is when we test whether the function has a given arrow type: in practice this never happens since programming languages test whether a value \emph{is} a function, rather than the type of a given function.}, but also we had to build the expression that causes the divergence in quite an \emph{ad hoc} way which makes divergence even more unlikely. %: setting %an $n_o$ twice the depth of the syntax tree of the outermost type case %should be enough to capture all realistic cases \beppe{can we give an estimate % based on benchmarking the prototype?} \subsubsection{Algorithmic typing rules}\label{sec:algorules} We now have all the notions we need for our typing algorithm, which is defined by the following rules. \begin{mathpar} \Infer[Efq\Aa] { } { \Gamma, (e:\Empty) \vdashA e': \Empty } { \begin{array}{c}\text{\tiny with priority over}\\[-1.8mm]\text{\tiny all the other rules}\end{array}} \qquad \Infer[Var\Aa] { } { \Gamma \vdashA x: \Gamma(x) } { x\in\dom\Gamma} \vspace{-2mm}\\ \Infer[Env\Aa] { \Gamma\setminus\{e\} \vdashA e : \ts } { \Gamma \vdashA e: \Gamma(e) \tsand \ts } { \begin{array}{c}e\in\dom\Gamma \text{ and }\\[-1mm] e \text{ not a variable}\end{array}} \qquad \Infer[Const\Aa] { } {\Gamma\vdashA c:\basic{c}} {c\not\in\dom\Gamma} \vspace{-2mm}\\ \Infer[Abs\Aa] {\Gamma,x:s_i\vdashA e:\ts_i'\\ \ts_i'\leq t_i} { \Gamma\vdashA\lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e:\textstyle\tsfun {\arrow {s_i} {t_i}}_{i\in I} } {\lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e\not\in\dom\Gamma} \vspace{-2mm}\\ \Infer[App\Aa] { \Gamma \vdashA e_1: \ts_1\\ \Gamma \vdashA e_2: \ts_2\\ \ts_1 \leq \arrow \Empty \Any\\ \ts_2 \leq \dom {\ts_1} } { \Gamma \vdashA {e_1}{e_2}: \ts_1 \circ \ts_2 } { {e_1}{e_2}\not\in\dom\Gamma} \vspace{-2mm}\\ \Infer[Case\Aa] {\Gamma\vdashA e:\ts_0\\ %\makebox{$\begin{array}{l} % \left\{ % \begin{array}{ll} %\Gamma, % \Refine + {e,t} \Gamma \vdashA e_1 : \ts_1 & \text{ if } \ts_0 \not\leq \neg t\\ % \ts_1 = \Empty & \text{ otherwise} % \end{array}\right.\\ % \left\{ % \begin{array}{ll} %\Gamma, % \Refine - {e,t} \Gamma \vdashA e_2 : \ts_2 & \text{ if } \ts_0 \not\leq t\\ % \ts_2 = \Empty & \text{ otherwise} % \end{array}\right. %\end{array}$} \Refine {e,t} \Gamma \vdashA e_1 : \ts_1\\ \Refine {e,\neg t} \Gamma \vdashA e_2 : \ts_2} {\Gamma\vdashA \tcase {e} t {e_1}{e_2}: \ts_1\tsor \ts_2} %{\ite {e} t {e_1}{e_2}\not\in\dom\Gamma} { \tcase {e} {t\!} {\!e_1\!}{\!e_2}\not\in\dom\Gamma} \vspace{-2mm} \\ \Infer[Proj\Aa] {\Gamma \vdashA e:\ts\and \!\!\ts\leq\pair{\Any\!}{\!\Any}} {\Gamma \vdashA \pi_i e:\bpi_{\mathbf{i}}(\ts)} {\pi_i e{\not\in}\dom\Gamma}\hfill \Infer[Pair\Aa] {\Gamma \vdashA e_1:\ts_1 \and \!\!\Gamma \vdashA e_2:\ts_2} {\Gamma \vdashA (e_1,e_2):{\ts_1}\tstimes{\ts_2}}%\pair{t_1}{t_2}} {(e_1,e_2){\not\in}\dom\Gamma} \end{mathpar} The side conditions of the rules ensure that the system is syntax directed, that is, that at most one rule applies when typing a term: priority is given to \Rule{Eqf\Aa} over all the other rules and to \Rule{Env\Aa} over all remaining logical rules. Type schemes are used to account for the type-multiplicity stemming from $\lambda$-abstractions as shown in particular by rule \Rule{Abs\Aa} (in what follows we use the word type'' also for type schemes). The subsumption rule is no longer in the system; it is replaced by: $(i)$ using a union type in \Rule{Case\Aa}, $(ii)$ checking in \Rule{Abs\Aa} that the body of the function is typed by a subtype of the type declared in the annotation, and $(iii)$ using type operators and checking subtyping in the elimination rules \Rule{App\Aa,Proj\Aa}. In particular, for \Rule{App\Aa} notice that it checks that the type of the function is a functional type, that the type of the argument is a subtype of the domain of the function, and then returns the result type of the application of the two types. The intersection rule is replaced by the use of type schemes in \Rule{Abs\Aa} and by the rule \Rule{Env\Aa}. The latter intersects the type deduced for an expression $e$ by occurrence typing and stored in $\Gamma$ with the type deduced for $e$ by the logical rules: this is simply obtained by removing any hypothesis about $e$ from $\Gamma$, so that the deduction of the type $\ts$ for $e$ cannot but end by a logical rule. Of course this does not apply when the expression $e$ is a variable, since an hypothesis in $\Gamma$ is the only way to deduce the type of a variable, which is why the algorithm reintroduces the classic rule for variables. The algorithmic system above is sound with respect to the deductive one of Section~\ref{sec:static} \begin{theorem}[Soundness] For every $\Gamma$, $e$, $t$, $n_o$, if $\Gamma\vdashA e: \ts$, then $\Gamma \vdash e:t$ for every $t\in\ts$. \end{theorem} We were not able to prove full completeness, just a partial form of it. As anticipated, the problems are twofold: $(i)$ the recursive nature of rule \Rule{Path} and $(ii)$ the use of nested \Rule{PAppL} that yield a precision that the algorithm loses by applying \tsrep{} in the definition of \constrf{} (case~\eqref{due} is the critical one). Completeness is recovered by $(i)$ limiting the depth of the derivations and $(ii)$ forbidding nested negated arrows on the left-hand side of negated arrows.\vspace{-.7mm} \begin{definition}[Rank-0 negation] A derivation of $\Gamma \vdash e:t$ is \emph{rank-0 negated} if \Rule{Abs--} never occurs in the derivation of a left premise of a \Rule{PAppL} rule.\vspace{-.7mm} \end{definition} \noindent The use of this terminology is borrowed from the ranking of higher-order types, since, intuitively, it corresponds to typing a language in which in the types used in dynamic tests, a negated arrow never occurs on the left-hand side of another negated arrow. \begin{theorem}[Partial Completeness] For every $\Gamma$, $e$, $t$, if $\Gamma \vdash e:t$ is derivable by a rank-0 negated derivation, then there exists $n_o$ such that $\Gamma\vdashA e: t'$ and $t'\leq t$. \end{theorem} \noindent The use of type schemes and of possibly diverging iterations yields a system that may seem overly complicated. But it is important to stress that this system is defined only to study the type inference system of Section~\ref{sec:static} and in particular to probe how close we can get to a complete algorithm for it. But for practical applications type schemes are not needed, since they are necessary only when type cases may specify types with negative arrows and this, in practice, never happens (see Footnote~\ref{foo:typecase} and Corollary~\ref{app:completeness}). This is why for our implementation we use the CDuce library in which type schemes are absent and functions are typed only by intersections of positive arrows. We present the implementation in Section~\ref{sec:practical} but before let us study some extensions.