The type system we defined in the previous section implements the ideas we illustrated in the introduction and it is safe. Now the problem is to decide whether an expression is well typed or not, that is, to find an algorithm that given a type environment $\Gamma$ and an expression $e$ decides whether there exists a type $t$ such that $\Gamma\vdash e:t$ is provable. For that we need to solve essentially two problems: $(i)$~how to handle the fact that it is possible to deduce several types for the same well-typed expression and $(ii)$~how to compute the auxiliary deduction system $\pvdash \Gamma e t$ for paths. $(i)$. Multiple types have two distinct origins each requiring a distinct technical solution. The first origin is the presence of structural rules% \footnote{\label{fo:rules}In logic, logical rules refer to a particular connective (here, a type constructor, that is, either $\to$, or $\times$, or $b$), while identity rules (e.g., axioms and cuts) and structural rules (e.g., weakening and contraction) do not.\svvspace{-3.3mm}} such as \Rule{Subs} and \Rule{Inter}. We handle this presence in the classic way: we define an algorithmic system that tracks the miminum type of an expression; this system is obtained from the original system by removing the two structural rules and by distributing suitable checks of the subtyping relation in the remaining rules. To do that in the presence of set-theoretic types we need to define some operators on types, which are given in Section~\ref{sec:typeops}. The second origin is the rule \Rule{Abs-} by which it is possible to deduce for every well-typed lambda abstraction infinitely many types, that is the annotation of the function intersected with as (finitely) many negations of arrow types as possible without making the type empty. We do not handle this multiplicity directly in the algorithmic system but only in the proof of its soundness by using and adapting the technique of \emph{type schemes} defined by~\citet{Frisch2008}. Type schemes are canonical representations of the infinite sets of types of $\lambda$-abstractions which can be used to define an algorithmic system that can be easily proved to be sound. The simpler algorithm that we propose in this section implies (i.e., it is less precise than) the one with type schemes (\emph{cf}.\ Lemma~\ref{soundness_simple_ts}) and it is thus sound, too. The algorithm of this section is not only simpler but, as we discuss in Section~\ref{sec:algoprop}, is also the one that should be used in practice. This is why we preferred to present it here and relegate the presentation of the system with type schemes to Appendix~\ref{app:typeschemes}. $(ii)$. For what concerns the use of the auxiliary derivation for the $\Gamma \evdash e t \Gamma'$ and $\pvdash \Gamma e t \varpi:t'$ judgments, we present in Section~\ref{sec:typenv} an algorithm that is sound and satisfies a limited form of completeness. All these notions are then used in the algorithmic typing system given in Section~\ref{sec:algorules}. %% \beppe{\begin{enumerate} %% \item type of functions -> type schemes %% \item elimination rules (app, proj,if) ->operations on types and how to compute them %% \item not syntax directed: rules Subs, Inter, Env. %% \item Compute the environments for occurrence typing. Algorithm to compute $\Gamma\vdash\Gamma$ %% \end{enumerate} %% } \subsubsection{Operators for type constructors}\label{sec:typeops} In order to define the algorithmic typing of expressions like applications and projections we need to define the operators on types we used in Section~\ref{sec:ideas}. Consider the classic rule \Rule{App} for applications. It essentially does three things: $(i)$ it checks that the expression in the function position has a functional type; $(ii)$ it checks that the argument is in the domain of the function, and $(iii)$ it returns the type of the application. In systems without set-theoretic types these operations are quite straightforward: $(i)$ corresponds to checking that the expression has an arrow type, $(ii)$ corresponds to checking that the argument is in the domain of the arrow deduced for the function, and $(iii)$ corresponds to returning the codomain of that same arrow. With set-theoretic types things get more difficult, since a function can be typed by, say, a union of intersection of arrows and negations of types. Checking that the function has a functional type is easy since it corresponds to checking that it has a type subtype of $\Empty{\to}\Any$. Determining its domain and the type of the application is more complicated and needs the operators $\dom{}$ and $\circ$ we informally described in Section~\ref{sec:ideas} where we also introduced the operator $\worra{}{}$. These three operators are used by our algorithm and formally defined as:\svvspace{-0.5mm} \begin{eqnarray} \dom t & = & \max \{ u \alt t\leq u\to \Any\} \$-1mm] \apply t s & = &\,\min \{ u \alt t\leq s\to u\} \\[-1mm] \worra t s & = &\,\min\{u \alt t\circ(\dom t\setminus u)\leq \neg s\}\label{worra} \end{eqnarray} In short, \dom t is the largest domain of any single arrow that subsumes t, \apply t s is the smallest codomain of an arrow type that subsumes t and has domain s and \worra t s was explained before. We need similar operators for projections since the type t of e in \pi_i e may not be a single product type but, say, a union of products: all we know is that t must be a subtype of \pair\Any\Any. So let t be a type such that t\leq\pair\Any\Any, then we define:\svvspace{-0.7mm} $$\begin{array}{lcrlcr} \bpl t & = & \min \{ u \alt t\leq \pair u\Any\}\qquad&\qquad \bpr t & = & \min \{ u \alt t\leq \pair \Any u\} \end{array}\svvspace{-0.7mm}$$ All the operators above but \worra{}{} are already present in the theory of semantic subtyping: the reader can find how to compute them in~\cite[Section 6.11]{Frisch2008} (see also~\citep[\S4.4]{Cas15} for a detailed description). Below we just show our new formula that computes \worra t s for a t subtype of \Empty\to\Any. For that, we use a result of semantic subtyping that states that every type t is equivalent to a type in disjunctive normal form and that if furthermore t\leq\Empty\to\Any, then t \simeq \bigvee_{i\in I}\left(\bigwedge_{p\in P_i}(s_p\to t_p)\bigwedge_{n\in N_i}\neg(s_n'\to t_n')\right) with \bigwedge_{p\in P_i}(s_p\to t_p)\bigwedge_{n\in N_i}\neg(s_n'\to t_n') \not\simeq \Empty for all i in I. For such a t and any type s then we have:\svvspace{-1.0mm} % $$\worra t s = \dom t \wedge\bigvee_{i\in I}\left(\bigwedge_{\{P\subseteq P_i\alt s\leq \bigvee_{p \in P} \neg t_p\}}\left(\bigvee_{p \in P} \neg s_p\right) \right)\svvspace{-1.0mm}$$ The formula considers only the positive arrows of each summand that forms t and states that, for each summand, whenever you take a subset P of its positive arrows that cannot yield results in s (since s does not overlap the intersection of the codomains of these arrows), then the success of the test cannot depend on these arrows and therefore the intersection of the domains of these arrows---i.e., the values that would precisely select that set of arrows---can be removed from \dom t. The proof that this type satisfies \eqref{worra} is given in the Appendix~\ref{app:worra}. \subsubsection{Type environments for occurrence typing}\label{sec:typenv} The second ingredient necessary to the definition of our algorithmic systems is the algorithm for the deduction of \Gamma \evdash e t \Gamma', that is an algorithm that takes as input \Gamma, e, and t, and returns an environment that extends \Gamma with hypotheses on the occurrences of e that are the most general that can be deduced by assuming that e\,{\in}\,t succeeds. For that we need the notation \tyof{e}{\Gamma} which denotes the type deduced for e under the type environment \Gamma in the algorithmic type system of Section~\ref{sec:algorules}. That is, \tyof{e}{\Gamma}=t if and only if \Gamma\vdashA e:t is provable. We start by defining the algorithm for each single occurrence, that is for the deduction of \pvdash \Gamma e t \varpi:t'. This is obtained by defining two mutually recursive functions \constrf and \env{}{}:\svvspace{-1.3mm} \begin{eqnarray} \constr\epsilon{\Gamma,e,t} & = & t\label{uno}\\[\sk] \constr{\varpi.0}{\Gamma,e,t} & = & \neg(\arrow{\env {\Gamma,e,t}{(\varpi.1)}}{\neg \env {\Gamma,e,t} (\varpi)})\label{due}\\[\sk] \constr{\varpi.1}{\Gamma,e,t} & = & \worra{{\tyof{\occ e{\varpi.0}}\Gamma}}{\env {\Gamma,e,t} (\varpi)}\label{tre}\\[\sk] \constr{\varpi.l}{\Gamma,e,t} & = & \bpl{\env {\Gamma,e,t} (\varpi)}\label{quattro}\\[\sk] \constr{\varpi.r}{\Gamma,e,t} & = & \bpr{\env {\Gamma,e,t} (\varpi)}\label{cinque}\\[\sk] \constr{\varpi.f}{\Gamma,e,t} & = & \pair{\env {\Gamma,e,t} (\varpi)}\Any\label{sei}\\[\sk] \constr{\varpi.s}{\Gamma,e,t} & = & \pair\Any{\env {\Gamma,e,t} (\varpi)}\label{sette}\\[.8mm] \env {\Gamma,e,t} (\varpi) & = & {\constr \varpi {\Gamma,e,t} \wedge \tyof {\occ e \varpi} \Gamma}\label{otto} \end{eqnarray}\svvspace{-5mm}\\ All the functions above are defined if and only if the initial path \varpi is valid for e (i.e., \occ e{\varpi} is defined) and e is well-typed (which implies that all \tyof {\occ e{\varpi}} \Gamma in the definition are defined)% \iflongversion% .\footnote{Note that the definition is well-founded. This can be seen by analyzing the rule \Rule{Case\Aa} of Section~\ref{sec:algorules}: the definition of \Refine {e,t} \Gamma and \Refine {e,\neg t} \Gamma use \tyof{\occ e{\varpi}}\Gamma, and this is defined for all \varpi since the first premisses of \Rule{Case\Aa} states that \Gamma\vdash e:t_0 (and this is possible only if we were able to deduce under the hypothesis \Gamma the type of every occurrence of e.)\svvspace{-3mm}} \else ; the well foundness of the definition can be deduced by analysing the rule~\Rule{Case\Aa} of Section~\ref{sec:algorules}. \fi Each case of the definition of the \constrf function corresponds to the application of a logical rule (\emph{cf.} definition in Footnote~\ref{fo:rules}) in the deduction system for \vdashp: case \eqref{uno} corresponds to the application of \Rule{PEps}; case \eqref{due} implements \Rule{Pappl} straightforwardly; the implementation of rule \Rule{PAppR} is subtler: instead of finding the best t_1 to subtract (by intersection) from the static type of the argument, \eqref{tre} finds directly the best type for the argument by applying the \worra{}{} operator to the static type of the function and the refined type of the application. The remaining (\ref{quattro}--\ref{sette}) cases are the straightforward implementations of the rules \Rule{PPairL}, \Rule{PPairR}, \Rule{PFst}, and \Rule{PSnd}, respectively. The other recursive function, \env{}{}, implements the two structural rules \Rule{PInter} and \Rule{PTypeof} by intersecting the type obtained for \varpi by the logical rules, with the static type deduced by the type system for the expression occurring at \varpi. The remaining structural rule, \Rule{Psubs}, is accounted for by the use of the operators \worra{}{} and \boldsymbol{\pi}_i in the definition of \constrf. It remains to explain how to compute the environment \Gamma' produced from \Gamma by the deduction system for \Gamma \evdash e t \Gamma'. Alas, this is the most delicate part of our algorithm. % In a nutshell, what we want to do is to define a function \Refine{\_,\_}{\_} that takes a type environment \Gamma, an expression e and a type t and returns the best type environment \Gamma' such that \Gamma \evdash e t \Gamma' holds. By the best environment we mean the one in which the occurrences of e are associated to the largest possible types (type environments are hypotheses so they are contravariant: the larger the type the better the hypothesis). Recall that in Section~\ref{sec:challenges} we said that we want our analysis to be able to capture all the information available from nested checks. If we gave up such a kind of precision then the definition of \Refinef would be pretty easy: it must map each subexpression of e to the intersection of the types deduced by \vdashp (i.e., by \env{}{}) for each of its occurrences. That is, for each expression e' occurring in e, \Refine {e,t}\Gamma would be the type environment that maps e' into \bigwedge_{\{\varpi \alt \occ e \varpi \equiv e'\}} \env {\Gamma,e,t} (\varpi). As we explained in Section~\ref{sec:challenges} the intersection is needed to apply occurrence typing to expressions such as \tcase{(x,x)}{\pair{t_1}{t_2}}{e_1}{e_2} where some expressions---here x---occur multiple times. In order to capture most of the type information from nested queries the rule \Rule{Path} allows the deduction of the type of some occurrence \varpi to use a type environment \Gamma' that may contain information about some suboccurrences of \varpi. On the algorithm this would correspond to applying the \Refinef defined above to an environment that already is the result of \Refinef, and so on. Therefore, ideally our algorithm should compute the type environment as a fixpoint of the function X\mapsto\Refine{e,t}{X}. Unfortunately, an iteration of \Refinef may not converge. As an example, consider the (dumb) expression \tcase {x x}{\Any}{e_1}{e_2}. If x:\Any\to\Any, then every iteration of \Refinef yields for x a type strictly more precise than the type deduced in the previous iteration. The solution we adopt in practice is to bound the number of iterations to some number n_o. This is obtained by the following definition of \Refinef\svvspace{-1mm} \[ \begin{array}{rcl} \Refinef_{e,t} \eqdef (\RefineStep{e,t})^{n_o}\\[-2mm] \text{where }\RefineStep {e,t}(\Gamma)(e') &=& \left\{\begin{array}{ll} %\tyof {e'} \Gamma \tsand \bigwedge_{\{\varpi \alt \occ e \varpi \equiv e'\}} \env {\Gamma,e,t} (\varpi) & \text{if } \exists \varpi.\ \occ e \varpi \equiv e' \\ \Gamma(e') & \text{otherwise, if } e'\in\dom\Gamma\\ \text{undefined} & \text{otherwise} \end{array}\right. \end{array}\svvspace{-1.5mm}$ Note in particular that $\Refine{e,t}\Gamma$ extends $\Gamma$ with hypotheses on the expressions occurring in $e$, since $\dom{\Refine{e,t}\Gamma} = \dom{\RefineStep {e,t}(\Gamma)} = \dom{\Gamma} \cup \{e' \alt \exists \varpi.\ \occ e \varpi \equiv e'\}$. In other terms, we try to find a fixpoint of $\RefineStep{e,t}$ but we bound our search to $n_o$ iterations. Since $\RefineStep {e,t}$ is monotone (w.r.t.\ the subtyping pre-order extended to type environments pointwise), then every iteration yields a better solution. \iflongversion While this is unsatisfactory from a formal point of view, in practice the problem is a very mild one. Divergence may happen only when refining the type of a function in an application: not only such a refinement is meaningful only when the function is typed by a union type, but also we had to build the expression that causes the divergence in quite an \emph{ad hoc} way which makes divergence even more unlikely: setting an $n_o$ twice the depth of the syntax tree of the outermost type case should be enough to capture all realistic cases. \fi \subsubsection{Algorithmic typing rules}\label{sec:algorules} We now have all the definitions we need for our typing algorithm% \iflongversion% , which is defined by the following rules. \else% : \fi \begin{mathpar} \Infer[Efq\Aa] { } { \Gamma, (e:\Empty) \vdashA e': \Empty } { \begin{array}{c}\text{\tiny with priority over}\\[-1.8mm]\text{\tiny all the other rules}\end{array}} \qquad \Infer[Var\Aa] { } { \Gamma \vdashA x: \Gamma(x) } { x\in\dom\Gamma} \svvspace{-2mm}\\ \Infer[Env\Aa] { \Gamma\setminus\{e\} \vdashA e : t } { \Gamma \vdashA e: \Gamma(e) \wedge t } { \begin{array}{c}e\in\dom\Gamma \text{ and }\\[-1mm] e \text{ not a variable}\end{array}} \qquad \Infer[Const\Aa] { } {\Gamma\vdashA c:\basic{c}} {c\not\in\dom\Gamma} \svvspace{-2mm}\\ \ifsubmission\else \end{mathpar} \begin{mathpar} \fi% \Infer[Abs\Aa] {\Gamma,x:s_i\vdashA e:t_i'\\ t_i'\leq t_i} { \Gamma\vdashA\lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e:\textstyle\wedge_{i\in I} {\arrow {s_i} {t_i}} } {\lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e\not\in\dom\Gamma} \svvspace{-2mm}\\ \Infer[App\Aa] { \Gamma \vdashA e_1: t_1\\ \Gamma \vdashA e_2: t_2\\ t_1 \leq \arrow \Empty \Any\\ t_2 \leq \dom {t_1} } { \Gamma \vdashA {e_1}{e_2}: t_1 \circ t_2 } { {e_1}{e_2}\not\in\dom\Gamma} \svvspace{-2mm}\\ \Infer[Case\Aa] {\Gamma\vdashA e:t_0\\ %\makebox{$\begin{array}{l} % \left\{ % \begin{array}{ll} %\Gamma, % \Refine + {e,t} \Gamma \vdashA e_1 : t_1 & \text{ if } t_0 \not\leq \neg t\\ % t_1 = \Empty & \text{ otherwise} % \end{array}\right.\\ % \left\{ % \begin{array}{ll} %\Gamma, % \Refine - {e,t} \Gamma \vdashA e_2 : t_2 & \text{ if } t_0 \not\leq t\\ % t_2 = \Empty & \text{ otherwise} % \end{array}\right. %\end{array}$} \Refine {e,t} \Gamma \vdashA e_1 : t_1\\ \Refine {e,\neg t} \Gamma \vdashA e_2 : t_2} {\Gamma\vdashA \tcase {e} t {e_1}{e_2}: t_1\vee t_2} %{\ite {e} t {e_1}{e_2}\not\in\dom\Gamma} { \tcase {e} {t\!} {\!e_1\!}{\!e_2}\not\in\dom\Gamma} \svvspace{-2mm} \\ \Infer[Proj\Aa] {\Gamma \vdashA e:t\and \!\!t\leq\pair{\Any\!}{\!\Any}} {\Gamma \vdashA \pi_i e:\bpi_{\mathbf{i}}(t)} {\pi_i e{\not\in}\dom\Gamma}\hfill \Infer[Pair\Aa] {\Gamma \vdashA e_1:t_1 \and \!\!\Gamma \vdashA e_2:t_2} {\Gamma \vdashA (e_1,e_2):{t_1}\times{t_2}}%\pair{t_1}{t_2}} {(e_1,e_2){\not\in}\dom\Gamma} \end{mathpar} The side conditions of the rules ensure that the system is syntax directed, that is, that at most one rule applies when typing a term: priority is given to \Rule{Eqf\Aa} over all the other rules and to \Rule{Env\Aa} over all remaining logical rules. The subsumption rule is no longer in the system; it is replaced by: $(i)$ using a union type in \Rule{Case\Aa}, $(ii)$ checking in \Rule{Abs\Aa} that the body of the function is typed by a subtype of the type declared in the annotation, and $(iii)$ using type operators and checking subtyping in the elimination rules \Rule{App\Aa,Proj\Aa}. In particular, for \Rule{App\Aa} notice that it checks that the type of the function is a functional type, that the type of the argument is a subtype of the domain of the function, and then returns the result type of the application of the two types. The intersection rule is (partially) replaced by the rule \Rule{Env\Aa} which intersects the type deduced for an expression $e$ by occurrence typing and stored in $\Gamma$ with the type deduced for $e$ by the logical rules: this is simply obtained by removing any hypothesis about $e$ from $\Gamma$, so that the deduction of the type $t$ for $e$ cannot but end by a logical rule. Of course, this does not apply when the expression $e$ is a variable, since an hypothesis in $\Gamma$ is the only way to deduce the type of a variable, which is why the algorithm reintroduces the classic rule for variables. Finally, notice that there is no counterpart for the rule \Rule{Abs-} and that therefore it is not possible to deduce negated arrow types for functions. This means that the algorithmic system is not complete as we discuss in details in the next section. \subsubsection{Properties of the algorithmic system}\label{sec:algoprop} The algorithmic system above is sound with respect to the deductive one of Section~\ref{sec:static} \begin{theorem}[Soundness]\label{th:algosound} For every $\Gamma$, $e$, $t$, $n_o$, if $\Gamma\vdashA e: t$, then $\Gamma \vdash e:t$. \end{theorem} The proof of this theorem (see Appendix~\ref{sec:proofs_algorithmic_without_ts}) is obtained by defining an algorithmic system $\vdashAts$ that uses type schemes, that is, which associates each typable term $e$ with a possibly infinite set of types $\ts$ (in particular a $\lambda$-expression $\lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e$ will be associated to a set of types of the form $\{s\alt \exists s_0 = \bigwedge_{i=1..n} \arrow {t_i} {s_i} \land \bigwedge_{j=1..m} \neg (\arrow {t_j'} {s_j'}).\ \Empty \not\simeq s_0 \leq s \}$) and proving that, if $\Gamma\vdashA e: t$ then $\Gamma\vdashAts e: \ts$ with $t\in\ts$: the soundness of $\vdashA$ follows from the soundness of $\vdashAts$. Completeness needs a more detailed explaination. The algorithmic system $\vdashA$ is not complete w.r.t.\ the language presented in Section~\ref{sec:syntax} because it cannot deduce negated arrow types for functions. However, no practical programming language would implement the full language of Section~\ref{sec:syntax}, but rather restrict all expressions of the form $\tcase{e}{t}{e_1}{e_2}$ so that the type $t$ tested in them is either non functional (e.g., products, integer, a record type, etc.) or it is $\Empty\to\Any$ (i.e., the expression can just test whether $e$ returns a function or not). There are multiple reasons to impose such a restriction, the most important ones can be summarized as follows: \begin{enumerate}[left=0pt .. 12pt] \item For explicitly-typed languages it may yield conterintutive results, since for instance $\lambda^{\Int\to\Int}x.x\in\Bool\to\Bool$ should fail despite the fact that identity functions maps Booleans to Booleans. \item For implicitly-typed languages it yields a semantics that depends on the inference algorithm, since $(\lambda y.(\lambda x.y))3\in 3{\to} 3$ may either fail or not according to whether the type deduced for the result of the expression is either $\Int{\to}\Int$ or $3{\to}3$ (which are both valid but incomparable). \item For gradually-typed languages it would yield a problematic system as we explain in Section~\ref{sec:gradual}. \end{enumerate} Now, if we apply this restriction to the language of Section~\ref{sec:syntax}, then the algorithmic system of section~\ref{sec:algorules} is complete. Let say that an expression $e$ is \emph{positive} if it never tests a functional type more precise than $\Empty\to\Any$ (see Appendix~\ref{sec:proofs_algorithmic_without_ts} for the formal definition). Then we have: \begin{theorem}[Completeness for Positive Expressions] For every type environment $\Gamma$ and \emph{positive} expression $e$, if $\Gamma\vdash e: t$, then there exist $n_o$ and $t'$ such that $\Gamma\vdashA e: t'$. \end{theorem}\noindent We can use the algorithmic system $\vdashAts$ defined for the proof of Theorem~\ref{th:algosound} to give a far more precise characterization than the above of the terms for which our algorithm is complete: positivity is a practical but rough approximation. The system $\vdashAts$ copes with negated arrow types, but it still is not complete essentially for two reasons: $(i)$ the recursive nature of rule \Rule{Path} and $(ii)$ the use of nested \Rule{PAppL} that yields a precision that the algorithm loses by using type schemes in defining of \constrf{} (case~\eqref{due} is the critical one). Completeness is recovered by $(i)$ limiting the depth of the derivations and $(ii)$ forbidding nested negated arrows on the left-hand side of negated arrows.\svvspace{-.7mm} \begin{definition}[Rank-0 negation] A derivation of $\Gamma \vdash e:t$ is \emph{rank-0 negated} if \Rule{Abs--} never occurs in the derivation of a left premise of a \Rule{PAppL} rule.\svvspace{-.7mm} \end{definition} \noindent The use of this terminology is borrowed from the ranking of higher-order types, since, intuitively, it corresponds to typing a language in which in the types used in dynamic tests, a negated arrow never occurs on the left-hand side of another negated arrow. \begin{theorem}[Rank-0 Completeness] For every $\Gamma$, $e$, $t$, if $\Gamma \vdash e:t$ is derivable by a rank-0 negated derivation, then there exists $n_o$ such that $\Gamma\vdashAts e: t'$ and $t'\leq t$. \end{theorem} \noindent This last result is only of theoretical interest since, in practice, we expect to have only languages with positive expressions. This is why for our implementation we use the library of CDuce~\cite{BCF03} in which type schemes are absent and functions are typed only by intersections of positive arrows. We present the implementation in Section~\ref{sec:practical}, but before we study some extensions.