In this section we formalize the ideas we outlined in the introduction. We start by the definition of types followed by the language and its reduction semantics. The static semantics is the core of our work: we first present a declarative type system that deduces (possibly many) types for well-typed expressions and then the algorithms to decide whether an expression is well typed or not.

In this section we formalize the ideas we outlined in the introduction. We start by the definition of types followed by the language and its reduction semantics. The static semantics is the core of our work: we first present a declarative type system that deduces (possibly many) types for well-typed expressions and then the algorithms to decide whether according to it an expression is well typed or not.

\subsection{Types}

...

...

@@ -441,7 +441,7 @@ algorithm that given a type environment $\Gamma$ and an expression $e$

decides whether there exists a type $t$ such that $\Gamma\vdash e:t$

is provable. For that we need to solve essentially two problems:

$(i)$~how to handle the fact that it is possible to deduce several

types for the same well-typed expression $(ii)$how to compute the

types for the same well-typed expression and $(ii)$~how to compute the

auxiliary deduction system for paths.

Multiple types have two distinct origins each requiring a distinct

...

...

@@ -450,16 +450,16 @@ it is possible to deduce for every well-typed lambda abstractions

infinitely many types, that is the annotation of the function

intersected with as many negations of arrow types as it is possible

without making the type empty. To handle this multiplicity we use and

extend the technique of \emph{type schemes} defined

adapt the technique of \emph{type schemes} defined

by~\citet{Frisch2008}. Type schemes---whose definition we recall in Section~\ref{sec:type-schemes}---are canonical representations of

the infinite set of types of $\lambda$-abstractions. The second origin is due

the infinite sets of types of $\lambda$-abstractions. The second origin is due

to the presence of structural rules\footnote{\label{fo:rules}In logic, logical rules

refer to a particular connective (here, a type constructor, that is,

either $\to$, or $\times$, or $b$), while identity rules (e.g.,

axioms and cuts) and structural rules (e.g., weakening and

contraction) do not.} such as \Rule{Subs} and \Rule{Inter}. We

handle it in the classic way: we define an algorithmic system that

tracks the miminum type (actually, minimum \emph{type scheme}) of an

tracks the miminum type---actually, the minimum \emph{type scheme}---of an

expression; this system is obtained from the original system by eliminating

the two structural rules and by distributing suitable checks of the subtyping

relation in the remaining rules. To do that in the presence of

...

...

@@ -482,7 +482,7 @@ system given in Section~\ref{sec:algorules}.

We introduce the new syntactic category of \emph{types schemes} which are the terms$\ts$ inductively produced by the following grammar.

We introduce the new syntactic category of \emph{types schemes} which are the terms~$\ts$ inductively produced by the following grammar.

\[

\begin{array}{lrcl}

\textbf{Type schemes}&\ts& ::=& t \alt\tsfun{\arrow t t ; \cdots ; \arrow t t}\alt\ts\tstimes\ts\alt\ts\tsor\ts\alt\tsempty

...

...

@@ -507,7 +507,7 @@ Type schemes denote sets of types, as formally stated by the following definitio

\end{align*}

\end{definition}

Note that $\tsint\ts$ is closed under subsumption and intersection

(straightforward induction) and that $\tsempty$, which denotes the

and that $\tsempty$, which denotes the

empty set of types is different from $\Empty$ whose interpretation is

the set of all types.

...

...

@@ -531,8 +531,6 @@ We also need to perform intersections of type schemes so as to intersect the sta

Let $\ts$ be a type scheme and $t$ a type. We can compute a type scheme, written $t \tsand\ts$, such that

\(\tsint{t \tsand\ts}=\{s \alt\exists t' \in\tsint\ts.\ t \land t' \leq s \}\)

\end{lemma}

\beppe{Do we need the next definition or it can go in the appendix?\\

Finally, given a type schema it is straightforward to choose in its interpretation a type which serves as the canonical representative of the set:

\begin{definition}[Representative]

We define a function $\tsrep{\_}$ that maps every non-empty type scheme into a type, \textit{representative} of the set of types denoted by the scheme.

...

...

@@ -549,12 +547,10 @@ Finally, given a type schema it is straightforward to choose in its interpretati

\end{definition}

Note that $\tsrep\ts\in\tsint\ts$.

}

{}

\beppe{The explaination that follows is redundant with Section~\ref{sec:ideas}. Harmonize!}

\subsubsection{Operators for type constructors}\label{sec:typeops}

{}\beppe{The explaination that follows is redundant with Section~\ref{sec:ideas}. Harmonize!}

In order to define the algorithmic typing of expressions like

applications and projections we need to define the operators on

...

...

@@ -592,9 +588,9 @@ In the example with \code{\(t=(\Int \to \Int)\) \(\wedge\) \((\Bool \to \Bool)\)

\begin{eqnarray}

\dom t & = &\max\{ u \alt t\leq u\to\Any\}

\\

\apply t s & = &\min\{ u \alt t\leq s\to u\}

\apply t s & = &\,\min\{ u \alt t\leq s\to u\}

\\

\worra t s & = &\min\{u \alt t\circ(\dom t\setminus u)\leq\neg s\}\label{worra}

\worra t s & = &\,\min\{u \alt t\circ(\dom t\setminus u)\leq\neg s\}\label{worra}

\end{eqnarray}

The first two operators belongs to the theory of semantic subtyping while the last one is new and we described it in Section~\ref{sec:ideas}

...

...

@@ -604,7 +600,18 @@ We need similar operators for projections since the type $t$ of $e$ in $\pi_i e$

\bpl t & = &\min\{ u \alt t\leq\pair u\Any\}\\

\bpr t & = &\min\{ u \alt t\leq\pair\Any u\}

\end{eqnarray}

All the operators above but $\worra{}{}$ are already present in the theory of semantic subtyping. The reader can find how to compute them and how to extend their definition to type schemes in~\cite[Section 6.11]{Frisch2008}. Below we just show the formula that computes $\worra t s$ for a $t$ subtype of $\Empty\to\Any$. For that, we use a result of semantic subtyping that states that every type $t$ is equivalent to a type in disjunctive normal form and that if furthermore $t\leq\Empty\to\Any$, then $t \simeq\bigvee_{i\in I}\left(\bigwedge_{p\in P_i}(s_p\to t_p)\bigwedge_{n\in N_i}\neg(s_n'\to t_n')\right)$ with $\bigwedge_{p\in P_i}(s_p\to t_p)\bigwedge_{n\in N_i}\neg(s_n'\to t_n')\not\simeq\Empty$ for all $i$ in $I$. For such a $t$ and any type $s$ then we have:

All the operators above but $\worra{}{}$ are already present in the

theory of semantic subtyping: the reader can find how to compute them

and how to extend their definition to type schemes in~\cite[Section

6.11]{Frisch2008}. Below we just show the formula that computes

$\worra t s$ for a $t$ subtype of $\Empty\to\Any$. For that, we use a

result of semantic subtyping that states that every type $t$ is

equivalent to a type in disjunctive normal form and that if

furthermore $t\leq\Empty\to\Any$, then $t \simeq\bigvee_{i\in

N_i}\neg(s_n'\to t_n')\right)$ with $\bigwedge_{p\in P_i}(s_p\to

t_p)\bigwedge_{n\in N_i}\neg(s_n'\to t_n')\not\simeq\Empty$ for all

$i$ in $I$. For such a $t$ and any type $s$ then we have:

%

\begin{equation}

\worra t s = \dom t \wedge\bigvee_{i\in I}\left(\bigwedge_{\{P\subset P_i\alt s\leq\bigvee_{p \in P}\neg t_p\}}\left(\bigvee_{p \in P}\neg s_p\right) \right)

...

...

@@ -618,7 +625,7 @@ The proof that this type satisfies \eqref{worra} is given in the Appendix~\ref{}

The last step for our presentation is to define the algorithm for the

deduction of $\Gamma\evdash e t \Gamma'$, that is an algorithm that

takes as input $\Gamma$, $e$, and $t$, and returns an enviromnt that

takes as input $\Gamma$, $e$, and $t$, and returns an environment that

extends $\Gamma$ with hypotheses on the occurrences of $e$ that are

the most general that can be deduced by assuming that $e\in t$ succeeds.

...

...

@@ -677,18 +684,18 @@ to the application of \Rule{PEps}; case \eqref{due} implements \Rule{Pappl}

straightforwardly; the implementation of rule \Rule{PAppR} is subtler:

instead of finding the best $t_1$ to subtract (by intersection) from the

static type of the argument, \eqref{tre} finds directly the best type for the argument by

applying the operator $\worra{}{}$ to the static types of the function

applying the $\worra{}{}$operator to the static types of the function

and the refined type of the application. The remaining (\ref{quattro}--\ref{sette})

cases are the straightforward implementations of the rules

\Rule{PPairL}, \Rule{PPairR}, \Rule{PFst}, and \Rule{PSnd},

respectively.

The other recursive function $\env{}{}$ implements the two structural

The other recursive function,$\env{}{}$, implements the two structural

rules \Rule{PInter} and \Rule{PTypeof} by intersecting the type

obtained for $\varpi$ by the logical rules, with the static type

deduce by the type system of the expression occurring at $\varpi$. The

remaining structural rule, \Rule{Psubs}, is accounted for by the use

of the different operators $\worra{}{}$ and $\boldsymbol{\pi}_i$ in

of the operators $\worra{}{}$ and $\boldsymbol{\pi}_i$ in

the definition of $\constr{}{}$.

...

...

@@ -699,10 +706,44 @@ This can be seen by analyzing the rule \Rule{Case\Aa}: the definition of $\Refin

$\tyof{\occ e{\varpi}}\Gamma$, and this is defined for all $\varpi$ since the first premisses of \Rule{Case\Aa} states that

$\Gamma\vdash e:\ts_0$ (and this is possible only if we were able to deduce under the hypothesis $\Gamma$ the type of every occurrence of $e$.)}

It remains to explain how to compute the best environment $\Gamma'$ produced from $\Gamma$ by the deduction system for $\Gamma\evdash e t \Gamma'$. Alas, this is the most delicate part of our algorithm.

\beppe{TODO

It remains to explain how to compute the environment $\Gamma'$ produced from $\Gamma$ by the deduction system for $\Gamma\evdash e t \Gamma'$. Alas, this is the most delicate part of our algorithm.

In a nutshell what we want to do is to define a function

$\Refine{\_,\_}{\_}$ that takes a type environment $\Gamma$, an

expression $e$ and a type $t$ and returns the best type environment

$\Gamma'$ such that $\Gamma\evdash e t \Gamma'$ holds. By the best

environment we mean the one in which the occurrences of $e$ are

associated to the largest possible types (type environments are

hypotheses so they are contravariant: the larger the type the better

the hypothesis). Recall that in Section~\ref{sec:challenges} we said

that we want our analysis to be able to capture all the information

available from nested checks. If we gave up such a kind of precision

then the definition of $\Refine{}{}$ would be pretty easy: it must map

each subexpression of $e$ to the intersection of the types deduced by

$\vdashp$ (\emph{ie}, by $\env{}{}$) for each of its occurrences. That

is, for each expression $e'$ occurring in $e$, $\Refine{e,t}\Gamma$

is a type environment that maps $e'$ into $\bigwedge_{\{\varpi\alt

\occ e \varpi\equiv e'\}}\env{\Gamma,e,t}(\varpi)$. As we

explained in Section~\ref{sec:challenges} the intersection is needed

to apply occurrence typing to expression such as

$\tcase{(x,x)}{\pair{t_1}{t_2}}{e_1}{e_2}$ where some

expressions---here $x$---occur multiple times.

In order to capture most of the type information from nested queries

the rule \Rule{Path} allows the deduction of the type of some

occurrence $\varpi$ to use a type environment $\Gamma'$ that may

contain information about some suboccurrences of $\varpi$. On the

algorithm this would correspond to apply the $\Refine{}{}$ defined

above to a result of $\Refine{}{}$, and so on. Therefore, ideally our

algorithm should compute the type environment as a fixpoint of the

function $X\mapsto\Refine{e,t}{X}$. Unfortunately, an iteration of $\Refine{}{}$ may

not converge. As an example consider the (dumb) expression $\tcase

{x_1x_2}{\Any}{e_1}$: if $x_1:\Any\to\Any$, then every iteration of

$\Refine{}{}$ yields for $x_1$ a type strictly more precise than the

previous iteration.

The solution we adopt here is to bound the number of iterations to some number $n_o$.

From a formal point of view, this means to give up the completeness of the algorithm: $\Refine{}{}$ will be complete (\emph{ie}, it will find all solutions) for the deductions of $\Gamma\evdash e t \Gamma'$ of depth at most $n_o$. This is obtained by the following definition:

&\dom{\Gamma'}=\dom{\Gamma}\cup\{e' \alt\exists\varpi.\ \occ e \varpi\equiv e'\}\\

...

...

@@ -713,20 +754,24 @@ It remains to explain how to compute the best environment $\Gamma'$ produced fro

\env{\Gamma,e,t} (\varpi) &\text{if }\exists\varpi.\ \occ e \varpi\equiv e' \\

\Gamma(e') &\text{otherwise}

\end{array}\right.\\&\\

&\Refine{e,t}\Gamma={\RefineStep{e,t}}^n(\Gamma)\qquad\text{with $n$ a global parameter}

&\Refine{e,t}\Gamma={\RefineStep{e,t}}^{n_o}(\Gamma)\qquad\text{with $n$ a global parameter}

\end{align*}

\beppe{still to explain and better write the definition above}

While this is unsatisfactory from a formal point of view, in practice

the problem is a very mild one. Divergence may happen only when

refining the type of a function in an application: not only such a

refinement is meaningful only when the function is typed by a union

type (which is quite rare in practice)\footnote{The only impact of adding a negated arrow type to the type of a function is when we test whether the function has a given arrow type: in practice this never happens since programming languages test whether a value \emph{is} a function, rather than the type of a given function.}, but also we had to build the

expression that causes the divergence in quite an ad hoc way which

makes divergence even more unlikely: setting an $n_o$ twice the depth

of the syntax tree of the outermost type case should capture all

realistic cases \beppe{can we give an estimate based on benchmarking

the protottype?}

The reason for the definition of $\RefineStep p {e,t}$ is that the same subterm of $e$

may occur several times in $e$ and therefore we can collect for every

occurrence a different type constraint. Since all the constraints must

hold, then we take their intersection. For instance, if $f$ is a function

of type $(s_1\times s_2\to t)\wedge( s_3\to\neg t)$ and we test

whether $f(x,x)$ is of type $t$ or not, then the test succeed only if $(x,x)$ is of

type $s_1\times s_2$, that is, that $x$ has both type $s_1$ and type

$s_2$ and thus their intersection $s_1{\wedge}s_2$.