We have implemented a simplified version of the algorithm presented in
Section~\ref{ssec:algorithm} that does
not make use of type schemes and is, therefore, incomplete w.r.t. the
system of Section~\ref{sec:static}. In particular, as we
explained in Section~\ref{ssec:algorithm}, in the absence of type
schemes it is not always possible to prove that $\forall v, \forall t,
v \in t \text{~or~} v \not\in \lnot t$. Since this property cease
to hold only for $\lambda$-expressions, then not using type schemes
yields less precise typing only for tests $\ifty{e}t{e_1}{e_2}$ where $e$
has a functional type, that is the value tested will be a $\lambda$
abstraction This seems like a reasonable compromise between the
complexity of an implementation involving type scheme and the programs
we want to type-check in practice. Indeed if we restrict the language
so that the only functional type $t$ allowed in $\ifty{e}t{e_1}{e_2}$
is $\Empty\to\Any$---i.e., we allow to check whether a value is a
function but not whether it has a specific function type (\emph{cf.}, Footnote~\ref{foo:typecase})---, then our
implementation becomes complete.
Our implementation is written in OCaml and uses CDuce as a library to
provide the semantic sub-typing machinery. Besides a type-checking
algorithm defined on the base language, our implementation supports
record types (Section \ref{ssec:struct}) and the refinement of function types
(Section \ref{sec:refining}). The implementation is rather crude and
consits of 2000 lines of OCaml code, including parsing, type-checking
of programs and pretty printing of types. We demonstrate the output of
our type-checking implementation in Table~\ref{tab:implem}.
\input{code_table}
In this table, the second column gives a code fragment and third
column the type deduced by our implementation. Code~1 is a
straightforward function similar to our introductory example \code{foo} in (\ref{foo},\ref{foo2}). Here the
programmer annotates the parameter of the function with a coarse type
$\Int\vee\Bool$. Our implementation first type-checks the body of the
function under this assumption, but doing so collects that the type of
$\texttt{x}$ is specialized to \Int{} in the ``then'' case and to \Bool{}
in the ``else'' case. The function is thus type-checked twice more
under each hypothesis for \texttt{x}, yielding the precise type
$(\Int\to\Int)\land(\Bool\to\Bool)$. Note that w.r.t.\ rule \Rule{AbsInf+} of Section~\ref{sec:refining}, we improved the output of the computed
type. Indeed using rule~[{\sc AbsInf}+] we would obtain the
type
$(\Int\to\Int)\land(\Bool\to\Bool)\land(\Bool\vee\Int\to\Bool\vee\Int)$
with a redundant arrow. Here we can see that since we deduced
the first two arrows $(\Int\to\Int)\land(\Bool\to\Bool)$, and since
the union of their domain exactly covers the domain the third arrow,
the latter is not needed. Code~2 shows what happens when the argument
of the function is left unannotated (i.e., it is annotated by the top
type \Any, written \texttt{Any} in our implementation). Here
type-checking and refinement also work as expected, but the function
only type checks if all cases for \texttt{x} are covered (which means
that the function must handle the case of inputs that are neither in \Int{}
nor in \Bool).
The following examples paint a more interesting picture. First
(Code~3) it is
easy in our formalism to provide type predicates such as those
hard-coded in the $\lambda_{\textit{TR}}$ language of \citet{THF10}. Such type
predicates, which return \texttt{true} if and only if their input has
a particular type, are just plain functions with an intersection
type. We then define Boolean connectives as overloaded
functions. The \texttt{not\_} connective (Code~4) just tests whether its
argument is the Boolean \texttt{true} by testing that it belongs to
the singleton type \True{} (the type whose only value is \texttt{true}) returning \texttt{true} for any other value (recall that $\neg\True$ is equivalent to $\texttt{Any\textbackslash}\True$). It works on values of any type, but we could restrict it to Boolean values simply annotating the parameter by \Bool{} (which in CDuce is syntactic sugar for \True$\vee$\False) yielding the type $(\True\to\False)\wedge(\False\to\True)$.
The \texttt{or\_} connective (Code~5) is defined as a
curried function, that first type cases on its argument. Depending
on this first type it may return either a constant function that returns
\texttt{true} for every input, or, a function that
discriminates on its argument (which is the second argument of the
\texttt{or\_}) and returns \texttt{true} or \texttt{false}
accordingly. Again we use a generalized version of the
\texttt{or\_} connective that accepts and treats any value that is not
\texttt{true} as \texttt{false} and again, we could restrict the initial
domain to \Bool{} if we want.
To showcase the power of our type system, and in particular of
the ``$\worra{}{}$''
type operator, we define \texttt{and\_} (Code~6) using De Morgan's
Laws instead of
using a direct definition. Here the application of the outermost \texttt{not\_} operator is checked against type \True. This
allows the system to deduce that the whole \texttt{or\_} application
has type \False, which in turn leads to \texttt{not\_ x} and
\texttt{not\_ y} to have type $\lnot \True$ and therefore \texttt{x}
and \texttt{y} to have type \True. The whole function is typed with
the most precise type (we present the type as printed by our
implementation, but the first arrow of the resulting type is
equivalent to
$(\True\to\lnot\True\to\False)\land(\True\to\True\to\True)$).
All these type predicates and Boolean connectives can be used together
to write complex type tests, as in Code~7. Here we define a function
\texttt{f} that takes two arguments \texttt{x} and \texttt{y}. If
\texttt{x} is an integer and \texttt{y} a Boolean, then it returns the
integer \texttt{1}; if \texttt{x} is a character or
\texttt{y} is an integer, then it returns \texttt{2}; otherwise the
function returns \texttt{3}. Our system correctly deduces a (complex)
intersection type that covers all cases (plus several redundant arrow
types). That this type is as precise as possible can be shown by the fact that
when applying
\texttt{f} to arguments of the expected type, the \emph{type} deduced for the
whole expression is the singleton type \texttt{1}, or \texttt{2},
or \texttt{3}, depending on the type of the arguments.
Code~8 allows us to demonstrate the use and typing of record paths. We
model, using open records, the type of DOM objects that represent XML
or HTML documents. Such objects possess a common field
\texttt{nodeType} containing an integer constant denoting the kind of
the node (e.g., \p{9} for the root element, \p{1} for an element node, \p{3} for a text node, \ldots). Depending on the kind, the object will have
different fields and methods. It is common practice to perform a test
on the value of the \texttt{nodeType} field. In dynamic languages such
as JavaScript, the relevant field or method can directly be accessed
after having checked for the appropriate \texttt{nodeType}. In
mainstream statically typed languages, such as Java, a downward cast
from the generic \texttt{Node} type to the expected precise type of
the object is needed. We can see that using the extension presented in
Section~\ref{ssec:struct} we can deduce the correct type for
\texttt{x} in all cases. Of particular interest is the last case,
since we use a type case to check the emptiness of the list of child
nodes. This splits, at the type level, the case for the \Keyw{Element}
type depending on whether the content of the \texttt{childNodes} field
is the empty list or not.
Our implementation features one last improvement that allows us
further improve the precision of the inferred type.
Consider the definition of the \texttt{xor\_} operator (Code~9).
Here the rule~[{\sc AbsInf}+] is not sufficient to precisely type the
function, and using only this rule would yield a type
$\Any\to\Any\to\Bool$. Let us follow the behaviour of the
``$\worra{}{}$'' operator. Here the whole \texttt{and\_} is requested
to have type \True, which implies that \texttt{or\_ x y} must have
type \True. This can always happen, whether \texttt{x} is \True{} or
not (but then depends on the type of \texttt{y}). The ``$\worra{}{}$''
operator correctly computes that the type for \texttt{x} in the
``\texttt{then}'' branch is $\True\vee\lnot\True\lor\True\simeq\Any$,
and a similar reasoning holds for \texttt{y}. To solve this problem,
we can first remark that even though type cases in the body of a
function are tipping points that may change the type of the result
of the function, they are not the only ones: applications of overloaded functions play exactly the same role. We
therefore extend deduction system for $\Gamma \vdash e\triangleright\psi$ defined in Section~\ref{sec:refining} with
the following rule\\[1mm]
\centerline{\(
\Infer[OverApp]
{
\Gamma \vdash e : \textstyle\bigvee \bigwedge_{i \in I}t_i\to{}s_i\\
\Gamma \vdash x : t\\
\Gamma \vdash e\triangleright\psi_1\\
\Gamma \vdash x\triangleright\psi_2\\
}
{ \Gamma \vdash\textstyle
{e}{~x}\triangleright\psi_1\cup\psi_2\cup\bigcup_{i\in I}\{
x\mapsto t\wedge t_i\} }
{}
\)}\\
Whenever a function parameter is the argument of an
overloaded function, we record as possible types for this parameter
all the possible domains of the arrows that type the overloaded
function, restricted by the static type of the parameter. In Code~9,
since, \texttt{or\_} has type\\
\centerline{$(\True\to\Any\to\True)\land(\Any\to\True\to\True)\land
(\lnot\True\to\lnot\True\to\False)$}
We consider \True, \Any and $\lnot\True$ as candidate types for
\texttt{x} which, in turn allows us to deduce a precise type given in the table. Finally, thanks to this rule it is no longer necessary to force refinement by using a type case. As a consequence we can define the functions \texttt{and\_} and \texttt{or\_} more naturally as:
\begin{alltt}\color{darkblue}
let and_ = fun (x : Any) -> fun (y : Any) -> not_ (or_ (not_ x) (not_ y))
let xor_ = fun (x : Any) -> fun (y : Any) -> and_ (or_ x y) (not_ (and_ x y))
\end{alltt}
for which the very same types as in Table~\ref{tab:implem} are deduced.