We have implemented the algorithmic system $\vdashA$ we presented in Section~\ref{sec:algorules}. Our
implementation is written in OCaml and uses CDuce as a library to
provide the semantic subtyping machinery. Besides the type-checking
algorithm defined on the base language, our implementation supports
the record types and expressions of Section \ref{ssec:struct} and the refinement of
function types
\iflongversion%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
described in Section \ref{sec:refining}. Furthermore, our implementation uses for the inference of arrow types
the following improved rule:
\input{optimize}
\else
of Section \ref{sec:refining} with the rule of
Appendix~\ref{app:optimize}.
\fi%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The implementation is rather crude and consists of 2000 lines of OCaml code,
including parsing, type-checking of programs, and pretty printing of types. We
demonstrate the output of our type-checking implementation in
Table~\ref{tab:implem} and Table~\ref{tab:implem2}. Table~\ref{tab:implem} lists
some examples, none of which can be typed by current systems. Even though some
systems such as Flow and TypeScript can type some of these examples by adding
explicit type annotations, the code 6, 7, 9, and 10 in Table~\ref{tab:implem}
and, even more, the \code{and\_} and \code{xor\_} functions given
in \eqref{and+} and \eqref{xor+} later in this
section are out of reach of current systems, even when using the right explicit
annotations. Table~\ref{tab:implem2} allows for a direct comparison of with
\cite{THF10} be giving the type inferred for the fourteen examples given in that
work.
It should be noted that for all the examples we present, the
time for the type inference process is less than 5ms, hence we do not report
precise timings in the table. These and other examples can be tested in the
\ifsubmission
anonymized
\else
\fi
online toplevel available at
\url{https://occtyping.github.io/}%
\input{code_table}
In both tables, the second column gives a code fragment and the third
column the type deduced by our implementation as is (we
pretty printed it but we did not alter the output). Code~1 is a
straightforward function similar to our introductory
example \code{foo} in (\ref{foo}) and (\ref{foo2}) where \code{incr}
is the successor function and \code{lneg} the logical negation for
Booleans. Here the
programmer annotates the parameter of the function with a coarse type
$\Int\vee\Bool$. Our implementation first type-checks the body of the
function under this assumption, but doing so it collects that the type of
$\texttt{x}$ is specialized to \Int{} in the ``then'' case and to \Bool{}
in the ``else'' case. The function is thus type-checked twice more
under each hypothesis for \texttt{x}, yielding the precise type
$(\Int\to\Int)\land(\Bool\to\Bool)$. Note that w.r.t.\
rule \Rule{AbsInf+} of Section~\ref{sec:refining}, the
rule \Rule{AbsInf++} we use int the implementation improved the output of the computed
type. Indeed, using rule~[{\sc AbsInf}+] we would have obtained the
type
$(\Int\to\Int)\land(\Bool\to\Bool)\land(\Bool\vee\Int\to\Bool\vee\Int)$
with a redundant arrow. Here we can see that, since we deduced
the first two arrows $(\Int\to\Int)\land(\Bool\to\Bool)$, and since
the union of their domain exactly covers the domain of the third arrow, then
the latter is not needed. Code~2 shows what happens when the argument
of the function is left unannotated (i.e., it is annotated by the top
type \Any, written ``\texttt{Any}'' in our implementation). Here
type-checking and refinement also work as expected, but the function
only type checks if all cases for \texttt{x} are covered (which means
that the function must handle the case of inputs that are neither in \Int{}
nor in \Bool).
The following examples paint a more interesting picture. First
(Code~3) it is
easy in our formalism to program type predicates such as those
hard-coded in the $\lambda_{\textit{TR}}$ language of \citet{THF10}. Such type
predicates, which return \texttt{true} if and only if their input has
a particular type, are just plain functions with an intersection
type inferred by the system of Section~\ref{sec:refining}. We next define Boolean connectives as overloaded
functions. The \texttt{not\_} connective (Code~4) just tests whether its
argument is the Boolean \texttt{true} by testing that it belongs to
the singleton type \True{} (the type whose only value is
\texttt{true}) returning \texttt{false} for it and \texttt{true} for
any other value (recall that $\neg\True$ is equivalent to
$\texttt{Any\textbackslash}\True$). It works on values of any type,
but we could restrict it to Boolean values by simply annotating the
parameter by \Bool{} (which, in the CDuce types that our system uses, is syntactic sugar for
\True$\vee$\False) yielding the type
$(\True{\to}\False)\wedge(\False{\to}\True)$.
The \texttt{or\_} connective (Code~5) is straightforward as far as the
code goes, but we see that the overloaded type precisely captures all
possible cases: the function returns \code{false} if and only if both
arguments are of type $\neg\True$, that is, they are any value
different from \code{true}. Again we use a generalized version of the
\texttt{or\_} connective that accepts and treats any value that is not
\texttt{true} as \texttt{false} and again, we could easily restrict the
domain to \Bool{} if desired.\\
\indent
To showcase the power of our type system, and in particular of
the ``$\worra{}{}$''
type operator, we define \texttt{and\_} (Code~6) using De Morgan's
Laws instead of
using a direct definition. Here the application of the outermost \texttt{not\_} operator is checked against type \True. This
allows the system to deduce that the whole \texttt{or\_} application
has type \False, which in turn leads to \texttt{not\_\;x} and
\texttt{not\_\;y} to have type $\lnot \True$ and therefore both \texttt{x}
and \texttt{y} to have type \True. The whole function is typed with
the most precise type (we present the type as printed by our
implementation, but the first arrow of the resulting type is
equivalent to
$(\True\to\lnot\True\to\False)\land(\True\to\True\to\True)$).
All these type predicates and Boolean connectives can be used together
to write complex type tests, as in Code~7. Here we define a function
\texttt{f} that takes two arguments \texttt{x} and \texttt{y}. If
\texttt{x} is an integer and \texttt{y} a Boolean, then it returns the
integer \texttt{1}; if \texttt{x} is a character or
\texttt{y} is an integer, then it returns \texttt{2}; otherwise the
function returns \texttt{3}. Our system correctly deduces a (complex)
intersection type that covers all cases (plus several redundant arrow
types). That this type is as precise as possible can be shown by the fact that
when applying
\texttt{f} to arguments of the expected type, the \emph{type}
statically deduced for the
whole expression is the singleton type \texttt{1}, or \texttt{2},
or \texttt{3}, depending on the type of the arguments.
Code~8 allows us to demonstrate the use and typing of record paths. We
model, using open records, the type of DOM objects that represent XML
or HTML documents. Such objects possess a common field
\texttt{nodeType} containing an integer constant denoting the kind of
the node (e.g., \p{1} for an element node, \p{3} for a text node, \ldots). Depending on the kind, the object will have
different fields and methods. It is common practice to perform a test
on the value of the \texttt{nodeType} field. In dynamic languages such
as JavaScript, the relevant field can directly be accessed
after having checked for the appropriate \texttt{nodeType}, whereas
in statically typed languages such as Java, a downward cast
from the generic \texttt{Node} type to the expected precise type of
the object is needed. We can see that using the extension presented in
Section~\ref{ssec:struct} we can deduce the correct type for
\texttt{x} in all cases. Of particular interest is the last case,
since we use a type case to check the emptiness of the list of child
nodes. This splits, at the type level, the case for the \Keyw{Element}
type depending on whether the content of the \texttt{childNodes} field
is the empty list or not.
Code~9 shows the usefulness of the rule \Rule{OverApp}.
Consider the definition of the \texttt{xor\_} operator.
Here the rule~[{\sc AbsInf}+] is not sufficient to precisely type the
function, and using only this rule would yield a type
$\Any\to\Any\to\Bool$.
\iflongversion
Let us follow the behavior of the
``$\worra{}{}$'' operator. Here the whole \texttt{and\_} is requested
to have type \True, which implies that \texttt{or\_ x y} must have
type \True. This can always happen, whether \texttt{x} is \True{} or
not (but then depends on the type of \texttt{y}). The ``$\worra{}{}$''
operator correctly computes that the type for \texttt{x} in the
``\texttt{then}'' branch is $\True\vee\lnot\True\lor\True\simeq\Any$,
and a similar reasoning holds for \texttt{y}.
\fi%%%%%%%%%%%%%%
However, since \texttt{or\_} has type
%\\[.7mm]\centerline{%
$(\True\to\Any\to\True)\land(\Any\to\True\to\True)\land
(\lnot\True\to\lnot\True\to\False)$
%}\\[.7mm]
then the rule \Rule{OverApp} applies and \True, \Any, and $\lnot\True$ become candidate types for
\texttt{x}, which allows us to deduce the precise type given in the table. Finally, thanks to rule \Rule{OverApp} it is not necessary to use a type case to force refinement. As a consequence, we can define the functions \texttt{and\_} and \texttt{xor\_} more naturally as:
\begin{alltt}\color{darkblue}
let and_ = fun (x : Any) -> fun (y : Any) -> not_ (or_ (not_ x) (not_ y)) \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{and+}
let xor_ = fun (x : Any) -> fun (y : Any) -> and_ (or_ x y) (not_ (and_ x y)) \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{xor+}
\end{alltt}
for which the very same types as in Table~\ref{tab:implem} are deduced.
As for Code~10 (corresponding to our introductory
example~\eqref{nest1}), it illustrates the need for iterative refinement of
type environments, as defined in Section~\ref{sec:typenv}. As
explained, a single pass analysis would deduce
for {\tt x}
a type \Int{} from the {\tt f\;x} application and \Any{} from the {\tt g\;x}
application. Here by iterating a second time, the algorithm deduces
that {\tt x} has type $\Empty$ (i.e., $\textsf{Empty}$), that is, that the first branch can never
be selected (and our implementation warns the user accordingly). In hindsight, the only way for a well-typed overloaded function to have
type $(\Int{\to}\Int)\land(\Any{\to}\Bool)$ is to diverge when the
argument is of type \Int: since this intersection type states that
whenever the input is \Int, {\em both\/} branches can be selected,
yielding a result that is at the same time an integer and a Boolean.
This is precisely reflected by the case $\Int\to\Empty$ in the result.
Indeed our {\tt example10} function can be applied to an integer, but
at runtime the application of {\tt f\,x} will diverge.
Code~11 simulates the behaviour of Javascript property resolution, by looking
for a property \texttt{l} either in the object \texttt{o} itself or in the
chained list of its \texttt{prototype} objects. In that example, we first model
prototype chaining by defining a type \texttt{Object} that can be either the
atom \texttt{Null} or any record with a \texttt{prototype} field which contains
(recursively) an \texttt{Object}. To ease the reading, we defined a recursive
type \texttt{ObjectWithPropertyL} which is either a record with a field
\texttt{l} or a record with a prototype of type \texttt{ObjectWithPropertyL}. We
can then define two predicate function \texttt{has\_property\_l} and
\texttt{has\_own\_property\_l} that tests whether an object has a property
through its prototype or directly. Lastly we can define a function
\texttt{get\_property\_l} which directly access the field if it is present, or
recursively search for it through the prototype chain (in our syntax, the
paremeter \texttt{self} allows one to refer to the function itself). Of
particular interest is the type deduced for the two predicate functions. Indeed,
we can see that \texttt{has\_own\_property\_l} is given an overloaded type whose
first argument is in each case a recursive record type that describe precisely
whether \texttt{l} is present at some point in the list or not (recall that
in a record type such as $\orecord{ l=?\Empty }$, indicate that field \texttt{l}
is absent for sure).
\input{code_table2}
In Table~\ref{tab:implem2}, we convert in our syntax the 14 examples of
\cite{THF10} (we tried to complete such examples with neutral code when they
were incomplete in the original paper). Of these 14 examples, Example~1 to 13
depict combinations of type predicates (such as \texttt{is\_int}) used either
directly or through Boolean predicates (such as the \texttt{or\_} function
previously defined). Note that for all examples for which there was no explicit
indication in the original version, we \emph{infer} the type of the function.
Notice also that for Example~6, the goal of the example is to show that indeed,
the function is ill-typed (which our typechecker detects accurately). The
original Example~14 could be written in our syntax as:
\begin{verbatim}
let example14_alt = fun (input : Int | String) ->
fun (extra : (Any, Any)) ->
if (input, is_int (fst extra)) is (Int, True) then
add input (fst extra)
else if is_int (fst extra) is True
then add (strlen input) (fst extra)
else 0
\end{verbatim}
Notice, in the version above the absence of an occurrence of \texttt{input} in
the second \texttt{if} statement. Indeed, if the first test fails, it is either
because \texttt{is\_int (fst extra)} is not \texttt{True} (i.e., if
\texttt{fst extra} is not an integer) or because \texttt{input} is not an
integer. Therefore, in our setting, the type information propagated to the
second test is : $\texttt{(input, is\_int (fst extra))} \in \lnot (\Int,
\True)$, that is $\texttt{(input, is\_int (fst extra))} \in (\lnot\Int,
\True)\lor(\Int, \False)$. Therefore, the type deduced for \texttt{input} in the second
branch is $(\String\lor\Int) \land (\lnot\Int \lor \Int) =
\String\lor\Int$ which is not precise enough. By adding an occurrence of
\texttt{input} in our \texttt{example14\_alt}, we can further restrict its type
typecheck the function. Lifting this limitation through a control-flow analysis
is part of our future work.
Although these experiments are still preliminary, they show how the
combination of occurrence typing and set-theoretic types, together
with the type inference for overloaded function types presented in
Section~\ref{sec:refining} goes beyond what languages like
TypeScript and Flow do, since they can only infer single arrow types.
Our refining of overloaded
functions is also future-proof and resilient to extensions: since it ``retypes'' functions
using information gathered by the typing of occurrences in the body,
its precision will improve with any improvement of
our occurrence typing framework.