As we recalled in the introduction, the main application of occurrence
typing is to type dynamic languages. In this section we explore how to
extend our work to encompass three features that are necessary to type
these languages.
First, we consider record types and record expressions which, in dynamic
languages, are used to implement objects. In particular, we extend our
system to cope with typical usage patterns of objects employed in these
languages such as adding, modifying, or deleting a field, or dynamically
testing its presence to specify different behaviors.
Second, in order
to precisely type applications in dynamic languages it is
crucial to refine the type of some functions to account for their
different behaviors with specific input types. But current approaches
are bad at it: they require the programmer to explicitly specify a
precise intersection type for these functions and, even with such specifications, some common cases
fail to type (in that case the only solution is to hard-code the
function and its typing discipline into the language). We show how we
can use the work developed in the previous sections to infer precise
intersection types for functions. In our system, these functions do not
require any type annotation or just an annotation for
the function parameters, whereas some of them fail to type in
current alternative approaches even when they are given the full intersection type
specification.
Finally, to type dynamic languages it is often necessary to make
statically-typed parts of a program coexist with dynamically-typed
ones. This is the aim of gradually typed systems that we explore in
the third extension of this section.
\subsection{Record types}
\label{ssec:struct}
\iflongversion%%%%%%%%%%%%%%
The previous analysis already covers a large gamut of realistic
cases. For instance, the analysis already handles list data
structures, since products and recursive types can encode them as
right-associative nested pairs, as it is done in the language
CDuce (e.g., $X = \textsf{Nil}\vee(\Int\times X)$ is the
type of the lists of integers): see Code 8 in Table~\ref{tab:implem} of Section~\ref{sec:practical} for a concrete example. Even more, thanks to the presence of
union types it is possible to type heterogeneous lists whose
content is described by regular expressions on types as proposed
by~\citet{hosoya00regular}. However, this is not enough to cover
records and, in particular, the specific usage patterns in dynamic
languages of records, whose field are dynamically tested, deleted,
added, and modified. This is why we extend here our work to records,
building on the record types as they are defined in CDuce.
The extension we present in this section is not trivial. Although we use the record \emph{types} as they are
defined in CDuce we cannot do the same for CDuce record
\emph{expressions}. The reasons why we cannot use the record
expressions of CDuce and we have to define and study new ones are
twofold. On the one hand we want to capture the typing of record field extension and field
deletion, two operation widely used in dynamic language; on the other
hand we need to have very simple expressions formed by elementary
sub-expressions, in order to limit the combinatorics of occurrence
typing. For this reason we build our records one field at a time,
starting from the empty record and adding, updating, or deleting
single fields.
\else%%%%%%%%%%%
Since the main application of occurrence
typing is to type dynamic languages, then it is important to show that our
work can handle records. We cannot handle them via a simple encoding into
pairs, since we need a precise type analysis for operations such
as field addition, update, and deletion. We therefore extend our approach to deal
with the records (types and expressions) as they are to be found in the aforementioned
CDuce language, where these operations can be precisely typed.
\fi%%%%%
Formally, CDuce record \emph{types} can be embedded in our types by adding the
following two type constructors:\\[1.4mm]
%
\centerline{\(\textbf{Types} \quad t ~ ::= ~ \record{\ell_1=t \ldots \ell_n=t}{t}\alt \Undef\)}\\[1.4mm]
%% \[
%% \begin{array}{lrcl}
%% \textbf{Types} & t & ::= & \record{\ell_1=t \ldots \ell_n=t}{t}\alt \Undef
%% \end{array}
%% \]
where $\ell$ ranges over an infinite set of labels $\Labels$ and $\Undef$
is a special singleton type whose only value is the constant
$\undefcst$, a constant not in $\Any$. The type
$\record{\ell_1=t_1 \ldots \ell_n=t_n}{t}$ is a \emph{quasi-constant
function} that maps every $\ell_i$ to the type $t_i$ and every other
$\ell \in \Labels$ to the type $t$ (all the $\ell_i$'s must be
distinct). Quasi constant functions are the internal representation of
record types in CDuce. These are not visible to the programmer who can use
only two specific forms of quasi constant functions, open record types and closed record types (as for OCaml object types), provided by the
following syntactic sugar:%
\iflongversion%
\footnote{Note that in the definitions ``$\ldots{}$'' is meta-syntax to denote the presence of other fields while in the open records ``{\large\textbf{..}}'' is the syntax that distinguishes them from closed ones.}
\fi
\begin{itemize}[nosep]
\item $\crecord{\ell_1=t_1, \ldots, \ell_n=t_n}$ for $\record{\ell_1=t_1 \ldots \ell_n=t_n}{\Undef}$ \hfill(closed records).
\item $\orecord{\ell_1=t_1, \ldots, \ell_n=t_n}$ for $\record{\ell_1=t_1 \ldots \ell_n=t_n}{\Any \vee \Undef}$\hfill (open records).
\end{itemize}
plus the notation $\mathtt{\ell \eqq} t$ to denote optional fields,
which corresponds to using in the quasi-constant function notation the
field $\ell = t \vee \Undef$%
\iflongversion
.
\else
\ (note that ``$\ldots{}$'' is meta-syntax while ``{\large\textbf{..}}'' is syntax).
\fi
For what concerns \emph{expressions}, we cannot use CDuce record expressions
as they are, but instead we must adapt them to our analysis. So as anticipated, we
consider records that are built starting from the empty record expression \erecord{} by adding, updating, or removing fields:\svvspace{-0.75mm}
\[
\begin{array}{lrcl}
\textbf{Expr} & e & ::= & \erecord {} ~\alt~ \recupd e \ell e ~\alt~ \recdel e \ell ~\alt~ e.\ell
\end{array}\svvspace{-.75mm}
\]
in particular $\recdel e \ell$ deletes the field $\ell$ from $e$, $\recupd e \ell e'$ adds the field $\ell=e'$ to the record $e$ (deleting any existing $\ell$ field), while $e.\ell$ is field selection with the reduction:
\(\erecord{...,\ell=e,...}.\ell\ \reduces\ e\).
To define record type subtyping and record expression type inference
we need three operators on record types: $\proj \ell t$ which returns
the type of the field $\ell$ in the record type $t$, $t_1+t_2$ which
returns the record type formed by all the fields in $t_2$ and those in
$t_1$ that are not in $t_2$, and $\recdel t\ell$ which returns the
type $t$ in which the field $\ell$ is
\iflongversion%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
undefined. They are formally defined as follows (see~\citet{alainthesis} for more details):
\begingroup
\allowdisplaybreaks
\begin{eqnarray}
\proj \ell t & = & \left\{\begin{array}{ll}\min \{ u \alt t\leq \orecord{\ell=u}\} &\text{if } t \leq \orecord{\ell = \Any}\\ \text{undefined}&\text{otherwise}\end{array}\right.\\[2mm]
t_1 + t_2 & = & \min\left\{
u \quad\bigg|\quad \forall \ell \in \Labels.\left\{\begin{array}{ll}
\proj \ell u \geq \proj \ell {t_2} &\text{ if } \proj \ell {t_2} \leq \neg \Undef\\
\proj \ell u \geq \proj \ell {t_1} \vee (\proj \ell {t_2} \setminus \Undef) &\text{ otherwise}
\end{array}\right\}
\right\}\\[2mm]
\recdel t \ell & = & \min \left\{ u \quad\bigg|\quad \forall \ell' \in \Labels. \left\{\begin{array}{ll}
\proj {\ell'} u \geq \Undef &\text{ if }\ell' = \ell\\
\proj {\ell'} u \geq \proj {\ell'} t &\text{ otherwise}
\end{array}\right\}\right\}
\end{eqnarray}
\endgroup
\else%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
undefined
(see \Appendix\ref{app:recop} for the formal definition and~\citet{alainthesis} for more details).
\fi%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
Then two record types $t_1$ and $t_2$ are in subtyping relation, $t_1 \leq t_2$, if and only if for all $\ell \in \Labels$ we have $\proj \ell {t_1} \leq \proj \ell {t_2}$. In particular $\orecord{\!\!}$ is the largest record type.
Expressions are then typed by the following rules (already in algorithmic form).\svvspace{-.1mm}
\begin{mathpar}
\Infer[Record]
{~}
{\Gamma \vdash \erecord {}:\crecord {}}
{}
~
\Infer[Update]
{\Gamma \vdash e_1:t_1\and t_1\leq\orecord {\!\!} \and \Gamma \vdash e_2:t_2}
{\Gamma \vdash \recupd{e_1}{\ell}{e_2}: t_1 + \crecord{\ell=t_2}}
{\recupd{e_1\!\!}{\!\!\ell}{e_2} \not\in\dom\Gamma}
%\end{mathpar}
%\begin{mathpar}
\svvspace{-1.9mm}
\\
\Infer[Delete]
{\Gamma \vdash e:t\and t\leq\orecord {\!\!}}
{\Gamma \vdash \recdel e \ell: \recdel t \ell}
{\recdel e \ell\not\in\dom\Gamma}
\Infer[Proj]
{\Gamma \vdash e:t\and t\leq\orecord{\ell = \Any}}
{\Gamma \vdash e.\ell:\proj \ell {t}}
{e.\ell\not\in\dom\Gamma}\svvspace{-2mm}
\end{mathpar}
To extend occurrence typing to records we add the following values to paths: $\varpi\in\{\ldots,a_\ell,u_\ell^1,u_\ell^2,r_\ell\}^*$, with
\(e.\ell\downarrow a_\ell.\varpi =\occ{e}\varpi\),
\(\recdel e \ell\downarrow r_\ell.\varpi = \occ{e}\varpi\), and
\(\recupd{e_1}\ell{e_2}\downarrow u_\ell^i.\varpi = \occ{e_i}\varpi\)
and add the following rules for the new paths:
\begin{mathpar}\svvspace{-8.7mm}
\Infer[PSel]
{ \pvdash \Gamma e t \varpi:t'}
{ \pvdash \Gamma e t {\varpi.a_\ell}:\orecord {\ell:t'} }
{ }
\Infer[PDel]
{ \pvdash \Gamma e t \varpi:t'}
{ \pvdash \Gamma e t \varpi.r_\ell: (\recdel {t'} \ell) + \crecord{\ell \eqq \Any}}
{ }
\svvspace{-3mm}\\
\Infer[PUpd1]
{ \pvdash \Gamma e t \varpi:t'}
{ \pvdash \Gamma e t \varpi.u_\ell^1: (\recdel {t'} \ell) + \crecord{\ell \eqq \Any}}
{ }
\Infer[PUpd2]
{ \pvdash \Gamma e t \varpi:t}
{ \pvdash \Gamma e t \varpi.u_\ell^2: \proj \ell t'}
{ }
\end{mathpar}
Deriving the algorithm from these rules is then straightforward:\\[1.8mm]
\hspace*{-2mm}\(
\begin{array}{llll}
\constr{\varpi.a_\ell}{\Gamma,e,t} = \orecord {\ell: \env {\Gamma,e,t} (\varpi)}\hspace*{-1mm}&
\constr{\varpi.r_\ell}{\Gamma,e,t} = \recdel {(\env {\Gamma,e,t} (\varpi))} \ell + \crecord{\ell \eqq \Any}\\
\constr{\varpi.u_\ell^2}{\Gamma,e,t} = \proj \ell {(\env {\Gamma,e,t} (\varpi))} &
\constr{\varpi.u_\ell^1}{\Gamma,e,t} = \recdel {(\env {\Gamma,e,t} (\varpi))} \ell + \crecord{\ell \eqq \Any}\\[1.8mm]
\end{array}
\)
%% \beppe{Je ne suis absolument pas convaincu par Ext1. Pour moi il devrait donner le field avec le type de $u_\ell^2$ autrement dit pour moi les regles correctes sont:
%% \begin{mathpar}
%% \Infer[PExt1]
%% { \pvdash \Gamma e t \varpi:t_1 \and \pvdash \Gamma e t \varpi.u_\ell^2:t_2}
%% { \pvdash \Gamma e t \varpi.u_\ell^1: (\recdel {t'} \ell) + \crecord{\ell = t_2}}
%% { }
%% \end{mathpar}
%% \[
%% \begin{array}{lcl}
%% \constr{\varpi.u_\ell^1}{\Gamma,e,t} & = & \recdel {(\env {\Gamma,e,t} (\varpi))} \ell + \crecord{\ell = \env {\Gamma,e,t} (\varpi.u_\ell^2)}
%% \end{array}
%% \]
%% }
Notice that the effect of doing $\recdel t \ell + \crecord{\ell \eqq
\Any}$ corresponds to setting the field $\ell$ of the (record) type
$t$ to the type $\Any\vee\Undef$, that is, to the type of all
undefined fields in an open record. So \Rule{PDel} and \Rule{PUpd1}
mean that if we remove, add, or redefine a field $\ell$ in an expression $e$
then all we can deduce for $e$ is that its field $\ell$ is undefined: since the original field was destroyed we do not have any information on it apart from the static one.
For instance, consider the test:\\[1mm]
\centerline{\(\ifty{\recupd{x}{a}{0}}{\orecord{a=\Int, b=\Bool}\vee \orecord{a=\Bool, b=\Int}}{x.b}{\False}\)}\\[1mm]
By $\constr{\varpi.u_\ell^1}{\Gamma,e,t}$---i.e., by \Rule{Ext1}, \Rule{PTypeof}, and \Rule{PInter}---the type for $x$ in the positive branch is $((\orecord{a=\Int, b=\Bool}\vee \orecord{a=\Bool, b=\Int}) \land \orecord{a=\Int}) + \crecord{a\eqq \Any}$.
It is equivalent to the type $\orecord{b=\Bool}$, and thus we can deduce that $x.b$ has the type $\Bool$.
\beppe{Compare with path expressions of ~\citet{THF10} }
\subsection{Refining function types}\label{sec:refining}
\input{refining}
\subsection{Integrating gradual typing}\label{sec:gradual}
\input{gradual}