Commit 8c389e40 authored by Giuseppe Castagna's avatar Giuseppe Castagna
Browse files

Merge branch 'master' of gitlab.math.univ-paris-diderot.fr:beppe/occurrence-typing

parents 1eb92cdf 2efe2aa6
In this work we presented to core of our analysis of occurrence
typing, extended it to record types and a proposed a couple of novel
applications of the theory, namely the inference of
intersection types for functions and a static analysis to reduce the number of
casts inserted when compiling gradually-typed programs.
There is still a lot of work to do to fill the gap with real-word
programming languages. Some of it should be quite routine such as the
encoding of specific language constructions (e.g., \code{isInt},
\code{typeof},...), the handling of more complex
kinds of checks (e.g., generic Boolean expression, multi-case
type-checks) and even encompass sophisticated type matching as the one
performed by the language CDuce. Some other will require more
work. For example, our analysis cannot handle flow of information. In
particular, the result of a type test can flow only to the branches
but not outside the test. As a consequence the current system cannot
type a let binding such as
\begin{alltt}\color{darkblue}
let x = (y\(\in\)Int)?`yes:`no in (x\(\in\)`yes)?y+1:not(y)
\end{alltt}
which is clearly safe when $y:\Int\vee\Bool$. Nor can this example be solved by partial evaluation since we do not handle nesting of tests in the condition\code{( ((y\(\in\)Int)?`yes:`no)\(\in\)`yes )? y+1 : not(y)},
and both are issues that system by~\citet{THF10} can handle. We think that it is possible
to reuse some of their ideas to perform a information flow analysis on the top of
our system to remove these limitations.
%
Information flow analysis would also be useful to improve
inference of intersection types presented in
Section~\ref{sec:refining}: there we said that type cases in the body of a
function are the tipping points that may change the type of the result
of the function; but they are not the only ones, the other being the
applications of overloaded functions. Therefore we plan to
detect the overloaded functions the parameter of an outer function
flows to, so as to use the partition of their domains to perform a
finer grained analysis of the outer function's type.
But the real challenges that lie ahead are the handling of side
effects and the addition of polymorphic types. Our analysis works in a
pure systems and extending it to cope with side-effects is not
immediate. We plan to do it by defining effect systems or by
performing some information flow analysis typically by enriching the
one we plan to develop for the limitations above. But our plan is not
more defined than that. For polymorphism, instead, we can easily adapt
the main idea of this work to the polymorphic setting. Indeed, the
main idea is to remove from the type of an expression all
the results of the expression that would make some test fail (or
succeed, if we are typing a negative branch). This is done by
applying an intersection to the type of the expression, so as to keep
only the values that may yield success (or failure) of the test. For
polymorphism the idea is the same, with the only difference that
besides applying an intersection we can also apply an
instantiation. The idea is to single out the two most general type
substitutions for which some test may succeed and fail, respectively, and apply these
substitutions to refine the types of the corresponding occurrences
in the ``then'' and ``else'' branches. Concretely, consider the test
$x_1x_2\in t$ where $t$ is a closed type and $x_1$, $x_2$ are
variables of type $x_1: s\to t$ and $x_2: u$ with $u\leq s$. For the
positive branch we first check whether there exists a type
substitution $\sigma$ such that $t\sigma\leq\neg\tau$. If it does not
exists, then this means that for all possible assignments of
polymorphic type variables of $s\to t$, the test may succeed, that is,
the success of the test does not depend on the particular instance of
$s\to t$ and, thus, it is not possible to pick some substitution for
refining the occurrence typing. If it exists, then
we find a type substitution $\sigma_\circ$ such that $\tau\leq
t\sigma_\circ$ and we refine for the
positive branch the types of $x_1$, of $x_2$, and of $x_1x_2$ by applying $\sigma_\circ$ to their types. While the
idea is clear (see Appendix~\ref{app:roadmap} for a more detailed explanation),
the technical details are quite involved, especially when considering
functions typed by intersection types and/or when integrating gradual
typing. This deserves a whole pan of non trivial research that we plan to
develop in the near future.
\subsection{Adding structured types}
The previous analysis already covers a large pan of realistic cases. For instance, the analysis already works for list data structures, since products and recursive types are enough is enough to encode them as right associative nested pairs, as it is done in the language CDuce~\cite{BCF03} (e.g., $X = \textsf{Nil}\vee(\Int\times X)$ is the type of the lists of integers). And even more since the presence of union types makes it possible to type heterogeneous lists whose content is described by regular expressions on types as proposed by~\citet{hosoya00regular}.
Since the main application of occurrence typing is to type dynamic languages, then what is really missing are record types.
\beppe{Compare with path expressions of ~\citet{THF10} }
First of all, we define:
The previous analysis already covers a large pan of realistic cases. For instance, the analysis already handles list data structures, since products and recursive types can encode them as right associative nested pairs, as it is done in the language CDuce~\cite{BCF03} (e.g., $X = \textsf{Nil}\vee(\Int\times X)$ is the type of the lists of integers). And even more since the presence of union types makes it possible to type heterogeneous lists whose content is described by regular expressions on types as proposed by~\citet{hosoya00regular}. Since the main application of occurrence typing is to type dynamic languages, then it is worth showing how to extend our work to records. We use the record types as they are defined in CDuce and which are obtained by extending types with the following two type constructors:
\[
\begin{array}{lrcl}
\textbf{Types} & t & ::= & \record{\ell_1=t \ldots \ell_n=t}{t}\alt \Undef
\end{array}
\]
where $\ell$ ranges over an infinite set of labels $\Labels$, $\Undef$
is a special singleton type whose only value is the constant
$\undefcst$ which is a constant not in $\Any$. The type
$\record{\ell_1=t_1 \ldots \ell_n=t_n}{t}$ is a \emph{quasi-constant
function} that maps every $\ell_i$ to the type $t_i$ and every other
$\ell \in \Labels$ to the type $t$ (all the $\ell_i$'s must be
distinct). Quasi constant functions are the internal representation of
record types in CDuce. These are not visible to the programmer who can use
only two specific forms of quasi constant functions, open record types and closed record types, provided by the
following syntactic sugar and that form the \emph{record types} of our language\footnote{Note that in the definitions ``$\ldots{}$'' is meta-syntax to denote the presence of other fields while in the open records ``{\large\textbf{..}}'' is the syntax that distinguishes them from closed ones.}
\begin{itemize}
\item $\Labels$, a possibly infinite set of labels, ranged over by $\ell$.
\item A constant $\undefcst$ that is not in $\Any$. This value is held by labels that are absent from a record. We denote by $\Undef$ the singleton type that only contains $\undefcst$.
\item $\crecord{\ell_1=t_1, \ldots, \ell_n=t_n}$ for $\record{\ell_1=t_1 \ldots \ell_n=t_n}{\Undef}$ (closed records).
\item $\orecord{\ell_1=t_1, \ldots, \ell_n=t_n}$ for $\record{\ell_1=t_1 \ldots \ell_n=t_n}{\Any \vee \Undef}$ (open records).
\end{itemize}
plus the notation $\mathtt{\ell \eqq t}$ to denote optional fields,
which corresponds to using in the quasi-constant function notation the
field $\ell = t \vee \Undef$.
We add to types the following production
For what concern expressions, we adapt CDuce records to our analysis. In particular records are built starting from the empty record expressions \erecord{} and by adding, updating, or removing fields:
\[
\begin{array}{lrcl}
\textbf{Types} & t & ::= & \record{\ell_1=t \ldots \ell_n=t}{t}
\textbf{Expr} & e & ::= & \erecord {} \alt \recupd e \ell e \alt \recdel e \ell \alt e.\ell
\end{array}
\]
with $\forall i. \ell_i \in \Labels$ (all distincts).
The type $\record{\ell_1=t_1 \ldots \ell_n=t_n}{t}$ can be seen as a quasi-constant function that maps every $\ell_i$ to the type $t_i$ and every other $\ell \in \Labels$ to the type $t$.
We add the following syntactic sugars:
\begin{itemize}
\item $\crecord{\ell_1=t_1 \ldots \ell_n=t_n}$ for $\record{\ell_1=t_1 \ldots \ell_n=t_n}{\Undef}$ (closed records).
\item $\orecord{\ell_1=t_1 \ldots \ell_n=t_n}$ for $\record{\ell_1=t_1 \ldots \ell_n=t_n}{\Any \vee \Undef}$ (open records).
\item $\mathtt{\ell \eqq t}$ for $\ell = t \vee \Undef$.
\end{itemize}
in particular $\recdel e \ell$ deletes the field $\ell$ from $e$, $\recupd e \ell e'$ adds the field $\ell=e'$ to the record $e$ (deleting any existing $\ell$ field), while $e.\ell$ is field selection with the reduction:
\(\erecord{...,\ell=e,...}.\ell\ \reduces\ e\).
We define the following operators on record types:
To define record type subtyping and record expression type inference we need the following operators on record types (refer to~\citet{alainthesis} for more details):
\begin{eqnarray}
\proj \ell t & = & \begin{array}{ll}\min \{ u \alt t\leq \orecord{\ell=u}\} &\text{if } t \leq \orecord{\ell = \Any}\\ &\text{undefined otherwise}\end{array}\\
t_1 + t_2 & = & \min \left\{
\begin{split}
&u \alt \forall \ell \in \Labels.\\
&\left\{\begin{array}{ll}
\proj \ell t & = & \left\{\begin{array}{ll}\min \{ u \alt t\leq \orecord{\ell=u}\} &\text{if } t \leq \orecord{\ell = \Any}\\ \text{undefined}&\text{otherwise}\end{array}\right.\\
t_1 + t_2 & = & \min\left\{
u \quad\bigg|\quad \forall \ell \in \Labels.\left\{\begin{array}{ll}
\proj \ell u \geq \proj \ell {t_2} &\text{ if } \proj \ell {t_2} \leq \neg \Undef\\
\proj \ell u \geq \proj \ell {t_1} \vee (\proj \ell {t_2} \setminus \Undef) &\text{ otherwise}
\end{array}\right.
\end{split}
\end{array}\right\}
\right\}\\
\recdel t \ell & = & \min \left\{ u \alt \forall \ell' \in \Labels. \left\{\begin{array}{ll}
\recdel t \ell & = & \min \left\{ u \quad\bigg|\quad \forall \ell' \in \Labels. \left\{\begin{array}{ll}
\proj {\ell'} u \geq \Undef &\text{ if }\ell' = \ell\\
\proj {\ell'} u \geq \proj {\ell'} t &\text{ otherwise}
\end{array}\right.\right\}
\end{array}\right\}\right\}
\end{eqnarray}
Then two record types $t_1$ and $t_2$ are in subtyping relation, $t_1 \leq t_2$, if and only if $\forall \ell \in \Labels. \proj \ell {t_1} \leq \proj \ell {t_2}$. In particular $\orecord{\!\!}$ is the largest record type.
We have the following subtyping rules:\\
If $t_1$ and $t_2$ are record types, $t_1 \leq t_2$ iff $\forall \ell \in \Labels. \proj \ell {t_1} \leq \proj \ell {t_2}$.\\
We also add the following expressions
\[
\begin{array}{lrcl}
\textbf{Expressions} & e & ::= & e.\ell \alt \crecord {} \alt \recupd e \ell e \alt \recdel e \ell
\end{array}
\]
We extend the paths with the following values: $\varpi\in\{\ldots,a_\ell,u_\ell^1,u_\ell^2,r_\ell\}^*$, with:
\[
\begin{array}{r@{\downarrow}l@{\quad=\quad}l}
e.\ell& a_\ell.\varpi & \occ{e}\varpi\\
\recupd{e_1}\ell{e_2}& u_\ell^i.\varpi & \occ{e_i}\varpi\\
\recdel e \ell& r_\ell.\varpi & \occ{e}\varpi\\
\end{array}
\]
We can then extend $\typep p \varpi {\Gamma,e,t}$ as follows:
\[
\begin{array}{lcl}
\typep{p}{\varpi.a_\ell}{\Gamma,e,t} & = & \orecord {\ell: \Gp p {\Gamma,e,t} (\varphi)}\\
\typep{p}{\varpi.u_\ell^1}{\Gamma,e,t} & = & \recdel {(\Gp p {\Gamma,e,t} (\varpi))} \ell + \crecord{\ell \eqq \Any}\\
\typep{p}{\varpi.u_\ell^2}{\Gamma,e,t} & = & \proj \ell {(\Gp p {\Gamma,e,t} (\varpi))}\\
\typep{p}{\varpi.r_\ell}{\Gamma,e,t} & = & \recdel {(\Gp p {\Gamma,e,t} (\varpi))} \ell + \crecord{\ell \eqq \Any}\\ \\
\Gp p {\Gamma,e,t}(\varpi) &=& \typep p \varpi{\Gamma,e,t} \wedge \tyof {\occ e\varpi}\Gamma
\end{array}
\]
Notice that the effect of doing $\recdel t \ell + \crecord{\ell \eqq \Any}$
corresponds to setting the field $\ell$ of the (record) type $t$ to the type
$\Any\vee\Undef$, that is, to the type of all undefined fields in an open
record.
For instance, let's consider this test:
\[\ifty{\recupd{x}{a}{0}}{\orecord{a=\Int, b=\Bool}\vee \orecord{a=\Bool, b=\Int}}{x.b}{\False}\]
We are interested in the type of $x$ in the "then" branch.
Using the above definition for $\typep{p}{\varpi.u_\ell^1}{\Gamma,e,t}$,
we know that in the "then" branch, $x$ has type $((\orecord{a=\Int, b=\Bool}\vee \orecord{a=\Bool, b=\Int}) \land \orecord{a=\Int}) + \crecord{a\eqq \Any}$.
It is equivalent to the type $\orecord{b=\Bool}$, and thus we can deduce that $x.b$ has the type $\Bool$.\\
Finally, we add the following typing rules:
Expressions are then typed by the following rules (already in algorithmic form).
\begin{mathpar}
\Infer[Record]
{~}
{\Gamma \vdash \crecord {}:\crecord {}}
{\Gamma \vdash \erecord {}:\crecord {}}
{}
~
\Infer[Update]
{\Gamma \vdash e_1:t_1\and t_1\leq\orecord {\!\!} \and \Gamma \vdash e_2:t_2}
{\Gamma \vdash \recupd{e_1}{\ell}{e_2}: t_1 + \crecord{\ell=t_2}}
{\recupd{e_1\!\!}{\!\!\ell}{e_2} \not\in\dom\Gamma}
\Infer[Delete]
{\Gamma \vdash e:t\and t\leq\orecord {\!\!}}
{\Gamma \vdash \recdel e \ell: \recdel t \ell}
{\recdel e \ell\not\in\dom\Gamma}
\Infer[Proj]
{\Gamma \vdash e:t\and t\leq\orecord{\ell = \Any}}
{\Gamma \vdash e.\ell:\proj \ell {t}}
{e.\ell\not\in\dom\Gamma}
\Infer[Delete]
{\Gamma \vdash e:t\and t\leq\orecord {}}
{\Gamma \vdash \recdel e \ell: \recdel t \ell}
{\recdel e \ell\not\in\dom\Gamma}
\Infer[Update]
{\Gamma \vdash e_1:t_1\and t_1\leq\orecord {} \and \Gamma \vdash e_2:t_2}
{\Gamma \vdash \recupd{e_1}{\ell}{e_2}: t_1 + \crecord{\ell=t_2}}
{\recupd{e_1}{\ell}{e_2} \not\in\dom\Gamma}
\end{mathpar}
\subsection{Refining function types}
To extend occurrence typing to records we add the paths the following values: $\varpi\in\{\ldots,a_\ell,u_\ell^1,u_\ell^2,r_\ell\}^*$, with
\(e.\ell\downarrow a_\ell.\varpi =\occ{e}\varpi\),
\(\recdel e \ell\downarrow r_\ell.\varpi = \occ{e}\varpi\), and
\(\recupd{e_1}\ell{e_2}\downarrow u_\ell^i.\varpi = \occ{e_i}\varpi\)
and add the following rules for the new paths:
\begin{mathpar}
\Infer[PSel]
{ \pvdash \Gamma e t \varpi:t'}
{ \pvdash \Gamma e t {\varpi.a_\ell}:\orecord {\ell:t'} }
{ }
\Infer[PDel]
{ \pvdash \Gamma e t \varpi:t'}
{ \pvdash \Gamma e t \varpi.r_\ell: (\recdel {t'} \ell) + \crecord{\ell \eqq \Any}}
{ }
\\
\Infer[PUpd1]
{ \pvdash \Gamma e t \varpi:t'}
{ \pvdash \Gamma e t \varpi.u_\ell^1: (\recdel {t'} \ell) + \crecord{\ell \eqq \Any}}
{ }
\Infer[PUpd2]
{ \pvdash \Gamma e t \varpi:t}
{ \pvdash \Gamma e t \varpi.u_\ell^2: \proj \ell t'}
{ }
\end{mathpar}
Deriving the algorithm from these rules is then straightforward:
\[
\begin{array}{lcl}
\constr{\varpi.a_\ell}{\Gamma,e,t} & = & \orecord {\ell: \env {\Gamma,e,t} (\varpi)}\\
\constr{\varpi.r_\ell}{\Gamma,e,t} & = & \recdel {(\env {\Gamma,e,t} (\varpi))} \ell + \crecord{\ell \eqq \Any}\\
\constr{\varpi.u_\ell^1}{\Gamma,e,t} & = & \recdel {(\env {\Gamma,e,t} (\varpi))} \ell + \crecord{\ell \eqq \Any}\\
\constr{\varpi.u_\ell^2}{\Gamma,e,t} & = & \proj \ell {(\env {\Gamma,e,t} (\varpi))}
\end{array}
\]
%% \beppe{Je ne suis absolument pas convaincu par Ext1. Pour moi il devrait donner le field avec le type de $u_\ell^2$ autrement dit pour moi les regles correctes sont:
%% \begin{mathpar}
%% \Infer[PExt1]
%% { \pvdash \Gamma e t \varpi:t_1 \and \pvdash \Gamma e t \varpi.u_\ell^2:t_2}
%% { \pvdash \Gamma e t \varpi.u_\ell^1: (\recdel {t'} \ell) + \crecord{\ell = t_2}}
%% { }
%% \end{mathpar}
%% \[
%% \begin{array}{lcl}
%% \constr{\varpi.u_\ell^1}{\Gamma,e,t} & = & \recdel {(\env {\Gamma,e,t} (\varpi))} \ell + \crecord{\ell = \env {\Gamma,e,t} (\varpi.u_\ell^2)}
%% \end{array}
%% \]
%% }
Notice that the effect of doing $\recdel t \ell + \crecord{\ell \eqq
\Any}$ corresponds to setting the field $\ell$ of the (record) type
$t$ to the type $\Any\vee\Undef$, that is, to the type of all
undefined fields in an open record. So \Rule{PDel} and \Rule{PUpd1}
mean that if we remove, add, or redefine a field $\ell$ in an expression $e$
then all we can deduce for $e$ is that its field $\ell$ is undefined: since the original field was destroyed we do not have any information on it apart from the static one.
For instance, consider the test:
\[\ifty{\recupd{x}{a}{0}}{\orecord{a=\Int, b=\Bool}\vee \orecord{a=\Bool, b=\Int}}{x.b}{\False}\]
By $\constr{\varpi.u_\ell^1}{\Gamma,e,t}$---i.e., by \Rule{Ext1}, \Rule{PTypeof}, and \Rule{PInter}---the type for $x$ in the positive branch is $((\orecord{a=\Int, b=\Bool}\vee \orecord{a=\Bool, b=\Int}) \land \orecord{a=\Int}) + \crecord{a\eqq \Any}$.
It is equivalent to the type $\orecord{b=\Bool}$, and thus we can deduce that $x.b$ has the type $\Bool$.
\beppe{Compare with path expressions of ~\citet{THF10} }
\subsection{Refining function types}\label{sec:refining}
\input{refining}
\subsection{Integrating gradual typing}
\input{gradual}
\subsection{Limitations}
In general we cannot handle flow of information. In particular, the result of a type test can flow only to the branches but not outside the test. As a consequence:
\begin{itemize}
\item Cannot handle let bindings:
\begin{alltt}
let x = (y\(\in\)Int)?`yes:`no
in (x\(\in\)`yes)?y+1:not(y)
\end{alltt}
which is well typed for $y:\Int\vee\Bool$
\item The previous example cannot be solved by partial evaluation since we cannot handle nesting of ifs in the condition:
\begin{alltt}
( ((y\(\in\)Int)?`yes:`no)\(\in\)`yes )? y+1 : not(y)
\end{alltt}
\end{itemize}
Both things that system of~\citet{THF10} can do.
......@@ -10,38 +10,33 @@ that a value crossing the barrier is correctly typed.
Occurrence typing and gradual typing are two complementary disciplines
which have a lot to gain to be integrated, although we are not aware
of any work in this sense. Moreover, the integration of gradual typing with
set-theoretic types has already been studied by~\citet{castagna2019gradual},
which allows us to keep the same formalism. In a sense, occurrence typing is a
of any study in this sense. We study it for the formalism of Section~\ref{sec:language} for which the integration of gradual typing was defined by~\citet{castagna2019gradual}.
In a sense, occurrence typing is a
discipline designed to push forward the frontiers beyond which gradual
typing is needed, thus reducing the amount of runtime checks needed. For
instance, the example at the beginning can be
instance, the the JavaScript code of~\eqref{foo} and~\eqref{foo2} in the introduction can be
typed by using gradual typing:
\begin{alltt}\color{darkblue}
function foo(x\textcolor{darkred}{ : \dyn}) \{
(typeof(x) === "number")? x++ : x.length
function foo(x\textcolor{darkred}{ : \pmb{\dyn}}) \{
(typeof(x) === "number")? x++ : x.trim() \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{foo3}
\}
\end{alltt}
Using ``standard'' gradual typing this is compiled into:
``Standard'' gradual typing inserts two dynamic checks since it compiles the code above into:
\begin{alltt}\color{darkblue}
function foo(x) \{
(typeof(x) === "number")? (\Cast{number}{x})++ : (\Cast{string}{x}).length
(typeof(x) === "number")? (\textcolor{darkred}{\Cast{number}{{\color{darkblue}x}}})++ : (\textcolor{darkred}{\Cast{string}{{\color{darkblue}x}}}).trim()
\}
\end{alltt}
where {\Cast{$t$}{$e$}} is a type-cast.\footnote{Intuitively, \code{\Cast{$t$}{$e$}} is
syntactic sugar for \code{(typeof($e$)==="$t$")? $e$ : (throw "Type
error")}. Not exactly though, since to implement compilation \emph{à la} sound gradual typing we need cast on function types.}
where {\Cast{$t$}{$e$}} is a type-cast that dynamically checks whether the value returned by $e$ has type $t$.\footnote{Intuitively, \code{\Cast{$t$}{$e$}} is
syntactic sugar for \code{(typeof($e$)==="$t$")\,?\,$e$\,:\,(throw "Type
error")}. Not exactly though, since to implement compilation \emph{à la} sound gradual typing is is necessary to use casts on function types that need special handling.}
%
We have already seen in the introduction that by using occurrence
typing combined with a union type instead of the gradual type \dyn
for parameter annotation, we can avoid the insertion of any cast, at the cost
of some additional type annotations.
But occurrence typing can be used also on the gradually typed code. If we use
occurrence typing to type the gradually-typed version of \code{foo}, this
allows the system to avoid inserting the first cast
We already saw that thanks to occurrence typing we can annotate the parameter \code{x} by \code{number|string} instead of \dyn{} and avoid the insertion of any cast.
But occurrence typing can be used also on the gradually typed code in order to statically detect the insertion of useless casts. Using
occurrence typing to type the gradually-typed version of \code{foo} in~\eqref{foo3}, allows the system to avoid inserting the first cast
\code{\Cast{number}{x}} since, thanks to occurrence typing, the
occurrence of \code{x} at issue is given type \code{number} (the
second cast is still necessary however). But removing this cast is far
second cast is still necessary however). But removing only this cast is far
from being satisfactory, since when this function is applied to an integer
there are some casts that still need to be inserted outside of the function.
The reason is that the compiled version of the function
......@@ -49,9 +44,9 @@ has type \code{\dyn$\to$number}, that is, it expects an argument of type
\dyn, and thus we have to apply a cast (either to the argument or
to the function) whenever this is not the case. In particular, the
application \code{foo(42)} will be compiled as
\code{foo(\Cast{\dyn}{42})}. Now the main problem with such a cast is not
\code{foo(\Cast{\dyn}{42})}. Now, the main problem with such a cast is not
that it produces some unnecessary overhead by performing useless
checks (a cast to \dyn can easily be detected and safely ignored at runtime).
checks (a cast to \dyn{} can easily be detected and safely ignored at runtime).
The main problem is that the combination of such a cast with type-cases
will lead to unintuitive results under the standard operational
semantics of type-cases and casts.
......@@ -62,7 +57,7 @@ subtype of $t$. In standard gradual semantics, \code{\Cast{\dyn}{42}} is a value
And this value is of type \code{\dyn}, which is not a subtype of \code{number}.
Therefore the check in \code{foo} would fail for \code{\Cast{\dyn}{42}}, and so
would the whole function call.
Although this behavior is sound, this is the opposite of
Although this behavior is type safe, this is the opposite of
what every programmer would expect: one would expect the test
\code{(typeof($e$)==="number")} to return true for \code{\Cast{\dyn}{42}}
and false for, say, \code{\Cast{\dyn}{true}}, whereas
......@@ -71,11 +66,11 @@ the standard semantics of type-cases would return false in both cases.
A solution is to modify the semantics of type-cases, and in particular of
\code{typeof}, to strip off all the casts in a value, even nested ones. This
however adds a new overhead at runtime. Another solution is to simply accept
this counter-intuitive result, which has the additional benefit of promoting
this counter-intuitive result, which has at least the benefit of promoting
the dynamic type to a first class type, instead of just considering it as a
directive to the front-end. Indeed, this approach allows to dynamically check
directive to the front-end. Indeed, this would allow to dynamically check
whether some argument has the dynamic type \code{\dyn} (i.e., whether it was
applied to a cast to such a type, simply by \code{(typeof($e$)==="\dyn")}.
applied to a cast to such a type) simply by \code{(typeof($e$)==="\dyn")}.
Whatever solution we choose it is clear that in both cases it would be much
better if the application \code{foo(42)} were compiled as is, thus getting
rid of a cast that at best is useless and at worse gives a counter-intuitive and
......@@ -83,22 +78,22 @@ unexpected semantics.
This is where the previous section about refining function types comes in handy.
To get rid of all superfluous casts, we have to fully exploit the information
provided to us by occurrence typing and deduce for the compiled function the type
provided to us by occurrence typing and deduce for the function in~\eqref{foo3} the type
\code{(number$\to$number)$\wedge$((\dyn\textbackslash
number)$\to$number)}, so that no cast is inserted when the
number)$\to$string)}, so that no cast is inserted when the
function is applied to a number.
To achieve this, we simply modify the typing rule for functions that we defined
in the previous section to accommodate for gradual typing. For every gradual type
$\tau$, we define $\tau^*$ as the type obtained from $\tau$ by replacing all
covariant occurrences of \dyn by \Any\ and all contravariant ones by \Empty. The
type $\tau^*$ can be seen as the \emph{maximal} interpretation of $\tau$, that is,
any expression that can safely be cast to $\tau$ is of type $\tau^*$. In
in the previous section to accommodate for gradual typing. Let $\sigma$ and $\tau$ range over \emph{gradual types}, that is the types produced by the grammar in Definition~\ref{def:types} to which we add \dyn{} as basic type (see~\citet{castagna2019gradual} for the definition of the subtyping relation on these types). For every gradual type
$\tau$, define $\tauUp$ as the (non graudal) type obtained from $\tau$ by replacing all
covariant occurrences of \dyn{} by \Any{} and all contravariant ones by \Empty. The
type $\tauUp$ can be seen as the \emph{maximal} interpretation of $\tau$, that is,
every expression that can safely be cast to $\tau$ is of type $\tauUp$. In
other words, if a function expects an argument of type $\tau$ but can be
typed under the hypothesis that the argument has type $\tau^*$, then no casts
are needed, since every cast that succeeds will always be to a subtype of
$\tau^*$. Taking advantage of this property, we modify the typing rule for
functions as follows:
typed under the hypothesis that the argument has type $\tauUp$, then no casts
are needed, since every cast that succeeds will be a subtype of
$\tauUp$. Taking advantage of this property, we modify the rule for
functions as:
%
%\begin{mathpar}
% \Infer[Abs]
% {\Gamma,x:\tau\vdash e\triangleright\psi\and \forall i\in I\quad \Gamma,x:\sigma_i\vdash e:\tau_i
......@@ -109,41 +104,39 @@ functions as follows:
% }
% {\psi(x)=\{\sigma_i\alt i\in I\}}
%\end{mathpar}
\[
\textsc{[AbsInf+]}
\frac
{
\begin{aligned}
\Gamma,x:&\sigma\vdash e\triangleright\psi \qquad \qquad \Gamma,x:\sigma\vdash e:\tau\\
T = \{ (\sigma, \tau) \}
&\cup \{ (\sigma,\tau) ~|~ \sigma \in \psi(x) \land \Gamma, x: \sigma \vdash e: \tau \}\\
&\cup \{ (\sigma^*,\tau) ~|~ \sigma \in \psi(x) \land \Gamma, x: \sigma^* \vdash e: \tau \}
\end{aligned}
\begin{array}{c}
\hspace*{-8mm}T = \{ (\sigma', \tau') \} \cup \{ (\sigma,\tau) ~|~ \sigma \in \psi(x) \land \Gamma, x: \sigma \vdash e: \tau \} \cup \{ (\sigmaUp,\tau) ~|~ \sigma \in \psi(x) \land \Gamma, x: \sigmaUp \vdash e: \tau \}\\
\Gamma,x:\sigma'\vdash e\triangleright\psi \qquad \qquad \qquad\Gamma,x:\sigma'\vdash e:\tau'
\end{array}
}
{
\Gamma\vdash\lambda x:\sigma.e:\textstyle\bigwedge_{(\sigma,\tau) \in T}\sigma\to \tau
\Gamma\vdash\lambda x:\sigma'.e:\textstyle\bigwedge_{(\sigma,\tau) \in T}\sigma\to \tau
}
\]
The main idea behind this rule is the same as before: we first collect all the
information we can into $\psi$ by analyzing the body of the function. We then
retype the function using the new hypothesis $x : \sigma$ for every
$\sigma \in \psi(x)$. However, we also retype the function using the hypothesis
$x : \sigma^*$, following the explanation we gave in the previous paragraph.
This allows us to eliminate unnecessary gradual types.
Going back to our previous example with function \code{foo}, we first deduce
$\sigma \in \psi(x)$. Furthermore, we also retype the function using the hypothesis
$x : \sigmaUp$: as explained before the rule, whenever this typing suceeds it eliminates unnecessary gradual types and, thus, unecessary casts.
Let us see how this works on the function \code{foo} in \eqref{foo3}. First, we deduce
the refined hypothesis
$\psi(x) = \{\code{number}\land\dyn, \dyn \textbackslash \code{number}\}$.
$\psi(\code x) = \{\,\code{number}{\land}\dyn\;,\;\dyn \textbackslash \code{number}\,\}$.
Typing the function using this new hypothesis but without considering the
maximal interpretation would yield
$(\dyn \to \code{number}) \land ((\code{number} \land \dyn) \to \code{number})
\land ((\dyn \textbackslash \code{number}) \to \code{number})$. However, as
$(\dyn \to \code{number}\vee\code{string}) \land ((\code{number} \land \dyn) \to \code{number})
\land ((\dyn \textbackslash \code{number}) \to \code{string})$. However, as
we stated before, this would introduce an unnecessary cast if the function
were to be applied to an integer. Hence the need for the second part of
Rule~\textsc{[AbsInf+]}: the maximal interpretation of $\code{number} \land \dyn$
is $\code{number}$, and it is clear that, if $x$ is given type \code{number},
is $\code{number}$, and it is clear that, if $\code x$ is given type \code{number},
the function type-checks, thanks to occurrence typing. Thus, after some
simplifications, we can actually deduce the desired type
$(\code{number} \to \code{number}) \land ((\dyn \textbackslash \code{number}) \to \code{number})$.
routine simplifications, we can actually deduce the desired type
$(\code{number} \to \code{number}) \land ((\dyn \textbackslash \code{number}) \to \code{string})$.
\beppe{Problem: how to compile functions with intersection types, since their body is typed several distinct types. I see two possible solutions: either merge the casts of the various typings (as we did in the compilation of polymorphic functions for semantic subtyping) or allow just one gradual type in the intersection when a function is explicitly typed (reasonable since, why would you use more gradual types in an intersection?)}
Typescript and Flow are extensions of JavaScript that allow the programmer to specify in the code type annotations used to statically type-check the program. For instance, the following function definition is valid in both languages
\begin{alltt}\color{darkblue}
function foo(x\textcolor{darkred}{ : number | string}) \{
(typeof(x) === "number")? x++ : x.trim()
(typeof(x) === "number")? x++ : x.trim() \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{foo}
\}
\end{alltt}
Apart from the type annotation (in red) of the function parameter, the above is
......@@ -48,9 +48,8 @@ since \code{++} returns an integer and \code{trim()} a string, then our
function \code{foo} has type
\code{(number|string)$\to$(number|string)}.\footnote{\label{null}Actually,
both Flow and TypeScript deduce as return type
\code{number|string|null} (written in Flow's syntax as
\code{?number|?string}), since both languages track the possibility
that some operation may yield \code{null} results. Considering this
\code{number|string|void}, since both languages track the possibility
that some operations may yield \code{void} results. Considering this
would not pose any problem but clutter our presentation, which is why we omit
such details.} But it is clear that a more precise type would be
one that states that \code{foo} returns a number when it is applied to
......
\subsection{Types}
\begin{definition}[Type syntax]\label{def:types} The set of types are the terms $t$ coinductively produced by the following grammar
\[
\begin{array}{lrcl}
\textbf{Types} & t & ::= & b\alt t\to t\alt t\times t\alt t\vee t \alt \neg t \alt \Empty
\end{array}
\]
and that satisfy the following conditions
\begin{itemize}
\item (regularity) the term has a finite number of different sub-terms;
\item (contractivity) every infinite branch of a type contains an infinite number of occurrences of the
arrows or product type constructors.
\end{itemize}
\end{definition}
We introduce the following abbreviations for types: $
t_1 \land t_2 \eqdef \neg (\neg t_1 \vee \neg t_2)$,
$t_ 1 \setminus t_2 \eqdef t_1 \wedge \neg t_2$, $\Any \eqdef \neg \Empty$.
%
We refer to $ b $ and $ \to $ as \emph{type constructors}
and to $ \lor $, $ \land $, $ \lnot $, and $ \setminus $
as \emph{type connectives}.
\subsubsection{Subtyping}
\subsubsection{Operators for type constructors}
Let $t$ be a functional type (i.e., $t\leq\Empty\to\Any$) then
\begin{eqnarray}
\dom t & = & \max \{ u \alt t\leq u\to \Any\}
\\
\apply t s & = &\min \{ u \alt t\leq s\to u\}
\\
\worra t s & = &\min\{u\leq \dom t\alt t\circ(\dom t\setminus u)\leq \neg s\}
\end{eqnarray}
\subsubsection{Type schemes}
Given a value $v$, the set of types $t$ such that $v \in t$ has no smallest element in general.
Indeed, the standard derivation rule for $\lambda$-abstraction values is:
\begin{mathpar}
\Infer[Abs]
{t=(\wedge_{i\in I}\arrow {s_i} {t_i})\land (\wedge_{j\in J} \neg (\arrow {s'_j} {t'_j}))\\t\not\leq \Empty}
{\vdash \lambda^{\wedge_{i\in I}\arrow {s_i} {t_i}}x.e : t}
{}
\\
\end{mathpar}
In the next parts, we will need an algorithmic type system.
In order for it to be complete in regards to lambda-abstractions, we'll need to use a
new syntactic category, called \textbf{type schemes}, that denotes a set of types.
\[
\begin{array}{lrcl}
\textbf{Type schemes} & \ts & ::= & t \alt \tsfun {\arrow t t ; \cdots ; \arrow t t} \alt \ts \tstimes \ts \alt