Commit 88e6ce23 authored by Giuseppe Castagna's avatar Giuseppe Castagna
Browse files

final reading

parent b935e55d
The system we defined in the previous section implements the ideas we
The type system we defined in the previous section implements the ideas we
illustrated in the introduction and it is safe. Now the problem is to
decide whether an expression is well typed or not, that is, to find an
algorithm that given a type environment $\Gamma$ and an expression $e$
......@@ -35,7 +35,7 @@ of its soundness by using and adapting the technique of \emph{type
representations of the infinite sets of types of
$\lambda$-abstractions which can be used to define an algorithmic
system that can be easily proved to be sound. The simpler algorithm
that we propose in this section implies (i.e., it is less precise than) the one with type schemes (cf.\
that we propose in this section implies (i.e., it is less precise than) the one with type schemes (\emph{cf}.\
Lemma~\ref{soundness_simple_ts}) and it is thus sound, too. The algorithm of this
section is not only simpler but, as we discuss in Section~\ref{sec:algoprop},
is also the one that should be used in practice. This is why we preferred to
......@@ -103,7 +103,7 @@ then we define:\vspace{-1.2mm}
All the operators above but $\worra{}{}$ are already present in the
theory of semantic subtyping: the reader can find how to compute them
in~\cite[Section
6.11]{Frisch2008} (see also~\citep[\S4.4]{Cas15} for a detailed description). Below we just show a new formula that computes
6.11]{Frisch2008} (see also~\citep[\S4.4]{Cas15} for a detailed description). Below we just show our new formula that computes
$\worra t s$ for a $t$ subtype of $\Empty\to\Any$. For that, we use a
result of semantic subtyping that states that every type $t$ is
equivalent to a type in disjunctive normal form and that if
......@@ -150,9 +150,9 @@ We start by defining the algorithm for each single occurrence, that is for the d
All the functions above are defined if and only if the initial path
$\varpi$ is valid for $e$ (i.e., $\occ e{\varpi}$ is defined) and $e$
is well-typed (which implies that all $\tyof {\occ e{\varpi}} \Gamma$
in the definition are defined).%
in the definition are defined)%
\iflongversion%
\footnote{Note that the definition is
.\footnote{Note that the definition is
well-founded. This can be seen by analyzing the rule
\Rule{Case\Aa} of Section~\ref{sec:algorules}: the definition of $\Refine {e,t} \Gamma$ and
$\Refine {e,\neg t} \Gamma$ use $\tyof{\occ e{\varpi}}\Gamma$, and
......@@ -160,19 +160,19 @@ in the definition are defined).%
\Rule{Case\Aa} states that $\Gamma\vdash e:t_0$ (and this is
possible only if we were able to deduce under the hypothesis
$\Gamma$ the type of every occurrence of $e$.)\vspace{-3mm}}
\else
; the well foundness of the definition can be deduced by analysing the rule~\Rule{Case\Aa} of Section~\ref{sec:algorules}.
\fi
Each case of the definition of the $\constrf$ function corresponds to the
application of a logical rule
\iflongversion
(\emph{cf.} Footnote~\ref{fo:rules})
\fi
(\emph{cf.} definition in Footnote~\ref{fo:rules})
in
the deduction system for $\vdashp$: case \eqref{uno} corresponds
to the application of \Rule{PEps}; case \eqref{due} implements \Rule{Pappl}
straightforwardly; the implementation of rule \Rule{PAppR} is subtler:
instead of finding the best $t_1$ to subtract (by intersection) from the
static type of the argument, \eqref{tre} finds directly the best type for the argument by
applying the $\worra{}{}$ operator to the static types of the function
applying the $\worra{}{}$ operator to the static type of the function
and the refined type of the application. The remaining (\ref{quattro}--\ref{sette})
cases are the straightforward implementations of the rules
\Rule{PPairL}, \Rule{PPairR}, \Rule{PFst}, and \Rule{PSnd},
......@@ -188,7 +188,7 @@ the definition of $\constrf$.
It remains to explain how to compute the environment $\Gamma'$ produced from $\Gamma$ by the deduction system for $\Gamma \evdash e t \Gamma'$. Alas, this is the most delicate part of our algorithm.
%
In a nutshell what we want to do is to define a function
In a nutshell, what we want to do is to define a function
$\Refine{\_,\_}{\_}$ that takes a type environment $\Gamma$, an
expression $e$ and a type $t$ and returns the best type environment
$\Gamma'$ such that $\Gamma \evdash e t \Gamma'$ holds. By the best
......@@ -259,7 +259,7 @@ cases.
\subsubsection{Algorithmic typing rules}\label{sec:algorules}
We now have all the notions we need for our typing algorithm%
We now have all the definitions we need for our typing algorithm%
\iflongversion%
, which is defined by the following rules.
\else%
......@@ -350,10 +350,10 @@ for an expression $e$ by occurrence typing and stored in $\Gamma$ with
the type deduced for $e$ by the logical rules: this is simply obtained
by removing any hypothesis about $e$ from $\Gamma$, so that the
deduction of the type $t$ for $e$ cannot but end by a logical rule. Of
course this does not apply when the expression $e$ is a variable,
course, this does not apply when the expression $e$ is a variable,
since an hypothesis in $\Gamma$ is the only way to deduce the type of
a variable, which is why the algorithm reintroduces the classic rule
for variables. Finally notice that there is no counterpart for the
for variables. Finally, notice that there is no counterpart for the
rule \Rule{Abs-} and that therefore it is not possible to deduce
negated arrow types for functions. This means that the algorithmic
system is not complete as we discuss in details in the next section.
......@@ -418,10 +418,10 @@ definition). Then we have:
\end{theorem}
We can use the algorithmic system $\vdashAts$ defined for the proof
of Theorem~\ref{th:algosound} to give a far more precise characterization
of the terms for which our algorithm is complete. The system $\vdashAts$
of the terms for which our algorithm is complete: positivity is a very useful but rough approximation. The system $\vdashAts$
copes with negated arrow types but it still is not
complete essentially for two reasons: $(i)$ the recursive nature of
rule \Rule{Path} and $(ii)$ the use of nested \Rule{PAppL} that yield
rule \Rule{Path} and $(ii)$ the use of nested \Rule{PAppL} that yields
a precision that the algorithm loses by using type schemes in the
definition of \constrf{} (case~\eqref{due} is the critical
one). Completeness is recovered by $(i)$ limiting the depth of the
......
......@@ -3,21 +3,21 @@
The previous analysis already covers a large gamut of realistic
cases. For instance, the analysis already handles list data
structures, since products and recursive types can encode them as
right associative nested pairs, as it is done in the language
right-associative nested pairs, as it is done in the language
CDuce (e.g., $X = \textsf{Nil}\vee(\Int\times X)$ is the
type of the lists of integers): see Code 8 in Table~\ref{tab:implem} of Section~\ref{sec:practical} for a concrete example. And even more since the presence of
union types makes it possible to type heterogeneous lists whose
type of the lists of integers): see Code 8 in Table~\ref{tab:implem} of Section~\ref{sec:practical} for a concrete example. Even more, thanks to the presence of
union types it is possible to type heterogeneous lists whose
content is described by regular expressions on types as proposed
by~\citet{hosoya00regular}. Since the main application of occurrence
typing is to type dynamic languages, then it is worth showing how to
extend our work to records. Although we use the record types as they are
defined in CDuce we cannot do the same for CDuce record expressions
extend our work to records. Although we use the record \emph{types} as they are
defined in CDuce we cannot do the same for CDuce record \emph{expressions}
which require non-trivial modifications for occurrence typing,
especially because we want to capture record field extension and field
deletion which current gradual typing systems fail to capture. CDuce record types are obtained by extending types with the
especially because we want to capture the typing of record field extension and field
deletion, two widely used record operations that current gradual typing systems fail to capture. CDuce record types are obtained by extending types with the
following two type constructors:\\[1.4mm]
%
\centerline{\(\textbf{Types} ~~ t ~ ::= ~ \record{\ell_1=t \ldots \ell_n=t}{t}\alt \Undef\)}\\[1.4mm]
\centerline{\(\textbf{Types} \quad t ~ ::= ~ \record{\ell_1=t \ldots \ell_n=t}{t}\alt \Undef\)}\\[1.4mm]
%% \[
%% \begin{array}{lrcl}
%% \textbf{Types} & t & ::= & \record{\ell_1=t \ldots \ell_n=t}{t}\alt \Undef
......@@ -25,27 +25,31 @@ following two type constructors:\\[1.4mm]
%% \]
where $\ell$ ranges over an infinite set of labels $\Labels$ and $\Undef$
is a special singleton type whose only value is the constant
$\undefcst$ which is a constant not in $\Any$. The type
$\undefcst$, a constant not in $\Any$. The type
$\record{\ell_1=t_1 \ldots \ell_n=t_n}{t}$ is a \emph{quasi-constant
function} that maps every $\ell_i$ to the type $t_i$ and every other
$\ell \in \Labels$ to the type $t$ (all the $\ell_i$'s must be
distinct). Quasi constant functions are the internal representation of
record types in CDuce. These are not visible to the programmer who can use
only two specific forms of quasi constant functions, open record types and closed record types, provided by the
only two specific forms of quasi constant functions, open record types and closed record types (as for OCaml object types), provided by the
following syntactic sugar and that form the \emph{record types} of our language%
\iflongversion%
\footnote{Note that in the definitions ``$\ldots{}$'' is meta-syntax to denote the presence of other fields while in the open records ``{\large\textbf{..}}'' is the syntax that distinguishes them from closed ones.}
\fi
\begin{itemize}[nosep]
\item $\crecord{\ell_1=t_1, \ldots, \ell_n=t_n}$ for $\record{\ell_1=t_1 \ldots \ell_n=t_n}{\Undef}$ (closed records).
\item $\orecord{\ell_1=t_1, \ldots, \ell_n=t_n}$ for $\record{\ell_1=t_1 \ldots \ell_n=t_n}{\Any \vee \Undef}$ (open records).
\item $\crecord{\ell_1=t_1, \ldots, \ell_n=t_n}$ for $\record{\ell_1=t_1 \ldots \ell_n=t_n}{\Undef}$ \hfill(closed records).
\item $\orecord{\ell_1=t_1, \ldots, \ell_n=t_n}$ for $\record{\ell_1=t_1 \ldots \ell_n=t_n}{\Any \vee \Undef}$\hfill (open records).
\end{itemize}
plus the notation $\mathtt{\ell \eqq} t$ to denote optional fields,
which corresponds to using in the quasi-constant function notation the
field $\ell = t \vee \Undef$.
field $\ell = t \vee \Undef$%
\iflongversion
.
\else
\ (note that ``$\ldots{}$'' is meta-syntax while ``{\large\textbf{..}}'' is syntax).
\fi
For what concerns expressions, we adapt CDuce records to our analysis. In particular records are built starting from the empty record expression \erecord{} and by adding, updating, or removing fields:\vspace{-1mm}
For what concerns expressions, we adapt CDuce records to our analysis. In particular records are built starting from the empty record expression \erecord{} by adding, updating, or removing fields:\vspace{-1mm}
\[
\begin{array}{lrcl}
\textbf{Expr} & e & ::= & \erecord {} \alt \recupd e \ell e \alt \recdel e \ell \alt e.\ell
......
......@@ -15,7 +15,7 @@ In a sense, occurrence typing is a
discipline designed to push forward the frontiers beyond which gradual
typing is needed, thus reducing the amount of runtime checks needed. For
instance, the JavaScript code of~\eqref{foo} and~\eqref{foo2} in the introduction can also be
typed by using gradual typing:\vspace{-.4mm}
typed by using gradual typing:%\vspace{-.2mm}
\begin{alltt}\color{darkblue}\morecompact
function foo(x\textcolor{darkred}{ : \pmb{\dyn}}) \{
return (typeof(x) === "number")? x++ : x.trim(); \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{foo3}
......@@ -149,7 +149,7 @@ $\code{number} \land \dyn\simeq \code{number}$ is
not an option, since it would force us to choose between having
the gradual guarantee or having, say,
$\code{number} \land \code{string}$ be more precise than
$\code{number} \land \dyn$.\vspace{-2mm}}%
$\code{number} \land \dyn$.\vspace{-2mm}}
Hence the need for the second part of
Rule~\textsc{[AbsInf+]}: the maximal interpretation of $\code{number} \land \dyn$
is $\code{number}$, and it is clear that, if $\code x$ is given type \code{number},
......
We have implemented the algorithmic system $\vdashA$. Our
implementation is in OCaml and uses CDuce as a library to
implementation is written in OCaml and uses CDuce as a library to
provide the semantic subtyping machinery. Besides the type-checking
algorithm defined on the base language, our implementation supports
record types (Section \ref{ssec:struct}) and the refinement of
......@@ -32,12 +32,12 @@ $\texttt{x}$ is specialized to \Int{} in the ``then'' case and to \Bool{}
in the ``else'' case. The function is thus type-checked twice more
under each hypothesis for \texttt{x}, yielding the precise type
$(\Int\to\Int)\land(\Bool\to\Bool)$. Note that w.r.t.\ rule \Rule{AbsInf+} of Section~\ref{sec:refining}, our implementation improved the output of the computed
type. Indeed using rule~[{\sc AbsInf}+] we would obtain the
type. Indeed, using rule~[{\sc AbsInf}+] we would obtain the
type
$(\Int\to\Int)\land(\Bool\to\Bool)\land(\Bool\vee\Int\to\Bool\vee\Int)$
with a redundant arrow. Here we can see that since we deduced
with a redundant arrow. Here we can see that, since we deduced
the first two arrows $(\Int\to\Int)\land(\Bool\to\Bool)$, and since
the union of their domain exactly covers the domain of the third arrow,
the union of their domain exactly covers the domain of the third arrow, then
the latter is not needed. Code~2 shows what happens when the argument
of the function is left unannotated (i.e., it is annotated by the top
type \Any, written ``\texttt{Any}'' in our implementation). Here
......@@ -138,7 +138,7 @@ $(\True\to\Any\to\True)\land(\Any\to\True\to\True)\land
(\lnot\True\to\lnot\True\to\False)$
%}\\[.7mm]
then the rule \Rule{OverApp} applies and \True, \Any, and $\lnot\True$ become candidate types for
\texttt{x}, which allows us to deduce the precise type given in the table. Finally, thanks to rule \Rule{OverApp} it is not necessary to use a type case to force refinement. As a consequence we can define the functions \texttt{and\_} and \texttt{xor\_} more naturally as:
\texttt{x}, which allows us to deduce the precise type given in the table. Finally, thanks to rule \Rule{OverApp} it is not necessary to use a type case to force refinement. As a consequence, we can define the functions \texttt{and\_} and \texttt{xor\_} more naturally as:
\begin{alltt}\color{darkblue}\morecompact
let and_ = fun (x : Any) -> fun (y : Any) -> not_ (or_ (not_ x) (not_ y))
let xor_ = fun (x : Any) -> fun (y : Any) -> and_ (or_ x y) (not_ (and_ x y))
......
......@@ -17,14 +17,14 @@ In this section we show how to do it by using the theory of occurrence
typing we developed in the first part of the paper. In particular, we
collect the different types that are assigned to the parameter of a function
in its body, and use this information to partition the domain of the function
and to re-type its body. Consider a more involved example
and to re-type its body. Consider a more involved example in a pseudo-code that uses our notation
\begin{alltt}\color{darkblue}\morecompact
function (x \textcolor{darkred}{: \(\tau\)}) \{
(x \(\in\) Real) ? \{ (x \(\in\) Int) ? succ x : sqrt x \} : \{ \(\neg\)x \} \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{foorefine}
\}
\end{alltt}
When $\tau$ is \code{Real|Bool} (we assume that \code{Int} is a
subtype of \code{Real}) we want to deduce for this function the
where we assume that \code{Int} is a
subtype of \code{Real}. When $\tau$ is \code{Real$\vee$Bool} we want to deduce for this function the
type
\code{$(\Int\to\Int)\wedge(\Real\backslash\Int\to\Real)\wedge(\Bool\to\Bool)$}.
When $\tau$ is \Any,
......@@ -39,9 +39,9 @@ by~\citet{THF10} since, for instance
in if is_int z then z+1 else 42
\end{alltt}
is well typed since the function \code{is\_int} is given type
$(\Int\to\True)\wedge(\neg\Int\to\False)$. We choose a more general
approach allowing the programmer to hint a particular type for the
argument and deducing, if possible, an intersection type for the
$(\Int\to\True)\wedge(\neg\Int\to\False)$. We propose a more general
approach than the one by~\citet{THF10} since we allow the programmer to hint a particular type for the
argument and let the system deduce, if possible, an intersection type for the
function.
We start by considering the system where $\lambda$-abstractions are
......@@ -118,7 +118,7 @@ Where $\psi\setminus\{x\}$ is the function defined as $\psi$ but undefined on $x
\end{displaymath}
\noindent
\else.~\fi
All that remains to do is replace the rule [{\sc Abs}+] with the
All that remains to do is to replace the rule [{\sc Abs}+] with the
following rule\vspace{-.8mm}
\begin{mathpar}
\Infer[AbsInf+]
......@@ -137,13 +137,13 @@ following rule\vspace{-.8mm}
$\psi$ is always equal to the
domain of $\Gamma$ restricted to variables.
Simply put, this rule first collects all possible types that are deduced
for a variable $x$ during typing and then uses them to re-type the body
of the lambda under this new refined hypothesis for the type of
for a variable $x$ during the typing of the body of the $\lambda$ and then uses them to re-type the body
under this new refined hypothesis for the type of
$x$. The re-typing ensures that the type safety property
carries over to this new rule.
This system is enough to type our case study \eqref{foorefine} for the case $\tau$
defined as \code{Real|Bool}. Indeed the analysis of the body yields
defined as \code{Real$\vee$Bool}. Indeed, the analysis of the body yields
$\psi(x)=\{\Int,\Real\setminus\Int\}$ for the branch \code{(x\,$\in$\,Int) ? succ\,x : sqrt\,x} and, since
\code{$(\Bool\vee\Real)\setminus\Real = \Bool$}, yields
$\psi(x)=\{\Bool\}$ for the branch \code{$\neg$x}. So the function
......@@ -153,7 +153,7 @@ will be checked for the input types $\Int$, $\Real\setminus\Int$, and
It is not too difficult to generalize this rule when the lambda is
typed by an intersection type:\vspace{-.8mm}
\begin{mathpar}
\Infer[AbsInf+] {\forall i\in I~\Gamma,x:s_i\vdash
\Infer[AbsInf+] {\forall i\in I\hspace*{2mm}\Gamma,x:s_i\vdash
e\triangleright\psi_i
\and
\Gamma,x:s_i\vdash
......@@ -164,11 +164,11 @@ typed by an intersection type:\vspace{-.8mm}
I}s_i\to t_i} x.e:\bigwedge_{i\in I}(s_i\to
t_i)\land\bigwedge_{(u, w)\in T_i}(u\to w) } {}\vspace{-3mm}
\end{mathpar}
Here, for each arrow declared in the interface of the function, we
For each arrow declared in the interface of the function, we
first typecheck the body of the function as usual (to check that the
arrow is valid) and collect the refined types for the parameter $x$.
Then we deduce all possible output types for this refined input
types and add the resulting arrows to the type we deduce for the whole
Then we deduce all possible output types for this refined set of input
types and add the resulting arrows to the type deduced for the whole
function (see Appendix~\ref{app:optimize} for an even more precise rule).
......@@ -178,8 +178,8 @@ function (see Appendix~\ref{app:optimize} for an even more precise rule).
In summary, in order to type a
function we use the type-cases on its parameter to partition the
domain of the function and we type-check the function on each single partitions rather
than on the union thereof. Of course, we could use a much finer
domain of the function and we type-check the function on each single partition rather
than on the union thereof. Of course, we could use much a finer
partition: the finest (but impossible) one is to check the function
against the singleton types of all its inputs. But any finer partition
would return, in many cases, not a much better information, since most
......@@ -207,7 +207,7 @@ therefore add to our deduction system a last further rule:\\[1mm]
Whenever a function parameter is the argument of an
overloaded function, we record as possible types for this parameter
all the domains $t_i$ of the arrows that type the overloaded
function, restricted (via intersection) by the static type $t$ of the parameter and provided that the type is not empty ($t\wedge t_i\not\simeq\Empty$). We show the full power of this rule on some practical examples in Section~\ref{sec:practical}.
function, restricted (via intersection) by the static type $t$ of the parameter and provided that the type is not empty ($t\wedge t_i\not\simeq\Empty$). We show the remarkable power of this rule on some practical examples in Section~\ref{sec:practical}.
\ignore{\color{darkred} RANDOM THOUGHTS:
A delicate point is the definition of the union
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment