Commit 088b9bab authored by Victor Lanvin's avatar Victor Lanvin
Browse files

Fixes

parent 75d33df6
We revisit occurrence typing, a technique to refine the type of
variables occurring in type-cases and, thus, capture some programming
patterns used in untyped languages. Althought occurrence typing was
patterns used in untyped languages. Although occurrence typing was
tied from its inception to set-theoretic types---union types, in
particular---it never fully exploited the capabilities of these types. Here
we show how by using set-theoretic types it is possible to develop a
we show how, by using set-theoretic types, it is possible to develop a
general typing framemork that encompasses and generalizes several
aspects of current occurrence typing proposals and that can be applied to
attack other problems such as the inference of intersection types for functions
tackle other problems such as the inference of intersection types for functions
and the optimization of the compilation of gradually typed languages.
In this work we presented to core of our analysis of occurrence
typing, extended it to record types and a proposed a couple of novel
typing, extended it to record types and proposed a couple of novel
applications of the theory, namely the inference of
intersection types for functions and a static analysis to reduce the number of
casts inserted when compiling gradually-typed programs.
One of the by-products of our work is the ability to define type
predicates such as those used in \cite{THF10} as plain function and
predicates such as those used in \cite{THF10} as plain functions and
have the inference procedure deduce automatically the correct
overloaded function type.
......@@ -19,8 +19,8 @@ type a let binding such as \code{
%\end{alltt}
}
which is clearly safe when $y:\Int\vee\Bool$. Nor can this example be solved by partial evaluation since we do not handle nesting of tests in the condition\code{( ((y\(\in\)Int)?`yes:`no)\(\in\)`yes )? y+1 : not(y)},
and both are issues that system by~\citet{THF10} can handle. We think that it is possible
to reuse some of their ideas to perform a information flow analysis on top of
and both are issues that the system by~\citet{THF10} can handle. We think that it is possible
to reuse some of their ideas to perform an information flow analysis on top of
our system to remove these limitations.
%
Some of the exensions we hinted to in Section~\ref{sec:practical}
......@@ -36,7 +36,7 @@ made to converge and, foremost, whether it is of use in practice is among our ob
But the real challenges that lie ahead are the handling of side
effects and the addition of polymorphic types. Our analysis works in a
pure systems and extending it to cope with side-effects is not
pure system and extending it to cope with side-effects is not
immediate. We plan to do it by defining effect systems or by
performing some information flow analysis typically by enriching the
one we plan to develop for the limitations above. But our plan is not
......
......@@ -72,7 +72,7 @@ Expressions are then typed by the following rules (already in algorithmic form).
{\Gamma \vdash e.\ell:\proj \ell {t}}
{e.\ell\not\in\dom\Gamma}\vspace{-2mm}
\end{mathpar}
To extend occurrence typing to records we add the paths the following values: $\varpi\in\{\ldots,a_\ell,u_\ell^1,u_\ell^2,r_\ell\}^*$, with
To extend occurrence typing to records we add paths the following values to paths: $\varpi\in\{\ldots,a_\ell,u_\ell^1,u_\ell^2,r_\ell\}^*$, with
\(e.\ell\downarrow a_\ell.\varpi =\occ{e}\varpi\),
\(\recdel e \ell\downarrow r_\ell.\varpi = \occ{e}\varpi\), and
\(\recupd{e_1}\ell{e_2}\downarrow u_\ell^i.\varpi = \occ{e_i}\varpi\)
......
......@@ -34,10 +34,10 @@ is \emph{not} of type \code{number} (and thus deduce from the type annotation th
Occurrence typing was first defined and formally studied by
\citet{THF08} to statically type-check untyped Scheme programs, and later extended by \citet{THF10}
yielding the development of Typed Racket. From its inception
yielding the development of Typed Racket. From its inception,
occurrence typing was intimately tied to type systems with
set-theoretic types: unions, intersections, and negation of
types. Union was the first type connective to appear, since it is already used by
types. Union was the first type connective to appear, since it was already used by
\citet{THF08} where its presence was needed to characterize the
different control flows of a type test, as our \code{foo} example
shows: one flow for integer arguments and another for
......@@ -51,7 +51,7 @@ function \code{foo} has type
\code{number|string|void}, since they track when operations may yield \code{void} results. Considering this would be easy but also clutter our presentation, which is why we omit
such details.} But it is clear that a more precise type would be
one that states that \code{foo} returns a number when it is applied to
a number and it returns a string when it is applied to a string, so that the type deduced for, say, \code{foo(42)} would be \code{number} rather than \code{number|string}. This is
a number and returns a string when it is applied to a string, so that the type deduced for, say, \code{foo(42)} would be \code{number} rather than \code{number|string}. This is
exactly what the intersection type \code{(number$\to$number) \&
(string$\to$string)} states (intuitively, an expression has an intersection of type, noted \code{\&}, if and only if it has all the types of the intersection) and corresponds in Flow to declaring \code{foo} as follows:
\begin{alltt}\color{darkblue}
......@@ -79,7 +79,7 @@ nested pairs or at the end of a path of selectors. \beppe{Not precise,
please check which are the cases that are handled in the cited
papers} In this work we aim at removing this limitation on the
contexts and develop a general theory to refine the type of variables
that occur in tested expressions under generic contexts, such as variables occurring on the
that occur in tested expressions under generic contexts, such as variables occurring in the
left or the right expressions of an application. In other words, we aim at establishing a formal framework to
extract as much static information as possible from a type test. We leverage our
analysis on the presence of full-fledged set-theoretic types
......@@ -108,7 +108,7 @@ In particular, in this introduction we concentrate on applications, since they c
\ifty{x_1x_2}{t}{e_1}{e_2}
\end{equation}
where $x_i$'s denote variables, $t$ is some type, and $e_i$'s are generic expressions.
Depending the actual $t$ and the static types of $x_1$ and $x_2$, we
Depending on the actual $t$ and on the static types of $x_1$ and $x_2$, we
can make type assumptions for $x_1$, for $x_2$, \emph{and} for the application $x_1x_2$
when typing $e_1$ that are different from those we can make when typing
$e_2$. For instance, suppose $x_1$ is bound to the function \code{foo} defined in \eqref{foo2}. Thus $x_1$ has type $(\Int\to\Int)\wedge(\String\to\String)$ (we used the syntax of the types of Section~\ref{sec:language} where unions and intersections are denoted by $\vee$ and $\wedge$ and have priority over $\to$ and $\times$, but not over $\neg$).
......@@ -162,8 +162,8 @@ $x_2$ must be of type \String\ for the application to have type
$\neg\Int$ and therefore we assume that $x_2$ has type
$(\Int\vee\String)\wedge\String$ (i.e., again \String).
We have seen that we can specialize in the branches the type of the
whole expression $x_1x_2$, the type of the argument $x_2$ of the application,
We have seen that we can specialize in both branches the type of the
whole expression $x_1x_2$, the type of the argument $x_2$,
but what about the type of $x_1$? Well, this depends on the type of
$x_1$ itself. In particular, if instead of an intersection type $x_1$
is typed by a union type, then the test may give us information about
......@@ -217,7 +217,7 @@ then
that is the union of all the possible
input types, while the precise return type of such a
function depends on the type of the argument the function is applied to:
either an integer, or a Boolean, or both (i.e., the argument has union type
either an integer, or a string, or both (i.e., the argument has union type
\(\Int\vee\String\)). So we have \( t_1 \circ \Int = \Int \),
\( t_1 \circ \String = \String \), and
\( t_1 \circ (\Int\vee\String) = \Int \vee \String \) (see Section~\ref{sec:typeops} for the formal definition of $\circ$).
......@@ -249,7 +249,7 @@ of $e_2$, that is $t_2$, with the elements of the domain that
easy to see how to refine the type of $e_2$ for when the test fails:
simply use all the other possible results of $e_2$, that is
$t_2\setminus(\worra{t_1} t)$. To sum up, to refine the type of an
argument in test of applications, all we need is to define
argument in the test of an application, all we need is to define
$\worra{t_1} t$, the set of arguments that when applied to a function
of type $t_1$ \emph{may} return a result in $t$; then we can refine the
type of $e_2$ as $t_2^+ \eqdeftiny t_2\wedge(\worra{t_1} t)$ in the ``then'' branch (we call it the \emph{positive} branch)
......@@ -271,13 +271,13 @@ that the function was applied to a value in $t_2^+$ and, second, that
the application did not diverge and, thus, returned a result in
$t_1\!\circ t_2^+$ (which is a subtype of $t_1\!\circ
t_2$). Therefore, we can exclude from $t_1$ all the functions that when applied to an argument
in $t_2^+$ they either diverge or yield a result not in $t_1\!\circ t_2^+$.
in $t_2^+$ either diverge or yield a result not in $t_1\!\circ t_2^+$.
Both of these things can be obtained simply by removing from $t_1$ the functions in
$(t_2^+\to \neg (t_1\!\circ t_2^+))$, that is, we refine the type of $e_1$ in the ``then'' branch as
$t_1^+=t_1\setminus (t_2^+\to
\neg(t_1\!\circ t_2^+)))$. That this removes the functions that applied to $t_2^+$ arguments yield results not in $t_1\!\circ t_2^+$ should be pretty obvious. That this also removes functions diverging on $t_2^+$ arguments is subtler and deserves some
\neg(t_1\!\circ t_2^+)))$. The fact that this removes the functions that applied to $t_2^+$ arguments yield results not in $t_1\!\circ t_2^+$ should be pretty obvious. That this also removes functions diverging on $t_2^+$ arguments is subtler and deserves some
explanation. In particular, the interpretation of a type $t\to s$ is the set
of all functions that when applied to an argument of type $t$ they
of all functions that when applied to an argument of type $t$
either diverge or return a value in $s$. As such the interpretation of $t\to s$ contains
all the functions that diverge (at least) on $t$. Therefore removing $t\to s$ from a type $u$ removes from $u$ not only all the functions that when applied to a $t$ argument return a result in $s$, but also all the functions that diverge on $t$.
Ergo $t_1\setminus (t_2^+\to
......@@ -379,7 +379,7 @@ that $x$ is of type $\Int$. But if we use the hypothesis generated by
the test of $fx$, that is, that $x$ is of type \Int, to check $gx$ against \Bool,
then the type deduced for $x$ is $\Empty$---i.e., the branch is never selected. In other words, we want to produce type
environmments for occurrence typing by taking into account all
hypotheses available, even when these hypotheses are formulated later
the available hypotheses, even when these hypotheses are formulated later
in the flow of control. This will be done in the type systems of
Section~\ref{sec:language} by the rule \Rule{Path} and will require at
algorithmic level to look for a fix-point solution of a function, or
......@@ -395,16 +395,16 @@ we can assume that the expression $(x,y)$ is of type
$\pair{(\Int\vee\Bool)}\Int$ and put it in the type environment. But
if in $e$ there is a test like
$\ifty{x}{\Int}{{\color{darkred}(x,y)}}{(...)}$ then we do not want use
the assumption in the type environemt to type the expression $(x,y)$
the assumption in the type environment to type the expression $(x,y)$
occurring in the inner test (in red). Instead we want to give to that occurrence of the expression
$(x,y)$ the type $\pair{\Int}\Int$. This will be done by temporary
removing the type assumption about $(x,y)$ from type environment and
retyping the expression without that assumption (see rule
removing the type assumption about $(x,y)$ from the type environment and
by retyping the expression without that assumption (see rule
\Rule{Env\Aa} in Section~\ref{sec:algorules}).
\subsubsection*{Outline} In Section~\ref{sec:language} we formalize the
ideas presented in this introduction: we define types and
ideas presented in this introduction: we define the types and
expressions of our system, their dynamic semantics and a type system that
implements occurrence typing together with the algorithms that decide
whether an expression is well typed or not. Section~\ref{sec:extensions} extends our formalism to record
......
......@@ -43,7 +43,7 @@ The subtyping relation for these types, noted $\leq$, is the one defined
by~\citet{Frisch2008} to which the reader may refer. A detailed description of the algorithm to decide it can be found in~\cite{Cas15}.
For this presentation it suffices to consider that
types are interpreted as sets of \emph{values} ({i.e., either
constants, $\lambda$-abstractions, or pair of values: see
constants, $\lambda$-abstractions, or pairs of values: see
Section~\ref{sec:syntax} right below) that have that type, and that subtyping is set
containment (i.e., a type $s$ is a subtype of a type $t$ if and only if $t$
contains all the values of type $s$). In particular, $s\to t$
......@@ -116,7 +116,7 @@ $\Cx[e]\reduces\Cx[e']$.
\subsection{Static semantics}\label{sec:static}
While the syntax and reduction semantics are, on the whole, pretty
standard, for the type system we will have to introduce several
standard for the type system, we will have to introduce several
unconventional features that we anticipated in
Section~\ref{sec:challenges} and are at the core of our work. Let
us start with the standard part, that is the typing of the functional
......@@ -698,7 +698,7 @@ is, for each expression $e'$ occurring in $e$, $\Refine {e,t}\Gamma$
would be the type environment that maps $e'$ into $\bigwedge_{\{\varpi \alt
\occ e \varpi \equiv e'\}} \env {\Gamma,e,t} (\varpi)$. As we
explained in Section~\ref{sec:challenges} the intersection is needed
to apply occurrence typing to expression such as
to apply occurrence typing to expressions such as
$\tcase{(x,x)}{\pair{t_1}{t_2}}{e_1}{e_2}$ where some
expressions---here $x$---occur multiple times.
......@@ -706,11 +706,11 @@ In order to capture most of the type information from nested queries
the rule \Rule{Path} allows the deduction of the type of some
occurrence $\varpi$ to use a type environment $\Gamma'$ that may
contain information about some suboccurrences of $\varpi$. On the
algorithm this would correspond to apply the $\Refinef$ defined
algorithm this would correspond to applying the $\Refinef$ defined
above to an environment that already is the result of $\Refinef$, and so on. Therefore, ideally our
algorithm should compute the type environment as a fixpoint of the
function $X\mapsto\Refine{e,t}{X}$. Unfortunately, an iteration of $\Refinef$ may
not converge. As an example consider the (dumb) expression $\tcase
not converge. As an example, consider the (dumb) expression $\tcase
{x_1x_2}{\Any}{e_1}{e_2}$. If $x_1:\Any\to\Any$, then every iteration of
$\Refinef$ yields for $x_1$ a type strictly more precise than the type deduced in the
previous iteration.
......@@ -828,7 +828,7 @@ The side conditions of the rules ensure that the system is syntax
directed, that is, that at most one rule applies when typing a term:
priority is given to \Rule{Eqf\Aa} over all the other rules and to
\Rule{Env\Aa} over all remaining logical rules. Type schemes are
used to account the type-multiplicity stemming from
used to account for the type-multiplicity stemming from
$\lambda$-abstractions as shown in particular by rule \Rule{Abs\Aa}
(in what follows we use the word ``type'' also for type schemes). The
subsumption rule is no longer in the system; it is replaced by: $(i)$
......@@ -848,7 +848,7 @@ removing any hypothesis about $e$ from $\Gamma$, so that the deduction
of the type $\ts$ for $e$ cannot but end by a logical rule. Of course
this does not apply when the expression $e$ is a variable, since
an hypothesis in $\Gamma$ is the only way to deduce the type of a
variable, which is a why the algorithm reintroduces the classic rules
variable, which is why the algorithm reintroduces the classic rules
for variables.
The algorithmic system above is sound with respect to the deductive one of Section~\ref{sec:static}
......@@ -874,13 +874,13 @@ left-hand side of another negated arrow.
For every $\Gamma$, $e$, $t$, if $\Gamma \vdash e:t$ is derivable by a rank-0 negated derivation, then there exists $n_o$ such that $\Gamma\vdashA e: t'$ and $t'\leq t$.
\end{theorem}
\noindent The use of type schemes and of possibly diverging iterations
yield a system that may seem overly complicated. But it is important
to stress that this systems is defined only to study the
yields a system that may seem overly complicated. But it is important
to stress that this system is defined only to study the
type inference system of Section~\ref{sec:static} and in particular to prod
how close we can get to a complete algorithm for it. But for the
practical application type schemes are not needed, since they are
how close we can get to a complete algorithm for it. But for
practical applications type schemes are not needed, since they are
necessary only when type cases may specify types with negative arrows
and this in practice never happens (see Footnote~\ref{foo:typecase} and Corollary~\ref{app:completeness}). This is why for our implementation we use the
and this, in practice, never happens (see Footnote~\ref{foo:typecase} and Corollary~\ref{app:completeness}). This is why for our implementation we use the
CDuce library in which type schemes are absent and functions are typed
only by intersections of positive arrows. We present the implementation in Section~\ref{sec:practical}
but before let us study some extensions.
......@@ -83,7 +83,7 @@ Laws instead of
using a direct definition. Here the application of the outermost \texttt{not\_} operator is checked against type \True. This
allows the system to deduce that the whole \texttt{or\_} application
has type \False, which in turn leads to \texttt{not\_\;x} and
\texttt{not\_\;y} to have type $\lnot \True$ and therefore \texttt{x}
\texttt{not\_\;y} to have type $\lnot \True$ and therefore both \texttt{x}
and \texttt{y} to have type \True. The whole function is typed with
the most precise type (we present the type as printed by our
implementation, but the first arrow of the resulting type is
......
......@@ -14,10 +14,10 @@ should be enough for the system to deduce the type \eqref{tyinter}
even in the absence the annotation given in \eqref{foo2}.
In this section we show how to do it by using the theory of occurrence
typing we developed in the first part of the paper. In particular we
collect the different types the parameter of a function is assigned in
its body to determine a partition of the domain types against which
the body of the function is checked. Consider a more involved example
typing we developed in the first part of the paper. In particular, we
collect the different types that are assigned to the parameter of a function
in its body, and use this information to partition the domain of the function
and to re-type its body. Consider a more involved example
\begin{alltt}\color{darkblue}\morecompact
function (x \textcolor{darkred}{: \(\tau\)}) \{
(x \(\in\) Real) ? \{ (x \(\in\) Int) ? succ x : sqrt x \} : \{ \(\neg\)x \}
......@@ -104,7 +104,7 @@ binders):\vspace{-1mm}
{\Gamma\vdash \ifty{e}{t}{e_1}{e_2}\triangleright\psi_\circ\cup\psi_1\cup\psi_2}
{}\vspace{-1mm}
\end{mathpar}
Where $\psi\setminus\{x\}$ is the function defined as $\psi$ but undefined on $x$ and $\psi_1\cup \psi_2$ denote component-wise union%
Where $\psi\setminus\{x\}$ is the function defined as $\psi$ but undefined on $x$ and $\psi_1\cup \psi_2$ denotes component-wise union%
\iflongversion%
, that is :
\begin{displaymath}
......@@ -140,7 +140,7 @@ Simply put, this rule first collects all possible types that are deduced
for a variable $x$ during typing and then uses them to re-type the body
of the lambda under this new refined hypothesis for the type of
$x$. The re-typing ensures that that the type soundness property
carries over this new rule.
carries over to this new rule.
This system is enough to type our case study for the case $\tau$
defined as \code{Real|Bool}. Indeed the analysis of the body yields
......@@ -182,12 +182,12 @@ function (see Appendix~\ref{app:optimize} for an even more precise rule).
In summary, in order to type a
function we use the type-cases on its parameter to partition the
domain of the function and we type-check the function on each single partitions rather
than on the union thereof. Of course, we could use much a finer
than on the union thereof. Of course, we could use a much finer
partition: the finest (but impossible) one is to check the function
against the singleton types of all its inputs. But any finer partition
would return, in many cases, not a much better information, since most
partitions would collapse on the same return type: type-cases on the
parameter are the tipping points that are likely make a difference, by returning different
parameter are the tipping points that are likely to make a difference, by returning different
types for different partitions thus yielding more precise typing. But they are not the only such tipping points: see rule \Rule{OverApp} in Section~\ref{sec:practical}.
\ignore{\color{darkred} RANDOM THOUGHTS:
......
......@@ -27,11 +27,13 @@
\newcommand{\beppe}[1]{{\bf\color{blue}Beppe: #1}}
\newcommand{\mick}[1]{{\bf\color{gray}Mickael: #1}}
\newcommand{\kim}[1]{{\bf\color{pink}Kim: #1}}
\newcommand{\victor}[1]{{\bf\color{purple}Victor: #1}}
\else
\newcommand{\ignore}[1]{}
\newcommand{\beppe}[1]{}
\newcommand{\mick}[1]{}
\newcommand{\kim}[1]{}
\newcommand{\victor}[1]{}
\fi
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Support macros
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment