In this work we presented to core of our analysis of occurrence

typing, extended it to record types and a proposed a couple of novel

typing, extended it to record types and proposed a couple of novel

applications of the theory, namely the inference of

intersection types for functions and a static analysis to reduce the number of

casts inserted when compiling gradually-typed programs.

One of the by-products of our work is the ability to define type

predicates such as those used in \cite{THF10} as plain function and

predicates such as those used in \cite{THF10} as plain functions and

have the inference procedure deduce automatically the correct

overloaded function type.

...

...

@@ -19,8 +19,8 @@ type a let binding such as \code{

%\end{alltt}

}

which is clearly safe when $y:\Int\vee\Bool$. Nor can this example be solved by partial evaluation since we do not handle nesting of tests in the condition\code{( ((y\(\in\)Int)?`yes:`no)\(\in\)`yes )? y+1 : not(y)},

and both are issues that system by~\citet{THF10} can handle. We think that it is possible

to reuse some of their ideas to perform a information flow analysis on top of

and both are issues that the system by~\citet{THF10} can handle. We think that it is possible

to reuse some of their ideas to perform an information flow analysis on top of

our system to remove these limitations.

%

Some of the exensions we hinted to in Section~\ref{sec:practical}

...

...

@@ -36,7 +36,7 @@ made to converge and, foremost, whether it is of use in practice is among our ob

But the real challenges that lie ahead are the handling of side

effects and the addition of polymorphic types. Our analysis works in a

pure systems and extending it to cope with side-effects is not

pure system and extending it to cope with side-effects is not

immediate. We plan to do it by defining effect systems or by

performing some information flow analysis typically by enriching the

one we plan to develop for the limitations above. But our plan is not

@@ -34,10 +34,10 @@ is \emph{not} of type \code{number} (and thus deduce from the type annotation th

Occurrence typing was first defined and formally studied by

\citet{THF08} to statically type-check untyped Scheme programs, and later extended by \citet{THF10}

yielding the development of Typed Racket. From its inception

yielding the development of Typed Racket. From its inception,

occurrence typing was intimately tied to type systems with

set-theoretic types: unions, intersections, and negation of

types. Union was the first type connective to appear, since it is already used by

types. Union was the first type connective to appear, since it was already used by

\citet{THF08} where its presence was needed to characterize the

different control flows of a type test, as our \code{foo} example

shows: one flow for integer arguments and another for

...

...

@@ -51,7 +51,7 @@ function \code{foo} has type

\code{number|string|void}, since they track when operations may yield \code{void} results. Considering this would be easy but also clutter our presentation, which is why we omit

such details.} But it is clear that a more precise type would be

one that states that \code{foo} returns a number when it is applied to

a number and it returns a string when it is applied to a string, so that the type deduced for, say, \code{foo(42)} would be \code{number} rather than \code{number|string}. This is

a number and returns a string when it is applied to a string, so that the type deduced for, say, \code{foo(42)} would be \code{number} rather than \code{number|string}. This is

exactly what the intersection type \code{(number$\to$number) \&

(string$\to$string)} states (intuitively, an expression has an intersection of type, noted \code{\&}, if and only if it has all the types of the intersection) and corresponds in Flow to declaring \code{foo} as follows:

\begin{alltt}\color{darkblue}

...

...

@@ -79,7 +79,7 @@ nested pairs or at the end of a path of selectors. \beppe{Not precise,

please check which are the cases that are handled in the cited

papers} In this work we aim at removing this limitation on the

contexts and develop a general theory to refine the type of variables

that occur in tested expressions under generic contexts, such as variables occurring on the

that occur in tested expressions under generic contexts, such as variables occurring in the

left or the right expressions of an application. In other words, we aim at establishing a formal framework to

extract as much static information as possible from a type test. We leverage our

analysis on the presence of full-fledged set-theoretic types

...

...

@@ -108,7 +108,7 @@ In particular, in this introduction we concentrate on applications, since they c

\ifty{x_1x_2}{t}{e_1}{e_2}

\end{equation}

where $x_i$'s denote variables, $t$ is some type, and $e_i$'s are generic expressions.

Depending the actual $t$ and the static types of $x_1$ and $x_2$, we

Depending on the actual $t$ and on the static types of $x_1$ and $x_2$, we

can make type assumptions for $x_1$, for $x_2$, \emph{and} for the application $x_1x_2$

when typing $e_1$ that are different from those we can make when typing

$e_2$. For instance, suppose $x_1$ is bound to the function \code{foo} defined in \eqref{foo2}. Thus $x_1$ has type $(\Int\to\Int)\wedge(\String\to\String)$ (we used the syntax of the types of Section~\ref{sec:language} where unions and intersections are denoted by $\vee$ and $\wedge$ and have priority over $\to$ and $\times$, but not over $\neg$).

...

...

@@ -162,8 +162,8 @@ $x_2$ must be of type \String\ for the application to have type

$\neg\Int$ and therefore we assume that $x_2$ has type

$(\Int\vee\String)\wedge\String$ (i.e., again \String).

We have seen that we can specialize in the branches the type of the

whole expression $x_1x_2$, the type of the argument $x_2$ of the application,

We have seen that we can specialize in both branches the type of the

whole expression $x_1x_2$, the type of the argument $x_2$,

but what about the type of $x_1$? Well, this depends on the type of

$x_1$ itself. In particular, if instead of an intersection type $x_1$

is typed by a union type, then the test may give us information about

...

...

@@ -217,7 +217,7 @@ then

that is the union of all the possible

input types, while the precise return type of such a

function depends on the type of the argument the function is applied to:

either an integer, or a Boolean, or both (i.e., the argument has union type

either an integer, or a string, or both (i.e., the argument has union type

\(\Int\vee\String\)). So we have \( t_1\circ\Int=\Int\),

\( t_1\circ\String=\String\), and

\( t_1\circ(\Int\vee\String)=\Int\vee\String\) (see Section~\ref{sec:typeops} for the formal definition of $\circ$).

...

...

@@ -249,7 +249,7 @@ of $e_2$, that is $t_2$, with the elements of the domain that

easy to see how to refine the type of $e_2$ for when the test fails:

simply use all the other possible results of $e_2$, that is

$t_2\setminus(\worra{t_1} t)$. To sum up, to refine the type of an

argument in test of applications, all we need is to define

argument in the test of an application, all we need is to define

$\worra{t_1} t$, the set of arguments that when applied to a function

of type $t_1$\emph{may} return a result in $t$; then we can refine the

type of $e_2$ as $t_2^+\eqdeftiny t_2\wedge(\worra{t_1} t)$ in the ``then'' branch (we call it the \emph{positive} branch)

...

...

@@ -271,13 +271,13 @@ that the function was applied to a value in $t_2^+$ and, second, that

the application did not diverge and, thus, returned a result in

$t_1\!\circ t_2^+$ (which is a subtype of $t_1\!\circ

t_2$). Therefore, we can exclude from $t_1$ all the functions that when applied to an argument

in $t_2^+$they either diverge or yield a result not in $t_1\!\circ t_2^+$.

in $t_2^+$ either diverge or yield a result not in $t_1\!\circ t_2^+$.

Both of these things can be obtained simply by removing from $t_1$ the functions in

$(t_2^+\to\neg(t_1\!\circ t_2^+))$, that is, we refine the type of $e_1$ in the ``then'' branch as

$t_1^+=t_1\setminus(t_2^+\to

\neg(t_1\!\circ t_2^+)))$. That this removes the functions that applied to $t_2^+$ arguments yield results not in $t_1\!\circ t_2^+$ should be pretty obvious. That this also removes functions diverging on $t_2^+$ arguments is subtler and deserves some

\neg(t_1\!\circ t_2^+)))$. The fact that this removes the functions that applied to $t_2^+$ arguments yield results not in $t_1\!\circ t_2^+$ should be pretty obvious. That this also removes functions diverging on $t_2^+$ arguments is subtler and deserves some

explanation. In particular, the interpretation of a type $t\to s$ is the set

of all functions that when applied to an argument of type $t$ they

of all functions that when applied to an argument of type $t$

either diverge or return a value in $s$. As such the interpretation of $t\to s$ contains

all the functions that diverge (at least) on $t$. Therefore removing $t\to s$ from a type $u$ removes from $u$ not only all the functions that when applied to a $t$ argument return a result in $s$, but also all the functions that diverge on $t$.

Ergo $t_1\setminus(t_2^+\to

...

...

@@ -379,7 +379,7 @@ that $x$ is of type $\Int$. But if we use the hypothesis generated by

the test of $fx$, that is, that $x$ is of type \Int, to check $gx$ against \Bool,

then the type deduced for $x$ is $\Empty$---i.e., the branch is never selected. In other words, we want to produce type

environmments for occurrence typing by taking into account all

hypotheses available, even when these hypotheses are formulated later

the available hypotheses, even when these hypotheses are formulated later

in the flow of control. This will be done in the type systems of

Section~\ref{sec:language} by the rule \Rule{Path} and will require at

algorithmic level to look for a fix-point solution of a function, or

...

...

@@ -395,16 +395,16 @@ we can assume that the expression $(x,y)$ is of type

$\pair{(\Int\vee\Bool)}\Int$ and put it in the type environment. But

if in $e$ there is a test like

$\ifty{x}{\Int}{{\color{darkred}(x,y)}}{(...)}$ then we do not want use

the assumption in the type environemt to type the expression $(x,y)$

the assumption in the type environment to type the expression $(x,y)$

occurring in the inner test (in red). Instead we want to give to that occurrence of the expression

$(x,y)$ the type $\pair{\Int}\Int$. This will be done by temporary

removing the type assumption about $(x,y)$ from type environment and

retyping the expression without that assumption (see rule

removing the type assumption about $(x,y)$ from the type environment and

by retyping the expression without that assumption (see rule

\Rule{Env\Aa} in Section~\ref{sec:algorules}).

\subsubsection*{Outline} In Section~\ref{sec:language} we formalize the

ideas presented in this introduction: we define types and

ideas presented in this introduction: we define the types and

expressions of our system, their dynamic semantics and a type system that

implements occurrence typing together with the algorithms that decide

whether an expression is well typed or not. Section~\ref{sec:extensions} extends our formalism to record

@@ -43,7 +43,7 @@ The subtyping relation for these types, noted $\leq$, is the one defined

by~\citet{Frisch2008} to which the reader may refer. A detailed description of the algorithm to decide it can be found in~\cite{Cas15}.

For this presentation it suffices to consider that

types are interpreted as sets of \emph{values} ({i.e., either

constants, $\lambda$-abstractions, or pair of values: see

constants, $\lambda$-abstractions, or pairs of values: see

Section~\ref{sec:syntax} right below) that have that type, and that subtyping is set

containment (i.e., a type $s$ is a subtype of a type $t$ if and only if $t$

contains all the values of type $s$). In particular, $s\to t$

...

...

@@ -116,7 +116,7 @@ $\Cx[e]\reduces\Cx[e']$.

\subsection{Static semantics}\label{sec:static}

While the syntax and reduction semantics are, on the whole, pretty

standard, for the type system we will have to introduce several

standard for the type system, we will have to introduce several

unconventional features that we anticipated in

Section~\ref{sec:challenges} and are at the core of our work. Let

us start with the standard part, that is the typing of the functional

...

...

@@ -698,7 +698,7 @@ is, for each expression $e'$ occurring in $e$, $\Refine {e,t}\Gamma$

would be the type environment that maps $e'$ into $\bigwedge_{\{\varpi\alt

\occ e \varpi\equiv e'\}}\env{\Gamma,e,t}(\varpi)$. As we

explained in Section~\ref{sec:challenges} the intersection is needed

to apply occurrence typing to expression such as

to apply occurrence typing to expressions such as

$\tcase{(x,x)}{\pair{t_1}{t_2}}{e_1}{e_2}$ where some

expressions---here $x$---occur multiple times.

...

...

@@ -706,11 +706,11 @@ In order to capture most of the type information from nested queries

the rule \Rule{Path} allows the deduction of the type of some

occurrence $\varpi$ to use a type environment $\Gamma'$ that may

contain information about some suboccurrences of $\varpi$. On the

algorithm this would correspond to apply the $\Refinef$ defined

algorithm this would correspond to applying the $\Refinef$ defined

above to an environment that already is the result of $\Refinef$, and so on. Therefore, ideally our

algorithm should compute the type environment as a fixpoint of the

function $X\mapsto\Refine{e,t}{X}$. Unfortunately, an iteration of $\Refinef$ may

not converge. As an example consider the (dumb) expression $\tcase

not converge. As an example, consider the (dumb) expression $\tcase

{x_1x_2}{\Any}{e_1}{e_2}$. If $x_1:\Any\to\Any$, then every iteration of

$\Refinef$ yields for $x_1$ a type strictly more precise than the type deduced in the

previous iteration.

...

...

@@ -828,7 +828,7 @@ The side conditions of the rules ensure that the system is syntax

directed, that is, that at most one rule applies when typing a term:

priority is given to \Rule{Eqf\Aa} over all the other rules and to

\Rule{Env\Aa} over all remaining logical rules. Type schemes are

used to account the type-multiplicity stemming from

used to account for the type-multiplicity stemming from

$\lambda$-abstractions as shown in particular by rule \Rule{Abs\Aa}

(in what follows we use the word ``type'' also for type schemes). The

subsumption rule is no longer in the system; it is replaced by: $(i)$

...

...

@@ -848,7 +848,7 @@ removing any hypothesis about $e$ from $\Gamma$, so that the deduction

of the type $\ts$ for $e$ cannot but end by a logical rule. Of course

this does not apply when the expression $e$ is a variable, since

an hypothesis in $\Gamma$ is the only way to deduce the type of a

variable, which is a why the algorithm reintroduces the classic rules

variable, which is why the algorithm reintroduces the classic rules

for variables.

The algorithmic system above is sound with respect to the deductive one of Section~\ref{sec:static}

...

...

@@ -874,13 +874,13 @@ left-hand side of another negated arrow.

For every $\Gamma$, $e$, $t$, if $\Gamma\vdash e:t$ is derivable by a rank-0 negated derivation, then there exists $n_o$ such that $\Gamma\vdashA e: t'$ and $t'\leq t$.

\end{theorem}

\noindent The use of type schemes and of possibly diverging iterations

yield a system that may seem overly complicated. But it is important

to stress that this systems is defined only to study the

yields a system that may seem overly complicated. But it is important

to stress that this system is defined only to study the

type inference system of Section~\ref{sec:static} and in particular to prod

how close we can get to a complete algorithm for it. But for the

practical application type schemes are not needed, since they are

how close we can get to a complete algorithm for it. But for

practical applications type schemes are not needed, since they are

necessary only when type cases may specify types with negative arrows

and this in practice never happens (see Footnote~\ref{foo:typecase} and Corollary~\ref{app:completeness}). This is why for our implementation we use the

and this, in practice, never happens (see Footnote~\ref{foo:typecase} and Corollary~\ref{app:completeness}). This is why for our implementation we use the

CDuce library in which type schemes are absent and functions are typed

only by intersections of positive arrows. We present the implementation in Section~\ref{sec:practical}

Where $\psi\setminus\{x\}$ is the function defined as $\psi$ but undefined on $x$ and $\psi_1\cup\psi_2$ denote component-wise union%

Where $\psi\setminus\{x\}$ is the function defined as $\psi$ but undefined on $x$ and $\psi_1\cup\psi_2$ denotes component-wise union%

\iflongversion%

, that is :

\begin{displaymath}

...

...

@@ -140,7 +140,7 @@ Simply put, this rule first collects all possible types that are deduced

for a variable $x$ during typing and then uses them to re-type the body

of the lambda under this new refined hypothesis for the type of

$x$. The re-typing ensures that that the type soundness property

carries over this new rule.

carries over to this new rule.

This system is enough to type our case study for the case $\tau$

defined as \code{Real|Bool}. Indeed the analysis of the body yields

...

...

@@ -182,12 +182,12 @@ function (see Appendix~\ref{app:optimize} for an even more precise rule).

In summary, in order to type a

function we use the type-cases on its parameter to partition the

domain of the function and we type-check the function on each single partitions rather

than on the union thereof. Of course, we could use much a finer

than on the union thereof. Of course, we could use a much finer

partition: the finest (but impossible) one is to check the function

against the singleton types of all its inputs. But any finer partition

would return, in many cases, not a much better information, since most

partitions would collapse on the same return type: type-cases on the

parameter are the tipping points that are likely make a difference, by returning different

parameter are the tipping points that are likely to make a difference, by returning different

types for different partitions thus yielding more precise typing. But they are not the only such tipping points: see rule \Rule{OverApp} in Section~\ref{sec:practical}.