@@ -17,10 +17,11 @@ performed by the CDuce language. Some other will require more

work. For example, our analysis cannot handle flow of information. In

particular, the result of a type test can flow only to the branches

but not outside the test. As a consequence the current system cannot

type a let binding such as

\begin{alltt}\color{darkblue}

let x = (y\(\in\)Int)?`yes:`no in (x\(\in\)`yes)?y+1:not(y)

\end{alltt}

type a let binding such as \code{

%\begin{alltt}\color{darkblue}

let x = (y\(\in\)Int)?`yes:`no in (x\(\in\)`yes)?y+1:not(y)%

%\end{alltt}

}

which is clearly safe when $y:\Int\vee\Bool$. Nor can this example be solved by partial evaluation since we do not handle nesting of tests in the condition\code{( ((y\(\in\)Int)?`yes:`no)\(\in\)`yes )? y+1 : not(y)},

and both are issues that system by~\citet{THF10} can handle. We think that it is possible

to reuse some of their ideas to perform a information flow analysis on top of

...

...

@@ -30,13 +31,12 @@ Some of the exensions we hinted to in Section~\ref{sec:practical}

warrant a formal treatment. In particular, the rule [{\sc OverApp}]

only detects the application of an overloaded function once, when

type-checking the body of the function against the coarse input type

(that is, $\psi$ is computed only once). But we could repeat this

process whilst type-checking the inferred arrows (that is we would

(i.e., $\psi$ is computed only once). But we could repeat this

process whilst type-checking the inferred arrows (i.e., we would

enrich $\psi$ while using it to find the various arrow types of the

lambda abstraction). Clearly, if untamed, such a process may never

reach a fix point. Studying whether this iterative refining can be

made to converge, and foremost whether it is of use in practice is one

of our objectives.

made to converge and, foremost, whether it is of use in practice is among our objectives.

But the real challenges that lie ahead are the handling of side

effects and the addition of polymorphic types. Our analysis works in a

...

...

@@ -74,3 +74,4 @@ the technical details are quite involved, especially when considering

functions typed by intersection types and/or when integrating gradual

typing. This deserves a whole pan of non trivial research that we plan to

The previous analysis already covers a large pan of realistic cases. For instance, the analysis already handles list data structures, since products and recursive types can encode them as right associative nested pairs, as it is done in the language CDuce~\cite{BCF03} (e.g., $X =\textsf{Nil}\vee(\Int\times X)$ is the type of the lists of integers). And even more since the presence of union types makes it possible to type heterogeneous lists whose content is described by regular expressions on types as proposed by~\citet{hosoya00regular}. Since the main application of occurrence typing is to type dynamic languages, then it is worth showing how to extend our work to records. We use the record types as they are defined in CDuce and which are obtained by extending types with the following two type constructors:

\[

\begin{array}{lrcl}

\textbf{Types}& t & ::=&\record{\ell_1=t \ldots\ell_n=t}{t}\alt\Undef

\end{array}

\]

The previous analysis already covers a large pan of realistic cases. For instance, the analysis already handles list data structures, since products and recursive types can encode them as right associative nested pairs, as it is done in the language CDuce~\cite{BCF03} (e.g., $X =\textsf{Nil}\vee(\Int\times X)$ is the type of the lists of integers). And even more since the presence of union types makes it possible to type heterogeneous lists whose content is described by regular expressions on types as proposed by~\citet{hosoya00regular}. Since the main application of occurrence typing is to type dynamic languages, then it is worth showing how to extend our work to records. We use the record types as they are defined in CDuce and which are obtained by extending types with the following two type constructors:\centerline{\(\textbf{Types} ~~ t ~ ::= ~ \record{\ell_1=t \ldots\ell_n=t}{t}\alt\Undef\)}

@@ -112,14 +112,15 @@ Depending the actual $t$ and the static types of $x_1$ and $x_2$, we

can make type assumptions for $x_1$, for $x_2$, \emph{and} for the application $x_1x_2$

when typing $e_1$ that are different from those we can make when typing

$e_2$. For instance, suppose $x_1$ is bound to the function \code{foo} defined in \eqref{foo2}. Thus $x_1$ has type $(\Int\to\Int)\wedge(\String\to\String)$ (we used the syntax of the types of Section~\ref{sec:language} where unions and intersections are denoted by $\vee$ and $\wedge$).

Then it is not hard to see that the expression\footnote{Most of the following expressions are given just for the sake of example. Determining the type of expressions other than variables is interesting for constructor expressions but less so for destructors such as applications, projections, and selections: any reasonable programmer would not repeat the same application twice, it would store its result in a variable. This becomes meaningful when we introduce constructor such as pairs, as we do for instance in the expression in~\eqref{pair}.}

Then it is not hard to see that the expression\footnote{This and most of the following expressions are just given for the sake of example. Determining the type of expressions other than variables is interesting for constructors but less so for destructors such as applications, projections, and selections: any reasonable programmer would not repeat the same application twice, (s)he would store its result in a variable. This becomes meaningful when we introduce constructor such as pairs, as we do for instance in the expression in~\eqref{pair}.}

%

\begin{equation}\label{mezzo}

\texttt{let }x_1 \texttt{\,=\,}\code{foo}\texttt{ in }\ifty{x_1x_2}{\Int}{((x_1x_2)+x_2)}{\texttt{42}}

\end{equation}

%

is well typed with type $\Int$: when typing the branch ``then'' we

know that the test $x_1x_2\in\Int$ succeeded and that, therefore, not

know that the test $x_1x_2\in\Int$

suceeded and that, therefore, not

only $x_1x_2$ is of type \Int, but also that $x_2$ is of type $\Int$: the other possibility,

$x_2:\String$, would have made the test fail.

For~\eqref{mezzo} we reasoned only on the type of the variables in the ``then'' branch but we can do the same

...

...

@@ -253,6 +254,7 @@ $\worra{t_1} t$, the set of arguments that when applied to a function

of type $t_1$\emph{may} return a result in $t$; then we can refine the

type of $e_2$ as $t_2^+\eqdeftiny t_2\wedge(\worra{t_1} t)$ in the ``then'' branch (we call it the \emph{positive} branch)

and as $t_2^-\eqdeftiny t_2\setminus(\worra{t_1} t)$ in the ``else'' branch (we call it the \emph{negative} branch).

\iflongversion

As a side remark note\marginpar{\tiny\beppe{Remove if space is needed}}

that the set $\worra{t_1} t$ is different from the set of elements that return a

result in $t$ (though it is a supertype of it). To see that, consider

...

...

@@ -262,7 +264,7 @@ Boolean and when applied to an integer return either an integer or a

string; then we have that $\dom{t_1}=\Int\vee\Bool$ and $\worra{t_1}\String=\Int$,

but there is no (non-empty) type that ensures that an application of a

function in $t_1$ will surely yield a $\String$ result.

\fi

Once we have determined $t_2^+$, it is then not very difficult to refine the

type $t_1$ for the positive branch, too. If the test succeeded, then we know two facts: first,

In this section we formalize the ideas we outlined in the introduction. We start by the definition of types followed by the language and its reduction semantics. The static semantics is the core of our work: we first present a declarative type system that deduces (possibly many) types for well-typed expressions and then the algorithms to decide whether an expression is well typed or not.

\subsection{Types}

...

...

@@ -374,12 +376,12 @@ of rules.

\qquad

\end{mathpar}

\begin{mathpar}

%\\ \\

%\\ \\

\Infer[PAppR]

{\pvdash\Gamma e t \varpi.0:\arrow{t_1}{t_2}\\\pvdash\Gamma e t \varpi:t_2'}

{\pvdash\Gamma e t \varpi.1:\neg t_1 }

{ t_2\land t_2' \simeq\Empty}

\\

\\

\Infer[PAppL]

{\pvdash\Gamma e t \varpi.1:t_1 \\\pvdash\Gamma e t \varpi:t_2 }

{\pvdash\Gamma e t \varpi.0:\neg (\arrow{t_1}{\neg t_2}) }

...

...

@@ -634,8 +636,6 @@ the most general that can be deduced by assuming that $e\in t$ succeeds. For tha

That is, $\tyof{e}{\Gamma}=\ts$ if and only if $\Gamma\vdashA e:\ts$ is provable.

We start by defining the algorithm for each single occurrence, that is for the deduction of $\pvdash\Gamma e t \varpi:t'$. This is obtained by defining two mutually recursive functions $\constrf$ and $\env{}{}$:

@@ -157,7 +157,7 @@ since, \texttt{or\_} has type\\

(\lnot\True\to\lnot\True\to\False)$}

We consider \True, \Any and $\lnot\True$ as candidate types for

\texttt{x} which, in turn allows us to deduce a precise type given in the table. Finally, thanks to this rule it is no longer necessary to force refinement by using a type case. As a consequence we can define the functions \texttt{and\_} and \texttt{or\_} more naturally as:

\begin{alltt}\color{darkblue}

\begin{alltt}\color{darkblue}\morecompact

let and_ = fun (x : Any) -> fun (y : Any) -> not_ (or_ (not_ x) (not_ y))

let xor_ = fun (x : Any) -> fun (y : Any) -> and_ (or_ x y) (not_ (and_ x y))

@@ -185,7 +183,7 @@ against the singleton types of all its inputs. But any finer partition

would return, in many cases, not a much better information, since most

partitions would collapse on the same return type: type-cases on the

parameter are the tipping points that are likely make a difference, by returning different

types for different partitions thus yielding more precise typing (but see also our discussion on future work in Section~\ref{sec:conclusion}).

types for different partitions thus yielding more precise typing. But they are not the only such tipping points: see rule \Rule{OverApp} in Section~\ref{sec:practical}.