In this work we presented to core of our analysis of occurrence

typing, extended it to record types and a proposed a couple of novel

applications of the theory, namely the inference of

intersection types for functions and a static analysis to reduce the number of

casts inserted when compiling gradually-typed programs.

There is still a lot of work to do to fill the gap with real-word

programming languages. Some of it should be quite routine such as the

encoding of specific language constructions (e.g., \code{isInt},

\code{typeof},...), the handling of more complex

kinds of checks (e.g., generic Boolean expression, multi-case

type-checks) and even encompass sophisticated type matching as the one

performed by the language CDuce. Some other will require more

work. For example, our analysis cannot handle flow of information. In

particular, the result of a type test can flow only to the branches

but not outside the test. As a consequence the current system cannot

type a let binding such as

\begin{alltt}\color{darkblue}

let x = (y\(\in\)Int)?`yes:`no in (x\(\in\)`yes)?y+1:not(y)

\end{alltt}

which is clearly safe when $y:\Int\vee\Bool$. Nor can this example be solved by partial evaluation since we do not handle nesting of tests in the condition\code{( ((y\(\in\)Int)?`yes:`no)\(\in\)`yes )? y+1 : not(y)},

and both are issues that system by~\citet{THF10} can handle. We think that it is possible

to reuse some of their ideas to perform a information flow analysis on the top of

our system to remove these limitations.

%

Information flow analysis would also be useful to improve

inference of intersection types presented in

Section~\ref{sec:refining}: there we said that type cases in the body of a

function are the tipping points that may change the type of the result

of the function; but they are not the only ones, the other being the

applications of overloaded functions. Therefore we plan to

detect the overloaded functions the parameter of an outer function

flows to, so as to use the partition of their domains to perform a

finer grained analysis of the outer function's type.

But the real challenges that lie ahead are the handling of side

effects and the addition of polymorphic types. Our analysis works in a

pure systems and extending it to cope with side-effects is not

immediate. We plan to do it by defining effect systems or by

performing some information flow analysis typically by enriching the

one we plan to develop for the limitations above. But our plan is not

more defined than that. For polymorphism, instead, we can easily adapt

the main idea of this work to the polymorphic setting. Indeed, the

main idea is to remove from the type of an expression all

the results of the expression that would make some test fail (or

succeed, if we are typing a negative branch). This is done by

applying an intersection to the type of the expression, so as to keep

only the values that may yield success (or failure) of the test. For

polymorphism the idea is the same, with the only difference that

besides applying an intersection we can also apply an

instantiation. The idea is to single out the two most general type

substitutions for which some test may succeed and fail, respectively, and apply these

substitutions to refine the types of the corresponding occurrences

in the ``then'' and ``else'' branches. Concretely, consider the test

$x_1x_2\in t$ where $t$ is a closed type and $x_1$, $x_2$ are

variables of type $x_1: s\to t$ and $x_2: u$ with $u\leq s$. For the

positive branch we first check whether there exists a type

substitution $\sigma$ such that $t\sigma\leq\neg\tau$. If it does not

exists, then this means that for all possible assignments of

polymorphic type variables of $s\to t$, the test may succeed, that is,

the success of the test does not depend on the particular instance of

$s\to t$ and, thus, it is not possible to pick some substitution for

refining the occurrence typing. If it exists, then

we find a type substitution $\sigma_\circ$ such that $\tau\leq

t\sigma_\circ$ and we refine for the

positive branch the types of $x_1$, of $x_2$, and of $x_1x_2$ by applying $\sigma_\circ$ to their types. While the

idea is clear (see Appendix~\ref{app:roadmap} for a more detailed explanation),

the technical details are quite involved, especially when considering

functions typed by intersection types and/or when integrating gradual

typing. This deserves a whole pan of non trivial research that we plan to

The previous analysis already covers a large pan of realistic cases. For instance, the analysis already works for list data structures, since products and recursive types are enough is enough to encode them as right associative nested pairs, as it is done in the language CDuce~\cite{BCF03} (e.g., $X =\textsf{Nil}\vee(\Int\times X)$ is the type of the lists of integers). And even more since the presence of union types makes it possible to type heterogeneous lists whose content is described by regular expressions on types as proposed by~\citet{hosoya00regular}.

Since the main application of occurrence typing is to type dynamic languages, then what is really missing are record types.

\beppe{Compare with path expressions of ~\citet{THF10}}

First of all, we define:

The previous analysis already covers a large pan of realistic cases. For instance, the analysis already handles list data structures, since products and recursive types can encode them as right associative nested pairs, as it is done in the language CDuce~\cite{BCF03} (e.g., $X =\textsf{Nil}\vee(\Int\times X)$ is the type of the lists of integers). And even more since the presence of union types makes it possible to type heterogeneous lists whose content is described by regular expressions on types as proposed by~\citet{hosoya00regular}. Since the main application of occurrence typing is to type dynamic languages, then it is worth showing how to extend our work to records. We use the record types as they are defined in CDuce and which are obtained by extending types with the following two type constructors:

\[

\begin{array}{lrcl}

\textbf{Types}& t & ::=&\record{\ell_1=t \ldots\ell_n=t}{t}\alt\Undef

\end{array}

\]

where $\ell$ ranges over an infinite set of labels $\Labels$, $\Undef$

is a special singleton type whose only value is the constant

$\undefcst$ which is a constant not in $\Any$. The type

$\record{\ell_1=t_1\ldots\ell_n=t_n}{t}$ is a \emph{quasi-constant

function} that maps every $\ell_i$ to the type $t_i$ and every other

$\ell\in\Labels$ to the type $t$ (all the $\ell_i$'s must be

distinct). Quasi constant functions are the internal representation of

record types in CDuce. These are not visible to the programmer who can use

only two specific forms of quasi constant functions, open record types and closed record types, provided by the

following syntactic sugar and that form the \emph{record types} of our language\footnote{Note that in the definitions ``$\ldots{}$'' is meta-syntax to denote the presence of other fields while in the open records ``{\large\textbf{..}}'' is the syntax that distinguishes them from closed ones.}

\begin{itemize}

\item$\Labels$, a possibly infinite set of labels, ranged over by $\ell$.

\itemA constant $\undefcst$ that is not in $\Any$. This value is held by labels that are absent from a record. We denote by $\Undef$the singleton type that only contains $\undefcst$.

\item$\crecord{\ell_1=t_1, \ldots, \ell_n=t_n}$ for $\record{\ell_1=t_1\ldots\ell_n=t_n}{\Undef}$ (closed records).

\item$\orecord{\ell_1=t_1, \ldots, \ell_n=t_n}$ for $\record{\ell_1=t_1\ldots\ell_n=t_n}{\Any\vee\Undef}$(open records).

\end{itemize}

plus the notation $\mathtt{\ell\eqq t}$ to denote optional fields,

which corresponds to using in the quasi-constant function notation the

field $\ell= t \vee\Undef$.

We add to types the following production

For what concern expressions, we adapt CDuce records to our analysis. In particular records are built starting from the empty record expressions \erecord{} and by adding, updating, or removing fields:

\textbf{Expr}&e& ::=&\erecord{}\alt\recupd e \ell e \alt\recdel e \ell\alt e.\ell

\end{array}

\]

with $\forall i. \ell_i \in\Labels$ (all distincts).

The type $\record{\ell_1=t_1\ldots\ell_n=t_n}{t}$ can be seen as a quasi-constant function that maps every $\ell_i$ to the type $t_i$ and every other $\ell\in\Labels$ to the type $t$.

We add the following syntactic sugars:

\begin{itemize}

\item$\crecord{\ell_1=t_1\ldots\ell_n=t_n}$ for $\record{\ell_1=t_1\ldots\ell_n=t_n}{\Undef}$ (closed records).

\item$\orecord{\ell_1=t_1\ldots\ell_n=t_n}$ for $\record{\ell_1=t_1\ldots\ell_n=t_n}{\Any\vee\Undef}$ (open records).

\item$\mathtt{\ell\eqq t}$ for $\ell= t \vee\Undef$.

\end{itemize}

in particular $\recdel e \ell$ deletes the field $\ell$ from $e$, $\recupd e \ell e'$ adds the field $\ell=e'$ to the record $e$ (deleting any existing $\ell$ field), while $e.\ell$ is field selection with the reduction:

\(\erecord{...,\ell=e,...}.\ell\ \reduces\ e\).

We define the following operators on record types:

To define record type subtyping and record expression type inference we need the following operators on record types (refer to~\citet{alainthesis} for more details):

\begin{eqnarray}

\proj\ell t & = &\begin{array}{ll}\min\{ u \alt t\leq\orecord{\ell=u}\}&\text{if } t \leq\orecord{\ell = \Any}\\&\text{undefined otherwise}\end{array}\\

t_1 + t_2 & = &\min\left\{

\begin{split}

&u \alt\forall\ell\in\Labels.\\

&\left\{\begin{array}{ll}

\proj\ell t & = &\left\{\begin{array}{ll}\min\{ u \alt t\leq\orecord{\ell=u}\}&\text{if } t \leq\orecord{\ell = \Any}\\\text{undefined}&\text{otherwise}\end{array}\right.\\

t_1 + t_2 & = &\min\left\{

u \quad\bigg|\quad\forall\ell\in\Labels.\left\{\begin{array}{ll}

\proj\ell u \geq\proj\ell{t_2}&\text{ if }\proj\ell{t_2}\leq\neg\Undef\\

\proj\ell u \geq\proj\ell{t_1}\vee (\proj\ell{t_2}\setminus\Undef) &\text{ otherwise}

\end{array}\right.

\end{split}

\end{array}\right\}

\right\}\\

\recdel t \ell& = &\min\left\{ u \alt\forall\ell' \in\Labels. \left\{\begin{array}{ll}

\recdel t \ell& = &\min\left\{ u \quad\bigg|\quad\forall\ell' \in\Labels. \left\{\begin{array}{ll}

\proj{\ell'} u \geq\Undef&\text{ if }\ell' = \ell\\

\proj{\ell'} u \geq\proj{\ell'} t &\text{ otherwise}

\end{array}\right.\right\}

\end{array}\right\}\right\}

\end{eqnarray}

Then two record types $t_1$ and $t_2$ are in subtyping relation, $t_1\leq t_2$, if and only if $\forall\ell\in\Labels. \proj\ell{t_1}\leq\proj\ell{t_2}$. In particular $\orecord{\!\!}$ is the largest record type.

We have the following subtyping rules:\\

If $t_1$ and $t_2$ are record types, $t_1\leq t_2$ iff $\forall\ell\in\Labels. \proj\ell{t_1}\leq\proj\ell{t_2}$.\\

We also add the following expressions

\[

\begin{array}{lrcl}

\textbf{Expressions}& e & ::=& e.\ell\alt\crecord{}\alt\recupd e \ell e \alt\recdel e \ell

\end{array}

\]

We extend the paths with the following values: $\varpi\in\{\ldots,a_\ell,u_\ell^1,u_\ell^2,r_\ell\}^*$, with:

%% \beppe{Je ne suis absolument pas convaincu par Ext1. Pour moi il devrait donner le field avec le type de $u_\ell^2$ autrement dit pour moi les regles correctes sont:

%% \begin{mathpar}

%% \Infer[PExt1]

%% { \pvdash \Gamma e t \varpi:t_1 \and \pvdash \Gamma e t \varpi.u_\ell^2:t_2}

%% { \pvdash \Gamma e t \varpi.u_\ell^1: (\recdel {t'} \ell) + \crecord{\ell = t_2}}

Notice that the effect of doing $\recdel t \ell+\crecord{\ell\eqq

\Any}$ corresponds to setting the field $\ell$ of the (record) type

$t$ to the type $\Any\vee\Undef$, that is, to the type of all

undefined fields in an open record. So \Rule{PDel} and \Rule{PUpd1}

mean that if we remove, add, or redefine a field $\ell$ in an expression $e$

then all we can deduce for $e$ is that its field $\ell$ is undefined: since the original field was destroyed we do not have any information on it apart from the static one.

By $\constr{\varpi.u_\ell^1}{\Gamma,e,t}$---i.e., by \Rule{Ext1}, \Rule{PTypeof}, and \Rule{PInter}---the type for $x$ in the positive branch is $((\orecord{a=\Int, b=\Bool}\vee\orecord{a=\Bool, b=\Int})\land\orecord{a=\Int})+\crecord{a\eqq\Any}$.

It is equivalent to the type $\orecord{b=\Bool}$, and thus we can deduce that $x.b$ has the type $\Bool$.

\beppe{Compare with path expressions of ~\citet{THF10}}

\subsection{Refining function types}\label{sec:refining}

\input{refining}

\subsection{Integrating gradual typing}

\input{gradual}

\subsection{Limitations}

In general we cannot handle flow of information. In particular, the result of a type test can flow only to the branches but not outside the test. As a consequence:

\begin{itemize}

\item Cannot handle let bindings:

\begin{alltt}

let x = (y\(\in\)Int)?`yes:`no

in (x\(\in\)`yes)?y+1:not(y)

\end{alltt}

which is well typed for $y:\Int\vee\Bool$

\item The previous example cannot be solved by partial evaluation since we cannot handle nesting of ifs in the condition:

@@ -10,38 +10,33 @@ that a value crossing the barrier is correctly typed.

Occurrence typing and gradual typing are two complementary disciplines

which have a lot to gain to be integrated, although we are not aware

of any work in this sense. Moreover, the integration of gradual typing with

set-theoretic types has already been studied by~\citet{castagna2019gradual},

which allows us to keep the same formalism. In a sense, occurrence typing is a

of any study in this sense. We study it for the formalism of Section~\ref{sec:language} for which the integration of gradual typing was defined by~\citet{castagna2019gradual}.

In a sense, occurrence typing is a

discipline designed to push forward the frontiers beyond which gradual

typing is needed, thus reducing the amount of runtime checks needed. For

instance, the example at the beginning can be

instance, the the JavaScript code of~\eqref{foo} and~\eqref{foo2} in the introduction can be

typed by using gradual typing:

\begin{alltt}\color{darkblue}

function foo(x\textcolor{darkred}{ : \dyn}) \{

(typeof(x) === "number")? x++ : x.length

function foo(x\textcolor{darkred}{ : \pmb{\dyn}}) \{

where {\Cast{$t$}{$e$}} is a type-cast.\footnote{Intuitively, \code{\Cast{$t$}{$e$}} is

syntactic sugar for \code{(typeof($e$)==="$t$")? $e$ : (throw "Type

error")}. Not exactly though, since to implement compilation \emph{à la} sound gradual typing we need cast on function types.}

where {\Cast{$t$}{$e$}} is a type-cast that dynamically checks whether the value returned by $e$ has type $t$.\footnote{Intuitively, \code{\Cast{$t$}{$e$}} is

syntactic sugar for \code{(typeof($e$)==="$t$")\,?\,$e$\,:\,(throw "Type

error")}. Not exactly though, since to implement compilation \emph{à la} sound gradual typing is is necessary to use casts on function types that need special handling.}

%

We have already seen in the introduction that by using occurrence

typing combined with a union type instead of the gradual type \dyn

for parameter annotation, we can avoid the insertion of any cast, at the cost

of some additional type annotations.

But occurrence typing can be used also on the gradually typed code. If we use

occurrence typing to type the gradually-typed version of \code{foo}, this

allows the system to avoid inserting the first cast

We already saw that thanks to occurrence typing we can annotate the parameter \code{x} by \code{number|string} instead of \dyn{} and avoid the insertion of any cast.

But occurrence typing can be used also on the gradually typed code in order to statically detect the insertion of useless casts. Using

occurrence typing to type the gradually-typed version of \code{foo} in~\eqref{foo3}, allows the system to avoid inserting the first cast

\code{\Cast{number}{x}} since, thanks to occurrence typing, the

occurrence of \code{x} at issue is given type \code{number} (the

second cast is still necessary however). But removing this cast is far

second cast is still necessary however). But removing only this cast is far

from being satisfactory, since when this function is applied to an integer

there are some casts that still need to be inserted outside of the function.

The reason is that the compiled version of the function

...

...

@@ -49,9 +44,9 @@ has type \code{\dyn$\to$number}, that is, it expects an argument of type

\dyn, and thus we have to apply a cast (either to the argument or

to the function) whenever this is not the case. In particular, the

application \code{foo(42)} will be compiled as

\code{foo(\Cast{\dyn}{42})}. Now the main problem with such a cast is not

\code{foo(\Cast{\dyn}{42})}. Now, the main problem with such a cast is not

that it produces some unnecessary overhead by performing useless

checks (a cast to \dyn can easily be detected and safely ignored at runtime).

checks (a cast to \dyn{} can easily be detected and safely ignored at runtime).

The main problem is that the combination of such a cast with type-cases

will lead to unintuitive results under the standard operational

semantics of type-cases and casts.

...

...

@@ -62,7 +57,7 @@ subtype of $t$. In standard gradual semantics, \code{\Cast{\dyn}{42}} is a value

And this value is of type \code{\dyn}, which is not a subtype of \code{number}.

Therefore the check in \code{foo} would fail for \code{\Cast{\dyn}{42}}, and so

would the whole function call.

Although this behavior is sound, this is the opposite of

Although this behavior is type safe, this is the opposite of

what every programmer would expect: one would expect the test

\code{(typeof($e$)==="number")} to return true for \code{\Cast{\dyn}{42}}

and false for, say, \code{\Cast{\dyn}{true}}, whereas

...

...

@@ -71,11 +66,11 @@ the standard semantics of type-cases would return false in both cases.

A solution is to modify the semantics of type-cases, and in particular of

\code{typeof}, to strip off all the casts in a value, even nested ones. This

however adds a new overhead at runtime. Another solution is to simply accept

this counter-intuitive result, which has the additional benefit of promoting

this counter-intuitive result, which has at least the benefit of promoting

the dynamic type to a first class type, instead of just considering it as a

directive to the front-end. Indeed, this approach allows to dynamically check

directive to the front-end. Indeed, this would allow to dynamically check

whether some argument has the dynamic type \code{\dyn} (i.e., whether it was

applied to a cast to such a type, simply by \code{(typeof($e$)==="\dyn")}.

applied to a cast to such a type) simply by \code{(typeof($e$)==="\dyn")}.

Whatever solution we choose it is clear that in both cases it would be much

better if the application \code{foo(42)} were compiled as is, thus getting

rid of a cast that at best is useless and at worse gives a counter-intuitive and

...

...

@@ -83,22 +78,22 @@ unexpected semantics.

This is where the previous section about refining function types comes in handy.

To get rid of all superfluous casts, we have to fully exploit the information

provided to us by occurrence typing and deduce for the compiled function the type

provided to us by occurrence typing and deduce for the function in~\eqref{foo3} the type

number)$\to$number)}, so that no cast is inserted when the

number)$\to$string)}, so that no cast is inserted when the

function is applied to a number.

To achieve this, we simply modify the typing rule for functions that we defined

in the previous section to accommodate for gradual typing. For every gradual type

$\tau$, we define $\tau^*$ as the type obtained from $\tau$ by replacing all

covariant occurrences of \dyn by \Any\and all contravariant ones by \Empty. The

type $\tau^*$ can be seen as the \emph{maximal} interpretation of $\tau$, that is,

any expression that can safely be cast to $\tau$ is of type $\tau^*$. In

in the previous section to accommodate for gradual typing. Let $\sigma$ and $\tau$ range over \emph{gradual types}, that is the types produced by the grammar in Definition~\ref{def:types} to which we add \dyn{} as basic type (see~\citet{castagna2019gradual} for the definition of the subtyping relation on these types). For every gradual type

$\tau$, define $\tauUp$ as the (non graudal) type obtained from $\tau$ by replacing all

covariant occurrences of \dyn{} by \Any{} and all contravariant ones by \Empty. The

type $\tauUp$ can be seen as the \emph{maximal} interpretation of $\tau$, that is,

every expression that can safely be cast to $\tau$ is of type $\tauUp$. In

other words, if a function expects an argument of type $\tau$ but can be

typed under the hypothesis that the argument has type $\tau^*$, then no casts

are needed, since every cast that succeeds will always be to a subtype of

$\tau^*$. Taking advantage of this property, we modify the typing rule for

functions as follows:

typed under the hypothesis that the argument has type $\tauUp$, then no casts

are needed, since every cast that succeeds will be a subtype of

$\tauUp$. Taking advantage of this property, we modify the rule for

\beppe{Problem: how to compile functions with intersection types, since their body is typed several distinct types. I see two possible solutions: either merge the casts of the various typings (as we did in the compilation of polymorphic functions for semantic subtyping) or allow just one gradual type in the intersection when a function is explicitly typed (reasonable since, why would you use more gradual types in an intersection?)}

Typescript and Flow are extensions of JavaScript that allow the programmer to specify in the code type annotations used to statically type-check the program. For instance, the following function definition is valid in both languages

\begin{alltt}\color{darkblue}

function foo(x\textcolor{darkred}{ : number | string}) \{