Commit 44b5089b authored by Giuseppe Castagna's avatar Giuseppe Castagna
Browse files

splitting section on refinement

parent fc9c6b84
......@@ -208,6 +208,9 @@ Finally, we add the following typing rules:
{\recupd{e_1}{\ell}{e_2} \not\in\dom\Gamma}
\end{mathpar}
\subsection{Refining function types}
\input{refining}
\subsection{Integrating gradual typing}
......@@ -297,154 +300,7 @@ order to avoid several passes on the parse tree).
thus, different results. This is not nice}.
The case of study is the following function
\begin{alltt}\color{darkblue}
function (x \textcolor{darkred}{: \(\tau\)}) \{
(x \(\in\) Real) ?
? \{ (x \(\in\) Int) ? succ x : sqrt x \}
: \{ \(\neg\)x \}
\}
\end{alltt}
first we consider the case without gradual typing. In particular when
$\tau$ is \code{Real|Bool} (we suppose that \code{Int} is a subtype of \code{Real}) we want to deduce for this function the
type
\code{$(\Int\to\Int)\wedge(\Real\backslash\Int\to\Complex)\wedge(\Bool\to\Bool)$}. No
cast is inserted, of course.
Then we consider gradual typing and in particular when $\tau$ is
\code{?}. Then we want to deduce for this function the type
\code{$(\Int\to\Int)\wedge(\Real\backslash\Int\to\Complex)\wedge(?\backslash\Real\to\Bool)$},
and a cast is inserted on the last occurrence of \code x\ which is
compiled as \code{\(\neg\)(\Cast{Bool}{x})}.
If we omit $\tau$ and imagine that this is syntactic sugar for the
parameter to be typed with \Any\ (we do not consider polymorphism),
then the function must be rejected (since it tries to type
\code{\(\neg\)x} under the assumption that \code x\ has type
\code{$\neg\Real$}. Notice that this last syntactic sugar (i.e., typing function parameter by $\Any$ when type annotation are omitted) allows us to
capture user-defined discrimination as defined by ~\citet{THF10}
since, for instance
\begin{alltt}\color{darkblue}
let is_int x = (x\(\in\)Int)? true : false
in if is_int z then z+1 else 42
\end{alltt}
is well typed since the function \code{is\_int} is given type $(\Int\to\True)\wedge(\neg\Int\to\False)$
Here is how to do it first for non-gradual types.
First we define the auxiliary deduction system that collects type information on the variable lambda abstracted (i.e., those in the domain of $\Gamma$):
\begin{mathpar}
\Infer[STL]
{
}
{ \Gamma \vdash e \triangleright\{\Gamma\} }
{ e\in STL}
\\
\Infer[Abs]
{\Gamma,x:s\vdash e\triangleright\psi}
{
\Gamma\vdash\lambda x:s.e\triangleright\psi\setminus\{x\}
}
{}
\Infer[App]
{
\Gamma \vdash e_1\triangleright\psi_1 \\
\Gamma\vdash e_2\triangleright\psi_2
}
{ \Gamma \vdash {e_1}{e_2}\triangleright\psi_1\cup\psi_2 }
{}
\\
\Infer[If]
{\Gamma\vdash e\triangleright\psi_\circ\\
\Gamma, \Gamma^+_{\Gamma,e,t}\vdash e_1\triangleright\psi_1\\
\Gamma, \Gamma^-_{\Gamma,e,t}\vdash e_2\triangleright\psi_2}
{\Gamma\vdash \ite e t {e_1}{e_2}\triangleright\psi_\circ\cup\psi_1\cup\psi_2}
{}
\\
\Infer[Pair]
{\Gamma \vdash e_1\triangleright\psi_1 \and \Gamma \vdash e_2\triangleright\psi_2}
{\Gamma \vdash (e_1,e_2)\triangleright\psi_1\cup\psi_2}
{}
\Infer[Proj]
{\Gamma \vdash e\triangleright\psi}
{\Gamma \vdash \pi_i e\triangleright\psi}
{}
\\\end{mathpar}
Where
\begin{enumerate}
\item STL is the set of expressions in simply-typed lambda calculus (i.e., expression without any type-case occurring in them),
\item $\{\Gamma\}$ is the function that maps $\Gamma(e)$ into $\{\Gamma(e)\}$ if $e$ is a variable and undefined otherwise
\item $\psi$ ranges on functions that map variables into set of types, $\psi\setminus\{x\}$ is the function defined as $\psi$ but undefined on $x$.
\item $\psi_1\cup\psi_2$ denote component-wise union.
\end{enumerate}
And now all we need to do is to modify the typing rule for lambdas as follows
\begin{mathpar}
\Infer[Abs]
{\Gamma,x:t\vdash e\triangleright\psi\and \forall i\in I\quad \Gamma,x:s_i\vdash e:t_i }
{
\Gamma\vdash\lambda x:t.e:\textstyle\bigwedge_{i\in I}s_i\to t_i
}
{\psi(x)=\{s_i\alt i\in I\}}
\end{mathpar}
Note the invariant that the domain of $\psi$ is always equal to the domain of $\Gamma$ restricted to variables. It is not too difficult to generalize this rule when the lambda is typed by an intersection type.
This system is enough to type our case study for the case $\tau$ defined as \code{Real|Bool}. Indeed the analysis of the body yields $\psi(x)=\{\Int,\Real\setminus\Int\}$ for the branch \code{(x $\in$ Int) ? succ x : sqrt x} and, since \code{$(\Bool\vee\Real)\setminus\Real = \Bool$}, yields $\psi(x)=\{\Bool\}$ for the branch \code{$\neg$x}. So the function will be checked for the input types $\Int$, $\Real\setminus\Int$, and \Bool, yielding the expected result.
{\color{darkred} RANDOM THOUGHTS:
A delicate point is the definition of the union
$\psi_1\cup\psi_2$. The simplest possible definition is the
component-wise union. This definition is enough to avoid the problem
of typecases on casted values and it is also enough to type our case
study, as we showed before. However if we want a more precise typing
discipline we may want to consider a more sophisticated way of
combining the information collected by the various $\psi$. For
instance, consider the following pair
\[\code{( \ite x s {e_1}{e_2} , \ite x t {e_3}{e_4} )}\]
Then we have $x:\Any\vdash \code{( \ite x s {e_1}{e_2} , \ite x t
{e_3}{e_4} )}\triangleright x\mapsto\{s,\neg s, t, \neg t\}$. However if this
code is the body of a function with parameter $x$, then it may be
tempting to try to produce a finer-grained analysis: for example,
instead of checking as input type just $s$ and $t$ one could check
instead $s\setminus t$, $t\setminus s$, and $s\wedge t$, whenever these
three types are not empty. This can be obtained by defining the union
operation as follows:
\[ (\psi_0\cup\psi_1)(x)=\{ t \alt \exists t_1\in\psi_1(x), t_2\in\psi_2(x), t=t_1\setminus t_2 \text{ or } t=t_2\setminus t_1 \text{ or } t=t_1\wedge t_2\text{ and } t\not\simeq\Empty\}\]
Do we really gain in precision? I think the gain is minimum. All we may obtain just come from a polymorphic use of the variable, but we can hardly gain more. Probably it is not worth the effort. As a concrete case consider
\[\code{function x \{ ( \ite x {\texttt{String|Bool}} {x}{x} , \ite x {\texttt{Bool|Int}} {x}{x} )} \}\]
So what?
A simple solution would be to define the union as follows
\begin{equation}
(\psi_0\cup\psi_1)(x)=
\left\{\begin{array}{ll}
\psi_i(x) &\text{ if }\psi_i(x)\subseteq\psi_{((i+1)\,\text{mod}\,2)}(x)\\
\psi_0(x)\cup\psi_1(x) &\text{ otherwise}
\end{array}\right.
\end{equation}
and $\psi_1(x)\subseteq\psi_2(x)\iff \forall\tau\in\psi_1(x),\exists \tau'\in\psi_2(x),\tau\leq\tau'$
This would be interesting to avoid to use as domains those in
$\psi_\circ$ that would be split in two in the $\psi_1$ and $\psi_2$
of the ``if'' ... but it is probably better to check it by modifying
the rule of ``if'' (that is, add $\psi_\circ$ only for when it brings new
information).
}
------------------
Finally to take into account gradual typing we need a little bit of
extra work, but just in this last typing rule, all the rest remains
......
Both TypeScript and Flow deduce for the function \code{foo} in the introduction the type $\code{number | string}\to\code{number}$.\footnote{Actually they deduce as return type \code{number | string | null}, but it does not matter}
Consider the following modification of our example.
\begin{alltt}\color{darkblue}
function foo(x\textcolor{darkred}{ : number | string}) \{
(typeof(x) === "number")? x++ : x.tile()
\}
\end{alltt}
where \code{tile()} is the JavaScript library function that removes all blank spaces from a string. Now the type deduce is \code{number | string}$\to$\code{number | string} but it is clear that a better type for this function would be the intersection type ...
Currently it is possible to deduce this type both in Typescript and Flow by explicit annotations. But it is clear that that the current use of typeof should be enough to suggest the system to check separatly the types for \code{number} and for \code{string} (i.e., \code{number | string}$\setminus$\code{number}). )
More generally we can refine the type of a function with intersection in may different ways, but we use the analysis of our occurrence typing to do it
\vdots
\bigskip
\vdots
The case of study is the following function
\begin{alltt}\color{darkblue}
function (x \textcolor{darkred}{: \(\tau\)}) \{
(x \(\in\) Real) ?
? \{ (x \(\in\) Int) ? succ x : sqrt x \}
: \{ \(\neg\)x \}
\}
\end{alltt}
first we consider the case without gradual typing. In particular when
$\tau$ is \code{Real|Bool} (we suppose that \code{Int} is a subtype of \code{Real}) we want to deduce for this function the
type
\code{$(\Int\to\Int)\wedge(\Real\backslash\Int\to\Complex)\wedge(\Bool\to\Bool)$}. No
cast is inserted, of course.
Then we consider gradual typing and in particular when $\tau$ is
\code{?}. Then we want to deduce for this function the type
\code{$(\Int\to\Int)\wedge(\Real\backslash\Int\to\Complex)\wedge(?\backslash\Real\to\Bool)$},
and a cast is inserted on the last occurrence of \code x\ which is
compiled as \code{\(\neg\)(\Cast{Bool}{x})}.
If we omit $\tau$ and imagine that this is syntactic sugar for the
parameter to be typed with \Any\ (we do not consider polymorphism),
then the function must be rejected (since it tries to type
\code{\(\neg\)x} under the assumption that \code x\ has type
\code{$\neg\Real$}. Notice that this last syntactic sugar (i.e., typing function parameter by $\Any$ when type annotation are omitted) allows us to
capture user-defined discrimination as defined by ~\citet{THF10}
since, for instance
\begin{alltt}\color{darkblue}
let is_int x = (x\(\in\)Int)? true : false
in if is_int z then z+1 else 42
\end{alltt}
is well typed since the function \code{is\_int} is given type $(\Int\to\True)\wedge(\neg\Int\to\False)$
Here is how to do it first for non-gradual types.
First we define the auxiliary deduction system that collects type information on the variable lambda abstracted (i.e., those in the domain of $\Gamma$):
\begin{mathpar}
\Infer[STL]
{
}
{ \Gamma \vdash e \triangleright\{\Gamma\} }
{ e\in STL}
\\
\Infer[Abs]
{\Gamma,x:s\vdash e\triangleright\psi}
{
\Gamma\vdash\lambda x:s.e\triangleright\psi\setminus\{x\}
}
{}
\Infer[App]
{
\Gamma \vdash e_1\triangleright\psi_1 \\
\Gamma\vdash e_2\triangleright\psi_2
}
{ \Gamma \vdash {e_1}{e_2}\triangleright\psi_1\cup\psi_2 }
{}
\\
\Infer[If]
{\Gamma\vdash e\triangleright\psi_\circ\\
\Gamma, \Gamma^+_{\Gamma,e,t}\vdash e_1\triangleright\psi_1\\
\Gamma, \Gamma^-_{\Gamma,e,t}\vdash e_2\triangleright\psi_2}
{\Gamma\vdash \ite e t {e_1}{e_2}\triangleright\psi_\circ\cup\psi_1\cup\psi_2}
{}
\\
\Infer[Pair]
{\Gamma \vdash e_1\triangleright\psi_1 \and \Gamma \vdash e_2\triangleright\psi_2}
{\Gamma \vdash (e_1,e_2)\triangleright\psi_1\cup\psi_2}
{}
\Infer[Proj]
{\Gamma \vdash e\triangleright\psi}
{\Gamma \vdash \pi_i e\triangleright\psi}
{}
\\\end{mathpar}
Where
\begin{enumerate}
\item STL is the set of expressions in simply-typed lambda calculus (i.e., expression without any type-case occurring in them),
\item $\{\Gamma\}$ is the function that maps $\Gamma(e)$ into $\{\Gamma(e)\}$ if $e$ is a variable and undefined otherwise
\item $\psi$ ranges on functions that map variables into set of types, $\psi\setminus\{x\}$ is the function defined as $\psi$ but undefined on $x$.
\item $\psi_1\cup\psi_2$ denote component-wise union.
\end{enumerate}
And now all we need to do is to modify the typing rule for lambdas as follows
\begin{mathpar}
\Infer[Abs]
{\Gamma,x:t\vdash e\triangleright\psi\and \forall i\in I\quad \Gamma,x:s_i\vdash e:t_i }
{
\Gamma\vdash\lambda x:t.e:\textstyle\bigwedge_{i\in I}s_i\to t_i
}
{\psi(x)=\{s_i\alt i\in I\}}
\end{mathpar}
Note the invariant that the domain of $\psi$ is always equal to the domain of $\Gamma$ restricted to variables. It is not too difficult to generalize this rule when the lambda is typed by an intersection type.
This system is enough to type our case study for the case $\tau$ defined as \code{Real|Bool}. Indeed the analysis of the body yields $\psi(x)=\{\Int,\Real\setminus\Int\}$ for the branch \code{(x $\in$ Int) ? succ x : sqrt x} and, since \code{$(\Bool\vee\Real)\setminus\Real = \Bool$}, yields $\psi(x)=\{\Bool\}$ for the branch \code{$\neg$x}. So the function will be checked for the input types $\Int$, $\Real\setminus\Int$, and \Bool, yielding the expected result.
{\color{darkred} RANDOM THOUGHTS:
A delicate point is the definition of the union
$\psi_1\cup\psi_2$. The simplest possible definition is the
component-wise union. This definition is enough to avoid the problem
of typecases on casted values and it is also enough to type our case
study, as we showed before. However if we want a more precise typing
discipline we may want to consider a more sophisticated way of
combining the information collected by the various $\psi$. For
instance, consider the following pair
\[\code{( \ite x s {e_1}{e_2} , \ite x t {e_3}{e_4} )}\]
Then we have $x:\Any\vdash \code{( \ite x s {e_1}{e_2} , \ite x t
{e_3}{e_4} )}\triangleright x\mapsto\{s,\neg s, t, \neg t\}$. However if this
code is the body of a function with parameter $x$, then it may be
tempting to try to produce a finer-grained analysis: for example,
instead of checking as input type just $s$ and $t$ one could check
instead $s\setminus t$, $t\setminus s$, and $s\wedge t$, whenever these
three types are not empty. This can be obtained by defining the union
operation as follows:
\[ (\psi_0\cup\psi_1)(x)=\{ t \alt \exists t_1\in\psi_1(x), t_2\in\psi_2(x), t=t_1\setminus t_2 \text{ or } t=t_2\setminus t_1 \text{ or } t=t_1\wedge t_2\text{ and } t\not\simeq\Empty\}\]
Do we really gain in precision? I think the gain is minimum. All we may obtain just come from a polymorphic use of the variable, but we can hardly gain more. Probably it is not worth the effort. As a concrete case consider
\[\code{function x \{ ( \ite x {\texttt{String|Bool}} {x}{x} , \ite x {\texttt{Bool|Int}} {x}{x} )} \}\]
So what?
A simple solution would be to define the union as follows
\begin{equation}
(\psi_0\cup\psi_1)(x)=
\left\{\begin{array}{ll}
\psi_i(x) &\text{ if }\psi_i(x)\subseteq\psi_{((i+1)\,\text{mod}\,2)}(x)\\
\psi_0(x)\cup\psi_1(x) &\text{ otherwise}
\end{array}\right.
\end{equation}
and $\psi_1(x)\subseteq\psi_2(x)\iff \forall\tau\in\psi_1(x),\exists \tau'\in\psi_2(x),\tau\leq\tau'$
This would be interesting to avoid to use as domains those in
$\psi_\circ$ that would be split in two in the $\psi_1$ and $\psi_2$
of the ``if'' ... but it is probably better to check it by modifying
the rule of ``if'' (that is, add $\psi_\circ$ only for when it brings new
information).
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment