practical.tex 14.8 KB
Newer Older
Giuseppe Castagna's avatar
Giuseppe Castagna committed
1
We have implemented the algorithmic system $\vdashA$ we presented in Section~\ref{sec:algorules}. Our
Giuseppe Castagna's avatar
Giuseppe Castagna committed
2
implementation is written in OCaml and uses CDuce as a library to
Giuseppe Castagna's avatar
Giuseppe Castagna committed
3
provide the semantic subtyping machinery. Besides the type-checking
Kim Nguyễn's avatar
Kim Nguyễn committed
4
algorithm defined on the base language, our implementation supports
Giuseppe Castagna's avatar
Giuseppe Castagna committed
5
the record types and expressions of Section \ref{ssec:struct} and the refinement of
6
7
8
9
10
11
12
13
14
15
function types 
\iflongversion%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
described in Section \ref{sec:refining}. Furthermore, our implementation uses for the inference of arrow types
the following improved rule:
\input{optimize}
\else
of Section \ref{sec:refining} with the rule of
Appendix~\ref{app:optimize}.
\fi%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

16
17
18
19
The implementation is rather crude and consists of 2000 lines of OCaml code,
including parsing, type-checking of programs, and pretty printing of types. We
demonstrate the output of our type-checking implementation in
Table~\ref{tab:implem} and Table~\ref{tab:implem2}. Table~\ref{tab:implem} lists
Giuseppe Castagna's avatar
Giuseppe Castagna committed
20
some examples, none of which can be typed by current systems. Even though some
21
22
systems such as Flow and TypeScript can type some of these examples by adding
explicit type annotations, the code 6, 7, 9, and 10 in Table~\ref{tab:implem}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
23
24
and, even more,  the \code{and\_} and \code{xor\_} functions given
in \eqref{and+} and \eqref{xor+} later in this
25
section are out of reach of current systems, even when using the right explicit
Giuseppe Castagna's avatar
Giuseppe Castagna committed
26
annotations. Table~\ref{tab:implem2} allows for a direct comparison of with
27
28
29
\cite{THF10} be giving the type inferred for the fourteen examples given in that
work.

Giuseppe Castagna's avatar
Giuseppe Castagna committed
30
It should be noted that for all the examples we present, the
31
32
time for the type inference process is less than 5ms, hence we do not report
precise timings in the table. These and other examples can be tested in the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
33
\ifsubmission
34
35
anonymized
\else
Giuseppe Castagna's avatar
Giuseppe Castagna committed
36
\fi
37
38
online toplevel available at
\url{https://occtyping.github.io/}%
Kim Nguyễn's avatar
Kim Nguyễn committed
39
\input{code_table}
Kim Nguyễn's avatar
Kim Nguyễn committed
40

41
In both tables, the second column gives a code fragment and the third
Giuseppe Castagna's avatar
Giuseppe Castagna committed
42
43
44
45
46
47
column the type deduced by our implementation as is (we
pretty printed it but we did not alter the output). Code~1 is a
straightforward function similar to our introductory
example \code{foo} in (\ref{foo}) and (\ref{foo2}) where \code{incr}
is the successor function and \code{lneg} the logical negation for
Booleans. Here the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
48
programmer annotates the parameter of the function with a coarse type
49
$\Int\vee\Bool$. Our implementation first type-checks the body of the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
50
function under this assumption, but doing so it collects that the type of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
51
52
53
$\texttt{x}$ is specialized to \Int{} in the ``then'' case and to \Bool{}
in the ``else'' case. The function is thus type-checked twice more
under each hypothesis for \texttt{x}, yielding the precise type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
54
55
56
57
$(\Int\to\Int)\land(\Bool\to\Bool)$. Note that w.r.t.\
rule \Rule{AbsInf+} of Section~\ref{sec:refining}, the
rule  \Rule{AbsInf++} we use int the  implementation improved the output of the computed
type. Indeed, using rule~[{\sc AbsInf}+] we would have obtained the
Kim Nguyễn's avatar
Kim Nguyễn committed
58
59
type
$(\Int\to\Int)\land(\Bool\to\Bool)\land(\Bool\vee\Int\to\Bool\vee\Int)$
Giuseppe Castagna's avatar
Giuseppe Castagna committed
60
with a redundant arrow. Here we can see that, since we deduced
61
the first two arrows $(\Int\to\Int)\land(\Bool\to\Bool)$, and since
Giuseppe Castagna's avatar
Giuseppe Castagna committed
62
the union of their domain exactly covers the domain of the third arrow, then
63
the latter is not needed. Code~2 shows what happens when the argument
Giuseppe Castagna's avatar
Giuseppe Castagna committed
64
of the function is left unannotated (i.e., it is annotated by the top
Giuseppe Castagna's avatar
Giuseppe Castagna committed
65
type \Any, written ``\texttt{Any}'' in our implementation). Here
66
67
type-checking and refinement also work as expected, but the function
only type checks if all cases for \texttt{x} are covered (which means
Giuseppe Castagna's avatar
Giuseppe Castagna committed
68
69
that the function must handle the case of inputs that are neither in \Int{}
nor in \Bool).
70
71
72

The following examples paint a more interesting picture. First
(Code~3) it is
Giuseppe Castagna's avatar
typos    
Giuseppe Castagna committed
73
easy in our formalism to program type predicates such as those
Giuseppe Castagna's avatar
Giuseppe Castagna committed
74
hard-coded in the $\lambda_{\textit{TR}}$ language of \citet{THF10}. Such type
75
76
predicates, which return \texttt{true} if and only if their input has
a particular type, are just plain functions with an intersection
Giuseppe Castagna's avatar
typos    
Giuseppe Castagna committed
77
type inferred by the system of Section~\ref{sec:refining}. We next define Boolean connectives as overloaded
Giuseppe Castagna's avatar
Giuseppe Castagna committed
78
functions. The \texttt{not\_} connective (Code~4) just tests whether its
79
argument is the Boolean \texttt{true} by testing that it belongs to
80
81
82
83
84
the singleton type \True{} (the type whose only value is
\texttt{true}) returning \texttt{false} for it and \texttt{true} for
any other value (recall that $\neg\True$ is equivalent to
$\texttt{Any\textbackslash}\True$). It works on values of any type,
but we could restrict it to Boolean values by simply annotating the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
85
parameter by \Bool{} (which, in the CDuce types that our system uses,  is syntactic sugar for
86
87
88
\True$\vee$\False) yielding the type
$(\True{\to}\False)\wedge(\False{\to}\True)$.
The \texttt{or\_} connective (Code~5) is straightforward as far as the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
89
code goes, but we see that the overloaded type precisely captures all
Giuseppe Castagna's avatar
Giuseppe Castagna committed
90
91
92
possible cases: the function returns \code{false} if and only if both
arguments are of type $\neg\True$, that is, they are any value
different from \code{true}. Again we use a generalized version of the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
93
\texttt{or\_} connective that accepts and treats any value that is not
94
\texttt{true} as \texttt{false} and again, we could easily restrict the
95
96
domain to \Bool{} if desired.\\
\indent
Giuseppe Castagna's avatar
Giuseppe Castagna committed
97
To showcase the power of our type system, and in particular of
98
the ``$\worra{}{}$''
Giuseppe Castagna's avatar
Giuseppe Castagna committed
99
type operator, we define \texttt{and\_} (Code~6) using De Morgan's
100
Laws instead of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
101
using a direct definition. Here the application of the outermost \texttt{not\_} operator is checked against type \True. This
102
allows the system to deduce that the whole \texttt{or\_} application
Giuseppe Castagna's avatar
typos    
Giuseppe Castagna committed
103
has type \False, which in turn leads to \texttt{not\_\;x} and
Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
104
\texttt{not\_\;y} to have type $\lnot \True$ and therefore both \texttt{x}
105
106
107
108
109
110
and \texttt{y} to have type \True. The whole function is typed with
the most precise type (we present the type as printed by our
implementation, but the first arrow of the resulting type is
equivalent to
$(\True\to\lnot\True\to\False)\land(\True\to\True\to\True)$).

Giuseppe Castagna's avatar
Giuseppe Castagna committed
111
All these type predicates and Boolean connectives can be used together
112
113
114
to write complex type tests, as in Code~7. Here we define a function
\texttt{f} that takes two arguments \texttt{x} and \texttt{y}. If
\texttt{x} is an integer and \texttt{y} a Boolean, then it returns the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
115
116
integer \texttt{1}; if \texttt{x} is a character or
\texttt{y} is an integer, then it returns \texttt{2}; otherwise the
117
function returns \texttt{3}. Our system correctly deduces a (complex)
Giuseppe Castagna's avatar
Giuseppe Castagna committed
118
119
intersection type that covers all cases (plus several redundant arrow
types). That this type is as precise as possible can be shown by the fact that
120
when applying
Giuseppe Castagna's avatar
Giuseppe Castagna committed
121
122
\texttt{f} to arguments of the expected type, the \emph{type}
statically deduced for the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
123
whole expression is the singleton type \texttt{1}, or \texttt{2},
124
125
126
or \texttt{3}, depending on the type of the arguments.

Code~8 allows us to demonstrate the use and typing of record paths. We
Giuseppe Castagna's avatar
Giuseppe Castagna committed
127
model, using open records, the type of DOM objects that represent XML
128
or HTML documents. Such objects possess a common field
Giuseppe Castagna's avatar
Giuseppe Castagna committed
129
\texttt{nodeType} containing an integer constant denoting the kind of
130
the node (e.g., \p{1} for an element node, \p{3} for a text node, \ldots). Depending on the kind, the object will have
131
132
different fields and methods. It is common practice to perform a test
on the value of the \texttt{nodeType} field. In dynamic languages such
133
134
135
as JavaScript, the relevant field can directly be accessed
after having checked for the appropriate \texttt{nodeType}, whereas
in statically typed languages such as Java, a downward cast
136
from the generic \texttt{Node} type to the expected precise type of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
137
the object is needed. We can see that using the extension presented in
138
139
140
141
142
143
144
Section~\ref{ssec:struct} we can deduce the correct type for
\texttt{x} in all cases. Of particular interest is the last case,
since we use a type case to check the emptiness of the list of child
nodes. This splits, at the type level, the case for the \Keyw{Element}
type depending on whether the content of the \texttt{childNodes} field
is the empty list or not.

Giuseppe Castagna's avatar
Giuseppe Castagna committed
145
Code~9 shows the usefulness of the rule \Rule{OverApp}.
146
Consider the definition of the \texttt{xor\_} operator.
147
148
Here the rule~[{\sc AbsInf}+] is not sufficient to precisely type the
function, and using only this rule would yield a type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
149
150
151
 $\Any\to\Any\to\Bool$.
\iflongversion
Let us follow the behavior of the
152
153
 ``$\worra{}{}$'' operator. Here the whole \texttt{and\_} is requested
 to have type \True, which implies that \texttt{or\_ x y} must have
Giuseppe Castagna's avatar
Giuseppe Castagna committed
154
 type \True. This can always happen, whether \texttt{x} is \True{} or
155
156
 not (but then depends on the type of \texttt{y}). The ``$\worra{}{}$''
 operator correctly computes that the type for \texttt{x} in the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
157
 ``\texttt{then}'' branch is $\True\vee\lnot\True\lor\True\simeq\Any$,
158
and a similar reasoning holds for \texttt{y}.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
159
\fi%%%%%%%%%%%%%%
160
However, since \texttt{or\_} has type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
161
162
163
164
165
%\\[.7mm]\centerline{%
$(\True\to\Any\to\True)\land(\Any\to\True\to\True)\land
   (\lnot\True\to\lnot\True\to\False)$
%}\\[.7mm]
then the rule \Rule{OverApp} applies and \True, \Any, and $\lnot\True$ become candidate types for
Giuseppe Castagna's avatar
Giuseppe Castagna committed
166
\texttt{x}, which allows us to deduce the precise type given in the table. Finally, thanks to rule \Rule{OverApp} it is not  necessary to use a type case to force refinement. As a consequence, we can define the functions \texttt{and\_} and \texttt{xor\_} more naturally as:
Giuseppe Castagna's avatar
Giuseppe Castagna committed
167
168
169
\begin{alltt}\color{darkblue}
  let and_ = fun (x : Any) -> fun (y : Any) -> not_ (or_ (not_ x) (not_ y)) \refstepcounter{equation}          \mbox{\color{black}\rm(\theequation)}\label{and+}
  let xor_ = fun (x : Any) -> fun (y : Any) -> and_ (or_ x y) (not_ (and_ x y)) \refstepcounter{equation}      \mbox{\color{black}\rm(\theequation)}\label{xor+}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
170
171
\end{alltt}
for which the very same types as in Table~\ref{tab:implem} are deduced.
Kim Nguyễn's avatar
.    
Kim Nguyễn committed
172

173
174
As for Code~10 (corresponding to our introductory
example~\eqref{nest1}), it illustrates the need for iterative refinement of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
175
type environments, as defined in Section~\ref{sec:typenv}. As
176
explained, a single pass analysis would deduce
Kim Nguyễn's avatar
.    
Kim Nguyễn committed
177
for {\tt x}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
178
a type \Int{} from the {\tt f\;x} application and \Any{} from the {\tt g\;x}
Kim Nguyễn's avatar
.    
Kim Nguyễn committed
179
application. Here by iterating a second time, the algorithm deduces
Giuseppe Castagna's avatar
Giuseppe Castagna committed
180
that {\tt x} has type $\Empty$ (i.e., $\textsf{Empty}$), that is, that the first branch can never
Giuseppe Castagna's avatar
Giuseppe Castagna committed
181
be selected (and our implementation warns the user accordingly). In hindsight, the only way for a well-typed overloaded function to have
Giuseppe Castagna's avatar
empty    
Giuseppe Castagna committed
182
type $(\Int{\to}\Int)\land(\Any{\to}\Bool)$ is to diverge when the
Giuseppe Castagna's avatar
typo    
Giuseppe Castagna committed
183
argument is of type \Int: since this intersection type states that
Kim Nguyễn's avatar
.    
Kim Nguyễn committed
184
185
186
187
whenever the input is \Int, {\em both\/} branches can be selected,
yielding a result that is at the same time an integer and a Boolean.
This is precisely reflected by the case $\Int\to\Empty$ in the result.
Indeed our {\tt example10} function can be applied to an integer, but
Giuseppe Castagna's avatar
Giuseppe Castagna committed
188
at runtime the application of {\tt f\,x} will diverge.
189

190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
Code~11 simulates the behaviour of Javascript property resolution, by looking
for a property \texttt{l} either in the object \texttt{o} itself or in the
chained list of its \texttt{prototype} objects. In that example, we first model
prototype chaining by defining a type \texttt{Object} that can be either the
atom \texttt{Null} or any record with a \texttt{prototype} field which contains
(recursively) an \texttt{Object}. To ease the reading, we defined a recursive
type \texttt{ObjectWithPropertyL} which is either a record with a field
\texttt{l} or a record with a prototype of type \texttt{ObjectWithPropertyL}. We
can then define two predicate function \texttt{has\_property\_l} and
\texttt{has\_own\_property\_l} that tests whether an object has a property
through its prototype or directly. Lastly we can define a function
\texttt{get\_property\_l} which directly access the field if it is present, or
recursively search for it through the prototype chain (in our syntax, the
paremeter \texttt{self} allows one to refer to the function itself). Of
particular interest is the type deduced for the two predicate functions. Indeed,
we can see that \texttt{has\_own\_property\_l} is given an overloaded type whose
first argument is in each case a recursive record type that describe precisely
whether \texttt{l} is present at some point in the list or not (recall that
in a record type such as $\orecord{ l=?\Empty }$, indicate that field \texttt{l}
is absent for sure).

\input{code_table2}
In Table~\ref{tab:implem2}, we convert in our syntax the 14 examples of
\cite{THF10} (we tried to complete such examples with neutral code when they
were incomplete in the original paper). Of these 14 examples, Example~1 to 13
depict combinations of type predicates (such as \texttt{is\_int}) used either
directly or through Boolean predicates (such as the \texttt{or\_} function
previously defined). Note that for all examples for which there was no explicit
indication in the original version, we \emph{infer} the type of the function.
Notice also that for Example~6, the goal of the example is to show that indeed,
the function is ill-typed (which our typechecker detects accurately). The
Giuseppe Castagna's avatar
Giuseppe Castagna committed
221
original Example~14 could be written in our syntax as:
222
223
224
225
226
227
228
229
230
231
232
233

\begin{verbatim}
  let example14_alt = fun (input : Int | String) ->
  fun (extra : (Any, Any)) ->
   if (input, is_int (fst extra)) is (Int, True) then
      add input (fst extra)
   else if is_int (fst extra) is True 
   then add (strlen input) (fst extra)
   else 0
\end{verbatim}
Notice, in the version above the absence of an occurrence of \texttt{input} in
 the second \texttt{if} statement. Indeed, if the first test fails, it is either
Giuseppe Castagna's avatar
Giuseppe Castagna committed
234
 because \texttt{is\_int (fst extra)} is not \texttt{True} (i.e., if
235
236
237
238
 \texttt{fst extra} is not an integer) or because \texttt{input} is not an
 integer. Therefore, in our setting, the type information propagated to the
 second test is : $\texttt{(input, is\_int (fst extra))} \in \lnot (\Int,
 \True)$, that is $\texttt{(input, is\_int (fst extra))} \in (\lnot\Int,
Giuseppe Castagna's avatar
Giuseppe Castagna committed
239
240
 \True)\lor(\Int, \False)$. Therefore, the type deduced for \texttt{input} in the second
 branch is $(\String\lor\Int) \land (\lnot\Int \lor \Int) =
241
242
243
244
245
246
 \String\lor\Int$ which is not precise enough. By adding an occurrence of
 \texttt{input} in our \texttt{example14\_alt}, we can further restrict its type
 typecheck the function. Lifting this limitation through a control-flow analysis
 is part of our future work.


247
Although these experiments are still preliminary, they show how the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
248
combination of occurrence typing and set-theoretic types, together
249
with the type inference for overloaded function types presented in
Giuseppe Castagna's avatar
Giuseppe Castagna committed
250
251
Section~\ref{sec:refining} goes beyond what languages like
TypeScript and Flow do, since they can only infer single arrow types.
252
Our refining of overloaded
Giuseppe Castagna's avatar
Giuseppe Castagna committed
253
functions is also future-proof and resilient to extensions: since it ``retypes'' functions
254
 using information gathered by the typing of occurrences in the body,
Giuseppe Castagna's avatar
Giuseppe Castagna committed
255
256
 its precision will improve with any improvement of
 our occurrence typing framework.