practical.tex 14.2 KB
Newer Older
Giuseppe Castagna's avatar
Giuseppe Castagna committed
1
We have implemented the algorithmic system $\vdashA$. Our
Giuseppe Castagna's avatar
Giuseppe Castagna committed
2
implementation is written in OCaml and uses CDuce as a library to
Giuseppe Castagna's avatar
Giuseppe Castagna committed
3
provide the semantic subtyping machinery. Besides the type-checking
Kim Nguyễn's avatar
Kim Nguyễn committed
4
algorithm defined on the base language, our implementation supports
5
record types of Section \ref{ssec:struct} and the refinement of
6
7
8
9
10
11
12
13
14
15
function types 
\iflongversion%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
described in Section \ref{sec:refining}. Furthermore, our implementation uses for the inference of arrow types
the following improved rule:
\input{optimize}
\else
of Section \ref{sec:refining} with the rule of
Appendix~\ref{app:optimize}.
\fi%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
The implementation is rather crude and consists of 2000 lines of OCaml code,
including parsing, type-checking of programs, and pretty printing of types. We
demonstrate the output of our type-checking implementation in
Table~\ref{tab:implem} and Table~\ref{tab:implem2}. Table~\ref{tab:implem} lists
some examples, none of which can be typed by current systems (even though some
systems such as Flow and TypeScript can type some of these examples by adding
explicit type annotations, the code 6, 7, 9, and 10 in Table~\ref{tab:implem}
and, even more,  the \code{and\_} and \code{xor\_} functions at the end of this
section are out of reach of current systems, even when using the right explicit
annotations). Table~\ref{tab:implem2} allows for a direct comparison of with
\cite{THF10} be giving the type inferred for the fourteen examples given in that
work.

It should be noted that for all the examples we discuss, the the
time for the type inference process is less than 5ms, hence we do not report
precise timings in the table. These and other examples can be tested in the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
32
\ifsubmission
33
34
anonymized
\else
Giuseppe Castagna's avatar
Giuseppe Castagna committed
35
\fi
36
37
online toplevel available at
\url{https://occtyping.github.io/}%
Kim Nguyễn's avatar
Kim Nguyễn committed
38
\input{code_table}
Kim Nguyễn's avatar
Kim Nguyễn committed
39

40
In both tables, the second column gives a code fragment and the third
Kim Nguyễn's avatar
Kim Nguyễn committed
41
column the type deduced by our implementation. Code~1 is a
Giuseppe Castagna's avatar
Giuseppe Castagna committed
42
43
straightforward function similar to our introductory example \code{foo} in (\ref{foo},\ref{foo2}). Here the
programmer annotates the parameter of the function with a coarse type
44
$\Int\vee\Bool$. Our implementation first type-checks the body of the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
45
function under this assumption, but doing so it collects that the type of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
46
47
48
$\texttt{x}$ is specialized to \Int{} in the ``then'' case and to \Bool{}
in the ``else'' case. The function is thus type-checked twice more
under each hypothesis for \texttt{x}, yielding the precise type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
49
$(\Int\to\Int)\land(\Bool\to\Bool)$. Note that w.r.t.\  rule \Rule{AbsInf+} of Section~\ref{sec:refining}, our implementation improved the output of the computed
Giuseppe Castagna's avatar
Giuseppe Castagna committed
50
type. Indeed, using rule~[{\sc AbsInf}+] we would obtain the
Kim Nguyễn's avatar
Kim Nguyễn committed
51
52
type
$(\Int\to\Int)\land(\Bool\to\Bool)\land(\Bool\vee\Int\to\Bool\vee\Int)$
Giuseppe Castagna's avatar
Giuseppe Castagna committed
53
with a redundant arrow. Here we can see that, since we deduced
54
the first two arrows $(\Int\to\Int)\land(\Bool\to\Bool)$, and since
Giuseppe Castagna's avatar
Giuseppe Castagna committed
55
the union of their domain exactly covers the domain of the third arrow, then
56
the latter is not needed. Code~2 shows what happens when the argument
Giuseppe Castagna's avatar
Giuseppe Castagna committed
57
of the function is left unannotated (i.e., it is annotated by the top
Giuseppe Castagna's avatar
Giuseppe Castagna committed
58
type \Any, written ``\texttt{Any}'' in our implementation). Here
59
60
type-checking and refinement also work as expected, but the function
only type checks if all cases for \texttt{x} are covered (which means
Giuseppe Castagna's avatar
Giuseppe Castagna committed
61
62
that the function must handle the case of inputs that are neither in \Int{}
nor in \Bool).
63
64
65

The following examples paint a more interesting picture. First
(Code~3) it is
Giuseppe Castagna's avatar
typos    
Giuseppe Castagna committed
66
easy in our formalism to program type predicates such as those
Giuseppe Castagna's avatar
Giuseppe Castagna committed
67
hard-coded in the $\lambda_{\textit{TR}}$ language of \citet{THF10}. Such type
68
69
predicates, which return \texttt{true} if and only if their input has
a particular type, are just plain functions with an intersection
Giuseppe Castagna's avatar
typos    
Giuseppe Castagna committed
70
type inferred by the system of Section~\ref{sec:refining}. We next define Boolean connectives as overloaded
Giuseppe Castagna's avatar
Giuseppe Castagna committed
71
functions. The \texttt{not\_} connective (Code~4) just tests whether its
72
argument is the Boolean \texttt{true} by testing that it belongs to
73
74
75
76
77
78
79
80
81
the singleton type \True{} (the type whose only value is
\texttt{true}) returning \texttt{false} for it and \texttt{true} for
any other value (recall that $\neg\True$ is equivalent to
$\texttt{Any\textbackslash}\True$). It works on values of any type,
but we could restrict it to Boolean values by simply annotating the
parameter by \Bool{} (which in CDuce is syntactic sugar for
\True$\vee$\False) yielding the type
$(\True{\to}\False)\wedge(\False{\to}\True)$.
The \texttt{or\_} connective (Code~5) is straightforward as far as the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
82
83
code goes, but we see that the overloaded type precisely captures all
possible cases. Again we use a generalized version of the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
84
\texttt{or\_} connective that accepts and treats any value that is not
85
\texttt{true} as \texttt{false} and again, we could easily restrict the
86
87
domain to \Bool{} if desired.\\
\indent
Giuseppe Castagna's avatar
Giuseppe Castagna committed
88
To showcase the power of our type system, and in particular of
89
the ``$\worra{}{}$''
Giuseppe Castagna's avatar
Giuseppe Castagna committed
90
type operator, we define \texttt{and\_} (Code~6) using De Morgan's
91
Laws instead of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
92
using a direct definition. Here the application of the outermost \texttt{not\_} operator is checked against type \True. This
93
allows the system to deduce that the whole \texttt{or\_} application
Giuseppe Castagna's avatar
typos    
Giuseppe Castagna committed
94
has type \False, which in turn leads to \texttt{not\_\;x} and
Victor Lanvin's avatar
Fixes    
Victor Lanvin committed
95
\texttt{not\_\;y} to have type $\lnot \True$ and therefore both \texttt{x}
96
97
98
99
100
101
and \texttt{y} to have type \True. The whole function is typed with
the most precise type (we present the type as printed by our
implementation, but the first arrow of the resulting type is
equivalent to
$(\True\to\lnot\True\to\False)\land(\True\to\True\to\True)$).

Giuseppe Castagna's avatar
Giuseppe Castagna committed
102
All these type predicates and Boolean connectives can be used together
103
104
105
to write complex type tests, as in Code~7. Here we define a function
\texttt{f} that takes two arguments \texttt{x} and \texttt{y}. If
\texttt{x} is an integer and \texttt{y} a Boolean, then it returns the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
106
107
integer \texttt{1}; if \texttt{x} is a character or
\texttt{y} is an integer, then it returns \texttt{2}; otherwise the
108
function returns \texttt{3}. Our system correctly deduces a (complex)
Giuseppe Castagna's avatar
Giuseppe Castagna committed
109
110
intersection type that covers all cases (plus several redundant arrow
types). That this type is as precise as possible can be shown by the fact that
111
when applying
Giuseppe Castagna's avatar
Giuseppe Castagna committed
112
113
\texttt{f} to arguments of the expected type, the \emph{type} deduced for the
whole expression is the singleton type \texttt{1}, or \texttt{2},
114
115
116
or \texttt{3}, depending on the type of the arguments.

Code~8 allows us to demonstrate the use and typing of record paths. We
Giuseppe Castagna's avatar
Giuseppe Castagna committed
117
model, using open records, the type of DOM objects that represent XML
118
or HTML documents. Such objects possess a common field
Giuseppe Castagna's avatar
Giuseppe Castagna committed
119
\texttt{nodeType} containing an integer constant denoting the kind of
120
the node (e.g., \p{1} for an element node, \p{3} for a text node, \ldots). Depending on the kind, the object will have
121
122
different fields and methods. It is common practice to perform a test
on the value of the \texttt{nodeType} field. In dynamic languages such
123
124
125
as JavaScript, the relevant field can directly be accessed
after having checked for the appropriate \texttt{nodeType}, whereas
in statically typed languages such as Java, a downward cast
126
from the generic \texttt{Node} type to the expected precise type of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
127
the object is needed. We can see that using the extension presented in
128
129
130
131
132
133
134
Section~\ref{ssec:struct} we can deduce the correct type for
\texttt{x} in all cases. Of particular interest is the last case,
since we use a type case to check the emptiness of the list of child
nodes. This splits, at the type level, the case for the \Keyw{Element}
type depending on whether the content of the \texttt{childNodes} field
is the empty list or not.

Giuseppe Castagna's avatar
Giuseppe Castagna committed
135
Code~9 shows the usefulness of the rule \Rule{OverApp}.
136
Consider the definition of the \texttt{xor\_} operator.
137
138
Here the rule~[{\sc AbsInf}+] is not sufficient to precisely type the
function, and using only this rule would yield a type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
139
140
141
 $\Any\to\Any\to\Bool$.
\iflongversion
Let us follow the behavior of the
142
143
 ``$\worra{}{}$'' operator. Here the whole \texttt{and\_} is requested
 to have type \True, which implies that \texttt{or\_ x y} must have
Giuseppe Castagna's avatar
Giuseppe Castagna committed
144
 type \True. This can always happen, whether \texttt{x} is \True{} or
145
146
 not (but then depends on the type of \texttt{y}). The ``$\worra{}{}$''
 operator correctly computes that the type for \texttt{x} in the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
147
 ``\texttt{then}'' branch is $\True\vee\lnot\True\lor\True\simeq\Any$,
148
and a similar reasoning holds for \texttt{y}.
Giuseppe Castagna's avatar
Giuseppe Castagna committed
149
\fi%%%%%%%%%%%%%%
150
However, since \texttt{or\_} has type
Giuseppe Castagna's avatar
Giuseppe Castagna committed
151
152
153
154
155
%\\[.7mm]\centerline{%
$(\True\to\Any\to\True)\land(\Any\to\True\to\True)\land
   (\lnot\True\to\lnot\True\to\False)$
%}\\[.7mm]
then the rule \Rule{OverApp} applies and \True, \Any, and $\lnot\True$ become candidate types for
Giuseppe Castagna's avatar
Giuseppe Castagna committed
156
\texttt{x}, which allows us to deduce the precise type given in the table. Finally, thanks to rule \Rule{OverApp} it is not  necessary to use a type case to force refinement. As a consequence, we can define the functions \texttt{and\_} and \texttt{xor\_} more naturally as:
157
\begin{alltt}\color{darkblue}\morecompact
Giuseppe Castagna's avatar
Giuseppe Castagna committed
158
159
  let and_ = fun (x : Any) -> fun (y : Any) -> not_ (or_ (not_ x) (not_ y))
  let xor_ = fun (x : Any) -> fun (y : Any) -> and_ (or_ x y) (not_ (and_ x y))
Giuseppe Castagna's avatar
Giuseppe Castagna committed
160
161
\end{alltt}
for which the very same types as in Table~\ref{tab:implem} are deduced.
Kim Nguyễn's avatar
.    
Kim Nguyễn committed
162

163
164
As for Code~10 (corresponding to our introductory
example~\eqref{nest1}), it illustrates the need for iterative refinement of
Giuseppe Castagna's avatar
Giuseppe Castagna committed
165
type environments, as defined in Section~\ref{sec:typenv}. As
166
explained, a single pass analysis would deduce
Kim Nguyễn's avatar
.    
Kim Nguyễn committed
167
for {\tt x}
Giuseppe Castagna's avatar
Giuseppe Castagna committed
168
a type \Int{} from the {\tt f\;x} application and \Any{} from the {\tt g\;x}
Kim Nguyễn's avatar
.    
Kim Nguyễn committed
169
application. Here by iterating a second time, the algorithm deduces
Giuseppe Castagna's avatar
empty    
Giuseppe Castagna committed
170
that {\tt x} has type $\Empty$ (i.e., $\textsf{Empty}$), that is that the first branch can never
Giuseppe Castagna's avatar
Giuseppe Castagna committed
171
be selected (and our implementation warns the user accordingly). In hindsight, the only way for a well-typed overloaded function to have
Giuseppe Castagna's avatar
empty    
Giuseppe Castagna committed
172
type $(\Int{\to}\Int)\land(\Any{\to}\Bool)$ is to diverge when the
Giuseppe Castagna's avatar
typo    
Giuseppe Castagna committed
173
argument is of type \Int: since this intersection type states that
Kim Nguyễn's avatar
.    
Kim Nguyễn committed
174
175
176
177
178
whenever the input is \Int, {\em both\/} branches can be selected,
yielding a result that is at the same time an integer and a Boolean.
This is precisely reflected by the case $\Int\to\Empty$ in the result.
Indeed our {\tt example10} function can be applied to an integer, but
at runtime the application of {\tt f ~x} will diverge.
179

180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
Code~11 simulates the behaviour of Javascript property resolution, by looking
for a property \texttt{l} either in the object \texttt{o} itself or in the
chained list of its \texttt{prototype} objects. In that example, we first model
prototype chaining by defining a type \texttt{Object} that can be either the
atom \texttt{Null} or any record with a \texttt{prototype} field which contains
(recursively) an \texttt{Object}. To ease the reading, we defined a recursive
type \texttt{ObjectWithPropertyL} which is either a record with a field
\texttt{l} or a record with a prototype of type \texttt{ObjectWithPropertyL}. We
can then define two predicate function \texttt{has\_property\_l} and
\texttt{has\_own\_property\_l} that tests whether an object has a property
through its prototype or directly. Lastly we can define a function
\texttt{get\_property\_l} which directly access the field if it is present, or
recursively search for it through the prototype chain (in our syntax, the
paremeter \texttt{self} allows one to refer to the function itself). Of
particular interest is the type deduced for the two predicate functions. Indeed,
we can see that \texttt{has\_own\_property\_l} is given an overloaded type whose
first argument is in each case a recursive record type that describe precisely
whether \texttt{l} is present at some point in the list or not (recall that
in a record type such as $\orecord{ l=?\Empty }$, indicate that field \texttt{l}
is absent for sure).

\input{code_table2}
In Table~\ref{tab:implem2}, we convert in our syntax the 14 examples of
\cite{THF10} (we tried to complete such examples with neutral code when they
were incomplete in the original paper). Of these 14 examples, Example~1 to 13
depict combinations of type predicates (such as \texttt{is\_int}) used either
directly or through Boolean predicates (such as the \texttt{or\_} function
previously defined). Note that for all examples for which there was no explicit
indication in the original version, we \emph{infer} the type of the function.
Notice also that for Example~6, the goal of the example is to show that indeed,
the function is ill-typed (which our typechecker detects accurately). The
original Example~14 could be written in our syntax with :

\begin{verbatim}
  let example14_alt = fun (input : Int | String) ->
  fun (extra : (Any, Any)) ->
   if (input, is_int (fst extra)) is (Int, True) then
      add input (fst extra)
   else if is_int (fst extra) is True 
   then add (strlen input) (fst extra)
   else 0
\end{verbatim}
Notice, in the version above the absence of an occurrence of \texttt{input} in
 the second \texttt{if} statement. Indeed, if the first test fails, it is either
 because \texttt{is\_int (fst extra)} is not \texttt{True} (that is, if
 \texttt{fst extra} is not an integer) or because \texttt{input} is not an
 integer. Therefore, in our setting, the type information propagated to the
 second test is : $\texttt{(input, is\_int (fst extra))} \in \lnot (\Int,
 \True)$, that is $\texttt{(input, is\_int (fst extra))} \in (\lnot\Int,
 \True)\lor(\Int, \False)$ Therefore, the type of \texttt{input} in the second
 branch is of type $(\String\lor\Int) \land (\lnot\Int \lor \int) =
 \String\lor\Int$ which is not precise enough. By adding an occurrence of
 \texttt{input} in our \texttt{example14\_alt}, we can further restrict its type
 typecheck the function. Lifting this limitation through a control-flow analysis
 is part of our future work.


237
Although these experiments are still preliminary, they show how the
Giuseppe Castagna's avatar
Giuseppe Castagna committed
238
combination occurrence typing and set-theoretic types, together
239
with the type inference for overloaded function types presented in
Giuseppe Castagna's avatar
Giuseppe Castagna committed
240
241
Section~\ref{sec:refining} goes beyond what languages like
TypeScript and Flow do, since they can only infer single arrow types.
242
Our refining of overloaded
Giuseppe Castagna's avatar
Giuseppe Castagna committed
243
functions is also future-proof and resilient to extensions: since it ``retypes'' functions
244
 using information gathered by the typing of occurrences in the body,
Giuseppe Castagna's avatar
Giuseppe Castagna committed
245
246
 its precision will improve with any improvement of
 our occurrence typing framework.