practical.tex 14.8 KB
 Giuseppe Castagna committed May 06, 2021 1 We have implemented the algorithmic system $\vdashA$ we presented in Section~\ref{sec:algorules}. Our  Giuseppe Castagna committed Mar 03, 2020 2 implementation is written in OCaml and uses CDuce as a library to  Giuseppe Castagna committed Mar 02, 2020 3 provide the semantic subtyping machinery. Besides the type-checking  Kim Nguyễn committed Jul 08, 2019 4 algorithm defined on the base language, our implementation supports  Giuseppe Castagna committed May 06, 2021 5 the record types and expressions of Section \ref{ssec:struct} and the refinement of  Giuseppe Castagna committed May 05, 2021 6 7 8 9 10 11 12 13 14 15 function types \iflongversion%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% described in Section \ref{sec:refining}. Furthermore, our implementation uses for the inference of arrow types the following improved rule: \input{optimize} \else of Section \ref{sec:refining} with the rule of Appendix~\ref{app:optimize}. \fi%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  Kim Nguyễn committed May 06, 2021 16 17 18 19 The implementation is rather crude and consists of 2000 lines of OCaml code, including parsing, type-checking of programs, and pretty printing of types. We demonstrate the output of our type-checking implementation in Table~\ref{tab:implem} and Table~\ref{tab:implem2}. Table~\ref{tab:implem} lists  Giuseppe Castagna committed May 06, 2021 20 some examples, none of which can be typed by current systems. Even though some  Kim Nguyễn committed May 06, 2021 21 22 systems such as Flow and TypeScript can type some of these examples by adding explicit type annotations, the code 6, 7, 9, and 10 in Table~\ref{tab:implem}  Giuseppe Castagna committed May 06, 2021 23 24 and, even more, the \code{and\_} and \code{xor\_} functions given in \eqref{and+} and \eqref{xor+} later in this  Kim Nguyễn committed May 06, 2021 25 section are out of reach of current systems, even when using the right explicit  Giuseppe Castagna committed May 06, 2021 26 annotations. Table~\ref{tab:implem2} allows for a direct comparison of with  Kim Nguyễn committed May 06, 2021 27 28 29 \cite{THF10} be giving the type inferred for the fourteen examples given in that work.  Giuseppe Castagna committed May 06, 2021 30 It should be noted that for all the examples we present, the  Kim Nguyễn committed May 06, 2021 31 32 time for the type inference process is less than 5ms, hence we do not report precise timings in the table. These and other examples can be tested in the  Giuseppe Castagna committed Jul 10, 2019 33 \ifsubmission  Kim Nguyễn committed Jul 09, 2020 34 35 anonymized \else  Giuseppe Castagna committed Jul 10, 2019 36 \fi  Kim Nguyễn committed Jul 09, 2020 37 38 online toplevel available at \url{https://occtyping.github.io/}%  Kim Nguyễn committed Jul 08, 2019 39 \input{code_table}  Kim Nguyễn committed May 06, 2021 40   Kim Nguyễn committed May 06, 2021 41 In both tables, the second column gives a code fragment and the third  Giuseppe Castagna committed May 06, 2021 42 43 44 45 46 47 column the type deduced by our implementation as is (we pretty printed it but we did not alter the output). Code~1 is a straightforward function similar to our introductory example \code{foo} in (\ref{foo}) and (\ref{foo2}) where \code{incr} is the successor function and \code{lneg} the logical negation for Booleans. Here the  Giuseppe Castagna committed Jul 09, 2019 48 programmer annotates the parameter of the function with a coarse type  Kim Nguyễn committed Jul 08, 2019 49 $\Int\vee\Bool$. Our implementation first type-checks the body of the  Giuseppe Castagna committed Mar 02, 2020 50 function under this assumption, but doing so it collects that the type of  Giuseppe Castagna committed Jul 09, 2019 51 52 53 $\texttt{x}$ is specialized to \Int{} in the then'' case and to \Bool{} in the else'' case. The function is thus type-checked twice more under each hypothesis for \texttt{x}, yielding the precise type  Giuseppe Castagna committed May 06, 2021 54 55 56 57 $(\Int\to\Int)\land(\Bool\to\Bool)$. Note that w.r.t.\ rule \Rule{AbsInf+} of Section~\ref{sec:refining}, the rule \Rule{AbsInf++} we use int the implementation improved the output of the computed type. Indeed, using rule~[{\sc AbsInf}+] we would have obtained the  Kim Nguyễn committed Jul 08, 2019 58 59 type $(\Int\to\Int)\land(\Bool\to\Bool)\land(\Bool\vee\Int\to\Bool\vee\Int)$  Giuseppe Castagna committed Mar 03, 2020 60 with a redundant arrow. Here we can see that, since we deduced  Kim Nguyễn committed Jul 08, 2019 61 the first two arrows $(\Int\to\Int)\land(\Bool\to\Bool)$, and since  Giuseppe Castagna committed Mar 03, 2020 62 the union of their domain exactly covers the domain of the third arrow, then  Kim Nguyễn committed Jul 08, 2019 63 the latter is not needed. Code~2 shows what happens when the argument  Giuseppe Castagna committed Jul 09, 2019 64 of the function is left unannotated (i.e., it is annotated by the top  Giuseppe Castagna committed Jul 24, 2019 65 type \Any, written \texttt{Any}'' in our implementation). Here  Kim Nguyễn committed Jul 08, 2019 66 67 type-checking and refinement also work as expected, but the function only type checks if all cases for \texttt{x} are covered (which means  Giuseppe Castagna committed Jul 09, 2019 68 69 that the function must handle the case of inputs that are neither in \Int{} nor in \Bool).  Kim Nguyễn committed Jul 08, 2019 70 71 72  The following examples paint a more interesting picture. First (Code~3) it is  Giuseppe Castagna committed Jul 09, 2019 73 easy in our formalism to program type predicates such as those  Giuseppe Castagna committed Jul 09, 2019 74 hard-coded in the $\lambda_{\textit{TR}}$ language of \citet{THF10}. Such type  Kim Nguyễn committed Jul 08, 2019 75 76 predicates, which return \texttt{true} if and only if their input has a particular type, are just plain functions with an intersection  Giuseppe Castagna committed Jul 09, 2019 77 type inferred by the system of Section~\ref{sec:refining}. We next define Boolean connectives as overloaded  Giuseppe Castagna committed Jul 09, 2019 78 functions. The \texttt{not\_} connective (Code~4) just tests whether its  Kim Nguyễn committed Jul 08, 2019 79 argument is the Boolean \texttt{true} by testing that it belongs to  Kim Nguyễn committed Jul 09, 2019 80 81 82 83 84 the singleton type \True{} (the type whose only value is \texttt{true}) returning \texttt{false} for it and \texttt{true} for any other value (recall that $\neg\True$ is equivalent to $\texttt{Any\textbackslash}\True$). It works on values of any type, but we could restrict it to Boolean values by simply annotating the  Giuseppe Castagna committed May 06, 2021 85 parameter by \Bool{} (which, in the CDuce types that our system uses, is syntactic sugar for  Kim Nguyễn committed Jul 09, 2019 86 87 88 \True$\vee$\False) yielding the type $(\True{\to}\False)\wedge(\False{\to}\True)$. The \texttt{or\_} connective (Code~5) is straightforward as far as the  Giuseppe Castagna committed Jul 10, 2019 89 code goes, but we see that the overloaded type precisely captures all  Giuseppe Castagna committed May 06, 2021 90 91 92 possible cases: the function returns \code{false} if and only if both arguments are of type $\neg\True$, that is, they are any value different from \code{true}. Again we use a generalized version of the  Giuseppe Castagna committed Jul 09, 2019 93 \texttt{or\_} connective that accepts and treats any value that is not  Kim Nguyễn committed Jul 09, 2019 94 \texttt{true} as \texttt{false} and again, we could easily restrict the  Giuseppe Castagna committed Mar 02, 2020 95 96 domain to \Bool{} if desired.\\ \indent  Giuseppe Castagna committed Jul 09, 2019 97 To showcase the power of our type system, and in particular of  Kim Nguyễn committed Jul 08, 2019 98 the $\worra{}{}$''  Giuseppe Castagna committed Jul 09, 2019 99 type operator, we define \texttt{and\_} (Code~6) using De Morgan's  Kim Nguyễn committed Jul 08, 2019 100 Laws instead of  Giuseppe Castagna committed Jul 09, 2019 101 using a direct definition. Here the application of the outermost \texttt{not\_} operator is checked against type \True. This  Kim Nguyễn committed Jul 08, 2019 102 allows the system to deduce that the whole \texttt{or\_} application  Giuseppe Castagna committed Jul 09, 2019 103 has type \False, which in turn leads to \texttt{not\_\;x} and  Victor Lanvin committed Jul 10, 2019 104 \texttt{not\_\;y} to have type $\lnot \True$ and therefore both \texttt{x}  Kim Nguyễn committed Jul 08, 2019 105 106 107 108 109 110 and \texttt{y} to have type \True. The whole function is typed with the most precise type (we present the type as printed by our implementation, but the first arrow of the resulting type is equivalent to $(\True\to\lnot\True\to\False)\land(\True\to\True\to\True)$).  Giuseppe Castagna committed Jul 09, 2019 111 All these type predicates and Boolean connectives can be used together  Kim Nguyễn committed Jul 08, 2019 112 113 114 to write complex type tests, as in Code~7. Here we define a function \texttt{f} that takes two arguments \texttt{x} and \texttt{y}. If \texttt{x} is an integer and \texttt{y} a Boolean, then it returns the  Giuseppe Castagna committed Jul 09, 2019 115 116 integer \texttt{1}; if \texttt{x} is a character or \texttt{y} is an integer, then it returns \texttt{2}; otherwise the  Kim Nguyễn committed Jul 08, 2019 117 function returns \texttt{3}. Our system correctly deduces a (complex)  Giuseppe Castagna committed Jul 09, 2019 118 119 intersection type that covers all cases (plus several redundant arrow types). That this type is as precise as possible can be shown by the fact that  Kim Nguyễn committed Jul 08, 2019 120 when applying  Giuseppe Castagna committed May 06, 2021 121 122 \texttt{f} to arguments of the expected type, the \emph{type} statically deduced for the  Giuseppe Castagna committed Jul 09, 2019 123 whole expression is the singleton type \texttt{1}, or \texttt{2},  Kim Nguyễn committed Jul 08, 2019 124 125 126 or \texttt{3}, depending on the type of the arguments. Code~8 allows us to demonstrate the use and typing of record paths. We  Giuseppe Castagna committed Jul 09, 2019 127 model, using open records, the type of DOM objects that represent XML  Kim Nguyễn committed Jul 08, 2019 128 or HTML documents. Such objects possess a common field  Giuseppe Castagna committed Jul 09, 2019 129 \texttt{nodeType} containing an integer constant denoting the kind of  Kim Nguyễn committed Jul 09, 2020 130 the node (e.g., \p{1} for an element node, \p{3} for a text node, \ldots). Depending on the kind, the object will have  Kim Nguyễn committed Jul 08, 2019 131 132 different fields and methods. It is common practice to perform a test on the value of the \texttt{nodeType} field. In dynamic languages such  Kim Nguyễn committed Jul 09, 2020 133 134 135 as JavaScript, the relevant field can directly be accessed after having checked for the appropriate \texttt{nodeType}, whereas in statically typed languages such as Java, a downward cast  Kim Nguyễn committed Jul 08, 2019 136 from the generic \texttt{Node} type to the expected precise type of  Giuseppe Castagna committed Jul 09, 2019 137 the object is needed. We can see that using the extension presented in  Kim Nguyễn committed Jul 08, 2019 138 139 140 141 142 143 144 Section~\ref{ssec:struct} we can deduce the correct type for \texttt{x} in all cases. Of particular interest is the last case, since we use a type case to check the emptiness of the list of child nodes. This splits, at the type level, the case for the \Keyw{Element} type depending on whether the content of the \texttt{childNodes} field is the empty list or not.  Giuseppe Castagna committed Feb 26, 2020 145 Code~9 shows the usefulness of the rule \Rule{OverApp}.  Kim Nguyễn committed Feb 28, 2020 146 Consider the definition of the \texttt{xor\_} operator.  Kim Nguyễn committed Jul 08, 2019 147 148 Here the rule~[{\sc AbsInf}+] is not sufficient to precisely type the function, and using only this rule would yield a type  Giuseppe Castagna committed Feb 26, 2020 149 150 151  $\Any\to\Any\to\Bool$. \iflongversion Let us follow the behavior of the  Kim Nguyễn committed Jul 08, 2019 152 153  $\worra{}{}$'' operator. Here the whole \texttt{and\_} is requested to have type \True, which implies that \texttt{or\_ x y} must have  Giuseppe Castagna committed Jul 09, 2019 154  type \True. This can always happen, whether \texttt{x} is \True{} or  Kim Nguyễn committed Jul 08, 2019 155 156  not (but then depends on the type of \texttt{y}). The $\worra{}{}$'' operator correctly computes that the type for \texttt{x} in the  Giuseppe Castagna committed Jul 09, 2019 157  \texttt{then}'' branch is $\True\vee\lnot\True\lor\True\simeq\Any$,  Kim Nguyễn committed Jul 09, 2020 158 and a similar reasoning holds for \texttt{y}.  Giuseppe Castagna committed Feb 26, 2020 159 \fi%%%%%%%%%%%%%%  Kim Nguyễn committed Feb 28, 2020 160 However, since \texttt{or\_} has type  Giuseppe Castagna committed Feb 26, 2020 161 162 163 164 165 %\\[.7mm]\centerline{% $(\True\to\Any\to\True)\land(\Any\to\True\to\True)\land (\lnot\True\to\lnot\True\to\False)$ %}\\[.7mm] then the rule \Rule{OverApp} applies and \True, \Any, and $\lnot\True$ become candidate types for  Giuseppe Castagna committed Mar 03, 2020 166 \texttt{x}, which allows us to deduce the precise type given in the table. Finally, thanks to rule \Rule{OverApp} it is not necessary to use a type case to force refinement. As a consequence, we can define the functions \texttt{and\_} and \texttt{xor\_} more naturally as:  Giuseppe Castagna committed May 06, 2021 167 168 169 \begin{alltt}\color{darkblue} let and_ = fun (x : Any) -> fun (y : Any) -> not_ (or_ (not_ x) (not_ y)) \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{and+} let xor_ = fun (x : Any) -> fun (y : Any) -> and_ (or_ x y) (not_ (and_ x y)) \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{xor+}  Giuseppe Castagna committed Jul 09, 2019 170 171 \end{alltt} for which the very same types as in Table~\ref{tab:implem} are deduced.  Kim Nguyễn committed Jul 11, 2019 172   Kim Nguyễn committed May 06, 2021 173 174 As for Code~10 (corresponding to our introductory example~\eqref{nest1}), it illustrates the need for iterative refinement of  Giuseppe Castagna committed Jul 11, 2019 175 type environments, as defined in Section~\ref{sec:typenv}. As  Kim Nguyễn committed Jul 09, 2020 176 explained, a single pass analysis would deduce  Kim Nguyễn committed Jul 11, 2019 177 for {\tt x}  Giuseppe Castagna committed Jul 11, 2019 178 a type \Int{} from the {\tt f\;x} application and \Any{} from the {\tt g\;x}  Kim Nguyễn committed Jul 11, 2019 179 application. Here by iterating a second time, the algorithm deduces  Giuseppe Castagna committed May 06, 2021 180 that {\tt x} has type $\Empty$ (i.e., $\textsf{Empty}$), that is, that the first branch can never  Giuseppe Castagna committed Jul 11, 2019 181 be selected (and our implementation warns the user accordingly). In hindsight, the only way for a well-typed overloaded function to have  Giuseppe Castagna committed Jul 11, 2019 182 type $(\Int{\to}\Int)\land(\Any{\to}\Bool)$ is to diverge when the  Giuseppe Castagna committed Jul 11, 2019 183 argument is of type \Int: since this intersection type states that  Kim Nguyễn committed Jul 11, 2019 184 185 186 187 whenever the input is \Int, {\em both\/} branches can be selected, yielding a result that is at the same time an integer and a Boolean. This is precisely reflected by the case $\Int\to\Empty$ in the result. Indeed our {\tt example10} function can be applied to an integer, but  Giuseppe Castagna committed May 06, 2021 188 at runtime the application of {\tt f\,x} will diverge.  Kim Nguyễn committed Jul 09, 2020 189   Kim Nguyễn committed May 06, 2021 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 Code~11 simulates the behaviour of Javascript property resolution, by looking for a property \texttt{l} either in the object \texttt{o} itself or in the chained list of its \texttt{prototype} objects. In that example, we first model prototype chaining by defining a type \texttt{Object} that can be either the atom \texttt{Null} or any record with a \texttt{prototype} field which contains (recursively) an \texttt{Object}. To ease the reading, we defined a recursive type \texttt{ObjectWithPropertyL} which is either a record with a field \texttt{l} or a record with a prototype of type \texttt{ObjectWithPropertyL}. We can then define two predicate function \texttt{has\_property\_l} and \texttt{has\_own\_property\_l} that tests whether an object has a property through its prototype or directly. Lastly we can define a function \texttt{get\_property\_l} which directly access the field if it is present, or recursively search for it through the prototype chain (in our syntax, the paremeter \texttt{self} allows one to refer to the function itself). Of particular interest is the type deduced for the two predicate functions. Indeed, we can see that \texttt{has\_own\_property\_l} is given an overloaded type whose first argument is in each case a recursive record type that describe precisely whether \texttt{l} is present at some point in the list or not (recall that in a record type such as $\orecord{ l=?\Empty }$, indicate that field \texttt{l} is absent for sure). \input{code_table2} In Table~\ref{tab:implem2}, we convert in our syntax the 14 examples of \cite{THF10} (we tried to complete such examples with neutral code when they were incomplete in the original paper). Of these 14 examples, Example~1 to 13 depict combinations of type predicates (such as \texttt{is\_int}) used either directly or through Boolean predicates (such as the \texttt{or\_} function previously defined). Note that for all examples for which there was no explicit indication in the original version, we \emph{infer} the type of the function. Notice also that for Example~6, the goal of the example is to show that indeed, the function is ill-typed (which our typechecker detects accurately). The  Giuseppe Castagna committed May 06, 2021 221 original Example~14 could be written in our syntax as:  Kim Nguyễn committed May 06, 2021 222 223 224 225 226 227 228 229 230 231 232 233  \begin{verbatim} let example14_alt = fun (input : Int | String) -> fun (extra : (Any, Any)) -> if (input, is_int (fst extra)) is (Int, True) then add input (fst extra) else if is_int (fst extra) is True then add (strlen input) (fst extra) else 0 \end{verbatim} Notice, in the version above the absence of an occurrence of \texttt{input} in the second \texttt{if} statement. Indeed, if the first test fails, it is either  Giuseppe Castagna committed May 06, 2021 234  because \texttt{is\_int (fst extra)} is not \texttt{True} (i.e., if  Kim Nguyễn committed May 06, 2021 235 236 237 238  \texttt{fst extra} is not an integer) or because \texttt{input} is not an integer. Therefore, in our setting, the type information propagated to the second test is : $\texttt{(input, is\_int (fst extra))} \in \lnot (\Int, \True)$, that is $\texttt{(input, is\_int (fst extra))} \in (\lnot\Int,  Giuseppe Castagna committed May 06, 2021 239 240  \True)\lor(\Int, \False)$. Therefore, the type deduced for \texttt{input} in the second branch is $(\String\lor\Int) \land (\lnot\Int \lor \Int) =  Kim Nguyễn committed May 06, 2021 241 242 243 244 245 246  \String\lor\Int$ which is not precise enough. By adding an occurrence of \texttt{input} in our \texttt{example14\_alt}, we can further restrict its type typecheck the function. Lifting this limitation through a control-flow analysis is part of our future work.  Kim Nguyễn committed Jul 09, 2020 247 Although these experiments are still preliminary, they show how the  Giuseppe Castagna committed May 06, 2021 248 combination of occurrence typing and set-theoretic types, together  Kim Nguyễn committed Jul 09, 2020 249 with the type inference for overloaded function types presented in  Giuseppe Castagna committed Jul 09, 2020 250 251 Section~\ref{sec:refining} goes beyond what languages like TypeScript and Flow do, since they can only infer single arrow types.  Kim Nguyễn committed Jul 09, 2020 252 Our refining of overloaded  Giuseppe Castagna committed Jul 09, 2020 253 functions is also future-proof and resilient to extensions: since it retypes'' functions  Kim Nguyễn committed Jul 09, 2020 254  using information gathered by the typing of occurrences in the body,  Giuseppe Castagna committed Jul 09, 2020 255 256  its precision will improve with any improvement of our occurrence typing framework.