practical.tex 11.4 KB
Newer Older
 Kim Nguyễn committed Jul 08, 2019 1 We have implemented a simplified version of the algorithm presented in  Giuseppe Castagna committed Jul 09, 2019 2 3 4 Section~\ref{ssec:algorithm} that does not make use of type schemes and is, therefore, incomplete w.r.t. the system of Section~\ref{sec:static}. In particular, as we  Giuseppe Castagna committed Jul 10, 2019 5 explained in Section~\ref{ssec:algorithm}, \beppe{did we?} in the absence of type  Giuseppe Castagna committed Jul 09, 2019 6 schemes it is not always possible to prove that $\forall v, \forall t,  Giuseppe Castagna committed Jul 10, 2019 7 v \in t \text{~or~} v \not\in \lnot t$. Since this property ceases  Kim Nguyễn committed Jul 09, 2019 8 to hold only for $\lambda$-abstractions, then not using type schemes  Giuseppe Castagna committed Jul 09, 2019 9 yields less precise typing only for tests $\ifty{e}t{e_1}{e_2}$ where $e$  Giuseppe Castagna committed Jul 09, 2019 10 11 has a functional type, that is, the value tested will be a $\lambda$-abstraction. This seems like a reasonable compromise between the  Giuseppe Castagna committed Jul 10, 2019 12 complexity of an implementation involving type schemes and the programs  Giuseppe Castagna committed Jul 09, 2019 13 14 15 we want to type-check in practice. Indeed, if we restrict the language so that the only functional type $t$ allowed in a test $\ifty{e}t{e_1}{e_2}$ is $\Empty{\to}\Any$---i.e., if we allow to check whether a value is a  Giuseppe Castagna committed Jul 09, 2019 16 function but not whether it has a specific function type (\emph{cf.}, Footnote~\ref{foo:typecase})---, then our  Giuseppe Castagna committed Jul 10, 2019 17 implementation becomes complete (see Corollary~\ref{app:completeness} in the appendix for a formal proof).  Giuseppe Castagna committed Jan 03, 2019 18   Kim Nguyễn committed Jul 08, 2019 19 Our implementation is written in OCaml and uses CDuce as a library to  Giuseppe Castagna committed Jul 10, 2019 20 provide the semantic subtyping machinery. Besides a type-checking  Kim Nguyễn committed Jul 08, 2019 21 22 algorithm defined on the base language, our implementation supports record types (Section \ref{ssec:struct}) and the refinement of function types  Giuseppe Castagna committed Jul 09, 2019 23 (Section \ref{sec:refining} with the rule of Appendix~\ref{app:optimize}). The implementation is rather crude and  Giuseppe Castagna committed Jul 09, 2019 24 consist of 2000 lines of OCaml code, including parsing, type-checking  Giuseppe Castagna committed Jul 10, 2019 25 of programs, and pretty printing of types. We demonstrate the output of  Kim Nguyễn committed Jul 09, 2019 26 27 our type-checking implementation in Table~\ref{tab:implem}. These examples and others can be tested in the online toplevel available at  Giuseppe Castagna committed Jul 10, 2019 28 29 30 \url{https://occtyping.github.io/}% \ifsubmission ~(the corresponding repository is  Kim Nguyễn committed Jul 09, 2019 31 anonymized).  Giuseppe Castagna committed Jul 10, 2019 32 33 \else. \fi  Kim Nguyễn committed Jul 08, 2019 34 \input{code_table}  Giuseppe Castagna committed Jul 09, 2019 35 In this table, the second column gives a code fragment and the third  Kim Nguyễn committed Jul 08, 2019 36 column the type deduced by our implementation. Code~1 is a  Giuseppe Castagna committed Jul 09, 2019 37 38 straightforward function similar to our introductory example \code{foo} in (\ref{foo},\ref{foo2}). Here the programmer annotates the parameter of the function with a coarse type  Kim Nguyễn committed Jul 08, 2019 39 $\Int\vee\Bool$. Our implementation first type-checks the body of the  Kim Nguyễn committed Jul 08, 2019 40 function under this assumption, but doing so collects that the type of  Giuseppe Castagna committed Jul 09, 2019 41 42 43 44 45 $\texttt{x}$ is specialized to \Int{} in the then'' case and to \Bool{} in the else'' case. The function is thus type-checked twice more under each hypothesis for \texttt{x}, yielding the precise type $(\Int\to\Int)\land(\Bool\to\Bool)$. Note that w.r.t.\ rule \Rule{AbsInf+} of Section~\ref{sec:refining}, we improved the output of the computed type. Indeed using rule~[{\sc AbsInf}+] we would obtain the  Kim Nguyễn committed Jul 08, 2019 46 47 type $(\Int\to\Int)\land(\Bool\to\Bool)\land(\Bool\vee\Int\to\Bool\vee\Int)$  Kim Nguyễn committed Jul 08, 2019 48 49 with a redundant arrow. Here we can see that since we deduced the first two arrows $(\Int\to\Int)\land(\Bool\to\Bool)$, and since  Kim Nguyễn committed Jul 09, 2019 50 the union of their domain exactly covers the domain of the third arrow,  Kim Nguyễn committed Jul 08, 2019 51 the latter is not needed. Code~2 shows what happens when the argument  Giuseppe Castagna committed Jul 09, 2019 52 of the function is left unannotated (i.e., it is annotated by the top  Kim Nguyễn committed Jul 09, 2019 53 type \Any, written \texttt{Any} in our implementation). Here  Kim Nguyễn committed Jul 08, 2019 54 55 type-checking and refinement also work as expected, but the function only type checks if all cases for \texttt{x} are covered (which means  Giuseppe Castagna committed Jul 09, 2019 56 57 that the function must handle the case of inputs that are neither in \Int{} nor in \Bool).  Kim Nguyễn committed Jul 08, 2019 58 59 60  The following examples paint a more interesting picture. First (Code~3) it is  Giuseppe Castagna committed Jul 09, 2019 61 easy in our formalism to program type predicates such as those  Giuseppe Castagna committed Jul 09, 2019 62 hard-coded in the $\lambda_{\textit{TR}}$ language of \citet{THF10}. Such type  Kim Nguyễn committed Jul 08, 2019 63 64 predicates, which return \texttt{true} if and only if their input has a particular type, are just plain functions with an intersection  Giuseppe Castagna committed Jul 09, 2019 65 type inferred by the system of Section~\ref{sec:refining}. We next define Boolean connectives as overloaded  Giuseppe Castagna committed Jul 09, 2019 66 functions. The \texttt{not\_} connective (Code~4) just tests whether its  Kim Nguyễn committed Jul 08, 2019 67 argument is the Boolean \texttt{true} by testing that it belongs to  Kim Nguyễn committed Jul 09, 2019 68 69 70 71 72 73 74 75 76 the singleton type \True{} (the type whose only value is \texttt{true}) returning \texttt{false} for it and \texttt{true} for any other value (recall that $\neg\True$ is equivalent to $\texttt{Any\textbackslash}\True$). It works on values of any type, but we could restrict it to Boolean values by simply annotating the parameter by \Bool{} (which in CDuce is syntactic sugar for \True$\vee$\False) yielding the type $(\True{\to}\False)\wedge(\False{\to}\True)$. The \texttt{or\_} connective (Code~5) is straightforward as far as the  Giuseppe Castagna committed Jul 10, 2019 77 78 code goes, but we see that the overloaded type precisely captures all possible cases. Again we use a generalized version of the  Giuseppe Castagna committed Jul 09, 2019 79 \texttt{or\_} connective that accepts and treats any value that is not  Kim Nguyễn committed Jul 09, 2019 80 81 \texttt{true} as \texttt{false} and again, we could easily restrict the domain to \Bool{} if desired.  Kim Nguyễn committed Jul 08, 2019 82   Giuseppe Castagna committed Jul 09, 2019 83 To showcase the power of our type system, and in particular of  Kim Nguyễn committed Jul 08, 2019 84 the $\worra{}{}$''  Giuseppe Castagna committed Jul 09, 2019 85 type operator, we define \texttt{and\_} (Code~6) using De Morgan's  Kim Nguyễn committed Jul 08, 2019 86 Laws instead of  Giuseppe Castagna committed Jul 09, 2019 87 using a direct definition. Here the application of the outermost \texttt{not\_} operator is checked against type \True. This  Kim Nguyễn committed Jul 08, 2019 88 allows the system to deduce that the whole \texttt{or\_} application  Giuseppe Castagna committed Jul 09, 2019 89 has type \False, which in turn leads to \texttt{not\_\;x} and  Victor Lanvin committed Jul 10, 2019 90 \texttt{not\_\;y} to have type $\lnot \True$ and therefore both \texttt{x}  Kim Nguyễn committed Jul 08, 2019 91 92 93 94 95 96 and \texttt{y} to have type \True. The whole function is typed with the most precise type (we present the type as printed by our implementation, but the first arrow of the resulting type is equivalent to $(\True\to\lnot\True\to\False)\land(\True\to\True\to\True)$).  Giuseppe Castagna committed Jul 09, 2019 97 All these type predicates and Boolean connectives can be used together  Kim Nguyễn committed Jul 08, 2019 98 99 100 to write complex type tests, as in Code~7. Here we define a function \texttt{f} that takes two arguments \texttt{x} and \texttt{y}. If \texttt{x} is an integer and \texttt{y} a Boolean, then it returns the  Giuseppe Castagna committed Jul 09, 2019 101 102 integer \texttt{1}; if \texttt{x} is a character or \texttt{y} is an integer, then it returns \texttt{2}; otherwise the  Kim Nguyễn committed Jul 08, 2019 103 function returns \texttt{3}. Our system correctly deduces a (complex)  Giuseppe Castagna committed Jul 09, 2019 104 105 intersection type that covers all cases (plus several redundant arrow types). That this type is as precise as possible can be shown by the fact that  Kim Nguyễn committed Jul 08, 2019 106 when applying  Giuseppe Castagna committed Jul 09, 2019 107 108 \texttt{f} to arguments of the expected type, the \emph{type} deduced for the whole expression is the singleton type \texttt{1}, or \texttt{2},  Kim Nguyễn committed Jul 08, 2019 109 110 111 or \texttt{3}, depending on the type of the arguments. Code~8 allows us to demonstrate the use and typing of record paths. We  Giuseppe Castagna committed Jul 09, 2019 112 model, using open records, the type of DOM objects that represent XML  Kim Nguyễn committed Jul 08, 2019 113 or HTML documents. Such objects possess a common field  Giuseppe Castagna committed Jul 09, 2019 114 115 \texttt{nodeType} containing an integer constant denoting the kind of the node (e.g., \p{9} for the root element, \p{1} for an element node, \p{3} for a text node, \ldots). Depending on the kind, the object will have  Kim Nguyễn committed Jul 08, 2019 116 117 118 119 different fields and methods. It is common practice to perform a test on the value of the \texttt{nodeType} field. In dynamic languages such as JavaScript, the relevant field or method can directly be accessed after having checked for the appropriate \texttt{nodeType}. In  Giuseppe Castagna committed Jul 09, 2019 120 mainstream statically typed languages, such as Java, a downward cast  Kim Nguyễn committed Jul 08, 2019 121 from the generic \texttt{Node} type to the expected precise type of  Giuseppe Castagna committed Jul 09, 2019 122 the object is needed. We can see that using the extension presented in  Kim Nguyễn committed Jul 08, 2019 123 124 125 126 127 128 129 Section~\ref{ssec:struct} we can deduce the correct type for \texttt{x} in all cases. Of particular interest is the last case, since we use a type case to check the emptiness of the list of child nodes. This splits, at the type level, the case for the \Keyw{Element} type depending on whether the content of the \texttt{childNodes} field is the empty list or not.  Kim Nguyễn committed Jul 11, 2019 130 Our implementation features another enhancement that allows us  Kim Nguyễn committed Jul 09, 2019 131 to further improve the precision of the inferred type.  Kim Nguyễn committed Jul 08, 2019 132 133 134 Consider the definition of the \texttt{xor\_} operator (Code~9). Here the rule~[{\sc AbsInf}+] is not sufficient to precisely type the function, and using only this rule would yield a type  Giuseppe Castagna committed Jul 09, 2019 135  $\Any\to\Any\to\Bool$. Let us follow the behavior of the  Kim Nguyễn committed Jul 08, 2019 136 137  $\worra{}{}$'' operator. Here the whole \texttt{and\_} is requested to have type \True, which implies that \texttt{or\_ x y} must have  Giuseppe Castagna committed Jul 09, 2019 138  type \True. This can always happen, whether \texttt{x} is \True{} or  Kim Nguyễn committed Jul 08, 2019 139 140  not (but then depends on the type of \texttt{y}). The $\worra{}{}$'' operator correctly computes that the type for \texttt{x} in the  Giuseppe Castagna committed Jul 09, 2019 141  \texttt{then}'' branch is $\True\vee\lnot\True\lor\True\simeq\Any$,  Kim Nguyễn committed Jul 08, 2019 142 143 and a similar reasoning holds for \texttt{y}. To solve this problem, we can first remark that even though type cases in the body of a  Giuseppe Castagna committed Jul 09, 2019 144 145 146 147 function are tipping points that may change the type of the result of the function, they are not the only ones: applications of overloaded functions play exactly the same role. We therefore extend deduction system for $\Gamma \vdash e\triangleright\psi$ defined in Section~\ref{sec:refining} with the following rule\\[1mm]  Kim Nguyễn committed Jul 08, 2019 148 149 150 \centerline{$$\Infer[OverApp] {  Giuseppe Castagna committed Jul 09, 2019 151  \Gamma \vdash e : \textstyle\bigvee \bigwedge_{i \in I}t_i\to{}s_i\\  Kim Nguyễn committed Jul 08, 2019 152 153 154 155  \Gamma \vdash x : t\\ \Gamma \vdash e\triangleright\psi_1\\ \Gamma \vdash x\triangleright\psi_2\\ }  Giuseppe Castagna committed Jul 09, 2019 156  { \Gamma \vdash\textstyle  Kim Nguyễn committed Jul 08, 2019 157 158 159  {e}{~x}\triangleright\psi_1\cup\psi_2\cup\bigcup_{i\in I}\{ x\mapsto t\wedge t_i\} } {}  Giuseppe Castagna committed Jul 09, 2019 160 161 $$}\\ Whenever a function parameter is the argument of an  Kim Nguyễn committed Jul 08, 2019 162 overloaded function, we record as possible types for this parameter  Giuseppe Castagna committed Jul 09, 2019 163 164 all the possible domains of the arrows that type the overloaded function, restricted by the static type of the parameter. In Code~9,  Giuseppe Castagna committed Jul 10, 2019 165 since, \texttt{or\_} has type\\[.7mm]  Kim Nguyễn committed Jul 08, 2019 166 \centerline{$(\True\to\Any\to\True)\land(\Any\to\True\to\True)\land  Giuseppe Castagna committed Jul 10, 2019 167 168  (\lnot\True\to\lnot\True\to\False)$}\\[.7mm] We consider \True, \Any, and $\lnot\True$ as candidate types for  Giuseppe Castagna committed Jul 10, 2019 169 \texttt{x} which, in turn allows us to deduce a precise type given in the table. Finally, thanks to this rule it is no longer necessary to use a type case to force refinement. As a consequence we can define the functions \texttt{and\_} and \texttt{xor\_} more naturally as:  Giuseppe Castagna committed Jul 10, 2019 170 \begin{alltt}\color{darkblue}\morecompact  Giuseppe Castagna committed Jul 10, 2019 171 172  let and_ = fun (x : Any) -> fun (y : Any) -> not_ (or_ (not_ x) (not_ y)) let xor_ = fun (x : Any) -> fun (y : Any) -> and_ (or_ x y) (not_ (and_ x y))  Giuseppe Castagna committed Jul 09, 2019 173 174 \end{alltt} for which the very same types as in Table~\ref{tab:implem} are deduced.  Kim Nguyễn committed Jul 11, 2019 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192  Last but not least Code~10 (which corresponds to our introductory Example~\ref{nest1}) illustrate the need for iterative refinement of type environments, as defined in Section~\ref{sec:typenv}. Indeed, has we have explained, a single pass analysis would deduce independently for {\tt x} a type \Int from the {\tt f~x} application and \Any from the {\tt g~x} application. Here by iterating a second time, the algorithm deduces that {\tt x} has type \Empty, that is that the first branch can never be selected (and our implementation warns the user accordingly). In hindsight, the only way for a well typed overloaded function to have type $(\Int\to\Int)\land(\Any\to\Bool)$ is to diverge when the argument is of type \Int (since this intersection types states that whenever the input is \Int, {\em both\/} branches can be selected, yielding a result that is at the same time an integer and a Boolean. This is precisely reflected by the case $\Int\to\Empty$ in the result. Indeed our {\tt example10} function can be applied to an integer, but at runtime the application of {\tt f ~x} will diverge.