Commit b182ca94 authored by Pietro Abate's avatar Pietro Abate
Browse files

[r2005-07-30 15:14:34 by afrisch] Empty log message

Original author: afrisch
Date: 2005-07-30 15:14:34+00:00
parent 899aa0de
...@@ -72,9 +72,9 @@ Some simple examples can be found <a --> ...@@ -72,9 +72,9 @@ Some simple examples can be found <a -->
<box title="Overview" link="overview"> <box title="Overview" link="overview">
<p> <p>
In a nutshell, OCamlDuce extends OCaml with new kind of values In a nutshell, OCamlDuce extends OCaml with a new kind of values
(<em>x-values</em>) which represent XML documents, fragments, tags, Unicode (<em>x-values</em>) to represent XML documents, fragments, tags, Unicode
strings. To describe these values, it also extends the type algebra strings. In order to describe these values, it also extends the type algebra
with so-called <em>x-types</em>. The philosophy behind these types is that they with so-called <em>x-types</em>. The philosophy behind these types is that they
represent <em>set of x-values</em>. They can be very precise: indeed, represent <em>set of x-values</em>. They can be very precise: indeed,
each value can be seen as a singleton type (a set with a single each value can be seen as a singleton type (a set with a single
...@@ -639,8 +639,8 @@ collect in <code>x</code> all the elements of type <code>Int</code> from ...@@ -639,8 +639,8 @@ collect in <code>x</code> all the elements of type <code>Int</code> from
a sequence.</p> a sequence.</p>
<p> <p>
The regexp operators +,*,? are greedy by default (they match as long The regexp operators <code>+,*,?</code> are greedy by default (they match as long
as possible). They admit non-greedy variants +?,*?,??. as possible). They admit non-greedy variants <code>+?,*?,??</code>.
</p> </p>
</box> </box>
...@@ -683,10 +683,229 @@ local form <code> let namespace "..." in ... </code>. ...@@ -683,10 +683,229 @@ local form <code> let namespace "..." in ... </code>.
</box> </box>
<box title="A few words about performances" link="perf"> <box title="More on type-checking" link="typecheck">
<section title="Type inference">
<p>
As we said above, the programmer is sometimes required to provide type
annotations. To know where to put these annotation, it is necessary to
get a basic understanding of how type-checking works.
</p>
<p>
The OCaml type-checker is run first to detect which sub-expressions
are of the x-kind. A second ML type-checking pass is then done to
introduce subsumption (implicit subtyping) steps where allowed. After
these two passes, the OCamlDuce type checker obtains a data-flow summary of
x-values in the whole compilation unit. This is a directed graph,
whose edges represent either simple data-flow or complex operation
on x-values. The nodes of the graph can be thought as x-type
variables. A data-flow edge corresponds to a subtyping constraints,
and an operation edge corresponds to a symbolic constraints which
mimics the corresponding operation on values.
</p>
<p>
Some of the nodes are given an explicit type by the programmer,
through type annotations (on expressions or function arguments)
or the other usual mechanism in ML (data type declarations,
signatures, ...).
</p>
<p>
Also, if there is a loop with only subtyping edges in the graph,
all the nodes on the loop are merged together.
</p>
<p>
After this operation, the graph is required to be acyclic (assuming
that the nodes with an explicit type are removed from the graph). It
is the responsibility of the programmer to provide enough type
annotation to achieve this property. Otherwise, a type error
is issued.
</p>
<sample><![CDATA[{{ON}}
# let rec f x = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
Cycle detected: cannot type-check
# let rec f x : {{ String }} = match x with 0 -> {{ [] }} | n -> {{ f {{n-1}} @ ['.'] }};;
val f : int -> {{String}} = <fun>]]>
</sample>
<p>
In the example above, there is a cycle between the result type for
<code>f</code> and the type for the sub-expression <code>{{ON}}f
{{n-1}}</code>. It is here broken with a type annotation on the result; it could
have been broken by a type annotation on the expression <code>{{ON}}f
{{n-1}}</code>, or on the function <code>f</code> itself, or by a
module signature.
</p>
<p>
Let us study another simple example:
</p>
<sample>{{ON}}
# let f x = {{ x + 1 }} in f {{ 2 }}, f {{ 3 }};;
- : {{3--4}} * {{3--4}} = ({{3}}, {{4}})
</sample>
<p>
The type-checkers detects that the two x-values <code>2</code> and
<code>3</code> can flow to the argument of <code>f</code>. Its body
is thus type-checked with the assumption that <code>x</code> has type
<code>2--3</code>. The computed result type is then <code>3--4</code>.
</p>
<p>
The type-inference process described above is global by nature. The
acyclicity condition is only imposed after a whole compilation unit
has been type-checked by OCaml (and the information from the module
interface as been integrated). When a type variable is inferred to
be of the x-kind, it is never generalized. As a consequence, there
is no parametric polymorphism on x-types.
</p>
<p>
In the toplevel, type-checking is done after each phrase. Consider
the following session:
</p>
<sample><![CDATA[{{ON}}
# let f x = {{ x + 1 }};;
val f : {{Empty}} -> {{Empty}} = <fun>
# let a = f {{ 2 }};;
Subtyping failed 2 <= Empty
Sample:
2
]]></sample>
<p>
The function <code>f</code> is inferred to have type
<code>{{ON}}{{Empty}} -> {{Empty}}</code> because when the first
phrase is type-checked, the data-flow graph says that no value
can flow to <code>x</code>, and thus the input type is empty
(and similarly for the result type). If the two phrases
were type-checked together (which would be the case it they had
been compiled by the compiler, not in the toplevel), the type checker
would have correctly inferred that the input type for <code>f</code>
must contain <code>2</code>.
</p>
</section>
<section title="Implicit subtyping">
<p>
Coercion from an x-type to a super type is automatic in OCamlDuce.
However, this automatic subsumption does not carry over to OCaml
type constructor, even if there are covariant. Consider:
</p>
<sample><![CDATA[{{ON}}
# let f (x : {{ Int }} * {{ Int }}) = 1;;
val f : {{Int}} * {{Int}} -> int = <fun>
# let g (x : {{ 0 }} * {{ 0 }}) = f x;;
This expression has type {{0}} * {{0}} but is here used with type
{{Int}} * {{Int}}
# let g (x : {{ 0 }} * {{ 0 }}) = let a,b = x in f (a,b);;
val g : {{0}} * {{0}} -> int = <fun>
# let g (x : {{ 0 }} * {{ 0 }}) = f (x :> {{ Int }} * {{ Int }});;
val g : {{0}} * {{0}} -> int = <fun>
]]></sample>
<p>
The first attempt to define <code>g</code> fails because the type for
<code>x</code> is not an x-type and thus subsumption does not
apply. In the second attempt, we extract the two components of the
pair; since they are inferred to be x-values, subtyping applies to
both of them. Thus, when the pair <code>(a,b)</code> is reconstructed,
it is legal to unify its type with the input type of <code>f</code>.
The third definition for <code>g</code> gives an alternative solution:
using explicit OCaml type coercions.
</p>
</section>
</box>
<box title="Exchanging values" link="transl">
<p>
OCamlDuce strongly seperates regular OCaml values from the new
x-values. They have different syntax, expressions, types, patterns,
and even type-checking algorithms. This strong segregation is key point
which allowed a simple integration between very different type
systems.
</p>
<p>
At some point, it is still necessary to cross the frontier and
translate OCaml values to x-values or the opposite.
</p>
<p>
Fortunately, OCamlDuce provides automatic translations in both
directions. Instead of double curly braces, you can
enclose x-expressions in curly brace+colon <code>{: ... :}</code>
(here, the <code>...</code> is an x-expression).
The effect is to translate the result of the x-expression
(which must be an x-value) to an OCaml value. Similarly,
in an x-expression, you can obtain the x-translation of
an OCaml value with the same syntax <code>{: ... :}</code>
(here, the <code>...</code> is an OCaml expression).
</p>
<p>
Here is how the translation works. To each OCaml type <code>t</code>,
we associate an x-type <code>T(t)</code> and a pair of translation
function between <code>t</code> and <code>T(t)</code>.
Actually, not all the features are supported. For instance,
free type variables, abstract types, object types, non-regular
recursive types cannot be translated. In particular, since
type variables are not allowed, the OCaml type must be fully known.
</p>
<p>
The translation for an OCaml type <code>t</code> is by structural
induction on the definition of <code>t</code>. Sum types are
translated to union types: a constant constructor <code>A</code> is
translated to the qualified name <code>`A</code>; a non-constant
constructor <code>A of t1 * ... * tn</code> is translated to
<code>&lt;A>[ T(t1) ... T(tn) ]</code>. Closed polymorphic variants
have the same translation. Record types are translated to closed
record x-types. Some other translations:
</p>
<table border="1">
<tr><th>Caml type t</th> <th>X-type T(t)</th></tr>
<tr><td><code>int</code></td> <td><code>Int</code></td></tr>
<tr><td><code>int32</code></td> <td><code>Int32</code></td></tr>
<tr><td><code>int64</code></td> <td><code>Int64</code></td></tr>
<tr><td><code>string</code></td> <td><code>Latin1</code></td></tr>
<tr><td><code>t list</code></td> <td><code>[T(t)*]</code></td></tr>
<tr><td><code>t array</code></td> <td><code>[T(t)*]</code></td></tr>
<tr><td><code>unit</code></td> <td><code>[]</code></td></tr>
<tr><td><code>char</code></td> <td><code>Latin1Char</code></td></tr>
<tr><td><code>{{t}}</code></td> <td><code>t</code></td></tr>
</table>
<p>
Here is an example:
</p>
<sample>{{ON}}
# let f (x : {{ Int }}) = {{ x + 1 }} in List.map f {: [ 1 2 3 ] :};;
- : {{Int}} list = [{{2}}; {{3}}; {{4}}]
</sample>
<p> <p>
TODO: efficient repr of strings, concatenation. Pattern matching. In this example, the result type of the translation is inferred
to be <code>{{ON}}{{ Int }} list</code> (because the type for
<code>f</code> is given). The corresponding x-type
is <code>{{ON}}{{ [Int*] }}</code>.
</p> </p>
</box> </box>
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment