Commit 97c7d858 authored by Pietro Abate's avatar Pietro Abate
Browse files

[r2005-07-31 07:59:30 by afrisch] Empty log message

Original author: afrisch
Date: 2005-07-31 07:59:30+00:00
parent 276c83a0
......@@ -497,6 +497,11 @@ OCaml expression; x-patterns must be surrounded by {{..}}, e.g.:
function {{p1}} -> e1 | ... | {{pn}} -> en
Pattern matching follows is first-match policy. The first pattern
that succeeds triggers the corresponding branch.
currently it is impossible to mix normal OCaml patterns and x-patterns
in a single pattern matching.
......@@ -624,10 +629,35 @@ X-patterns can have:
In record x-patterns, it is possible to omit the <code>=p</code> part of a field.
The content is then replaced with the label name considered as
a capture variable. E.g. <code>{ x y=p }</code> is equivalent to
<code>{ x=x y=p }</code>.</p>
Here is a brief description of the semantics of patterns. Given
an input value, a pattern can either succeed or fail. If it succeeds,
it also produces a bindings from the capture variables in the pattern
to x-values.
<li>A pattern which is just a type (no capture variable) succeeds if
and only if the value has the type.</li>
<li>A pattern <code>p1 | p2</code> succeeds if either <code>p1</code>
or <code>p2</code> succeed, and returns the corresponding binding; if
both patterns succeeds, <code>p1</code> wins. It is required that
<code>p1</code> and <code>p2</code> have the same sets of capture
variables. </li>
<li>A pattern <code>p1 &amp; p2</code> succeeds if both <code>p1</code>
and <code>p2</code> succeed, and returns the concatenation of the two
bindings. It is required that <code>p1</code> and <code>p2</code> have
<em>disjoint</em> sets of capture variables. </li>
In record x-patterns, it is possible to omit the <code>=p</code> part
of a field. The content is then replaced with the label name
considered as a capture variable. E.g. <code>{ x y=p }</code> is
equivalent to <code>{ x=x y=p }</code>.</p>
<p>It is also possible to add an "else" clause:
<code>{ x = (a,_)|(a:=3) }</code>
......@@ -645,7 +675,8 @@ If the same sequence capture variable appears several times (or below a
repetition) in a regexp, it is bound to the concatenation of all
matched subsequences. E.g.: <code>[ (x::Int | _)* ]</code> will
collect in <code>x</code> all the elements of type <code>Int</code> from
a sequence.</p>
a sequence. It is not legal to have repeated simple capture variables.
The regexp operators <code>+,*,?</code> are greedy by default (they match as long
......@@ -940,8 +971,142 @@ documentation for a description of its interface.
<box title="Code samples" link="code">
<box title="Marshaling" link="marshal">
OCamlDuce use some tricks on its internal representation of x-values
to reduce memory usage and improve performance. You need to pay
special attention is you want to use OCaml serialization function
(module <code>Marshal</code>, functions
<code>input_value/output_value</code>) on x-values. In addition to
your values, you also need to save and restore some piece of internal data
using the functions <code>Cduce_types.Value.extract_all</code> and
<code>Cduce_types.Value.intract_all</code>. Of course, this also
applies if the value to be serialized contains deeply nested x-values.
Here are generic
serialization/deserializations functions that illustrate how to do it:
let my_output_value oc v =
let p = Cduce_types.Value.extract_all () in
output_value oc (p,v)
let my_input_value ic =
let (p,v) = input_value ic in
Cduce_types.Value.intract_all p;
<box title="Performance" link="perf">
<section title="Strings">
OCaml users might be surprised by the fact that x-strings are simply
represented as sequences in OCamlDuce. Does this mean that they are
actually stored in memory as linked list? Certainly not! The internal
representation of sequence values uses several tricks to improve
performance and memory usage. In particular, a special form in the
representation can store strings as byte buffers, as in OCaml.
It an XML document is loaded, or if a Caml string is converted
to an x-value, this compact representation will be used.
<section title="Concatenation">
Similarly, OCaml users might be relectutant to use the sequence
concatenation <code>@</code> on sequences. In OCaml, the complexity
of this operator is linear in the size of its first argument (which
need to be copied). OCamlDuce use a special form in its internal
representation to store concatenation in a lazy way. The concatenation
will really by computed only when the value is accessed. This means
that it's perfectly ok to build a long sequence by adding
new elements at the end one by one, as long as you don't
simultaneously inspect the sequence.
<section title="Pattern matching">
Another point which is worth knowing when programming in OCamlDuce
is that patterns can be written in a declarative style without
affective performance. The compiler uses static type information
about matched values to produce efficient code for pattern matching.
To illustrate this, consider the following sample:
type a = {{ <a>[ a* ] }}
type b = {{ <b>[ b* ] }}
let f : {{ a|b }} -> int = function {{ a }} -> 0 | {{ _ }} -> 1
type a = {{ <a>[ a* ] }}
type b = {{ <b>[ b* ] }}
let f : {{ a|b }} -> int = function {{ <a>_ }} -> 0 | {{ _ }} -> 1
The two functions have exactly the same semantics, but the first
implementation is more declarative: it uses type checks to distinguish
between <code>a</code> and <code>b</code> instead of saying
<em>how</em> to distinguish between these two types. Imagine
that the definition of these types change to:
type a = {{ <x kind="a">[ a* ] }}
type b = {{ <x kind="b">[ b* ] }}
Then the first implementation still works as expected, but the
second one needs to be rewritten.</p>
<p>Now one might believe that the second implementation is more
efficient because it tells the compiler to check only the root tag,
whereas the first implementation would force
the compiler to produce code to check that all tags in the tree
are <code>a</code>s. But this is not what happens! Actually,
you can check that the compiler will produce exactly the same code
for both implementations. It considers the static type information
about the argument of the pattern matching (here, the input type
of the function), and computes an efficient way to evaluate
patterns for the values of this type.
<section title="The map iterator">
The <code>map ... with ...</code> iterator is implemented in a
tail-recursive way. You can safely use it on very long sequences.
<box title="Code samples" link="code">
<section title="Parsing XML files">
......@@ -998,7 +1163,7 @@ a complex parsing of XML.
It it interesting to introduce errors in the parser
<code></code> or the printer
<code></code> and see how the type system catch them.
<code></code> and see how the type system catches them.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment