Commit 3af01457 authored by Pietro Abate's avatar Pietro Abate

add tutorial and manual to the distribution

parent afcb4bdd
This diff is collapsed.
This diff is collapsed.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<page name="manual_interpreter">
<title>Compiler/interpreter/toplevel</title>
<box title="Command-line" link="cmdline">
<p>
According to the command line arguments,
the <code>cduce</code> command behaves
either as an interactive toplevel, an interpreter, a compiler, or
a loader.
</p>
<ul>
<li>
<sample>
cduce [OPTIONS ...] [--arg ARGUMENT ...]
</sample>
<p>
The command operates as an interactive
toplevel. See the <a href="#toplevl">Toplevel</a> section below.
</p>
</li>
<li>
<sample>
cduce [OPTIONS ...] [ {{script}}.cd | --stdin ] [--arg ARGUMENT ...]
</sample>
<sample>
cduce [OPTIONS ...] --script {{script}}.cd [ ARGUMENT ...]
</sample>
<p>
The command runs the script <code>{{script}}.cd</code>.
</p>
</li>
<li>
<sample>
cduce [OPTIONS ...] --compile {{script}}.cd
</sample>
<p>
The command compiles the script <code>{{script}}.cd</code> and
produces <code>{{script}}.cdo</code>. If the OCaml/CDuce interface
is available and enabled, the compilers looks for a corresponding OCaml interface
<code>{{script}}.cmi</code>. See the <local
href="manual_interfacewithocaml"/> page for more information.
</p>
</li>
<li>
<sample>
cduce [OPTIONS ...] --run [ {{script}}.cdo ... ] [--arg ARGUMENT ...]
</sample>
<p>
The command runs one or several pre-compiled scripts.
</p>
</li>
</ul>
<p>
The arguments that follow the <code>--arg</code> option
are the scripts' command line. They can be accessed within
CDuce using the <code>argv</code> operator
(of type <code>[] -> [ String* ]</code>).
</p>
<p>
The options and arguments are:
</p>
<ul>
<li> <code>--verbose</code> (for <code>--compile</code> mode only).
Display the type of values in the compiled unit.</li>
<li> <code>--obj-dir %%directory%%</code> (for <code>--compile</code>
mode only).
Specify where to put the <code>.cdo</code> file (default: same directory as the
source file).</li>
<li> <code>--I %%directory%%</code>
Add a directory to the search path for <code>.cdo</code>,
<code>.cmi</code> and include files. </li>
<li> <code>--stdin</code>. Read CDuce script from standard input. </li>
<li> <code>--no %%feature%%</code>.
Disable one of the built-in optional features. The list of feature and
their symbolic name can be obtained with the <code>-v</code>
option. Can be used for instance to turn the Expat parser off, in
order to use PXP, if both have been included at compile time.
</li>
<li> <code>-v</code>, <code>--version</code>. Show version information
and built-in optional features, and exit.
</li>
<li> <code>--mlstub</code>. See <local href="manual_interfacewithocaml"/>.
</li>
<li> <code>--help</code>. Show usage information about the command line.
</li>
</ul>
</box>
<box title="Scripting" link="scripting">
<p>
CDuce can be used for writing scripts. As usual it suffices to start
the script file by <code> #!%%install_dir%%/cduce</code> to call in a
batch way the CDuce interpreter. The <code>--script</code> option can
be used to avoid <code>--arg</code> when calling the script. Here is
an example of a script file that prints all the titles of the filters
of an Evolution mail client.
</p>
<sample><![CDATA[
#!/bin/env cduce --script
type Filter = <filteroptions>[<ruleset> [(<rule>[<title>String _*])+]];;
let src : Latin1 =
match argv [] with
| [ f ] -> f
| _ -> raise "Invalid command line"
in
let filter : Filter =
match load_xml src with
| x&Filter -> x
| _ -> raise "Not a filter document"
in
print_xml(<filters>([filter]/<ruleset>_/<rule>_/<title>_)) ;;
]]></sample>
</box>
<box title="Phrases" link="phrases">
<p>
CDuce programs are sequences of phrases, which can
be juxtaposed or separated by <code>;;</code>. There are several kinds of
phrases:
</p>
<ul>
<li>Types declarations <code>type %%T%% = %%t%%</code>. Adjacent types declarations are mutually
recursive, e.g.:
<sample><![CDATA[
type T = <a>[ S* ]
type S = <b>[ T ]
]]></sample>
</li>
<li>Function declarations <code>let %%f%% %%...%%</code>.
Adjacent function declarations are mutually recursive, e.g.:
<sample><![CDATA[
let f (x : Int) : Int = g x
let g (x : Int) : Int = x + 1
]]></sample>
</li>
<li>Global bindings <code>let %%p%% = %%e%%</code>
(bind the result of the expression <code>%%e%%</code> using the
pattern <code>%%p%%</code>),
<code>let %%p%% : %%t%% = %%e%%</code>
(gives a less precise type to the expression),
<code>let %%p%% :? %%t%% = %%e$$</code>
(dynamically checks that the expression has some type).</li>
<li>Evaluation statements: an expression to evaluate.</li>
<li>Textual inclusion <code>include "%%other_cduce_script.cd%%"</code>;
note that cycle of inclusion are detected and automatically broken.
Filename are relative to the directory of the current file
(or the current directory in the toplevel).
</li>
<li>Global namespace binding: see <local href="namespaces"/>.</li>
<li>Schema declaration: see <local href="manual_schema"/>.</li>
<li>Alias for an external unit <code>using %%alias%% =
"%%unit%%"</code> or <code>using %%alias%% = %%unit%%</code>: gives an
alternative name for a pre-compiled unit. Values, types, namespace
prefixes, schema from <code>%%unit%%.cdo</code> can be referred to
either as <code>%%alias%%.%%ident%%</code> or as
<code>%%unit%%.%%ident%%</code>. </li>
<li>Open an external unit <code>open %%u%%</code>: the effect of this
statement is to import all the idenfiers exported by the compilation
unit <code>%%u%%</code> into the current scope. These identifiers
are also re-exported by the current unit.</li>
</ul>
</box>
<box title="Toplevel" link="toplevl">
<p>
If no CDuce file is given on the command line, the interpreter
behaves as an interactive toplevel.
</p>
<p>
Toplevel phrases are processed after each <code>;;</code>.
Mutually recursive declarations of types or functions
must be contained in a single adjacent sequence of phrases
(without <code>;;</code> inbetween).
</p>
<p>
You can quit the toplevel with the toplevel directive
<code>#quit</code> but also with either <code>Ctrl-C</code> or
<code>Ctrl-D</code>. Another option is to use the built-in
<code>exit</code>.
</p>
<p>
The toplevel directive <code>#help</code> prints an help message about
the available toplevel directives.
</p>
<p>
The toplevel directive <code>#env</code> prints the current
environment: the set of defined global types and values, and also
the current sets of prefix-to-<local href="namespaces">namespace</local> bindings used
for parsing (as defined by the user) and
for pretty-printing (as computed by CDuce itself).
</p>
<p>
The two toplevel directives <code>#silent</code> and
<code>#verbose</code> can be used to turn down and up toplevel
outputs (results of typing and evaluation).
</p>
<p>
The toplevel directive <code>#reinit_ns</code> reinit the
table of prefix-to-namespace bindings used for pretty-printing
values and types with namespaces (see <local href="namespaces"/>).
</p>
<p>
The toplevel directive <code>#print_type</code> shows a representationo of a
CDuce type (including types imported from <local href="manual_schema">XML
Schema</local> documents).
</p>
<p>
The toplevel directive <code>#builtins</code> prints the name
of embedded OCaml values (see <local href="manual_interfacewithocaml"/>).
</p>
<p>
The toplevel has no line editing facilities.
You can use an external wrapper such as
<a href="http://pauillac.inria.fr/~ddr/">ledit</a>.
</p>
</box>
<box title="Lexical entities" link="lex">
<p>
The <b>identifiers</b> (for variables, types, recursive patterns, ...)
are qualified names, in the sense of
<a
href="http://www.w3.org/TR/REC-xml-names/">XML Namespaces</a>.
The chapter <local href="namespaces"/> explains how to declare
namespace prefixes in CDuce. Identifiers are resolved as XML
attributes (which means that the default namespace does not apply).
All the identifiers are in the same scope. For instance, there cannot be
simultaneously a type and variable (or a schema, a namespace prefix, an alias
for an external unit) with the same name.
</p>
<p>
The dot must be protected by a backslash in identifiers, to avoid
ambiguity with the dot notation.
</p>
<p>
The dot notation serves several purposes:
</p>
<ul>
<li>
to refer to values and types declared in a separate CDuce compilation unit;
</li>
<li>
to refer to values from OCaml compilation unit
(see <local href="manual_interfacewithocaml"/>);
</li>
<li>
to refer to schema components
(see <local href="manual_schema"/>);
</li>
<li>
to select a field from a record expression.
</li>
</ul>
<p>
CDuce supports two style of <b>comments</b>: <code>(* ... *)</code>
and <code>/* ... */</code>. The first style allows the programmer
to put a piece a code apart. Nesting is allowed, and strings
within simple or double quotes are not searched for the end-marker
<code>*)</code>. In particular, simple quotes (and apostrophes)
have to be balanced inside a <code>(* ... *)</code> comment.
The other style <code>/* ... */</code> is more adapted to textual
comments. They cannot be nested and quotes are not treated
specially inside the comment.
</p>
</box>
</page>
This diff is collapsed.
This diff is collapsed.
<page name="manual_schema_samples">
<title>XML Schema sample documents</title>
<box title="Sample XML documents" link="sample">
<p>
All the examples you will see in the manual section regarding CDuce's XML
Schema support are related to the XML Schema Document <code>mails.xsd</code>
and to the XML Schema Instance <code>mails.xml</code> reported below.
</p>
</box>
<box title="mails.xsd" link="mails_xsd">
<sample><![CDATA[
<!-- mails.xsd -->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="mails" type="mailsType" />
<xsd:complexType name="mailsType">
<xsd:sequence minOccurs="0" maxOccurs="unbounded">
<xsd:element name="mail" type="mailType" />
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="mailType">
<xsd:sequence>
<xsd:element name="envelope" type="envelopeType" />
<xsd:element name="body" type="bodyType" />
<xsd:element name="attachment" type="attachmentType"
minOccurs="0" maxOccurs="unbounded" />
</xsd:sequence>
<xsd:attribute use="required" name="id" type="xsd:integer" />
</xsd:complexType>
<xsd:element name="header">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute ref="name" use="required" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
<xsd:element name="Date" type="xsd:dateTime" />
<xsd:complexType name="envelopeType">
<xsd:sequence>
<xsd:element name="From" type="xsd:string" />
<xsd:element name="To" type="xsd:string" />
<xsd:element ref="Date" />
<xsd:element name="Subject" type="xsd:string" />
<xsd:element ref="header" minOccurs="0" maxOccurs="unbounded" />
</xsd:sequence>
<xsd:attribute name="From" type="xsd:string" use="required" />
</xsd:complexType>
<xsd:simpleType name="bodyType">
<xsd:restriction base="xsd:string" />
</xsd:simpleType>
<xsd:complexType name="attachmentType">
<xsd:group ref="attachmentContent" />
<xsd:attribute ref="name" use="required" />
</xsd:complexType>
<xsd:group name="attachmentContent">
<xsd:sequence>
<xsd:element name="mimetype">
<xsd:complexType>
<xsd:attributeGroup ref="mimeTypeAttributes" />
</xsd:complexType>
</xsd:element>
<xsd:element name="content" type="xsd:string" minOccurs="0" />
</xsd:sequence>
</xsd:group>
<xsd:attribute name="name" type="xsd:string" />
<xsd:attributeGroup name="mimeTypeAttributes">
<xsd:attribute name="type" type="mimeTopLevelType" use="required" />
<xsd:attribute name="subtype" type="xsd:string" use="required" />
</xsd:attributeGroup>
<xsd:simpleType name="mimeTopLevelType">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="text" />
<xsd:enumeration value="multipart" />
<xsd:enumeration value="application" />
<xsd:enumeration value="message" />
<xsd:enumeration value="image" />
<xsd:enumeration value="audio" />
<xsd:enumeration value="video" />
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
]]></sample>
</box>
<box title="mails.xml" link="mails_xml">
<sample><![CDATA[
<!-- mails.xml -->
<mails>
<mail id="0">
<envelope From="bill@microsoft.com">
<From>user@unknown.domain.org</From>
<To>user@cduce.org</To>
<Date>2003-10-15T15:44:01Z</Date>
<Subject>I desperately need XML Schema support in CDuce</Subject>
<header name="Reply-To">bill@microsoft.com</header>
</envelope>
<body>
As subject says, is it possible to implement it?
</body>
<attachment name="signature.doc">
<mimetype type="application" subtype="msword"/>
<content>
### removed by spamoracle ###
</content>
</attachment>
</mail>
<mail id="1">
<envelope From="zack@cs.unibo.it">
<From>zack@di.ens.fr</From>
<To>bill@microsoft.com</To>
<Date>2003-10-15T16:17:39Z</Date>
<Subject>Re: I desperately need XML Schema support in CDuce</Subject>
</envelope>
<body>
user@unknown.domain.org wrote:
> As subject says, is possible to implement it?
Sure, I'm working on it, in a few years^Wdays it will be finished
</body>
</mail>
</mails>
]]></sample>
</box>
</page>
This diff is collapsed.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<page name="tutorial_errors">
<title>Error messages and Warnings</title>
<banner>Error messages and Warnings</banner>
<left>
<boxes-toc/>
<p>
You can cut and paste the code on this page and
test it on the <a href="http://reglisse.ens.fr/cgi-bin/cduce">online interpreter</a>.
</p>
</left>
<box title="Key concepts" link="p1">
<p>
CDuce, statically detects a large class of error and tries to help their
debugging by providing precise error messages and, in case of type errors, by
showing a description (we call it a "sample") of specific values that would make
the computation fail.
</p>
<p>
CDuce signals the classic syntax errors as well as those for instance of unbound variables. It also checks that pattern matching is exhaustive
<footnote>
It checks it in functions, <code>match</code>, and <code>map</code> expressions, but
not for <code>transform</code> and <code>xtransform</code> for which a default branch
returning the empty sequence is always defined
</footnote>.
For instance if we declare the type <code>Person</code> defined in Section "<local href="tutorial_overloading"/>" and try the following definition:
</p>
<sample><![CDATA[
fun name (Person -> String)
| <person gender = {{"F"}}>[ n ;_] -> n
]]></sample>
<p> then we obtain the following message error (frames of the same form as the following denote text taken verbatim from the on line demo, no color or formatting added):</p>
<sessionsample><![CDATA[
Error at chars 228-298:
{{%%fun name (Person -> String)
| <person gender = "F">[ n ;_] -> n%%}}
This pattern matching is not exhaustive
Residual type:
<person gender = [ 'M' ]>[ Name Children ]
Sample:
<person {| gender = [ 'M' ] |}>[ <name {| |}>[ ] <children {| |}>[ ] ]
]]></sessionsample>
<p>
This error message tells us three things: (1) that pattern matching is not
defined for all the possible input types (as we forgot the case when the
attribute is <code>"M"</code>); (2) it gives us the exact type of the values of
the type we have forgotten in our matching (in this case this is exactly
<code>MPerson</code>); (3) it shows us a "sample" of the residual type, that is
a simplified representation of a value that would make the expression fail (in
this case it shows us the value <code>&lt;person gender="M">[ &lt;name>[ ]
&lt;children>[ ] ]</code>).
</p>
<note>Samples are simplified representations of values in the sense that they show
only that part of the value that is relevant for the error and may omit other parts
that are needed to obtain an effective value.
</note>
<section title="Warnings">
<p>
CDuce use warnings to signal possible subtler errors. So for instance it issues a warning whenever a capture variable of a pattern is not used in the subsequent expression. This is very useful for instance to detect misprinted types in patterns such as in:
</p>
<sample><![CDATA[
transform [ 1 "c" 4 "duce" 2 6 ] with
x & {{Sting}} -> [ x ]
]]></sample>
<p> The intended semantics of this expression was to extract the sequence of all
the strings occuring in the matched sequence. But because of the typo in
<code>St(r)ing</code> the transformation is instead the identity function:
<code>Sting</code> is considered as a fresh capture variable. CDuce however
detects that <code>Sting</code> is never used in the subsequent expression
and it pinpoints the possible presence of an error by issuing the
following warning:
</p>
<sessionsample><![CDATA[
Warning at chars 42-60:
{{%% x & Sting -> [ x ]%%}}
The capture variable Sting is declared in the pattern but not used in
the body of this branch. It might be a misspelled or undeclared type
or name (if it isn't, use _ instead).
%%transform [ 1 "c" 4 "duce" 2 6 ] with
x & Sting -> [ x ]%%
- : [ 1 [ 'c' ] 4 [ 'duce' ] 2 6 ] =
[ 1 "c" 4 "duce" 2 6 ]
Ok.
]]></sessionsample>
</section>
</box>
<box title="Empty types" link="emptyty">
<p>
CDuce's type system can find very nasty errors. For instance look at this DTD declaration
</p>
<xmlsample><![CDATA[
<!ELEMENT person (name,children)>
<!ELEMENT children (person+)>
<!ELEMENT name (#PCDATA)>
]]></xmlsample>
<p>
Apparently this declaration does not pose any problem. But if you consider it more carefully you will see that there exists no document that can be valid for such a DTD,
as a person contains a sequence of children that contain a non empty
sequence of persons, etc generating an infinite tree.
</p>
<p>
Let us write the same type in CDuce and look at the result returned by the type-checker
</p>
<sessionsample><![CDATA[
type Person = <person>[ Name Children ]
type Children = <children>[Person+]
type Name = <name>[PCDATA]
Warning at chars 57-76:
%%type Children = %%{{%%<children>[Person+]%%}}
This definition yields an empty type for Children
Warning at chars 14-39:
%%type Person = %%{{%%<person>[ Name Children ]%%}}
This definition yields an empty type for Person
]]></sessionsample>
<p>
The type checker correctly issues a "Warning" to signal that the first
two types are empty. Note that instead the declarations</p>
<sample><![CDATA[
type Person = <person>[ Name Children ]
type Children = <children>[({{ref Person}})+]
type Name = <name>[PCDATA]
]]></sample>
<p>
correctly do not yield any warning: in this case it is possible to build a value of type person (and thus of type children), for instance by using a recursive definition where a person is a child of itself.
</p>
<p>
We paid special care in localizing errors and suggesting solutions.
You can try it by
yourself by picking the examples available on the <a
href="http://reglisse.ens.fr/cgi-bin/cduce">on line interpreter</a> and putting in
them random errors.
</p>
</box>
<box title="Unused branches" link="pr">
<p>
The emptiness test is used also to check for possible errors in the definition
of patterns. If the type checker statically determines that a pattern in a match
operation can never be matched then it is very likely that even if the match
expression is well-typed, the programmer had made an error. This is determined by checking whether the intersection of set of all values that can be fed to the branch and the set of all values that Consider for example
the following code:
</p>
<sample><![CDATA[
type Person = <person>[<name>String <tel>String (<email>String)?]
fun main_contacts(x : [Person*]):[String*] =
transform x with
| <_>[_ _ <{{emal}}>s] -> [s]
| <_>[_ <tel>s ] -> [s]
]]></sample>
<p>
This function was supposed to extract the list of contacts from a list of persons
elements giving priority to email addresses over telephone numbers. Even if
there is a typo in the pattern of the first branch, the function is well
typed. However because of the typo the first branch will never be selected and
emails never printed. The CDuce type-checker however recognizes that this branch
has no chance to be selected since <code> Person &amp; &lt;_>[_ _
&lt;emal>s]</code>=<code>Empty</code> and it warns the programmer by issuing the following warning message:
</p>
<sessionsample><![CDATA[
Warning at chars 144-167:
%% | %%{{%%<_>[_ _ <emal>;s] -> [s]%%}}
This branch is not used
%%fun main_contacts(x : [Person*]):[String*] =
transform x with
| <_>[_ _ <emal>s] -> [s]
| <_>[_ <tel>s ] -> [s]%%
- : [ Person* ] -> [ String* ] = <fun>
Ok.
]]></sessionsample>
</box>
</page>
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<page name="tutorial_exercises">
<title>Exercises</title>
<left>
<boxes-toc/>
<p>
You can cut and paste the code on this page and
test it on the <a href="http://reglisse.ens.fr/cgi-bin/cduce">online interpreter</a>.
</p>
</left>
<box title="Tree navigation" link="treenav">
<section title="XPath expressions">
<p>Write a function that implements <code>//t</code> without
using references types and <code>xtransform</code></p>
<ol>
<li>Give a non tail-recursive version</li>
<li>Give a tail-recursive version</li>
</ol>
</section>
</box>
<box title="Patterns" link="pat">
<section title="Sort (by Artur Miguel Diaz: http://ctp.di.fct.unl.pt/~amd)">
<p>
Write a non recursive function of type <code> Int -> Latin1</code> which given a non-negative number produces all its digits in the order.
</p>
<p>
The function is given below nearly completely programmed. Define the patterns that allows to produce the result.</p>
<sample> <![CDATA[
let sortAlg (n :Int):Latin1 =
match string_of n with
%%PATTERN%% -> %%RESULT%%
;;
]]> </sample>
<p>Example:</p>
<sample> <![CDATA[
fact 200 =
788657867364790503552363213932185062295135977687173263294742533
244359449963403342920304284011984623904177212138919638830257642
790242637105061926624952829931113462857270763317237396988943922
445621451664240254033291864131227428294853277524242407573903240
321257405579568660226031904170324062351700858796178922222789623
703897374720000000000000000000000000000000000000000000000000
sortAlg (fact 200) =
"00000000000000000000000000000000000000000000000000000000000000
000000000000001111111111111111111111111122222222222222222222222
222222222222222222222222222222233333333333333333333333333333333
333333333444444444444444444444444444444444445555555555555555555
555555666666666666666666666666666667777777777777777777777777777
7777777888888888888888888888889999999999999999999999999999999"
]]> </sample>
</section>
</box>
<box title="Solutions" link="solution">
<section title="Tree navigation">
<sample><![CDATA[
type t = %%specify here a type to test%%
fun ( x :[Any*]):[t*] =
let f( x :[Any*]):[t*]) = ...
]]></sample>
<p>Note here that the recursive function <code>f</code> is wrapped by a second anonymous function so that it does not expose the recursion variable.</p>
<sample><![CDATA[
fun (e : [Any*]):[ T*] =
let f( accu :[T*] , x :[Any*]):[T*] =
match x with
[ h&T&<_ ..>(k&[Any*]) ;t] -> f( accu@[h], k@t)
| [ <_ ..>(k&[Any*]) ;t] -> f( accu, k@t)
| [ h&T ;t] -> f( accu@[h], t)
| [ _ ;t] -> f( accu, t)
| [] -> accu
in f ([], e);;
]]></sample>
<p>Note that this implementation may generate empty branch warnings in particular</p>
<ul>
<li>for the first branch if <code>T&amp;&lt;_ ..>(k&amp;[Any*])</code> is <code>Empty</code></li>
<li>for the second branch if <code>&lt;_ ..>(k&amp;[Any*])</code> is smaller than <code>T&amp;&lt;_>(k&amp;[Any*])</code></li>
<li>for the first branch if <code>t</code> is smaller than <code>&lt;_ ..>(k&amp;[Any*])</code></li>
</ul>
</section>
<section title="Patterns">
<