Commit f53a4cfc authored by Pietro Abate's avatar Pietro Abate
Browse files

[r2003-07-04 17:52:59 by cvscast] Empty log message

Original author: cvscast
Date: 2003-07-04 17:53:00+00:00
parent bcd37c85
......@@ -5,10 +5,8 @@
<include file="tutorial/getting_started.xml"/>
<!--
<include file="tutorial1.xml"/>
-->
<include file="tutorial2.xml"/>
<include file="tutorial/first_functions.xml"/>
<include file="tutorial/overloading.xml"/>
<left>
<p>
......
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<page name="tutorial_first_functions">
<title>First functions</title>
<banner>First functions</banner>
<left>
<boxes-toc/>
</left>
<box title="First functions" link="t2">
<p>
A first example of transformation is <code>names</code>, which extracts the
sequences of all names of parents in a <code>ParentBook</code> element:
</p>
<sample><![CDATA[
let names (ParentBook -> [Name*])
<parentbook>x -> (map x with <person>[ n _*] -> n)
]]></sample>
<p>
The name of the transformation is followed by an <i>interface</i> that
states that <code>names</code> is a function from
<code>ParentBook</code> elements to (possibly empty) sequences of
<code>Name</code> elements. This is obtained by matching the argument of the
function against the pattern
</p>
<sample><![CDATA[<parentbook>x ]]></sample>
<p>which binds <code>x</code> to
the sequence of person elements forming the parentbook. The operator
<code>map</code> applies to each element of a sequence (in this case <code>x</code>) the
transformation defined by the subsequent pattern matching. Here <code>map</code>
returns the sequence obtained by replacing each person in <code>x</code> by its
<code>Name</code> element. Note that we use the pattern
</p>
<sample><![CDATA[<person>[ n _*],
]]></sample>
<p>to match the person elements: <code>n</code> matches (and captures) the <code>Name</code>
element-that is, the first element of the sequence-, <code>_*</code> matches (and discards) the sequence of elements that
follow, and person matches the tag of the person (although the
latter contains an attribute). The interface and the type definitions
ensure that the tags will be the expected ones, so we could optimize the
code by defining a body that skips the check of the tags:
</p>
<sample><![CDATA[
<_> x -> (map x with <_>[ n _*] -> n)
]]></sample>
</box>
</page>
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<page name="tutorial_overloading">
<title>Overloading</title>
<box title="Overloaded functions" link="val">
<p>
The simplest form for a toplevel function declaration is
</p>
<sample><![CDATA[
let %%f%% (%%t%%->%%s%%) %%x%% -> %%e%%
]]></sample>
<p>
in which the body of a function is formed by a single branch
<code>%%x%% -> %%e%%</code> of pattern matching. As we have seen in the previous sections, the body of a function may be formed by several branches with complex patterns.
<br/>
The interface <code>%%t%%->%%s%%</code> specifies
a constraint on the behavior of the function to be checked by the type
system: when applied to an argument
of type <code>%%t%%</code>, the function returns a result of type <code>%%s%%</code>.
</p>
<section title="Simple Overloading">
<p>
In general the interface of a function may specify several such constraints,
as the <a href="tutorial_first_functions.html">names3</a> example
The general form of a toplevel function declaration is indeed:
</p>
<sample><![CDATA[
let %%f%% (%%t1%%->%%s1%%;...;%%tn%%->%%sn%%) %%p1%% -> %%e1%% | ... | %%pm%% -> %%em%%
]]></sample>
<p>
(the first vertical bar and the <code>fun</code> keyword are optional). Such a function accepts arguments of type
(<code>%%t1%%|...|%%tn%%</code>); it has all the types <code>%%ti%%->%%si%%</code>, and,
thus, it also has their intersection <code>%%t1%%->%%s1%%&amp;...&amp;%%tn%%->%%sn%%</code>
</p>
<p>
The use of several arrow types in an interface serves to give the function a
more precise type. We can roughly distinguish two different uses of multiple
arrow types in an interface:
</p>
<ul>
<li>when each arrow type specifies the behavior of a different piece
of code forming the body of the function, the compound interface
serves to specify the <i>overloaded</i> behavior of the
function. This is the case for the
function below
<sample><![CDATA[
let add ( (Int,Int)->Int ; (String,String)->String )
| (x & Int, y & Int) -> x+y
| (x & String, y & String) -> x@y
]]></sample>
<p>
where each arrow type in the interface refers to a different branch of the body.
</p></li>
<li>when the arrow types specify different behavior for the same code,
then the compound interface serves to give a more precise
description of the behavior of the function. An example is
the function <code>names4</code> from Section <local href="tutorial_first_functions"/>.
</li>
</ul>
<p>
There is no clear separation between these two situations since, in general, an
overloaded function has body branches that specify behaviors of
different arrow types of the interface but share some common portions of the
code.
</p>
</section>
<section title="Complex overloading">
<p>
Let us examine a more complex example.
We want to transform the representation of persons introduced
in Section <local href="tutorial_first_functions"/>, using different tags
<code>&lt;man></code> and <code>&lt;woman></code>
instead of the gender attribute and, conversely, using an attribute
instead of an element for the name.
We also want to distinguish the children of a person into two different
sequences, one of sons, composed of men (i.e. elements tagged by <code>&lt;man></code>), and the other of daughters, composed of
women. Of course we also want to apply this transformation recursively to the
children of a person. In practice, we want to define a function <code>&lt;split></code> of
type <code>Person ->(Man | Woman)</code> where <code>Man</code> and <code>Woman</code> are the types:
</p>
<sample><![CDATA[
type Man = <man name=String>[ Sons Daughters ]
type Woman = <woman name=String>[ Sons Daughters ]
type Sons = <sons>[ Man* ]
type Daughters = <daughters>[ Woman* ]
]]></sample>
<p>
Here is a possible way to implement such a transformation:
</p>
<sample>
<include-verbatim file="overloading.cd"/>
</sample>
</section>
</box>
</page>
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<page name="tutorial1">
<title>Getting started</title>
<banner> CDuce tutorial: getting started </banner>
<left>
<boxes-toc/>
</left>
<box title="XML documents" link="t0">
<p>
CDuce uses its own notation to denote XML documents. In the next table we
present an XML document on the left and the same document in CDuce notation on
the right
</p>
<two-columns>
<left>
<sample><![CDATA[
<?xml version="1.0"?>
<parentbook>
<person gender="F">
<name>Clara</name>
<children>
<person gender="M">
<name>Pål André</name>
<children/>
</person>
</children>
<email>clara@lri.fr</email>
<tel>314-1592654</tel>
</person>
<person gender="M">
<name> Bob </name>
<children>
<person gender="F">
<name>Alice</name>
<children/>
</person>
<person gender="M">
<name>Anne</name>
<children>
<person gender="M">
<name>Charlie</name>
<children/>
</person>
</children>
</person>
</children>
<tel kind="work">271828</tel>
<tel kind="home">66260</tel>
</person>
</parentbook>
]]></sample>
</left>
<right>
<sample><![CDATA[
let parents : ParentBook =
<parentbook>[
<person gender="F">[
<name>"Clara"
<children>[
<person gender="M">[
<name>['Pål ' 'André']
<children>[]
]
]
<email>['clara@lri.fr']
<tel>"314-1592654"
]
<person gender="M">[
<name>"Bob"
<children>[
<person gender="F">[
<name>"Alice"
<children>[]
]
<person gender="M">[
<name>"Anne"
<children>[
<person gender="M">[
<name>"Charlie"
<children>[]
]
]
]
]
<tel kind="work">"271828"
<tel kind="home">"66260"
]
]
]]></sample>
</right>
</two-columns>
Note the straightforward correspondence between the two notations: instead of
using an closing tag, we enclose the content of each element in square
brackets. In CDuce square brackets denote sequences, that is, heterogeneous (ordered) lists
of blank-separated elements. In CDuce strings are not a primitive data-type but are sequences of characters.
To the purpose of the example we used different notations to denote
strings as in CDuce <code>"xyz"</code>, <code> ['xyz']</code>, <code> ['x' 'y' 'z']</code>,
<code> [ 'xy' 'z' ]</code>, and <code> [ 'x' 'yz' ]</code> define the same string
literal. Note also
that the <code>"Pål André"</code> string is accepted as CDuce supports Unicode
characters.
</box>
<box title="Loading XML files" link="t0.5">
<p>
The program on the right hand-side in the previous section starts by binding the
variable <code>parents</code> to the XML document. It also specifies that
parents has the type <a href="#t1"><code>ParentBook</code></a>: this is optional
but it usually allows earlier detection of type errors. If the file XML on the
left hand-side is stored in a file, say, <tt>parents.xml</tt> then the same
binding can be obtained by loading the file as follows
</p>
<sample><![CDATA[
let parents : ParentBook = {{load_xml}} "parents.xml"
]]></sample>
<p>
as <code>load_xml</code> converts and XML document stored into a file in the CDuce expression representing it.</p>
TO BE WRITTEN
</box>
<box title="Type declarations" link="t1">
<p>
First, we declare some types:
</p>
<sample><![CDATA[
type ParentBook = <parentbook>[Person*]
type Person = FPerson | MPerson
type FPerson = <person gender="F">[ Name Children (Tel | Email)*]
type MPerson = <person gender="M">[ Name Children (Tel | Email)*]
type Name = <name>[ PCDATA ]
type Children = <children>[Person*]
type Tel = <tel kind=?"home"|"work">['0'--'9'+ '-'? '0'--'9'+]
type Echar = 'a'--'z' | 'A'--'Z' | '_' | '0'--'9'
type Email= <email>[ Echar+ ('.' Echar+)* '@' Echar+ ('.' Echar+)+ ]
]]></sample>
<p>
The type ParentBook describes XML documents that store information
of persons. A tag &lt;tag attr1=...; attr2=...; ...&gt; followed by a
sequence type denotes an XML document type.
Sequence types classify ordered lists of heterogeneous elements and they are denoted by square brackets
that
enclose regular expressions over types (note that a regular
expression over types <i>is not</i> a type, it just describes the content
of a sequence type, therefore if it is not enclosed in square brackets
it is meaningless). The definitions above state that a ParentBook
element is formed by a possibly empty sequence of persons. A person is
either of type FPerson or MPerson according to the value of the
gender attribute. An equivalent definition for Person would
thus be:
</p>
<sample><![CDATA[
<person gender={{"F"|"M"}}>[ Name Children (Tel | Email)*]
]]></sample>
<p>
A person element is composed by a sequence formed of a name element, a
children element, and zero or more telephone and e-mail elements, in this order.
</p>
<p>
Name elements contain strings. These are encoded as sequences of
characters. The PCDATA keyword is equivalent to the regexp
Char*, then String, [Char*], [PCDATA], [PCDATA* PCDATA],
..., are all
equivalent notations. Children are composed of zero or more Person
elements. Telephone elements have an optional (as indicated by =?)
string attribute whose value is either ``home'' or ``work'' and they are
formed by a single string of two non-empty sequences of numeric characters
separated by an optional dash character. Had we wanted to state that a
phone number is an integer with at least, say, 5 digits (of course this is
meaningful only if no phone number starts by 0) we would have used
an interval type such as &lt;tel kind=?"home"|"work"&gt;[10000- -*], where
* here denotes plus infinity.
</p>
<p>
Echar is the type of characters in e-mails
addresses. It is used in the regular expression defining Email to
precisely constrain the form of the addresses. An XML document satisfying
these constraints is shown
</p>
</box>
</page>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment