getting_started.xml 8.03 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<page name="tutorial_getting_started">

<title>Getting started</title>

<box title="Key concepts" link="concepts">

<p>
CDuce is a strongly-typed functional programming language adapted
to the manipulation of XML documents. Its syntax is reminiscient
of the ML family, but CDuce has a completely different type system.
</p>

<p>
Let us introduce directly some key concepts:
</p>

<ul>
<li><b>Values</b> are the objects manipulated by
CDuce programs; we can distinguish several kind of values:
 <ul>
 <li>Basic values: integers, characters.</li>
 <li>XML documents and fragments: elements, tag names, strings.</li>
 <li>Constructed values: pairs, records, sequences.</li>
 <li>Functional values.</li>
 </ul>
</li>

<li><b>Types</b> denote sets of values that share common
structural and/or behavioral properties. For instance,
<code>Int</code> denotes the sets of all integers,
and <code>&lt;a href=String>[]</code> denotes XML elements
with tag <code>a</code> that have an attribute <code>href</code>
(whose content is a string), and with no sub-element.
</li>

<li><b>Expressions</b> are fragments of CDuce programs
that <em>produce</em> values. For instance, the expression <code>1 + 3</code>
evaluates to the value <code>4</code>. Note that values can 
be seen either as special cases of expressions, or as
the result of evaluating expressions.</li>

<li><b>Patterns</b> are ``types + capture variables''. They allow
to extract from an input value some sub-values, which can then be
used in the rest of the program. For instance, the pattern
<code>&lt;a href=x>[]</code> extracts the value of the
<code>href</code> attribute and binds it to the <em>value
identifier</em> <code>x</code>.
</li>
</ul>

<section title="A first example">
<sample><![CDATA[
let x = "Hello, " in
let y = "world !" in
x @ y;;
]]></sample>

<p>
The expression binds two strings to value identifiers <code>x</code>
and <code>y</code>, and then concatenates them. The general form
of the local binding is:
</p>

<sample><![CDATA[
66
let %%p%% = %%e%% in %%e'%%
67
68
69
70
]]></sample>
</section>

<p>
71
72
where <code>%%p%%</code> is a pattern and <code>%%e%%</code>, 
<code>%%e'%%</code> are expressions.
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
</p>
</box>

<box title="XML documents" link="xml">
<p>
CDuce uses its own notation to denote XML documents. In the next table we
present an XML document on the left and the same document in CDuce notation on
the right:
</p>

<two-columns>

<left>

<sample><![CDATA[
<?xml version="1.0"?>
<parentbook>
  <person gender="F">
    <name>Clara</name>
    <children>
      <person gender="M">
        <name>Pål André</name>
        <children/>
      </person>
    </children>
    <email>clara@lri.fr</email>
    <tel>314-1592654</tel>
  </person>
  <person gender="M">
    <name> Bob </name>
    <children>
      <person gender="F">
        <name>Alice</name>
        <children/>
      </person>
      <person gender="M">
        <name>Anne</name>
        <children>
          <person gender="M">
            <name>Charlie</name>
            <children/>
          </person>
        </children>
      </person>
    </children>
    <tel kind="work">271828</tel>
    <tel kind="home">66260</tel>
  </person>
</parentbook>
]]></sample>

</left>

<right>

<sample><![CDATA[
let parents : ParentBook =
<parentbook>[
  <person gender="F">[
    <name>"Clara"
    <children>[
      <person gender="M">[
        <name>['Pål ' 'André'] 
        <children>[]
      ]
    ]
    <email>['clara@lri.fr']
    <tel>"314-1592654"
  ] 
  <person gender="M">[
    <name>"Bob"
    <children>[
      <person gender="F">[
        <name>"Alice" 
        <children>[]
      ]
      <person gender="M">[
        <name>"Anne"
        <children>[
          <person gender="M">[
            <name>"Charlie"
            <children>[]
          ] 
        ] 
      ] 
    ] 
    <tel kind="work">"271828"
    <tel kind="home">"66260"
  ] 
] 
]]></sample>

</right>
</two-columns>

<p> Note the straightforward correspondence between the two notations:
instead of using an closing tag, we enclose the content of each
element in square brackets. In CDuce square brackets denote sequences,
that is, heterogeneous (ordered) lists of blank-separated elements. In
CDuce strings are not a primitive data-type but are sequences of
characters.</p>

<p>To the purpose of the example we used different notations to
denote strings as in CDuce <code>"xyz"</code>, <code> ['xyz']</code>,
<code> ['x' 'y' 'z']</code>, <code> [ 'xy' 'z' ]</code>, and <code> [
'x' 'yz' ]</code> define the same string literal. Note also that the
<code>"Pål André"</code> string is accepted as CDuce supports Unicode
characters.</p>
</box>


<box title="Loading XML files" link="loading">

<p> The program on the right hand-side in the previous section starts
by binding the variable <code>parents</code> to the XML document. It
also specifies that parents has the type <a
href="#type_decl"><code>ParentBook</code></a>: this is optional but it
usually allows earlier detection of type errors. If the file XML on
the left hand-side is stored in a file, say, <tt>parents.xml</tt> then
the same binding can be obtained by loading the file as follows </p>
<sample><![CDATA[ let parents : ParentBook = {{load_xml}}
"parents.xml" ]]></sample> <p> as <code>load_xml</code> converts and
XML document stored i a file into the CDuce expression representing
it.</p>

</box>

<box title="Type declarations" link="type_decl">
<p>
First, we declare some types:
</p>

<sample><![CDATA[
type ParentBook = <parentbook>[Person*];;
type Person = FPerson | MPerson;; 
type FPerson = <person gender="F">[ Name Children (Tel | Email)*];; 
type MPerson = <person gender="M">[ Name Children (Tel | Email)*];; 
type Name = <name>[ PCDATA ];;
type Children = <children>[Person*];; 
type Tel = <tel kind=?"home"|"work">['0'--'9'+ '-'? '0'--'9'+];;
type Echar = 'a'--'z' | 'A'--'Z' | '_' | '0'--'9';;
type Email= <email>[ Echar+ ('.' Echar+)* '@' Echar+ ('.' Echar+)+ ];;
]]></sample>

<p> The type ParentBook describes XML documents that store information
of persons. A tag <code>&lt;tag attr1=...; attr2=...; ...&gt;</code>
followed by a sequence type denotes an XML document type. Sequence
types classify ordered lists of heterogeneous elements and they are
denoted by square brackets that enclose regular expressions over types
(note that a regular expression over types <i>is not</i> a type, it
just describes the content of a sequence type, therefore if it is not
enclosed in square brackets it is meaningless). The definitions above
state that a ParentBook element is formed by a possibly empty sequence
of persons. A person is either of type <code>FPerson</code> or
<code>MPerson</code> according to the value of the gender attribute.
An equivalent definition for Person would thus be:

</p>

<sample><![CDATA[
<person gender={{"F"|"M"}}>[ Name Children (Tel | Email)*];; 
]]></sample>

<p> A person element is composed by a sequence formed of a name
element, a children element, and zero or more telephone and e-mail
elements, in this order.  </p>

<p> Name elements contain strings. These are encoded as sequences of
characters. The <code>PCDATA</code> keyword is equivalent to the
regexp <code>Char*</code>, then <code>String</code>,
<code>[Char*]</code>, <code>[PCDATA]</code>, <code>[PCDATA*
PCDATA]</code>, ..., are all equivalent notations. Children are
composed of zero or more Person elements.  Telephone elements have an
optional (as indicated by <code>=?</code>) string attribute whose
value is either ``home'' or ``work'' and they are formed by a single
string of two non-empty sequences of numeric characters separated by
an optional dash character. Had we wanted to state that a phone number
is an integer with at least, say, 5 digits (of course this is
meaningful only if no phone number starts by 0) we would have used an
interval type such as <code>&lt;tel kind=?"home"|"work"&gt;[10000--*]</code>,
where * here denotes plus infinity.  </p>

<p>
Echar is the type of characters in e-mails
addresses. It is used in the regular expression defining Email to
precisely constrain the form of the addresses. An XML document satisfying
these constraints is shown
</p>

</box>
</page>