CDuce is a strongly-typed functional programming language adapted to the manipulation of XML documents. Its syntax is reminiscient of the ML family, but CDuce has a completely different type system.
Let us introduce directly some key concepts:
Int
denotes the sets of all integers,
and <a href=String>[]
denotes XML elements
with tag a
that have an attribute href
(whose content is a string), and with no sub-element.
1 + 3
evaluates to the value 4
. Note that values can
be seen either as special cases of expressions, or as
the result of evaluating expressions.<a href=x>[]
extracts the value of the
href
attribute and binds it to the value
identifier x
.
The expression binds two strings to value identifiers x
and y
, and then concatenates them. The general form
of the local binding is:
where %%p%%
is a pattern and %%e%%
,
%%e'%%
are expressions.
CDuce uses its own notation to denote XML documents. In the next table we present an XML document on the left and the same document in CDuce notation on the right:
Note the straightforward correspondence between the two notations: instead of using an closing tag, we enclose the content of each element in square brackets. In CDuce square brackets denote sequences, that is, heterogeneous (ordered) lists of blank-separated elements. In CDuce strings are not a primitive data-type but are sequences of characters.
To the purpose of the example we used different notations to
denote strings as in CDuce "xyz"
, ['xyz']
,
['x' 'y' 'z']
, [ 'xy' 'z' ]
, and [
'x' 'yz' ]
define the same string literal. Note also that the
"Pål André"
string is accepted as CDuce supports Unicode
characters.
The program on the right hand-side in the previous section starts
by binding the variable parents
to the XML document. It
also specifies that parents has the type ParentBook
: this is optional but it
usually allows earlier detection of type errors. If the file XML on
the left hand-side is stored in a file, say, parents.xml then
the same binding can be obtained by loading the file as follows
as load_xml
converts and
XML document stored i a file into the CDuce expression representing
it.
First, we declare some types:
The type ParentBook describes XML documents that store information
of persons. A tag <tag attr1=...; attr2=...; ...>
followed by a sequence type denotes an XML document type. Sequence
types classify ordered lists of heterogeneous elements and they are
denoted by square brackets that enclose regular expressions over types
(note that a regular expression over types is not a type, it
just describes the content of a sequence type, therefore if it is not
enclosed in square brackets it is meaningless). The definitions above
state that a ParentBook element is formed by a possibly empty sequence
of persons. A person is either of type FPerson
or
MPerson
according to the value of the gender attribute.
An equivalent definition for Person would thus be:
A person element is composed by a sequence formed of a name element, a children element, and zero or more telephone and e-mail elements, in this order.
Name elements contain strings. These are encoded as sequences of
characters. The PCDATA
keyword is equivalent to the
regexp Char*
, then String
,
[Char*]
, [PCDATA]
, [PCDATA*
PCDATA]
, ..., are all equivalent notations. Children are
composed of zero or more Person elements. Telephone elements have an
optional (as indicated by =?
) string attribute whose
value is either ``home'' or ``work'' and they are formed by a single
string of two non-empty sequences of numeric characters separated by
an optional dash character. Had we wanted to state that a phone number
is an integer with at least, say, 5 digits (of course this is
meaningful only if no phone number starts by 0) we would have used an
interval type such as <tel kind=?"home"|"work">[10000--*]
,
where * here denotes plus infinity.
Echar is the type of characters in e-mails addresses. It is used in the regular expression defining Email to precisely constrain the form of the addresses. An XML document satisfying these constraints is shown