CDuce is modern programming language, adapted to the manipulation of XML documents. It is developped by the Languages group of ENS in Paris and the Databases group of LRI in Orsay, two CNRS labs. See also the CDuce team page, and our technical papers.
The only available implementation of CDuce is an online prototype. To get a feeling of CDuce, you can try the examples and play with them, or have a look at this memento which briefly explains the syntax of the language.
We are planning to distribute a first public release in the next few weeks.
Our point of view and our guideline for the design of CDuce is that a programming language for XML should take XML types ( DTD, XML Schema, Relax-NG, ...) seriously. We expect the following benefits:
- static verifications (e.g.: ensure that a transformation produces a valid document);
- in particular, we aim at smooth and safe compositions of XML transformations, and incremental programming;
- static optimizations and efficient execution model (knowing the type of a document is crucial to extract information efficiently).
Some of CDuce peculiar features:
- XML objects can be manipulated as first-class citizen values: elements, sequences, tags, characters and strings, attribute sets; sequences of XML elements can be specified by regular expressions, which also apply to characters strings;
- functions themselves are first-class values, they can be manipulated, stored in data structure, returned by a function,...
- a powerful pattern matching operation can perform complex extractions from sequences of XML elements;
- a rich type algebra, with recursive types and arbitrary boolean combinations (union, intersection, complement) allows precise definitions of data structures and XML types; general purpose types and types constructors are taken seriously (products, extensible records, arbitrary precision integers with interval constraints, Unicode characters);
- polymorphism through a natural notion of subtyping, and overloaded functions with dynamic dispatch;
- an highly-effective type-driven compilation schema.
Preliminary benchmarks suggest that a CDuce program can run faster (30% to 60%) than an equivalent XSLT style-sheet (we performed benchmarks with the xsltproc tools from the Gnome libxslt library).
Our plans concerning the design of the core language include:
- a module system to support incremental programming;
- parametric polymorphism;
- XML-friendly primitives, to mimic XSLT transformations.
Apart from the core language design and implementation, our research projects include:
- integration of a query sub-language into CDuce, using types as a primary optimization strategy for request evaluation;
- study of security (confidentiality, ...) properties in the setting of XML transformations.
We wrote several technical papers about the language design and its theoretical foundations.
The starting point of our work on CDuce was the XDuce language developped at the UPenn DB group. Many of CDuce features originate from XDuce. Some of our achievements:
- integration of first-class and overloaded functions, arbitrary boolean connectives, and extensible (or not) records, to the semantic definition of subtyping;
- a subtyping algorithm without backtracking;
- extending pattern matching to capture non consecutive subsequences; removing tail condition for exact matching (they arrived independently to another solution);
- efficient evaluation model that takes profit of static type information;
Of course, the work on XDuce continued during our, and they developped nice ideas: mixed attribute-element types (same expressive power as our records, but they can sometimes avoid exponential explosion where we cannot); powerful filter operation.
- By Philip Wadler.