Commit 603a6478 authored by Pietro Abate's avatar Pietro Abate
Browse files

[r2003-11-28 09:27:21 by szach] - reviewed schema manual about import

- added schema document samples

- written schema manual about validation

Original author: szach
Date: 2003-11-28 09:27:21+00:00
parent 87c73b8d
......@@ -9,6 +9,7 @@
<include file="manual/expressions.xml"/>
<include file="manual/namespaces.xml"/>
<include file="manual/schema.xml"/>
<include file="manual/schema_samples.xml"/>
<left>
<p>
This Guide describes all CDuce's constructions.
......
......@@ -33,6 +33,11 @@
<a href="#print_xml">XML Schema instances output</a>
</li>
</ul>
<p>
This manual page describes how to use these features in CDuce, all the
documents used in the examples are available in the manual section: <local
href="manual_schema_samples">XML Schema sample documents</local>.
</p>
</box>
<box title="XML Schema components (micro) introduction" link="primer">
......@@ -89,11 +94,18 @@
</p>
<sample>
# {{schema Mails = "tests/schema/mails.xsd"}};;
Registering schema type: Mails # attachmentType
Registering schema type: Mails # mimeTopLevelType
Registering schema type: Mails # mailType
Registering schema type: Mails # envelopeType
Registering schema type: Mails # mailsType
Registering schema type: Mails # bodyType
Registering schema attribute: Mails # name
Registering schema element: Mails # Date
Registering schema element: Mails # mails
Registering schema element: Mails # header
Registering schema attribute group: Mails # mimeTypeAttributes
Registering schema model group: Mails # attachmentContent
</sample>
<p>
The above declaration will (try to) import all schema components included in
......@@ -105,12 +117,14 @@ Registering schema element: Mails # mails
have both an element declaration and an attribute declaration having the
same name in a single schema document. In case of no ambiguity you can
reference CDuce types corresponding to schema components just using the name
with the following syntax:<br /> <code>&lt;schema_name&gt; #
&lt;component_name&gt;</code><br /> Otherwise you can specify the kind of
schema component as follows:<br /> <code>&lt;schema_name&gt; #
&lt;component_name&gt; as &lt;component_kind&gt;</code><br /> where
component kind is one of:<br /> <code>element | type | attribute |
attribute_group | model_group</code><br />
with the following syntax:<br /> <tt>schema_ref ::= </tt>
<code>&lt;schema_name&gt; # &lt;component_name&gt;</code><br />
Otherwise you can specify the kind of schema component as follows:<br />
<tt>|</tt> <code>&lt;schema_name&gt; # &lt;component_name&gt; as
&lt;component_kind&gt;</code><br /> where component kind is one of:<br />
<tt>component_kind ::= </tt>
<code>element | type | attribute | attribute_group | model_group</code>
<br />
</p>
<p>
The result of a schema component reference is an ordinary CDuce type which
......@@ -175,12 +189,15 @@ val argv : [ String* ] = ""
</p>
<sample><![CDATA[
# #print_schema Mails;;
Types: C:10:mailType C:7:envelopeType C:12:mailsType S:bodyType'
Elements: E:13:<mails>
Types: C:10:attachmentType S:mimeTopLevelType' C:12:mailType C:4:envelopeType C:14:mailsType S:bodyType'
Attributes: @name:xsd:string
Elements: E:18:<Date> E:15:<mails> E:17:<header>
Attribute groups: {agroup:mimeTypeAttributes}
Model groups: {mgroup:attachmentContent}
]]></sample>
<p>
For more information about toplevel directives
<local href="manual_interpreter">click here</local>.
For more information have a look at the manual section about <local
href="manual_interpreter">toplevel directives</local>.
</p>
</box>
......@@ -243,6 +260,16 @@ Elements: E:13:<mails>
type)
</td>
</tr>
<tr>
<td>
(<b>Not properly supported</b>)<br /> <code>decimal</code>,
<code>float</code>, <code>double</code>, <code>NOTATION</code>,
<code>QName</code>
</td>
<td>
<code>String</code>
</td>
</tr>
</table>
<p>
<b>Simple type definitions</b> are built from the above types following
......@@ -257,50 +284,50 @@ Elements: E:13:<mails>
complex type.
</p>
<p>
As an example, the following XML Schema complex type:
As an example, the following XML Schema complex type (a simplified
version of the omonymous <code>envelopeType</code> defined in <local
href="manual_schema_samples">mails.xsd</local>):
</p>
<sample><![CDATA[
<xsd:complexType name="mailType">
<xsd:complexType name="envelopeType">
<xsd:sequence>
<xsd:element name="envelope" type="{{envelopeType}}"/>
<xsd:element name="body" type="{{bodyType}}"/>
<xsd:element name="From" type="xsd:string"/>
<xsd:element name="To" type="xsd:string"/>
<xsd:element name="Date" type="xsd:dateTime"/>
<xsd:element name="Subject" type="xsd:string"/>
</xsd:sequence>
<xsd:attribute use="{{required}}" name="{{id}}" type="{{xsd:integer}}"/>
</xsd:complexType>
</xsd:complexType>
]]></sample>
<p>
will be mapped to a CDuce type which must have an <tt>id</tt> attribute
of type Int and two children elements respectively of the types
corresponding to the XML Schema types <tt>envelopeType</tt> and
<tt>bodyType</tt>.
will be mapped to an XML CDuce type which must have a <tt>From</tt>
attribute of type String and four children. Among them the <tt>Date</tt>
children must be an XML element containing a record which represents a
<tt>dateTime</tt> Schema type.
</p>
<sample><![CDATA[
# #print_type Mails # mailType;;
<({{Any}}) {| {{id = Int}} |}>
[ <{{envelope}} {| |}>
[ <From {| |}> String
<To {| |}> String
<Date {| |}> {
positive = Bool;
year = Int; month = Int; day = Int;
hour = Int; minute = Int; second = Int;
timezone =? { positive = Bool; hour = Int; minute = Int }
}
<Subject {| |}> String
(<header {| name = String |}> [ String ])*
]
<{{body}} {| |}>[ Char ]
]
# #print_type Mails # envelopeType;;
<(Any) {| From = String |}>[
<From {| |}>String
<To {| |}>String
<Date {| |}>{
positive = Bool;
year = Int; month = Int; day = Int;
hour = Int; minute = Int; second = Int;
timezone =? { positive = Bool; hour = Int; minute = Int }
}
<Subject {| |}>String
]
]]></sample>
</li>
<li>
<p>
XML Schema <b>attribute declarations</b> are converted to record types
with just one field corresponding to the declared attribute.
XML Schema <b>attribute declarations</b> are converted to closed record
types with exactly one required field corresponding to the declared
attribute.
</p>
<sample>
# #print_type Person # age;;
{| {{age = 1--*}} |}
# #print_type Mails # name;;
{| {{name = String}} |}
</sample>
</li>
<li>
......@@ -320,26 +347,38 @@ Elements: E:13:<mails>
declaration.
</p>
<p>
For example, the following XML Schema element:
For example, the following XML Schema element (corresponding to the
omonymous element defined in <local
href="manual_schema_samples">mails.xsd</local>):
</p>
<sample><![CDATA[
<xsd:element name="day" type="xsd:date"/>
<xsd:element name="header">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute ref="name" use="required" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
]]></sample>
<p>
will be translated to the following CDuce type:
</p>
<sample><![CDATA[
# #print_type Calendar # day;;
<day {| |}> {
# #print_type Mails # Date;;
<Date {| |}>{
positive = Bool;
year = Int;
month = Int;
day = Int
year = Int; month = Int; day = Int;
hour = Int; minute = Int; second = Int;
timezone =? { positive = Bool; hour = Int; minute = Int }
}
]]></sample>
<p>
Note that the type of the element content <em>is not a sequence</em>
unless the translation of the XML Schema types is a sequence itself.
Note that the type of the element content <em>is not a sequence</em> (as
you can notice in the example above) unless the translation of the XML
Schema types is a sequence itself.
</p>
</li>
<li>
......@@ -353,18 +392,20 @@ Elements: E:13:<mails>
The following XML Schema attribute group declaration:
</p>
<sample><![CDATA[
<xsd:attributeGroup name="nameAttributes">
<xsd:attribute use="required" name="name" type="xsd:string" />
<xsd:attribute use="required" name="surname" type="xsd:string" />
<xsd:attribute use="optional" name="nickname" type="xsd:string" />
<xsd:attributeGroup name="mimeTypeAttributes">
<xsd:attribute name="type" type="mimeTopLevelType" use="required" />
<xsd:attribute name="subtype" type="xsd:string" use="required" />
</xsd:attributeGroup>
]]></sample>
<p>
will thus be mapped to the following CDuce type:
</p>
<sample>
# #print_type Person # nameAttributes;;
{| name = String; surname = String; nickname =? String |}
# #print_type Mails # mimeTypeAttributes;;
{| type = [
'image' | 'text' | 'application' | 'audio' | 'message' | 'multipart' | 'video'
];
subtype = String |}
</sample>
</li>
<li>
......@@ -380,20 +421,22 @@ Elements: E:13:<mails>
type sizes explosion. Thus, this kind of content models are normalized
and considered, in the type system, as sequence types.
</p>
<p>
For a similar reason, <tt>mixed</tt> content models aren't supported by
CDuce too.
</p>
<p>
As an example, the following XML Schema model group definition:
</p>
<sample><![CDATA[
<xsd:group name="family">
<xsd:group name="attachmentContent">
<xsd:sequence>
<xsd:element name="mother" type="xsd:string" />
<xsd:element name="father" type="xsd:string" />
<xsd:sequence minOccurs="0" maxOccurs="unbounded">
<xsd:choice>
<xsd:element name="son" type="xsd:string" />
<xsd:element name="daughter" type="xsd:string" />
</xsd:choice>
</xsd:sequence>
<xsd:element name="mimetype">
<xsd:complexType>
<xsd:attributeGroup ref="mimeTypeAttributes" />
</xsd:complexType>
</xsd:element>
<xsd:element name="content" type="xsd:string" minOccurs="0" />
</xsd:sequence>
</xsd:group>
]]></sample>
......@@ -401,11 +444,9 @@ Elements: E:13:<mails>
will be mapped to the following CDuce type:
</p>
<sample><![CDATA[
# #print_type Person # family;;
[ <mother {| |}> String
<father {| |}> String
(<daughter {| |}> String | <son {| |}>String)*
]
# #print_type Mails # attachmentContent;;
[ X1 <content {| |}>String | X1 ] where
X1 = <mimetype {| type = [ ... ]; subtype = String |}>[ ]
]]></sample>
</li>
</ul>
......@@ -413,8 +454,214 @@ Elements: E:13:<mails>
<box title="XML Schema validation" link="validation">
<p>
<b>TODO</b>
The processes of XML Schema validation and assessment check that an XML
Schema instance document is valid with respect to an XML Schema document and
add missing information such as default values. The CDuce's notion of Schema
validation is a bit different.
</p>
<p>
CDuce permits to have XML values made of arbitrary types, for example you
can have XML elements which have integer attributes. Still, this feature is
rarely used because the function used to load XML documents
(<code>load_xml</code>) return XML values which have as leaves values of
type PCDATA.
</p>
<p>
Once you have imported an XML Schema in CDuce, you can use it to validate an
XML value returned by <code>load_xml</code> against an XML Schema component
defined in it. The process of validation will basically build a CDuce value
which has the type corresponding to the conversion of the XML Schema type of
the component used in validation to a CDuce type. The conversion is the same
described in the previous secion. Note that is not strictly necessary that
the input XML value comes from <code>load_xml</code> it's enough that it has
PCDATA values as leaves.
</p>
<p>
During validation PCDATA strings are parsed to build CDuce values
corresponding to XML Schema simple types and whitespace are handled as
specified by XML Schema <code>whiteSpace</code> facet. For example,
validating the <code>1234567890</code> <em>PCDATA string</em> against an
<code>xsd:integer</code> simple type will return the CDuce value
<code>1234567890</code> typed with type <code>Int</code>.<br />
Default values for missing attributes or elements are also added where
specified.
</p>
<p>
You can use the <code>validate</code> keyword to perform validation in CDuce
program. The syntax is as follows:<br /> <code>validate &lt;expr&gt; with
&lt;schema_ref&gt;</code><br /> where schema_ref is defined as described
in <a href="#import">XML Schema components import</a>. Same ambiguity rules
will apply here.
</p>
<p>
More in detail, validation can be applied to different kind of CDuce values
depending on the type of Schema component used for validation.
</p>
<ul>
<li>
<p>
The typical use of validation is to validate against <b>element
declaration</b>. In such a case validate should be invoked on an XML
CDuce value as in the following example.
</p>
<sample><![CDATA[
# let xml = <Date>"2003-10-15T15:44:01Z" in
validate xml with Mails # Date;;
- : <Date {| |}>{
positive = Bool;
year = Int; month = Int; day = Int;
hour = Int; minute = Int; second = Int;
timezone =? { positive = Bool; hour = Int; minute = Int }
}
=
<Date> {
positive=`true;
year=2003; month=10; day=15;
hour=15; minute=44; second=1;
timezone={ positive=`true; hour=0; minute=0 }
}
]]></sample>
<p>
The tag of the given element is checked for consistency with the
element declaration; attributes and content are checked against the
Schema type declared for the element.
</p>
</li>
<li>
<p>
Sometimes you may want to validate an element against an XML Schema
<b>complex type</b> without having to use element declarations. This
case is really similar to the previous one with the difference that the
Schema component you should use is a complex type declaration, you can
apply such a validation to any XML value. The other important difference
is that the tag name of the given value is completely ignored.
</p>
<p>
As an example:
</p>
<sample><![CDATA[
# let xml = load_xml "envelope.xml" ;;
val xml : Any = <ignored_tag From="fake@microsoft.com">[
<From>[ 'user@unknown.domain.org' ]
<To>[ 'user@cduce.org' ]
<Date>[ '2003-10-15T15:44:01Z' ]
<Subject>[ 'I desperately need XML Schema support in CDuce' ]
<header name="Reply-To">[ 'bill@microsoft.com' ]
]
# validate xml with Mails # envelopeType;;
- : <(Any) {| From = String |}>[
<From {| |}>String <To {| |}>String
<Date {| |}>{
positive = Bool;
year = Int; month = Int; day = Int;
hour = Int; minute = Int; second = Int;
timezone =? { positive = Bool; hour = Int; minute = Int }
}
<Subject {| |}>String
<header {| name = String |}>[ String ]* ]
=
<ignored_tag From="fake@microsoft.com">[
<From>[ 'user@unknown.domain.org' ]
<To>[ 'user@cduce.org' ]
<Date> {
positive=`true;
year=2003; month=10; day=15;
hour=15; minute=44; second=1;
timezone={ positive=`true; hour=0; minute=0 }
}
<Subject>[ 'I desperately need XML Schema support in CDuce' ]
<header name="Reply-To">[ "bill@microsoft.com" ]
]
]]></sample>
</li>
<li>
<p>
Similarly you can want to validate against a <b>model group</b>. In this
case you can validate CDuce's sequences against model groups. Given
sequences will be considered as content of XML elements.
</p>
<p>
As an example:
</p>
<sample><![CDATA[
# let xml = load_xml "attachment.xml";;
val xml : Any =
<ignored_tag ignored_attribute="foo">[
<mimetype type="application"; subtype="msword">[ ]
<content>[ '\n ### removed by spamoracle ###\n ' ]
]
# let content = match xml with <_>cont -> cont | _ -> raise "failure";;
val content : Any = [
<mimetype type="application"; subtype="msword">[ ]
<content>[ '\n ### removed by spamoracle ###\n ' ]
]
# validate content with Mails # attachmentContent;;
- : [ X1 <content {| |}>String | X1 ] where
X1 = <mimetype {|
type = [
'image' | 'text' | 'application' | 'audio' | 'message' | 'multipart' | 'video'
];
subtype = String |}>[ ]
=
[ <mimetype type="application"; subtype="msword">[ ]
<content>[ '\n ### removed by spamoracle ###\n ' ]
]
]]></sample>
</li>
<!-- TODO see schema/schema_validator.mli
<li>
<p>
Is also possible to validate CDuce records against <b>attribute
declarations</b>. If the defined attribute is required, the record is
scanned for a field having the same name as the attribute. Its content
is then validated against the simple type associated to the attribute in
the schema declaration and a new record value is returned. This value is
identical to the given one except for the content of the validated
field. Validation fails if no field in the record matches the attribute
name.
</p>
<p>
If the defined attribute is not required no error is raised if the field
is missing. If a default value is specified in the attribute declaration
the returned record will have a corresponding additional field,
otherwise a record identical to the given one is returned.
</p>
<p>
As an example:
</p>
<sample><![CDATA[
# let record = { name = "User-Agent"; added_by = "mutt" };;
val record : {| name = [ 'User-Agent' ]; added_by = [ 'mutt' ] |}
=
{ name="User-Agent"; added_by="mutt" }
# validate record with Mails # name ;;
- : { name = String } = { name="User-Agent"; added_by="mutt" }
]]></sample>
</li>
-->
<li>
<p>
Finally is possible to validate records against <b>attribute groups</b>.
All required attributes declared in the attribute group should have
corresponding fields in the given record. The content of each of them is
validate against the simple type defined for the corresponding attribute
in the attribute group. Non required fields are added if missing using
the corresponding default value (if any).
</p>
<p>
As an example:
</p>
<sample><![CDATA[
# let record = { type = "image"; subtype = "png" };;
val record :
{| type = [ 'image' ]; subtype = [ 'png' ] |} =
{ type="image"; subtype="png" }
# validate record with Mails # mimeTypeAttributes ;;
- : {| type = [ 'image' | 'text' | ... ]; subtype = String |} =
{ type="image"; subtype="png" }
]]></sample>
</li>
</ul>
</box>
<box title="XML Schema instances output" link="print_xml">
......
<page name="manual_schema_samples">
<title>XML Schema sample documents</title>
<box title="Sample XML documents" link="sample">
<p>
All the examples you will see in the manual section regarding CDuce's XML
Schema support are related to the XML Schema Document <code>mails.xsd</code>
and to the XML Schema Instance <code>mails.xml</code> reported below.
</p>
</box>
<box title="mails.xsd" link="mails_xsd">
<sample><![CDATA[
<!-- mails.xsd -->
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="mails" type="mailsType" />
<xsd:complexType name="mailsType">
<xsd:sequence minOccurs="0" maxOccurs="unbounded">
<xsd:element name="mail" type="mailType" />
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="mailType">
<xsd:sequence>
<xsd:element name="envelope" type="envelopeType" />
<xsd:element name="body" type="bodyType" />
<xsd:element name="attachment" type="attachmentType"
minOccurs="0" maxOccurs="unbounded" />
</xsd:sequence>
<xsd:attribute use="required" name="id" type="xsd:integer" />
</xsd:complexType>
<xsd:element name="header">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute ref="name" use="required" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
<xsd:element name="Date" type="xsd:dateTime" />
<xsd:complexType name="envelopeType">
<xsd:sequence>
<xsd:element name="From" type="xsd:string" />
<xsd:element name="To" type="xsd:string" />
<xsd:element ref="Date" />
<xsd:element name="Subject" type="xsd:string" />
<xsd:element ref="header" minOccurs="0" maxOccurs="unbounded" />
</xsd:sequence>
<xsd:attribute name="From" type="xsd:string" use="required" />
</xsd:complexType>
<xsd:simpleType name="bodyType">
<xsd:restriction base="xsd:string" />
</xsd:simpleType>
<xsd:complexType name="attachmentType">
<xsd:group ref="attachmentContent" />
<xsd:attribute ref="name" use="required" />
</xsd:complexType>
<xsd:group name="attachmentContent">
<xsd:sequence>
<xsd:element name="mimetype">
<xsd:complexType>
<xsd:attributeGroup ref="mimeTypeAttributes" />
</xsd:complexType>
</xsd:element>
<xsd:element name="content" type="xsd:string" minOccurs="0" />
</xsd:sequence>
</xsd:group>
<xsd:attribute name="name" type="xsd:string" />
<xsd:attributeGroup name="mimeTypeAttributes">
<xsd:attribute name="type" type="mimeTopLevelType" use="required" />
<xsd:attribute name="subtype" type="xsd:string" use="required" />
</xsd:attributeGroup>
<xsd:simpleType name="mimeTopLevelType">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="text" />
<xsd:enumeration value="multipart" />
<xsd:enumeration value="application" />
<xsd:enumeration value="message" />
<xsd:enumeration value="image" />
<xsd:enumeration value="audio" />
<xsd:enumeration value="video" />
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
]]></sample>
</box>
<box title="mails.xml" link="mails_xml">
<sample><![CDATA[
<!-- mails.xml -->
<mails>
<mail id="0">
<envelope From="bill@microsoft.com">
<From>user@unknown.domain.org</From>
<To>user@cduce.org</To>
<Date>2003-10-15T15:44:01Z</Date>
<Subject>I desperately need XML Schema support in CDuce</Subject>
<header name="Reply-To">bill@microsoft.com</header>
</envelope>
<body>
As subject says, is it possible to implement it?
</body>
<attachment name="signature.doc">
<mimetype type="application" subtype="msword"/>