Commit f3797de3 authored by Pietro Abate's avatar Pietro Abate
Browse files

[r2005-05-17 11:35:33 by beppe] updated for Unicode chars and char_to_int

Original author: beppe
Date: 2005-05-17 11:35:33+00:00
parent d03f251f
......@@ -216,68 +216,6 @@ fun (Any -> Int)
</box>
<box title="Sequences" link="seq_exp">
<p>
The concatenation operator is written <code>@</code>. There
is also a <code>flatten</code> operator which takes a sequence of
sequences and returns their concatenation.
</p>
<p>
There are two built-in constructions to iterate over a sequence.
Both have a very precise typing which takes into account
the position of elements in the input sequence as given by
its static type. The <code>map</code> construction is:
</p>
<sample><![CDATA[
map %%e%% with
| %%p1%% -> %%e1%%
%%...%%
| %%pn%% -> %%en%%
]]></sample>
<p>
Note the syntactic similarity with pattern matching. Actually,
<code>map</code> is a pattern matching form,
where the branches are applied in turn to each element of the
input sequence (the result of the evaluation of <code>%%e%%</code>).
The semantics is to return a sequence of the same length, where
each element in the input sequence is replaced by the result of
the matching branch.
</p>
<p>
Contrary to <code>map</code>, the <code>transform</code> construction
can return a sequence of a different length. This is achieved
by letting each branch return a sequence instead of a single
element. The syntax is:
</p>
<sample><![CDATA[
transform %%e%% with
| %%p1%% -> %%e1%%
%%...%%
| %%pn%% -> %%en%%
]]></sample>
<p>
There is always an implicit default branch <code>_ -> []</code>
at then end of <code>transform</code>, which means that
unmatched elements of the input sequence are simply discarded.
</p>
<p>
Note that <code>map</code> can be simulated by <code>transform</code>
by replacing each expression <code>%%ei%%</code> with
<code>[ %%ei%% ]</code>.
</p>
<p>
Conversely, <code>transform</code> can be simulated by
<code>map</code> by using the <code>flatten</code> operator.
Indeed, we can rewrite <code>transform %%e%% with %%...%%</code>
as <code>flatten (map %%e%% with %%...%% | _ -> [])</code>.
</p>
</box>
<box title="Exceptions" link="exn">
<p>
......@@ -466,6 +404,70 @@ of the mismatch is raised.
</p>
</box>
<box title="Sequences" link="seq_exp">
<p>
The concatenation operator is written <code>@</code>. There
is also a <code>flatten</code> operator which takes a sequence of
sequences and returns their concatenation.
</p>
<p>
There are two built-in constructions to iterate over a sequence.
Both have a very precise typing which takes into account
the position of elements in the input sequence as given by
its static type. The <code>map</code> construction is:
</p>
<sample><![CDATA[
map %%e%% with
| %%p1%% -> %%e1%%
%%...%%
| %%pn%% -> %%en%%
]]></sample>
<p>
Note the syntactic similarity with pattern matching. Actually,
<code>map</code> is a pattern matching form,
where the branches are applied in turn to each element of the
input sequence (the result of the evaluation of <code>%%e%%</code>).
The semantics is to return a sequence of the same length, where
each element in the input sequence is replaced by the result of
the matching branch.
</p>
<p>
Contrary to <code>map</code>, the <code>transform</code> construction
can return a sequence of a different length. This is achieved
by letting each branch return a sequence instead of a single
element. The syntax is:
</p>
<sample><![CDATA[
transform %%e%% with
| %%p1%% -> %%e1%%
%%...%%
| %%pn%% -> %%en%%
]]></sample>
<p>
There is always an implicit default branch <code>_ -> []</code>
at then end of <code>transform</code>, which means that
unmatched elements of the input sequence are simply discarded.
</p>
<p>
Note that <code>map</code> can be simulated by <code>transform</code>
by replacing each expression <code>%%ei%%</code> with
<code>[ %%ei%% ]</code>.
</p>
<p>
Conversely, <code>transform</code> can be simulated by
<code>map</code> by using the <code>flatten</code> operator.
Indeed, we can rewrite <code>transform %%e%% with %%...%%</code>
as <code>flatten (map %%e%% with %%...%% | _ -> [])</code>.
</p>
</box>
<box title="XML-specific constructions" link="xml_exp">
<section title="Loading XML documents">
......@@ -573,6 +575,30 @@ which are not matched and are not XML elements are copied verbatim.
</box>
<box title="Unicode Strings" link="ustr">
<p>
Strings are nothing but sequences of characters, but in view of their
importance when dealing with XML we introduced the standard double
quote notation. So <code>[ 'a' '⇔' 'c' ]</code> can be
written as <code>"a⇔c"</code>. In double quote all the
<i>values</i> of type <code>Char</code> can be used: so besides Unicode chars we
can also double-quote codepoint-defined characters (<code>\x%%h%%;
\%%d%%; </code> where <code>%%h%%</code> and <code>%%d%%</code> are
hexadecimal and decimal integers respectively), and backslash-escaped
characters (<code>\t</code> tab, <code>\n</code> newline,
<code>\r</code> return, <code>\\</code> backslash). Instead we
cannot use character expressions that are not values. For instance, for
characters there is the built-in function <code>char_of_int : Int
-> Char</code> which returns the character corresponding to the given
Unicode codepoint (or raises an exception for a non-existent
codepoint), and this can only be used with the regular sequence
notation, thus <code>"a⇔c"</code>, <code>"a"@[(char_of_int
8916)]@"c"</code>, and <code>"a\8916;c"</code> are equivalent expressions.
</p>
</box>
<box title="Converting to and from string" link="str">
<section title="Pretty-printing a value">
......
......@@ -173,8 +173,8 @@ integers, characters, and atoms. To each kind corresponds a family of types.
quote can also be escaped, but this is not mandatory.
The usual <code>'\n', '\t', '\r'</code> are recognized.
Arbitrary Unicode codepoints can be written in decimal
<code>'\%%i%%;</code> (<code>%%i%%</code> is an decimal integer; note that the code is ended by a semicolon) or
in hexadecimal <code>'\x%%i%%;</code>. Any other occurrence of
<code>'\%%i%%;'</code> (<code>%%i%%</code> is an decimal integer; note that the code is ended by a semicolon) or
in hexadecimal <code>'\x%%i%%;'</code>. Any other occurrence of
a backslash character is prohibited.
<ul>
......
......@@ -39,9 +39,18 @@ _" character, starting by a capitalized letter or underscore.</li>
</li>
<li>Unicode characters:
<ul>
<li>Values: <code>'a','b','c'...</code> </li>
<li>Values: quoted characters (<code>'a'</code>, <code>'b'</code>,
<code>'c'</code>, ...,<code>'あ'</code>, <code>'い'</code>, ... ,
<code>'私'</code>, ... , <code>'⊆'</code>, ...),
codepoint-defined characters (<code>'\x%%h%%;' '\%%d%%;'
</code> where <code>%%h%%</code> and
<code>%%d%%</code> are hexadecimal and decimal integers
respectively), and backslash escaped characaters
(<code>'\t'</code> tab, <code>'\n'</code> newline,
<code>'\r'</code> return, <code>'\\'</code> backslash).</li>
<li>Types: intervals <code>'a'--'z', '0'--'9'</code>,
singletons <code>'a','b','c',...</code> </li>
<li>Operators: <code>char_of_int</code> : Int -> Char</li>
</ul>
</li>
<li>Symbolic atoms:
......@@ -79,6 +88,7 @@ _" character, starting by a capitalized letter or underscore.</li>
<br/><code>int_of</code> : String -> Int,
<br/><code>float_of</code> : String -> Float,
<br/><code>string_of</code> : Any -> Latin1,
<br/><code>char_of_int</code> : Int -> Char,
<br/><code>atom_of</code> : String -> Atom,
<br/><code>system</code> : Latin1 -> { stdout = Latin1; stderr = Latin1;
status = (`exited,Int) | (`stopped,Int) | (`signaled,Int)
......
......@@ -467,7 +467,7 @@ text-align:center; color: #aa0000; font: bold 200% helvetica" >
<html>[
<head>[
<title>[ !site ': ' !title ]
<meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type">[]
<meta content="text/html; charset=utf8" http-equiv="Content-Type">[]
<style type="text/css">style
]
<body style="margin: 0; padding : 0; background: #fcb333"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment