Commit 5c29e471 authored by Pietro Abate's avatar Pietro Abate
Browse files

[r2004-06-21 20:29:33 by beppe] Empty log message

Original author: beppe
Date: 2004-06-21 20:29:33+00:00
parent 95d3d324
......@@ -27,7 +27,77 @@ href="http://www.telecom.gouv.fr/rntl/">RNTL</a>) and aims at ...
</box>
<box title="Tralala" link="Tralala">
<p>Tralala est ...</p>
<p>
This ACI is motivated by the increasing number of applications that
produce, consume or handle large sets of data, or
``\emph{datamasses}''. In many cases, these are either raw data or a
collection of data from various sources, both of which lack uniform
descriptive criteria. Such cases require more flexibility than the
classical relational model can provide, and have given rise to the
so-called semi-structured data model~\cite{serge99}, of
which XML is one of the most prominent examples.
Our project intends to study the processing, querying and handling of large
datamasses whenever data is available in XML format. We pay particular attention
to the programming languages and query languages problems. We aim to cover in a
uniform way a wide spectrum of different areas, namely: {\bf programming
languages} (expressiveness, typing, new programming primitives, query underlying
logics, logical optimization), {\bf data access\/} (streamed data, compression,
access to secondary memory storages, persistency engines), {\bf implementation}
(pattern matching compiling, physical optimization, subtyping verification,
execution models for streamed data).
We will tackle these challenges following three research directions:
\begin{description}
\item[query languages:] one of the characteristics of the relation model is to
base query languages on the relational algebra or the relational calculus.
These are paradigms characterized by {\it high declarativity\/} (in the
sense that they describe the result rather the way to obtain the result)
and limited expressiveness (notably, they are not Turing complete). The
``simplicity'' of these languages is at the origin of the good
performances, performances that can be improved by using the algebraic
properties of the operators (logical optimization) or by secondary memory
management techniques (physical optimization). Our goal is to develop a
similar, or at least close, framework for the XML model, and we will
pursue it as follows: theoretical study of the expressiveness and
complexity of the query languages; definition of query languages for XML
and their implementation; definition and validation of optimization
techniques.
\item[streaming:] the possibility of process streams of data without needing of storing whole documents (if not partially) is crucial in the context of datamasses. We will consider the
aspects related to streaming also when the data is compressed.
always possible~\cite{segoufin1}, so one of the main difficulty to
overcome here is to identify a suitable class of ``streamable''
queries, with or without compression, and in the former case to
determine optimal compression granularity.
\item[document typing :] type systems are used in the first place for
document validation and for checking integrity constraints, but as
with standard programming languages, types are at the basis of many
helpful optimizations. This makes the study of typing systems one of
our primary objectives.
Another motivation for line of work is our interest in integrity
constraints whose satisfaction does not depend on the ordering of
the fields in a document, unlike the constraints expressible in
``classical'' type systems for XML such as DTD. This is a natural
choice when processing data originating from the fusion of several
relational databases (a frequent instance of large documents), since
the order of the fields is then irrelevant.
\end{description}
The groups involved in our project have each already been working
separately on XML document handling, although this is only one of the
incentives for us to work together. Indeed, we share the same
fundamental theoretic approach, namely automata theory and the
associated logics, and the same interest in query languages and
document validation: typing, integrity constraints
Beyond our agreement on foundational tools and our agreement on goals,
cooperation inside the project is further strengthened by the choice
of a single software target, the CDuce language~\cite{BCF02,CDuce}, a
joint development of LIENS and LRI, two of the sites involved in this
project.
</p>
<p>More information about the project can we found in the following page on
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment