We will tackle these challenges following three research directions:
\begin{description}
\item[query languages:] one of the characteristics of the relation model is to
base query languages on the relational algebra or the relational calculus.
These are paradigms characterized by {\it high declarativity\/} (in the
sense that they describe the result rather the way to obtain the result)
and limited expressiveness (notably, they are not Turing complete). The
``simplicity'' of these languages is at the origin of the good
performances, performances that can be improved by using the algebraic
properties of the operators (logical optimization) or by secondary memory
management techniques (physical optimization). Our goal is to develop a
similar, or at least close, framework for the XML model, and we will
pursue it as follows: theoretical study of the expressiveness and
complexity of the query languages; definition of query languages for XML
and their implementation; definition and validation of optimization
techniques.
\item[streaming:] the possibility of process streams of data without needing of storing whole documents (if not partially) is crucial in the context of datamasses. We will consider the
aspects related to streaming also when the data is compressed.
always possible~\cite{segoufin1}, so one of the main difficulty to
overcome here is to identify a suitable class of ``streamable''
queries, with or without compression, and in the former case to
determine optimal compression granularity.
\item[document typing :] type systems are used in the first place for
document validation and for checking integrity constraints, but as
with standard programming languages, types are at the basis of many
helpful optimizations. This makes the study of typing systems one of
our primary objectives.
Another motivation for line of work is our interest in integrity
constraints whose satisfaction does not depend on the ordering of
the fields in a document, unlike the constraints expressible in
``classical'' type systems for XML such as DTD. This is a natural
choice when processing data originating from the fusion of several
relational databases (a frequent instance of large documents), since
the order of the fields is then irrelevant.
\end{description}
The groups involved in our project have each already been working
separately on XML document handling, although this is only one of the
incentives for us to work together. Indeed, we share the same
fundamental theoretic approach, namely automata theory and the
associated logics, and the same interest in query languages and