
Querying XML in the New Millennium
Back to project
main page | Back to project
home The site is still under construction.
Kweelt is a framework to query XML data. Among other things, it
offers an evaluation engine for the Quilt XML
query language proposed by Chamberlin, Florescu and Robbie, with a
lot of useful extensions. Using Kweelt, one can run all the use-cases
published by W3C for the XML query requirements.
But Kweelt is not just an implementation of a query language. It has
been designed as a reference platform to make all sort of research
experiments related to XML: storage, query optimization, document
output, etc.
Some important features of Kweelt are:
- Kweelt implements a query language for XML that satisfies all
the requirements from the W3C query-language-requirements.
- Kweelt offers multiple XML back-ends. The query evaluator does not
impose any specific storage for XML but relies on a set of interfaces
(Node and NodeList) implemented by a
NodeFactory. Kweelt comes with a couple of built-in node
factories (DOM, DOM+, SAX, Wizdom), but the user can easily provide
his/her own.
- Kweelt is open-source (GPL) to allow people to make more
radical changes to the framework itself.
- Kweelt is fully written in Java 1.2, can run an any
Java platform and has a small footprint.
- Kweelt is extensible. The user can create his/her own user-defined
functions (UDF) and make them available inside the query. Kweelt
provides various template classes to make the creation of such
functions very easy.
- Kweelt comes with the Kweelt Server Pages (KSP) extension,
a built-in Cocoon
processor. KSP allows to embed Kweelt queries inside any XML page
serviced by Cocoon.
- Kweelt comes with numerous working examples.
The short answer is everybody.
More seriously here are some domains that could take advantage of Kweelt:
- EDUCATION
Kweelt is a great way to teach XML is a practical way. Because of its
open architecture it is suitable for various class projects from simple to
highly complex ones.
- DATABASE RESEARCH
Kweelt is a nice framework to validate optimization techniques,
storage, caching, mapping, etc. This is also a useful and highly tunable
component for information integration architectures.
- APPLICATION SERVICE PROVIDING (ASP) and WEB INTEGRATION
Kweelt is the first framework to comply with all the query
requirements from W3C. Its query language based on Quilt leverages on
the XPath standard. Because it is storage-independent, Kweelt can be
put on top of any XML storage backend. As of this writing, an
relational backend (where XML documents are stored as relational data
and navigation primitives are translated on the fly as SQL queries) is
under construction.
Kweelt is a "personal" adaptation of the Quilt proposal where some
features have been removed, some have been adapted, other have been
added. Kweelt does not claim to be faithful to the Quilt
proposal. It does try to offer an intuitive, powerful and extensible
syntax to query (navigate, extract, compose, reconstruct) XML
documents.
| In-Lined XML |
| Description |
In-lined XML is a way to embed some small XML pieces inside a
query itself. The syntax is very similar to the XML CDATA: XML data is
embedded inside Kweelt by using outer [[ and ]].
The string is simply sent to DOM parser to produce regular XML nodes.
|
| Rationale | Since XML offers a nice syntax to describe any kind of
data, this is in particular adequate to describe some contant values
that can be used as part of a query. Check the portfolio
query for an example. |
| Java external functions |
| Description |
Users can write their own Java code (functions) and call it from
the query. Such functions have to implement the right interface or
extend some template (abstract) classes and must be registered inside
the query using the import ... as ...; construct.
Check Q9 for an example.
|
| Rationale | To be able to extend the language without touching the
implementation itself. |
| Typed references (IDREFs) |
| Description |
Quilt has introduced the "arrow operator" that permits to dereference
an IDREF attribute and get to the corresponding element(s) (pointed by
the value of the IDREF). For instance, in Quilt the construct
@spouse->/@name will return the list of attribute nodes
name for elements with an id attribute equal to the value of
attribute spouse.
In Kweelt, the user user can give some hints to the query
processor. A hint consists of a pair (element-tag-name,
attribute-name) and tells Kweelt that the reference points to an
element with a given element-tag-name, which the id attribute
is defined by attribute-name. The Kweelt syntax is:
@spouse->{Person@pid, Student@sid}/@name. This is to be
understood as follows: grab the Person elements for which
attribute pid is equal to the value of attribute
spouse or the Student elements for which attribute
sid is equal to the value of attribute spouse.
Check the queries from the REF use-cases
for some examples.
|
| Rationale |
First, it is worth keeping in mind that the arrow operator is just
syntactic sugar and can be described as a regular join. The
problem is that the arrow operator (just like the id function
is XPath) requires some structural information (DTD, Schema, etc.)
about the XML document. Without such information, the operator has no
meaning because there is no way to guess which attribute has to be
considered as an ID. Moreover, even in the presence of
structural information, following a reference means to look at ALL
elements and find out if their ID attribute is equal to the
value.
In Kweelt, we do not assume that every document comes with some
structural information. Therefore the user is strongly encouraged to
provide the piece of information necessary to give a meaning to the
query. A hint is a way to tell Kweelt which elements to look
for, by somehow typing the reference. This also makes the
query easier to read.
We think that this addition is useful for various reasons.
- it is optional and can be added by hand by the user
- it gives more readability to queries by identifing the type of
the element that gets pointed to
- it does not force Kweelt to look for the DTD/Schema
- it is compatible with any kind of structural description (DTD,
Schema, etc.)
- one can easily imagine a preprocessing step that would retrieve
the DTD/Schema of the documents used in the query and adorn the Kweelt
query with such hints
When there is no hint, our implementation will assume that the
name of the attribute ID is either "ID" or "id". It turns out that
most users are using either "ID" or "id" as the attribute name for the
attribute of type ID. Check the Torquemada project for more info.
|
| LET .. EVAL |
| Description |
??
|
| Rationale |
Why do we need it? LET .. RETURN seems to be enough.
|
| XML comments and PI creation |
| Description |
A Kweelt query cannot create comment or processing-instructions.
|
| Rationale |
None. Laziness. But it is trivial to do.
|
| Namespaces |
| Description |
Kweelt ignores namespaces. xhtml:html is considered as a
regular tag name and the inner tags do not inherit the xhtml
namespace.
|
| Rationale |
None.
|
The Kweelt distribution can be downloaded from this page.
You can choose
between various formats (zip or tar.gz).
By using the URL, you accept the conditions of the license agreement.
For some other packages needed by the distribution, we provide below
some pointers and - when it is legally possible - a copy of the jar
file. The copy is not always the latest version. You are encouraged to
download from the official site in order to get the latest version,
the release notes and the documentation.
Depending on what one wants to do, the set-up might be slightly
different. You can use your favorite Java 1.2 implementation. We
recommend the one from IBM. The default NodeFactory is based on
Apache Xerces.
For the rest of the section, we will use in the definition of the
CLASSPATH the name of the jar file. For the actual
configuration one will have to prodide the full path instead of just
the name of the jar file.
| kweelt.jar | the Kweelt package |
| rt.jar | the Java 1.2 runtime classes |
| xerces.jar | Apache DOM/SAX package |
| regex.jar | Apache regular expression package |
export CLASSPATH=rt.jar:kweelt.jar (Unix/bash)
set CLASSPATH=rt.jar;kweelt.jar (Win)
export CLASSPATH=rt.jar:kweelt.jar:xerces.jar (Unix/bash)
set CLASSPATH=rt.jar;kweelt.jar;xerces.jar (Win)
If you decide to use user-defined functions, make sure that you also
include them (as a jar file or as a directory) in your classpath.
export CLASSPATH=rt.jar:kweelt.jar:xerces.jar:regex.jar
(Unix/bash)
set CLASSPATH=rt.jar;kweelt.jar;xerces.jar;regex.jar
(Win)
If you want to use Kweelt as part of Cocoon, you first need to have Cocoon up and running. Check the Cocoon Web site for some installation information.
Once Cocoon is up-and-running, you need to do two things: (1) tell the servlet engine that Cocoon uses where the Java classes needed by Kweelt are located; (2) register Kweelt as a Cocoon processor.
If you use JServ as your servlet engine, you need to add the following lines in the jserv.properties file:
jserv.properties:wrapper.classpath=rt.jar
jserv.properties:wrapper.classpath=kweelt.jar
jserv.properties:wrapper.classpath=regex.jar
Also make sure that JServ will use a Java 1.2 VM, by having the following line in the jserv.properties file:
wrapper.bin=path_to_java_1.2/bin/java
To register Kweelt in Cocoon, simply add the following line in the cocoon.properties file:
processor.type.kweelt = xacute.ksp.KweeltProcessor
To test for the installation, you simply need to run Kweelt using the
-v switch to get the version number.
java -classpath $CLASSPATH xacute.quilt.Main -v (Unix/bash)
java -classpath %CLASSPATH% xacute.quilt.Main -v (Win)
You should get the following answer:
 |  |
[sahuguet@isis]$ java xacute.quilt.Main -v
+---------------------------------------------+
| Kweelt version 1.04 / Bergerac / 2000-09-01 |
| Copyright 2000, Arnaud Sahuguet |
| MAIL: Arnaud.Sahuguet@polytechnique.org |
| URL: http://db.cis.upenn.edu/Kweelt |
+---------------------------------------------+
[sahuguet@isis]$
|
|
Running Kweelt with the -v switch
|
This is not tremendously informative, but it proves that the Kweelt
classes are properly installed. The Kweelt command-line provides a way
to check which third-party packages that might be needed are available.
To check for that, simply run Kweelt with the
--check-classpath switch.
 |  |
[sahuguet@isis]$ java xacute.quilt.Main --check-classpath
Looking for available packages:
(*) package Java 2 (mandatory)
FOUND
(*) package W3C DOM (mandatory)
FOUND
(*) package SAX (mandatory)
FOUND
(*) package Xerces DOM/SAX parsers (mandatory)
FOUND
(*) package Sun Project X DOM/SAX parsers (optional)
NOT FOUND
(*) package Oracle DOM parser (optional)
NOT FOUND
(*) package Apache Regular Expression (optional)
FOUND
[sahuguet@isis]$
|
|
Running Kweelt to test for installed packages
|
This way you can check which packages are available in the CLASSPATH
you are using to run Kweelt.
The Kweelt distribution comes with numerous examples located in the useCases folder. We provide examples from the
W3C use-cases and other ones that illustrate other features of the
framework.
| XMP |
Use Case "XMP": Experiences and Exemplars |
| TREE |
Use Case "TREE": Queries that preserve hierarchy |
| SEQ |
Use Case "SEQ" - Queries based on Sequence |
| R |
Use Case "R" - Access to Relational Data |
| SGML |
Use Case "SGML": Standard Generalized Markup Language |
| TEXT |
Use Case "TEXT": Full-text Search |
| NS |
Use Case "NS" - Queries Using Namespaces |
| PARTS |
Use Case "PARTS" - Recursive Parts Explosion |
| REF |
Use Case "REF" - Queries based on References |
| AdvancedExamples |
Advanced Examples |
Usually query examples require some local files. Make
sure that you run the query from the correct folder, or modify the
query to point to the right document.
To recompile the source code, you need: (1) to adapt the rules.mk file by providing the right location for Java libraries and runtime; (2) set the environment variable TOP to the src folder. The easiest way is to go to the src directory (where the gnu and xacute folders are located) and type export TOP=`pwd` if you are using bash or setenv TOP `pwd` if you are using tcsh.
Then run the makefile (from any location inside the source tree) by typing make -e (the -e asks make to use environment variables).
If you are using Windows, the best way is to install Cygwin, a Unix shell that runs under Windows.
The design of Kweelt has been guided by modularity. The core of the
system is the xacute.quilt package that contains - among
other things - the query parser and the query evaluator.
The xacute.quilt package relies on interfaces and constants
defined in the xacute.common package. The
xacute.quilt package is not bound to any XML related
implementation. Everthing is defined in terms of
xacute.common.Node and xacute.common.NodeList. Access
to the implementation of Node and NodeList is
performed via the xacute.common.NodeFactory.
Some implementations for interfaces xacute.common.Node,
xacute.common.NodeList are located in the
xacute.impl package. xacute.impl.dom offers an
implementation based on the DOM interface. xacute.impl.xdom
offers an implementation based on the Xerces DOM implementation where
some methods not provided as part of the DOM interface are being used
for better performance. Implementations rely on default
implementations (abstract classes) from the xacute.util package. Should you want
to write your own implementation for these interfaces, you are
strongly encouraged to subclass the abstract classes from
xacute.util.
Package xacute.util also offers various utility functions
used by the Kweelt framework.
The Kweelt evaluation engine is organised in terms of
expressions, the top-level class being
QuiltExpression. Every QuiltExpression must
implement the method eval(EvalContext con) that returns a
xacute.quilt.Value.
xacute.quilt.Value is an interface for the Kweelt base types:
ValueBool, ValueString, ValueNum (every
number is a represented as float), ValueNode and
ValueNodeList. ValueNode and ValueNodeList
are just wrapping classes for xacute.common.Node and
xacute.common.NodeList.
EvalContext is a class used to carry the evaluation context
along the various steps of the evaluation of a query. Among other
things, it contains: (1) XPath-related information: the current node,
the position of the current node in the current nodelist and the size
of the nodelist; (2) variable bindings for LET and FOR;
(3) bindings for user-defined functions; (4) information about the
NodeFactory to be used for node creation.
In most cases, Kweelt does have to worry about nodes. The node
management is delegated to the NodeFactory. Nodes are created
by a specific implementation and Kweelt does not have to know anything
about that. Nodes are created usually by parsing directly an XML file
and building an in-memory representation of it, and are manipulated
via the interfaces we mentioned above.
The problems come when the query requires to create some new
nodes, nodes that do not belong to any physical XML
document. This happens when the query creates some XML elements for
instance, when the query uses operators like SHALLOW or
FILTER. In this case, we need to have a way to create new
nodes. We could require the NodeFactory to provide ways to
create new nodes. But what if the NodeFactory relies on
SAX-based back-end? How do we create nodes? What if the
NodeFactory relies on a SQL back-end? Do we need to make a
SQL insert for every new node we want to create.
The solution we adopt is to have Kweelt take care of the new
nodes. Hopefully, most of the nodes will be physical nodes and the
number of new will be limited. Kweelt defines classes for text
nodes (xacute.util.QuiltTextNode), element nodes
(xacute.util.QuiltElementNode), attribute nodes
(xacute.util.QuiltAttributeNode) and shallow nodes
(xacute.util.QuiltShallowNode).
This solves a lot of headaches and make the NodeFactory
interface very simple (basically, it is read only). On the other hand,
we need to make sure that these new behave like real nodes
and that, in particular, offer the same navigation capabilities along
the various XPath axis. This is crucial when queries are nested and
the result of the first one is used by the second.
Another issue is whether to copy nodes or not. When Kweelt evaluates a
query and that a given node is going to be part of the result, should
the node be copied and added to the result or should a pointer only be
added to the result. The result of a query is in most cases
a nodelist of pointers to physical nodes.
We have decided to use pointers whenever possible to save on
resources. This is in particular the case when the SHALLOW
operator is used: instead of duplicating the node, we use a
QuiltShallowNode which is simply a wrapper on top of a
regular Node, where the node behavior has been modified to
reflect the semantics of SHALLOW.
When we need to produce (output) the result, we simply navigate the
structure by following pointers. This way, we can take advantage of
the underlying XML back-end. In the current implementation, the output
of a result uses SAX event, which means that the result is NEVER
really materialized. An extreme case we plan to investigate is when
the documents we are handling are huge and stored in a database
back-end. Assuming the result is small (in terms of number of
pointers), Kweelt should be able to evaluate and output the query with
minimal resources, all the hard work being performed by the database
back-end.
 |
|
Kweelt Core Architecture
|
| Name | Type | Classname (xacute.impl) |
Status |
| Xerces parser | DOM |
.dom.NodeFactoryXerces |
operational |
| Xerces parser | DOM + node index |
.xdom.NodeFactoryXerces |
operational |
| Oracle parser | DOM |
.dom.NodeFactoryOracle |
operational |
| Sun parser | DOM |
.dom.NodeFactorySun | operational |
| SAX-LD | SAX + naive linear array-based storage |
.ldtree.NodeFactory | almost ready for
prime-time |
| Wizdom | binary file storage |
.wizdom.NodeFactory | almost ready for
prime-time |
| mySQL | relational back-end |
.sql.NodeFactory | under
development (David White) |
 |
|
Kweelt: Cocoon-based Deployment Architecture
|
Executing a query is a very simple process: (1) create an instance
of the parser; (2) parse the query into a QuiltQuery object;
(3) evaluate it.
The evaluation requires an EvalContext. The role of the
EvalContext is to provide information about the
NodeFactory. The result of the evaluation always produces some SAX events and therefore the eval method requires a SAX DocumentHandler as an argument.
Kweelt provides various DocumentHandler in the
xacute.util package, such as: ToStringHandler to
produce a String; OutputHandler to send the result
on some output stream; HashHandler to produce a MD5 hash of
the XML output; DevNullHandler to do nothing.
 |  |
String s;
QuiltParser parser = new QuiltParser();
QuiltQuery query = parser.parseQuery(s);
EvalContext con = new EvalContext();
con.setNodeFactory( new MyNodeFactory() );
DocumentHandler handler = new MyOutputHandler();
query.eval(handler, con);
|
|
Using the Kweelt API
|
You can enrich Kweelt with some arbitrary functions defined as Java
classes. To do so, you must: (1) write your function by subclassing
one of the function template classes; (2) compile it and make sure
that it is in a location reachable by the Java classpath; (3) import
the function is the Kweelt query using the import keyword.
- The DATE function, from the core library
- The Split
function, used to split a string into a list of strings (nodelist of
text nodes)
- The Like function,
used to test whether a string satisfies a regular expression.
An example of a query that uses imported functions can be
found in the UDF directory.
Writing your own NodeFactory is a different kind of
undertaking and you should think twice before doing it. The
NodeFactory is in charge of: (1) parsing XML, (2) instanciating
nodes that support the Kweelt Node interface and (3)
offering Node and NodeList primitives.
Kweelt comes with default implementations (in the form of abstract
classes) for NodeFactory, NodeImpl and
NodeListImpl. By subclassing them - altogether or separately
- the extra work you need to provide is small.
If you feel the need to write your own implementation, you will have
to provide the implementation for a class that implements the
NodeFactory interface. Since NodeFactory use both
NodeImpl and NodeListImpl, you will have to provide
some implementation for them too.
The JavaDoc documentation is available from here.
KSP stands for Kweelt Server Pages. KSPs are a way to embed Kweelt
queries into XML pages (in the spirit of server-side-includes,
active-server-pages, etc.).
Instead of re-inventing the wheel, we provide KSP as a special Cocoon processor. This way, you can take advantage
of all the nice features of Cocoon and use Kweelt as another way to
produce content. Moreover, the architecture of Cocoon permits to
postprocess the output of KSP using other Cocoon processors
(i.e. XSLT).
Maiking Kweelt a new Cocoon processor is only a few lines of Java code
(thanks to the modular design of Cocoon, I guess). Feel free to modify
the code to meet
your specific needs.
You can embed as many queries as you want
per page. A query is introduced using an outer kweelt tag and
an inner kweelt-query tag. The easiest way is to write the
Kweelt query as a CDATA node, to avoid the burden of escaping
characters. An example is provided below.
 |  |
<?xml version="1.0"?>
<?cocoon-process type="kweelt"?>
[...]
<kweelt>
<kweelt-query>
<![CDATA[ put your query here ]]>
</kweelt-query>
</kweelt>
[...]
|
|
KSP: the input
|
The first processing-instruction tells Cocoon how to process the XML
page (the XML document is directly sent to the Kweelt processor).
The Kweelt processor will extract the information inside the
kweelt tag and will replace the subtree rooted at
kweelt-query by a new subtree rooted at
kweelt-result and containing the result of the query inside
tag data and some other meta-information available for
further processing, such parsing and execution time, etc.
 |  |
<?xml version="1.0"?>
[...]
<kweelt>
<kweelt-result>
<data>
the result of the evaluation of the Kweely query
</data>
</kweelt-result>
</kweelt>
[...]
|
|
KSP: the output
|
It is often useful to be able to pass parameters to the KSP. For
instance, one can write a KSP to produce the list of publication for a
given author. The template is the same for every author: the only
difference is the name of the author. KSP supports parameterized
queries. In the query itself, parameters are introduced using
{@var}. Parameters are transmitted to the KSP via the HTTP
QUERY_STRING.
For instance the following page (say publicationBy.xml)
should be called as
publicationBy.xml?author=Arnaud&after=1995. The Kweelt
processor will extract the values of the paramters and replace
them in the query itself: {@author} will be replaced with
arnaud and {@after} will be replace with 1995.
 |  |
<?xml version="1.0"?>
<?xml-stylesheet href="my-stylesheet.xsl" type="text/xsl"?>
<?cocoon-process type="kweelt"?>
<?cocoon-process type="xslt"?>
<doc>
<H1>First Query</H1>
<kweelt>
<kweelt-query>
<![CDATA[
<P>
<UL>
LET $items := document("bib.xml")/*[@year .>=. {@after}][CONTAINS(author, "{@author}")][count(author)=1]
FOR $item IN $items
RETURN <LI>$item/title/text(), " ", <U>$item/@year</U></LI> SORTBY (./U DESCENDING)
</UL>, CONCAT( " ", NUMFORMAT("##", count($items)), " items found.")
</P>
]]>
</kweelt-query>
</kweelt>
</doc>
|
|
KSP using parameters
|
The {@var} syntax is only recognized by KSP and should be
viewed as a pre-processing instruction. This is NOT part of the
Kweelt query language itself.
To wrap-up, the DTD used for the Kweelt Server Pages is defined below:
 |  |
<!ELEMENT kweelt (kweelt-query|keelt-result)>
<!ELEMENT kweelt-query (#PCDATA)>
<!ELEMENT kweelt-result (data, parsingLog, ExecLog) >
<!ATTLIST kweelt-result parsing "OK"|"FAIL" >
<!ATTLIST kweelt-result parsingTime CDATA >
<!ATTLIST kweelt-result execution "OK"|"FAIL" >
<!ATTLIST kweelt-result executionTime CDATA >
<!ELEMENT parsingLog (#PCDATA)>
<!ELEMENT ExecLog (#PCDATA)>
<!ELEMENT data ANY>
|
|
Kweelt DTD
|
The page generated by the Kweelt processor can be further processed
using XSLT for instance. The distribtion provides various examples.
Kweelt comes with a test suite borrowed from the W3C use-cases. The examples from the test
suite can be used to validate the parser and validate the evaluation
engine.
To make the testing easier, we provide two bash scripts
test-parser.sh and test-engine.sh. They will need to
be configured by the user to specify the location of the Java runtime
and the Java libraries.
To use them, you simply need to go into
one of the use-case directories and run the program. The script will
look for queries and either parse them or evaluate them and compare
the output with the published result. Feel free to change the scripts.
test-parser simply calls the Kweelt parser for given query
(Quilt queries use the .qlt extension) and expects a
status code equal to 0 (the program should exit with exit value
0).
test-engine is more complex. It needs to evaluate the query
and compare it with the result published in the use-cases. In order to
compare two XML documents, we compute the hash (MD5) of a
document. Basically, we navigate the XML tree and hash all the
nodes. Attribute nodes are sorted in lexicographic order. Text nodes
are trimmed. The code of the handler can be found here.
In the spirit of the open-source movement, everyone is encouraged
to contribute to the project. As mentioned above Kweelt has been
designed in a modular way to make extensions easy. Here are some ideas
for contributions:
- libraries of Java functions that can be called from the language,
such as: number formatting and comparison, date formatting and
comparison, domain specific text manipulation functions.
- KSP extensions to take advantage of the servlet archictecture
(cookies, session ids). For instance, paramters could refer to cookie
values or other HTTP-specific information.
- new XML backends. Relational (MySQL, MiniSQL, Postgres,
etc.). Backends with full-text index capabilities. Backends for BIG
XML files.
- smart cache for XML documents and XML nodes (the default is to
cache all documents)
- query optimizer (even simple optimizations can lead to tremendous
gains)
- GUI for the query language
| Q: | Why does Kweelt require Java 1.2 |
| A: |
Because I am lazy. The only feature of Java 1.2 that Kweelt makes use
of is collection API and the sort method it provides. So, by
simply rewritting the code for sort, one could make it work
under Java 1.1. Java 1.2 also offers some nice features with
collections that do not have to incur the overhead of
synchrnonized. Since Kweelt needs to create many
lists, there is a potential benefit in using 1.2.
|
| Q: | What does Kweelt stand for |
| A: |
Nothing. It just sounds like "Quilt".
After further inquiry (thanks to Frank N.), it turns out that
'kweelen' is a dutch verb with two meanings (kweelt then is 3rd person
singular, like in 'he kweelt'):
1 - to sing in a beloving way; this is a word mostly used by poets,
normal people :-) use it in an ironic way, if I would say "he *kweelt*
a song" then it would be a hint that he sings very badly
2- another meaning is to suffer, but also this is not really used by
normal people.
|
| Q: | When I run a query I do not always get the same result in
terms of order |
| A: |
Order is major issue for XML query languages.
First the notion of order is not clearly defined. For instance, when
you union nodes from two different documents, what is the order? When
you union attribute nodes, what is the order?
Second, DOM does not provide a good way to compare two nodes. The only
way is to grab for both nodes their ancestors, reach the first common
node and from there determine the order by comparing child node
indices. The big problem is that this is a very expensive (in
terms of the depth of the document tree) operation. The good
news is that xacute.impl.xdom.NodeFactoryXerces takes
advantage of some internals of the Xerces implementation that provides
a comparison function between nodes for free (comparing two int). But
this is not DOM anymore.
The Kweelt strategy is ALWAYS to try to perform an ordered
union. When it cannot (because two nodes cannot be compared which
triggers an exception), Kweelt performs an un-ordered union (aka
append). Hopefully, you get the best result available for every
query. Using an ordered vs.un-ordered union also has some
consequences in terms of performance, especially if you have to use
DOM (pure DOM) to compare nodes.
|
| Q: | When I use SHALLOW, FILTER or when I nest RETURN
statements, I get some funny answers |
| A: |
The problem comes from the difference between physical and
new nodes. Physical nodes are nodes that exist in a physical
XML document and come from the parsing of this document. Their
behavior is managed by the NodeFactory and its underlying XML
back-end. New nodes are nodes that are freshly created out of the
query via element construct (XML tags you use in the RETURN
clause) and the SHALLOW and FILTER operators. If you
think about it, these nodes do not exist anywhere and we need to
create them. The behavior of these new nodes is defined in
the xacute.util.Quilt_xxxx_Node classes and not all the
navigation primitives have been implemented consistently. THIS IS A
SHORTCOMING OF KWEELT THAT WILL BE FIXED IN THE FUTURE.
|
| Q: | How can I stay informed? |
| A: | A mailing list should be set-up really soon. Meanwhile, simply
vivist the Kweelt webpage an a regular basis.
|
| Q: | How can I contribute? |
| A: | There are many ways to contribute to the project. Using Kweelt
is already a good way to contribute. Building applications that rely
on Kweelt is a good way to identify shortcomings and think about contributions.
|
- fix bugs
- improve error messages
- clean parser
- clean code (especially the Quilt_xxxx_Node which do not
have a very consistent behavior)
- make the code 1.1 compliant
-
things to do
Kweelt is the result of the joint work of 2 people (Laurent and
Arnaud) for a few weeks during the summer of 2000, in the Database
Research Group at the University of Pennsylvania. In parallel, some
work was being done (Thien-Loc) on wizdom, a file-based XML back-end
to be used inside Kweelt. The Kweelt work ended in a prototype running
most of the use-cases.
Since then, the prototype has been
completely rewritten (Arnaud) to be modular, run all the use-cases,
extend the language, provide KSP, etc.
Basically, these guys are the French Connection, also known as
the X-Men (to be pronounced "icks-men"). See below:
In order to make the fixing of bug easier and faster, a bug submitter
is encourged to provide the following information:
- Java version being used
- Kweelt version being used
- operating system
- example that illustrates the bug (query + XML document(s))
If you can, do not submit a complex example. Try to identify which
element of the query creates the bug.
Bug reports can be submitted here.
 |
|
The Official Kweelt Logo
|
You can browse the distribution on-line if you will. The filestructure
is available below.
If you just want to download some stuff, go to the download directory.
Project hosted by
