


Open Source XML Databases? 19
tarun asks: "I am creating the next version of my open source UDDI registry and decided to use an XML database backend - if I can find any good ones. The reason to make this choice was that I was impressed by oracle and db2's xml capabilities in my past lives. However, when I tried looking for an open-source alternative it seems there is nothing around except perhaps xindice -which clearly is less then perfect. I am looking for something that can work with more than one existing databases (I will ship my software with MySQL but if a large organization wants to deploy it, it should be able to do it using Oracle, DB2 or whatever they want to use) and xindice currently only works with Berkley-DB. Also, I am looking for something that can create database tables for me given an XML schema (I can tweak it later to create indexes, stored procedures etc) and given an XML document - write it to these tables. If it supports something standard like Xupdate or XQuery, that is even better."
"There are some other XML Database projects but either they have too few features or are not open-source. What is the XML-aware portion of the Slashdot community using? Have you ever run across such problems? Do you guys create your database schemas by pain-stackingly copying every element in every XML schema you have to handle to database tables and write huge amount of parsing/deparsing code both ways?"
Right here, baby (Score:1, Informative)
Re:Right here, baby (Score:3, Informative)
XPath support XPath support is still preliminary. Some functions and numerical operators are missing and only abbreviated XPath syntax is supported. The parser has also some problems to recognize the full range of unicode characters. I have started to write a new XPath parser (using JavaCC instead of ANTLR) to overcome these limitations.
XUpdate The basic model has been designed to provide efficient, index based retrieval. As a drawback, eXist does currently not support direct manipulations of the DOM tree like node insertions or removals. A document always has to be deleted or updated as a whole.
This is clearly a major restriction for applications which need to directly manipulate the DOM tree. Such applications have to create a new document (as XSLT does) and insert this into the DB after all transformations are done. Documents should be kept small to easily reinsert them whenever they change.
DOM manipulation methods and XUpdate are planned for one of the next releases
Use an XMLJavaDB toolkit (Score:4, Informative)
Bad Link (Score:1)
Why XML? (Score:4, Interesting)
Re:Why XML? (Score:1, Insightful)
Re:Why XML? (Score:5, Informative)
The second way of handling XML in an RDBMS is to store the document as a CLOB. Storing it as a CLOB has the advantage of solving the two above issues, but introduces one of its own; You can't query the data that is represented by the CLOB because it is all stored in a single column. This means you have to extract the document from the CLOB and parse it before being able to use any of the data. Some databases now have built in XML parsers so you can do this from stored procedures and combine the XML document with tabular data, but the performance sucks.
I do cover why you would want to use an XML database and how to use Xindice in an article I wrote for DevX that can be found here [devx.com].
Re:Why XML? (Score:1)
One other question, and forgive me if this is naiive because I'm still thinking in relations. Can an XML database generate XML dynamically from other documents?
Say I have an RDBMS with three tables: STUDENTS, CLASSES, ENROLLMENTS. I can join STUDENTS and ENROLLMENTS to give me all of the students taking a particular class. How does an XML database handle that? How do I create relationships between the documents?
Re:Why XML? (Score:1)
(don't get too much into my implementation, this is just an example)
If you have students belonging to many classes then you might have a STUDENT tag with many CLASS tags with CLASSID attributes. An XML document for these students would contain many STUDENTs. And if later you decide those students should be broken up in dormatorys then you might arrange STUDENTS under DORMATORY tags.
RDBMS aren't good at the later idea... Taking data and moving it around in a tree. RDBMS's can't deal with tree's very well at all. Often one you get back to a tree you'll find that what took many tables is really just a way of representing many flat parts of tree.
It doesn't suit everything, and I'd imagine that for databases they'd be RDBMS too - the XML document would be a table-type.
You'd use XPATH to select a node on the XML tree. But this is a lightweight method, and it only suits simple situations. XQuery is for anything more complex.
For an XPATH query if you wanted a student who had an attribute of studentID="4" then the query might look like "student/@studentID=4".
Re:Why XML? (Score:1)
One forms data relationships in the same way as before. Just mark part of one tree as relating to part of another. That hasn't really changed (from what I've read).
Re:Why XML? (Score:3, Interesting)
Re:Why XML? (Score:2)
Basically, yes. "XML database" doesn't mean much about the database itself, unless you mean that the file format used to store the data is XML (which is pretty much uninteresting, except for being fairly braindead for many sorts of data sets). It tells you nothing about how, logically, the data is organized and what operations it supports (which is what "object database", "relational database", "hierarchical database", etc. attempt to convey), which is generally what a programmer using the database is _most_ interested in.
It may mean that data is presented in XML at query time and XML queries are accepted; if so, that's a moderately more interesting claim but really speaks to a database interface (a la JDBC or pydb) rather than anything interesting about the database itself. Which is not to be dismissed, but formatting results as XML is trivial compared to having to implement e.g. relations in code (for instance), and that sort of interface can be added to any kind of underlying database.
It doesn't really speak at all about the on-disk storage structures (even if data is "stored as XML"), which is often the most interesting thing from a performance standpoint and often interesting from a usability standpoint (e.g. "can efficiently store data in the existing native filesystem" is often mandatory for non-dedicated applications).
Sumner
Umm..no (Score:3, Informative)
No way (Score:1)
What's wrong with Xindice? (Score:3, Insightful)
In Xindice, XML documents are stored in collections that have filers associated with them that do the actually storage. Xindice provides a Filer interface as well as several different implementations both in-memory and persistent. Additionally, it would be quite trival to implement a DBFiler that stored the data in an RDBMS.
So again I ask, what is wrong with Xindice?
What do you want? (Score:1, Interesting)
From the sound of it... (Score:2, Informative)
It sounds like what your looking for is really an object-relational wrapper rather than an XML based RDBMS. Take a look at OJB [apache.org] and see if it is not really what you are seeking.