Graphical XML Query Language
Abstract: This paper formulates an intuitive and simple, yet reasonably
powerful, graphical query language for selectively extracting information from
well-formed XML documents. We discuss the interface necessary for such a graphical
query language and provide interpretations for certain constructs (like the
joins of two XML documents) in the context of XML queries.
XML is set to revolutionize the way data is exchanged over the Internet. The immense activity in the research community has already resulted in at least six different query languages that have varying capabilities. Even though much work has been done on XML querying languages almost all of these have concentrated on providing a textual user interface.[1] We want to formulate in detail a language that allows us to express queries graphically. The result of these queries will be well-formed XML documents that conform to a new xml schema. Our graphical query language will allow us to specify condition(s) on every terminal element. It will allow you to do string matching queries for String-type terminals and range queries for Number-type terminals. (Hence it is imperative that we specify types in our schema or DTD)
Following is a sample DTD and XML document that is used as an example throughout this paper[2]. The XML document describes book orders. It consists of items, a ship-to address, optional contact information and the order date. We will generate an interface to query an XML document following this XML DTD.
<!ELEMENT order (shipto, contact?, item+, date)>
<!ATTLIST order number PCDATA #REQUIRED>
<!ELEMENT shipto (fulladdress|reference)>
<!ELEMENT contact (reference|PCDATA)>
<!ELEMENT fulladdress (company?, city, addressline+)>
<!ELEMENT reference EMPTY>
<!ATTLIST reference customer IDREF>
<!ELEMENT person (firstname?,lastname,fulladdress)>
<!ATTLIST person id ID>
<!ELEMENT company PCDATA>
<!ELEMENT addressline PCDATA>
<!ELEMENT city PCDATA>
<!ELEMENT date (day, month, year)>
<!ELEMENT day PCDATA>
<!ELEMENT month PCDATA>
<!ELEMENT year PCDATA>
<!ELEMENT item (book, quantity, discount?)>
<!ELEMENT book (isbn,title?,price,author*)>
<!ELEMENT author (firstname?,lastname)>
<!ELEMENT firstname PCDATA>
<!ELEMENT lastname PCDATA>
<!ELEMENT isbn PCDATA>
<!ELEMENT title PCDATA>
<!ELEMENT price PCDATA>
<!ELEMENT quantity PCDATA>
<!ELEMENT discount PCDATA>
PS: PCDATA should actually represent type information. We should ideally be using XML Schemas instead of XML DTDs.
Figure 1
<?xml version="1.0" standalone="no" encoding="UTF-8"?>
<DOCTYPE ORDER SYSTEM "order.dtd">
<ORDER number=1>
<SHIPTO><REFERENCE customer="C00001"></REFERENCE></SHIPTO>
<CONTACT>Tim Bell</CONTACT>ITEM>
<DATE><DAY>14</DAY><MONTH>11</MONTH><YEAR>1998</YEAR></DATE>
<ITEM>
<BOOK><ISBN>15536455</ISBN>
<TITLE>Introduction to XML</TITLE>
<PRICE>24.95</PRICE>
<AUTHOR><FIRSTNAME>Charles</FIRSTNAME>
<LASTNAME>Porter</LASTNAME></AUTHOR>
</BOOK>
<QUANTITY>6</QUANTITY>
<DISCOUNT>.40</DISCOUNT>
</ITEM>
<ITEM>
<BOOK><ISBN>15532155</ISBN>
<TITLE>Introduction to Internet</TITLE>
<PRICE>22.50</PRICE>
<AUTHOR><FIRSTNAME>Steve</FIRSTNAME>
<LASTNAME>Andrews</LASTNAME></AUTHOR>
</BOOK>
<QUANTITY>10</QUANTITY>
<DISCOUNT>.42</DISCOUNT>
</ITEM>
</ORDER>
<ORDER number=2>
<SHIPTO>
<FULLADDRESS><COMPANY>ASA</COMPANY><CITY>Los Angeles</CITY>
<ADDRESSLINE>18 Harvard str.</ADDRESSLINE>
</FULLADDRESS>
</SHIPTO>
<CONTACT><REFERENCE customer="C00002"></REFERENCE></CONTACT>
<DATE><DAY>20</DAY><MONTH>11</MONTH><YEAR>1998</YEAR></DATE>
<ITEM>
<BOOK><ISBN>15536455</ISBN>
<TITLE>Introduction to XML</TITLE>
<PRICE>24.95</PRICE>
<AUTHOR><FIRSTNAME>Charles</FIRSTNAME>
<LASTNAME>Porter</LASTNAME></AUTHOR>
</BOOK>
<QUANTITY>6</QUANTITY>
<DISCOUNT>.40</DISCOUNT>
</ITEM>
<ITEM>
<BOOK><ISBN>15532155</ISBN>
<TITLE>Introduction to Internet</TITLE>
<PRICE>22.50</PRICE>
<AUTHOR><FIRSTNAME>Steve</FIRSTNAME>
<LASTNAME>Andrews</LASTNAME></AUTHOR>
</BOOK>
<QUANTITY>10</QUANTITY>
<DISCOUNT>.42</DISCOUNT>
</ITEM>
</ORDER>
<PERSON id="C00001">
<FIRSTNAME>Robert</FIRSTNAME>
<LASTNAME>Moore</LASTNAME>
<FULLADDRESS><COMPANY>ABC</COMPANY><CITY>Los Angeles</CITY>
<ADDRESSLINE>10 Michigan str.</ADDRESSLINE>
</FULLADDRESS>
</PERSON>
<PERSON id="C00002">
<FIRSTNAME>Tom</FIRSTNAME>
<LASTNAME>Smith</LASTNAME>
<FULLADDRESS><COMPANY>ASA</COMPANY><CITY>Los Angeles</CITY>
<ADDRESSLINE>18 Harvard str.</ADDRESSLINE>
</FULLADDRESS>
</PERSON>
<PERSON id="C00003">
<FIRSTNAME>Steve</FIRSTNAME>
<LASTNAME>Andrews</LASTNAME>
<FULLADDRESS><CITY>San Francisco</CITY>
<ADDRESSLINE>15 Washington str.</ADDRESSLINE>
</FULLADDRESS>
</PERSON>
Figure 2
The XML schema(s) is used to create a graphical query interface. Here are the general rules that are followed when creating the interface:
1. Each complex type element is contained within a colored box.
2. For every string, there appears a drop-down menu with options {IS,LIKE} (the LIKE is for wildcard matches)
3. For every number, there appears a set of operator,operand pairs.
4. Along with every condition appears a MORE button that allows you to specify more conditions for that attribute or terminal type.
5. To specify join attributes, you can color the two attributes the same color.
Sample GUI derived from the schema in Figure 1 above.
How are Range Queries handled in this graphical interface?
As mentioned earlier, we need type information in the XML Schema/DTD. So when the actual mapping from the XML document to the graphical interface will be done, the appropriate conditional fields will be generated for each terminal element.
Now when the user wants to specify a query on a field, which is of type “number”, he would be prompted by a default box of the format:
number:
Say we just want to specify one condition for the query, e.g number = 30, we will do:
number:
Clicking on MORE lets the USER add another set of query. This new set can be either OR’ed or And’ed with the previous query. This feature allows the user to generate complex range queries.
For instance:
number:
Note: Say, we are building a very complex (long) query and as programmers we introduce braces for sake of clarity. On first look at this way
of query specifying, we might think that it would result in improper results. But that is not true, we certainly know from theory that every
complex set of query can be broken down to a plain query without any braces with meaning preserved.
A similar mechanism is used for querying strings.
contact:
Clicking the MORE button in this case presents you with the option to create complex queries on string types.
So what do queries executed on
this interface actually mean? To explain this we include examples that illustrate
the three main relational algebra operators: selection, projection and
join
- Link to example 1
- Link to example 1 results
QUERY: Find Orders containing the book titled, "Introduction to XML", to be shipped to an address in Los Angeles, and return the shipping and item information of matching orders.
DETAILS: The selection conditions are: order.shipto.fulladdress.city = ‘Los Angeles’ AND order.item.book.title = ‘Introduction to XML’. The projection elements are: order.shipto and order.item. We will assume that whenever an element is chosen for projection, we will include in the result the immediately next level element. So in our example, the result would include the tags <shipto> and <item> enclosed in the tag immediately one-level above it, namely, <order>
In keeping with the graphical nature of this query language, even the results are returned as graphical boxes (see example) along with the resulting XML document.
- Link to example 2
- Link to example 1 results
QUERY: Join the order entities and person entities on the join fields: order.item.book.author.lastname = person.lastname. Then return all the books where person.firstname starts with ‘S’.
DETAILS: The join condition is: order.item.book.author.lastname = person.lastname. The order in which the join is performed is very important. If the person entity was subordinated under the order entity, then the resulting XML document would look like:
<ORDER number=1> <SHIPTO><REFERENCE customer="C00001"></REFERENCE></SHIPTO> <CONTACT>Tim Bell</CONTACT>ITEM> <DATE><DAY>14</DAY><MONTH>11</MONTH><YEAR>1998</YEAR></DATE> <ITEM> <BOOK><ISBN>15536455</ISBN> <TITLE>Introduction to XML</TITLE> <PRICE>24.95</PRICE> <AUTHOR><FIRSTNAME>Charles</FIRSTNAME> <LASTNAME>Porter</LASTNAME></AUTHOR> </BOOK> <QUANTITY>6</QUANTITY> <DISCOUNT>.40</DISCOUNT> </AUTHOR> </BOOK> <QUANTITY>6</QUANTITY> <DISCOUNT>.40</DISCOUNT> </ITEM> <ITEM> <BOOK><ISBN>15532155</ISBN> <TITLE>Introduction to Internet</TITLE> <PRICE>22.50</PRICE> <AUTHOR><FIRSTNAME>Steve</FIRSTNAME> <Person.FIRSTNAME>Steve</Person.FIRSTNAME> <LASTNAME>Andrews</LASTNAME>
<CITY>San Francisco</CITY> <ADDRESSLINE>15 Washington str.</ADDRESSLINE> </FULLADDRESS> </AUTHOR> </BOOK> <QUANTITY>10</QUANTITY> <DISCOUNT>.42</DISCOUNT> </ITEM> </ORDER> <ORDER number=2> <SHIPTO> <FULLADDRESS><COMPANY>ASA</COMPANY><CITY>Los Angeles</CITY> <ADDRESSLINE>18 Harvard str.</ADDRESSLINE> </FULLADDRESS> </SHIPTO> <CONTACT><REFERENCE customer="C00002"></REFERENCE></CONTACT> <DATE><DAY>20</DAY><MONTH>11</MONTH><YEAR>1998</YEAR></DATE> <ITEM> <BOOK><ISBN>15536455</ISBN> <TITLE>Introduction to XML</TITLE> <PRICE>24.95</PRICE> <AUTHOR><FIRSTNAME>Charles</FIRSTNAME> <LASTNAME>Porter</LASTNAME></AUTHOR> </BOOK> <QUANTITY>6</QUANTITY> <DISCOUNT>.40</DISCOUNT> </ITEM> <ITEM> <BOOK><ISBN>15532155</ISBN> <TITLE>Introduction to Internet</TITLE> <PRICE>22.50</PRICE> <AUTHOR><FIRSTNAME>Steve</FIRSTNAME> <Person.FIRSTNAME>Steve</Person.FIRSTNAME> <LASTNAME>Andrews</LASTNAME></AUTHOR> <FULLADDRESS> <CITY>San Francisco</CITY> <ADDRESSLINE>15 Washington str.</ADDRESSLINE> </FULLADDRESS> </BOOK> <QUANTITY>10</QUANTITY> <DISCOUNT>.42</DISCOUNT> </ITEM> </ORDER>
|
Conversely, if we subordinated the order entity under the person entity, we would get the following undesirable result:
<PERSON id="C00003"> <FIRSTNAME>Steve</FIRSTNAME> <order.FIRSTNAME>Steve</order.FIRSTNAME> <LASTNAME>Andrews</LASTNAME> <FULLADDRESS><CITY>San Francisco</CITY> <ADDRESSLINE>15 Washington str.</ADDRESSLINE> </FULLADDRESS> </PERSON> <PERSON id="C00003"> <FIRSTNAME>Steve</FIRSTNAME> <order.FIRSTNAME>Steve</order.FIRSTNAME> <LASTNAME>Andrews</LASTNAME> <FULLADDRESS><CITY>San Francisco</CITY> <ADDRESSLINE>15 Washington str.</ADDRESSLINE> </FULLADDRESS> </PERSON>
|
In order to specify the order of the join, when we color the join elements with the same color, we can specify (using a checkbox) which entity is the Master and all the other elements are subordinated under the Master.
The selection condition is: person.firstname LIKE ‘S%’.
The projection element is: order.item.book. Once again, we will assume that whenever an element is chosen for projection, we will include in the result the immediately next level element. So in our example, the result would include the tag <book> enclosed in the tag immediately one-level above it, namely, <item>
This is a work-in-progress. We have only attempted to describe the semantics of the graphical query language leaving the implementation details as a separate task on its own.
Authors:
Ankur Gupta ankur@gw.total-web.net
Zahid Khan zahid@cc.gatech.edu
[1] An exception to this is XML-GL (http://xerox.elet.polimi.it/Xml-gl/documents/paperwww8.html)
[2] This example is taken from XML-GL (http://xerox.elet.polimi.it/Xml-gl/documents/paperwww8.html)