Graphical XML Query Language

 

Abstract: This paper formulates an intuitive and simple, yet reasonably powerful, graphical query language for selectively extracting information from well-formed XML documents. We discuss the interface necessary for such a graphical query language and provide interpretations for certain constructs (like the joins of two XML documents) in the context of XML queries.

 

1          Introduction:

XML is set to revolutionize the way data is exchanged over the Internet. The immense activity in the research community has already resulted in at least six different query languages that have varying capabilities. Even though much work has been done on XML querying languages almost all of these have concentrated on providing a textual user interface.[1] We want to formulate in detail a language that allows us to express queries graphically. The result of these queries will be well-formed XML documents that conform to a new xml schema. Our graphical query language will allow us to specify condition(s) on every terminal element. It will allow you to do string matching queries for String-type terminals and range queries for Number-type terminals. (Hence it is imperative that we specify types in our schema or DTD)

 

2          Defining an example:

Following is a sample DTD and XML document that is used as an example throughout this paper[2]. The XML document describes book orders. It consists of items, a ship-to address, optional contact information and the order date.  We will generate an interface to query an XML document following this XML DTD.

 

<!ELEMENT order (shipto, contact?, item+, date)>
<!ATTLIST order number PCDATA #REQUIRED>
<!ELEMENT shipto (fulladdress|reference)>
<!ELEMENT contact (reference|PCDATA)>
<!ELEMENT fulladdress (company?, city, addressline+)>
<!ELEMENT reference EMPTY>
<!ATTLIST reference customer IDREF>
<!ELEMENT person (firstname?,lastname,fulladdress)>
<!ATTLIST person id ID>
<!ELEMENT company PCDATA>
<!ELEMENT addressline PCDATA>
<!ELEMENT city PCDATA>
<!ELEMENT date (day, month, year)>
<!ELEMENT day PCDATA>
<!ELEMENT month PCDATA>
<!ELEMENT year PCDATA>
<!ELEMENT item (book, quantity, discount?)>
<!ELEMENT book (isbn,title?,price,author*)>
<!ELEMENT author (firstname?,lastname)>
<!ELEMENT firstname PCDATA>
<!ELEMENT lastname PCDATA>
<!ELEMENT isbn PCDATA>
<!ELEMENT title PCDATA>
<!ELEMENT price PCDATA>
<!ELEMENT quantity PCDATA>
               <!ELEMENT discount PCDATA>
 
        PS: PCDATA should actually represent type information. We should ideally be using XML Schemas instead of XML DTDs.
 
Figure 1
 
        <?xml version="1.0" standalone="no" encoding="UTF-8"?>
<DOCTYPE ORDER SYSTEM "order.dtd">
 
<ORDER number=1>
 <SHIPTO><REFERENCE customer="C00001"></REFERENCE></SHIPTO>
 <CONTACT>Tim Bell</CONTACT>ITEM>
 <DATE><DAY>14</DAY><MONTH>11</MONTH><YEAR>1998</YEAR></DATE>
 <ITEM>
  <BOOK><ISBN>15536455</ISBN>
    <TITLE>Introduction to XML</TITLE>
    <PRICE>24.95</PRICE>
    <AUTHOR><FIRSTNAME>Charles</FIRSTNAME>
            <LASTNAME>Porter</LASTNAME></AUTHOR>
  </BOOK>
  <QUANTITY>6</QUANTITY>
  <DISCOUNT>.40</DISCOUNT>
 </ITEM>
 <ITEM>
  <BOOK><ISBN>15532155</ISBN>
    <TITLE>Introduction to Internet</TITLE>
    <PRICE>22.50</PRICE>
    <AUTHOR><FIRSTNAME>Steve</FIRSTNAME>
            <LASTNAME>Andrews</LASTNAME></AUTHOR>
  </BOOK>
  <QUANTITY>10</QUANTITY>
  <DISCOUNT>.42</DISCOUNT>
 </ITEM>
</ORDER>
 
<ORDER number=2>
 <SHIPTO>
  <FULLADDRESS><COMPANY>ASA</COMPANY><CITY>Los Angeles</CITY>
    <ADDRESSLINE>18 Harvard str.</ADDRESSLINE>
  </FULLADDRESS>
 </SHIPTO>
 <CONTACT><REFERENCE customer="C00002"></REFERENCE></CONTACT>
 <DATE><DAY>20</DAY><MONTH>11</MONTH><YEAR>1998</YEAR></DATE>
 
  <ITEM>
  <BOOK><ISBN>15536455</ISBN>
    <TITLE>Introduction to XML</TITLE>
    <PRICE>24.95</PRICE>
    <AUTHOR><FIRSTNAME>Charles</FIRSTNAME>
            <LASTNAME>Porter</LASTNAME></AUTHOR>
  </BOOK>
  <QUANTITY>6</QUANTITY>
  <DISCOUNT>.40</DISCOUNT>
 </ITEM>
 <ITEM>
  <BOOK><ISBN>15532155</ISBN>
    <TITLE>Introduction to Internet</TITLE>
    <PRICE>22.50</PRICE>
    <AUTHOR><FIRSTNAME>Steve</FIRSTNAME>
            <LASTNAME>Andrews</LASTNAME></AUTHOR>
  </BOOK>
  <QUANTITY>10</QUANTITY>
  <DISCOUNT>.42</DISCOUNT>
 </ITEM>
</ORDER>
 
<PERSON id="C00001">
 <FIRSTNAME>Robert</FIRSTNAME>
 <LASTNAME>Moore</LASTNAME>
 <FULLADDRESS><COMPANY>ABC</COMPANY><CITY>Los Angeles</CITY>
   <ADDRESSLINE>10 Michigan str.</ADDRESSLINE>
 </FULLADDRESS>
</PERSON>
<PERSON id="C00002">
 <FIRSTNAME>Tom</FIRSTNAME>
 <LASTNAME>Smith</LASTNAME>
 <FULLADDRESS><COMPANY>ASA</COMPANY><CITY>Los Angeles</CITY>
   <ADDRESSLINE>18 Harvard str.</ADDRESSLINE>
 </FULLADDRESS>
</PERSON>
<PERSON id="C00003">
 <FIRSTNAME>Steve</FIRSTNAME>
 <LASTNAME>Andrews</LASTNAME>
 <FULLADDRESS><CITY>San Francisco</CITY>
   <ADDRESSLINE>15 Washington str.</ADDRESSLINE>
 </FULLADDRESS>
</PERSON>
Figure 2
 

3          The Query Language:

 

The XML schema(s) is used to create a graphical query interface. Here are the general rules that are followed when creating the interface:

1.      Each complex type element is contained within a colored box.

2.      For every string, there appears a drop-down menu with options {IS,LIKE} (the LIKE is for wildcard matches)

3.      For every number, there appears a set of operator,operand pairs.

4.      Along with every condition appears a MORE button that allows you to specify more conditions for that attribute or terminal type.

5.      To specify join attributes,  you can color the two attributes the same color.

 

Sample GUI derived from the schema in Figure 1 above.

 

3.1          QUERYING NUMBERS:

 

How are Range Queries handled in this graphical interface?

 

As mentioned earlier, we need type information in the XML Schema/DTD. So when the actual mapping from the XML document to the graphical interface will be done, the appropriate conditional fields will be generated for each terminal element.

 

Now when the user wants to specify a query on a field, which is of type “number”, he would be prompted by a default box of the format:

 

number:

 

Say we just want to specify one condition for the query, e.g number = 30, we will do:

 

number:

 

Clicking on MORE lets the USER add another set of  query. This new set can be either OR’ed or And’ed with the previous query. This feature allows the user to generate complex range queries. 

 

For instance:

                                number:    

 

    Note: Say, we are building a very complex (long) query and as programmers we introduce braces for sake of clarity. On first look at this way

              of query specifying, we might think that it would result in improper results. But that is not true, we certainly know from theory that every

               complex set of query can be broken down to a plain query without any braces with meaning preserved.

3.2          QUERYING STRINGS

 

A similar mechanism is used for querying strings.

 

contact:

Clicking the MORE button in this case presents you with the option to create complex queries on string types.

 

3.3          Graphical Query Examples

 

So what do queries executed on this interface actually mean? To explain this we include examples that illustrate the three main relational algebra operators: selection, projection and join

 

  1. Example 1:

-         Link to example 1

-         Link to example 1 results

 

QUERY: Find Orders containing the book titled, "Introduction to XML", to be shipped to an address in Los Angeles, and return the shipping and item information of matching orders.

 

DETAILS: The selection conditions are: order.shipto.fulladdress.city = ‘Los Angeles’ AND order.item.book.title = ‘Introduction to XML’. The projection elements are: order.shipto and order.item. We will assume that whenever an element is chosen for projection, we will include in the result the immediately next level element. So in our example, the result would include the tags <shipto> and <item> enclosed in the tag immediately one-level above it, namely, <order>

 

In keeping with the graphical nature of this query language, even the results are returned as graphical boxes (see example) along with the resulting XML document.

 

  1. Example 2:

-         Link to example 2

-         Link to example 1 results

 

QUERY: Join the order entities and person entities on the join fields: order.item.book.author.lastname = person.lastname. Then return all the books where person.firstname starts with ‘S’.

DETAILS: The join condition is: order.item.book.author.lastname = person.lastname. The order in which the join is performed is very important. If the person entity was subordinated under the order entity, then the resulting XML document would look like:

 

<ORDER number=1>
 <SHIPTO><REFERENCE customer="C00001"></REFERENCE></SHIPTO>
 <CONTACT>Tim Bell</CONTACT>ITEM>
 <DATE><DAY>14</DAY><MONTH>11</MONTH><YEAR>1998</YEAR></DATE>
 <ITEM>
  <BOOK><ISBN>15536455</ISBN>
    <TITLE>Introduction to XML</TITLE>
    <PRICE>24.95</PRICE>
    <AUTHOR><FIRSTNAME>Charles</FIRSTNAME>
            <LASTNAME>Porter</LASTNAME></AUTHOR>
  </BOOK>
  <QUANTITY>6</QUANTITY>
  <DISCOUNT>.40</DISCOUNT>
    </AUTHOR>
  </BOOK>
  <QUANTITY>6</QUANTITY>
  <DISCOUNT>.40</DISCOUNT>
 </ITEM>
 <ITEM>
  <BOOK><ISBN>15532155</ISBN>
    <TITLE>Introduction to Internet</TITLE>
    <PRICE>22.50</PRICE>
    <AUTHOR><FIRSTNAME>Steve</FIRSTNAME>
            <Person.FIRSTNAME>Steve</Person.FIRSTNAME>
            <LASTNAME>Andrews</LASTNAME>
                <CITY>San Francisco</CITY>
                <ADDRESSLINE>15 Washington str.</ADDRESSLINE>
            </FULLADDRESS>
 
    </AUTHOR>
  </BOOK>
  <QUANTITY>10</QUANTITY>
  <DISCOUNT>.42</DISCOUNT>
 </ITEM>
</ORDER>
 
<ORDER number=2>
 <SHIPTO>
  <FULLADDRESS><COMPANY>ASA</COMPANY><CITY>Los Angeles</CITY>
    <ADDRESSLINE>18 Harvard str.</ADDRESSLINE>
  </FULLADDRESS>
 </SHIPTO>
 <CONTACT><REFERENCE customer="C00002"></REFERENCE></CONTACT>
 <DATE><DAY>20</DAY><MONTH>11</MONTH><YEAR>1998</YEAR></DATE>
 
  <ITEM>
  <BOOK><ISBN>15536455</ISBN>
    <TITLE>Introduction to XML</TITLE>
    <PRICE>24.95</PRICE>
    <AUTHOR><FIRSTNAME>Charles</FIRSTNAME>
            <LASTNAME>Porter</LASTNAME></AUTHOR>
  </BOOK>
  <QUANTITY>6</QUANTITY>
  <DISCOUNT>.40</DISCOUNT>
 </ITEM>
 <ITEM>
  <BOOK><ISBN>15532155</ISBN>
    <TITLE>Introduction to Internet</TITLE>
    <PRICE>22.50</PRICE>
    <AUTHOR><FIRSTNAME>Steve</FIRSTNAME>
            <Person.FIRSTNAME>Steve</Person.FIRSTNAME>
            <LASTNAME>Andrews</LASTNAME></AUTHOR>
            <FULLADDRESS>
                <CITY>San Francisco</CITY>
                <ADDRESSLINE>15 Washington str.</ADDRESSLINE>
            </FULLADDRESS>
  </BOOK>
  <QUANTITY>10</QUANTITY>
  <DISCOUNT>.42</DISCOUNT>
 </ITEM>
</ORDER>
 

 

 

Conversely, if we subordinated the order entity under the person entity, we would get the following undesirable result:

 

 
 
<PERSON id="C00003">
 <FIRSTNAME>Steve</FIRSTNAME>
 <order.FIRSTNAME>Steve</order.FIRSTNAME>
 <LASTNAME>Andrews</LASTNAME>
 <FULLADDRESS><CITY>San Francisco</CITY>
   <ADDRESSLINE>15 Washington str.</ADDRESSLINE>
 </FULLADDRESS>
</PERSON>
 
<PERSON id="C00003">
 <FIRSTNAME>Steve</FIRSTNAME>
 <order.FIRSTNAME>Steve</order.FIRSTNAME>
 <LASTNAME>Andrews</LASTNAME>
 <FULLADDRESS><CITY>San Francisco</CITY>
   <ADDRESSLINE>15 Washington str.</ADDRESSLINE>
 </FULLADDRESS>
</PERSON>
 

 

 

 

In order to specify the order of the join, when we color the join elements with the same color, we can specify (using a checkbox) which entity is the Master and all the other elements are subordinated under the Master.

 

The selection condition is: person.firstname LIKE ‘S%’.

 

The projection element is: order.item.book. Once again, we will assume that whenever an element is chosen for projection, we will include in the result the immediately next level element. So in our example, the result would include the tag <book> enclosed in the tag immediately one-level above it, namely, <item>

4          Conclusion:

This is a work-in-progress. We have only attempted to describe the semantics of the graphical query language leaving the implementation details as a separate task on its own.

 

 

Authors:

Ankur Gupta ankur@gw.total-web.net

Zahid Khan zahid@cc.gatech.edu



[1] An exception to this is XML-GL (http://xerox.elet.polimi.it/Xml-gl/documents/paperwww8.html)

[2] This example is taken from XML-GL (http://xerox.elet.polimi.it/Xml-gl/documents/paperwww8.html)