kNN Search on the Air

Sponsor Ling Liu / Bugra Gedik
{lingliu, bgedik}
216 / 263 CCB
Area Databases 

With the emergence of positioning technologies like GPS, popularity of mobile communications and the coming era of ubiquitous computing, location management has become an active area of research. An interesting problem in this domai n is the investigation of indexing and searching mechanisms that will enable energy efficient querying of location dependent information in broadcast environments.  One special case of this problem is what we call the "kNN Search on the Air" ;, which can be described as the problem of broadcasting a spatial index (an R-tree like structure in our case) on the wireless medium and querying this broadcast on a client device in order to answer k nearest neighbor (kNN) queries in an energy efficien t way. To illustrate the problem, consider a battlefield scenario where several mobile units ranging from soldiers to tanks, report their positions to a central server through a wireless channel. The server builds a spatial index on this position informat ion and broadcasts it to the battlefield. Then the mobile devices can tune in and process the broadcast to answer spatial queries on the broadcasted position information. One such query can be posted by a tank as: "Give me the positions and names of the 10 nearest friendly units".   

The student is expected to first read references [1] and [2] in order to get familiar with the ideas involved in and mechanisms used for indexing in broadcast environments. References [3], [4] and [5] are essential for understanding R-trees and kNN s earch on R-trees. It is important to note that the traditional kNN algorithms do not directly apply in broadcast environments as the access is sequential (we do not consider caching a complete broadcast cycle on the client device). As a result, the way an R-tree is broadcasted and searched is crucial and has implications on two important performance measures: access latency and tuning time (see [1] and [2]). For instance a breadth-first like (BFS-like) broadcast or a dept-first like (DFS-like) broadcast c an be used, as shown in Figure 1 below. In a BFS-like broadcast (see Figure 1a), explicit links can be used for each node in order to point the location of its child nodes and special markers can be used to seperate each layer. In a DFS-like broadcast (se e Figure 1b), explicit links can be used for each node in order to point the next node in DFS order after skipping the current branch and special markers can be used to indicate each backtack. These two diffrent organizations require two different search algorithms and may have different advantages in terms of the two performance measures, access latency and tuning time.

There are two main goals in this project:, each forming a seperate mini-project designed to be taken in order:

  1. Design Phase
    (a) Design two kNN search algorithms for BFS-like and DFS-like organizations. You can use the above descriptions or come up with your own DFS-like and BFS-like organizations.
    (b) Design a third organization which may or may not use an R-tree and de scribe its associated kNN search algorithm (This design is expected to be superior).
    (c) Present an argument why the third organization you designed may provide better access latency and tuning time.
  2. Development Phase
    (a) Implement (with your programming language of choice) the three organizations designed in phase 1 and their associated search algorithms.
    (b) Design and implement experiments using varying workloads and packet sizes and c ompare your three different broadcast organizations with their associated search algorithms, in terms of access latency and tuning time.

    [1] T. Imielinski, S. Viswanathan, and B.R. Badrinath, "Energy Efficient Indexing on Air". In Proceedings of the IEEE ICDE, 1994.
    [2] S. Hambrusch, C.-M. Liu, W. Aref, and S. Prabhakar, "Query Processing in Broadcasted Spatial Index Trees", SSTD 2001.
    [3] A. Guttman, "R-trees: A Dynamic Index Structure for Spatial Searching". In Proceedings of the ACM SIGMOD, 1984.
    [4] N. Roussopoulos, S. Kelley, F. Vincent, "Nearest Neighbor Queries". In Proceedings of the ACM SIGMOD, 1995.
    [5] T. Seidl, H. P. Kriegel, "Optimal Multi-Step k-Nearest Neighbor Search". In Proceedings of the ACM SIGMOD, 1998.

Some knowledge of spatial index structures, especially R-trees, is helpfull.

A report, describing your designs and findings (with references to related work) and source code for any implementation you have done to validate your ideas.

You will be graded on the novelty and quality of report and implementation.