Efficient Search Techniques in Peer-to-Peer Data Sharing Systems


Sponsor

Ling Liu
lingliu@cc.gatech.edu
CCB 221

Area

Systems and Databases


Problem
Peer-to-Peer  (P2P) systems have emerged as a popular way to share huge volume of data across the Internet. An important issue in such systems is to support trusted computing.  One of the key issues in data-sharing P2P systems is the techniques for efficient search and retrieval of data distributed in the P2P overlay networks. Examples include Gnutella, Freenet, Morpheus, and Napster. These systems focus on data sharing and often have a loosely controlled data placement scheme and thus they provide lesser guarantee for availability or persistence of the data.

In contrast, Chord, Pastry, Tapestry, CAN are commonly seen as having a somewhat tighter control over the data placement and topology within the P2P overlay network. One unique characteristics of these tightly controlled P2P systems is that they guarantee location of content, if it exists, within a bounded number of hops.

In terms of search techniques, the loosely controlled P2P systems tend to provide a wider range of search capabilities such as search key (search object) identifier or keyword based search; whereas the tightly controlled P2P systems so far only support search by the identifier of the search key. It is well-known that search in current loosely controlled P2P systems, such as Breadth-first search (BFS) used in Gnutella or depth-first search (DFS) used in Freenet is not efficient.

This project is designed to investigate potential techniques to improve search efficiency in such P2P systems can be classified into a number of categories, such as index-based, selective BFS, a hybrid of BFS and DFS. For index-based solutions, a design decision needs to be made to go for local indexes or global indexes. For selective BFS, a design decision needs to be made on selection criteria, such as neighbor nodes that have the shortest message queue, or that return the best results (e.g., lowest number of hops needed to answer the previous queries, highest number of results, or highest quality of results in terms of trust, performance, or reliability). Finally one example for a hybrid of BFS and DFS is the well-known technique, called iterative deepening, widely used for search over a state space in AI. A summary of these techniques can be found in [5].

We define seven different mini-projects on efficient search in P2P networks. The first, third, and fifth are dedicated to the design issues and the second, fourth, and sixth are focused on experimental evaluation of the solutions proposed in the first, third or fifth mini-project respectively. The seventh project is an independent study project. You may choose any one of the seven projects as a cs7001 mini-project.

1.      Design 2-3 potential techniques based on selective BFS for efficient search and retrieval of data in P2P systems in general. Design an evaluation model that can be used to validate the proposed solutions.

2.      This is a continuation of the first mini-project. Please use Gnutella as the testbed to evaluate the proposed solutions and compare with the BFS used in Gnutella. The experiments should focus on studying the sensitiveness and boundaries of the proposed techniques.

3.      Design 2-3 potential techniques based on a hybrid of BFS and DFS in general. Design an evaluation model that can be used to validate the proposed solutions.

4.      This is a continuation of the third mini-project. Please use Gnutella as the testbed to evaluate the proposed solutions and compare with the BFS used in Gnutella. The experiments should focus on studying the sensitiveness and boundaries of the proposed techniques.

5.      Design 2-3 potential techniques based on index-based solutions for efficient search and retrieval of data in P2P systems in general. Design an evaluation model that can be used to validate the proposed solutions

6.      This is a continuation of the fifth mini-project. Please use Gnutella as the testbed to evaluate the proposed solutions and compare with the BFS used in Gnutella. The experiments should focus on studying the sensitiveness and boundaries of the proposed techniques.

7.      Can tightly controlled P2P systems support searches beyond the identifier-based approach? Propose a solution if your answer is yes. Provide a proof if your answer is no.

References

  1. Gnutella. http://gnutella.wego.com
  2. Freenet. http://freenet.sourceforge.net
  3. Morpheus. http://www.musiccity.com
  4. Napster. http://www.napster.com
  5. B. Yang and H. Carcia-Molina. Efficient Search in Peer-to-Peer Networks, ICDCS 2002

Deliverables

Evaluation
Based on the quality of the report and the appropriateness of the list of references turned in by the due date