CS 8802 - Algorithms for Complex Networks

Spring 2005


Description

We will focus on algorithmic questions arizing in networks driving today's technology: the Internet, the WWW, P2P networks, ad-hoc and sensor networks, and biological networks at the genome level (gene and protein interactions). These networks are expressed "locally", by their nodes (data points) and links (relationships between two data points). How do these networks organize to execute the fundamental tasks that they are formed to perform? How can formal models and algorithms infer their semantics? How can algorithm design methods improve their performance?

This special topics course will meet once a week for 2 hours, during which time we will focus and discuss a topic. In collaboration with the instructor, each student will be responsible for leading one of the discussions. Students will also do one project/term paper, in groups of 2-3, meet with the instructor to discuss the progress of their projects, and do a presentation of their work towards the end of the semester. Project areas include:
-Study and comparison of algorithmic features of different search engines, and their sponsored links such as Google, MSN, Yahoo and A9.
-Study and comparison of different mechanisms by which P2P networks maintain connectivity, and study and comparison of different mechanisms for large scale content distribution.
-Information propagation and efficiency in ad-hoc (e.g. mobile, wireless) and sensor networks.
-Study of the semantics of gene and protein interaction networks, inference of building blocks of such genetic networks, inference of evolutionary models explaining their statistics.

We wish to get a mixture of graduate students between ACO/theory, networking, databases, and bioinformatics. Undergraduates with strong background and/or experize in one of the application areas are also wellcome (however, you will need permission of the instructor).

Lectures

Tue 12:00-2:00, College of Computing, Room 52.
The material covered in each lecture together with references and/or notes will be posted in this web page.

Instructor

Milena Mihail, Associate Professor
Office phone: 404-385-0617
Office hours: Thu 12:00-:200, College of Computing 238, or by appointment.
mihail@cc.gatech.edu

Text and Notes

(1) Handbook on Graphs and Networks: From the Genome to the Internet, by Bornholdt and Schuster.
(2) Notes from similar classes at Cornell, Stanford, Harvard, Berkeley.

Grading

Class participation and presentation: 50%.
Project or Term Paper: 50%.

Resources

Class Web-site http://www.cc.gatech.edu/~mihail/index8802.html. This site is maintained by Milena Mihail, please report problems to mihail@cc.gatech.edu.

 


Tue Jan 11: Homework
(1) Browse this web site. The spirit and several topics of our class will be very similar.
(2) Send email to mihail@cc.gatech.edu, with title 8802, and with the following information: Your name, email, school, year, major, advisor, your technical background (very important) and your special interests.
(3) Download the 5 papers in the reading below, concerning the Case of Google. This is your reading material for this week, and it will be covered in next week's meeting. If you see this material for the first time, or if your backbround in linear algebra and probability is not very strong, then read only papers (1) and (5), and the introductions/discussions of the rest. For ACO/theory students and for those with solid background in linear algebra and probability, read all the papers. (Note: The further paper on Statistical Mechanics of Complex Networks is NOT required for next week, but may be interesting for math majors).
(4) If you are a math or theory major, start reading the articles for the math oriented below. You may choose to write a report on these two articles as your term paper.
(5) If you are interested in bioinformatics, start reading the articles for the "bioX" oriented below. You may choose to write a report on these five articles as your term paper.

Tue Jan 11, Jan 18: The Case of Google, Readings.
(1) S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Proc. 7th International World Wide Web Conference, 1998.
(2) L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web.
(3) R. Lempel, S. Moran, The Stochastic Approach for Link-Structure Analysis (SALSA) and the TKC Effect, 9th International World Wide Web Conference, May 2000.
(4) H. Zhang, A. Goel, R. Govindan, K. Mason, B. Van Roy, Improving Eigenvector-Based Reputation Systems against Collusions, Workshop on Algorithms and Models for the Web Graph (WAW) 2004.
(5) J. Kleinberg, Authoritative sources in a hyperlinked environment, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999).

Tue Jan 25: A preamble for the math oriented, reading.
(1) R. Albert and L.A. Barabasi, Statistical Mechanics of Complex Networks, Rev. Mod. Phys. 74, 47-97 (2002).
(2) B. Bollobas, Mathematical Results in Scale-Free random Graphs.
(3) M. Newman, The Structure and Function of Complex networks, SIAM Review 45, 167-256 (2003).

Tue Feb 1: A preamble for the bioX oriented, reading.
(1) Schwikowski et al, A Network of Protein-Protein Interactions in Yeast, Nature, December 2000, Vol 18, No 12, 1257-1261.
(2) Giot et al, A Protein Interaction Map of Drosophila Melanogaster, Science, Vol 302, Issue 5651, 1727-1737, 5 December 2003.
(3) Tong et al, Global Mapping of the Yeast Genetic Interaction Network, Science, Vol 303, Issue 5659, 808-813 , 6 February 2004.
(4) Middendorf et al, Inferring Network Mechanisms: The Drosophila Melanogaster Protein Interaction Network.
(5) Middendorf et al, Discriminative Topological Features Reveal Biological Network Mechanisms.

Tue Feb 8: On the Small World Phenomenon.
(1) D. Watts, Networks, Dynamics and the Small World Phenomenon, American Journal of Sociology, 1999.
(2) J. Kleinberg, The Small World Phenomenon: An Algorithmic Perspective, STOC 2000.

Tue Feb 15: Performance, Spectrum and Conductance.
(1) Gkantsidis, Mihail, Saberi, On the Random Walk Method for Peer-to-Peer Networks, in particular Theorem 4.1, Infocom 2004. Theorem 4.1 contains the folklore argument of expansion of random (regular) graphs.
(2) Gkantsidis, Mihail, Saberi, Conductance and Congestion in Power Law Graphs, Sigmetrics 2003, in particular Lemma 3.2. This Lemma contains the basic argument of expansion of random graphs with arbitrary degree sequences in the configurational model.
(3) Mihail, Papadimitriou, Saberi, On Certain Connectivity Properties of the Internet Topology, Focs 2003, in particular Theorem 1. This Theorem contains the basic argument of expansion of random graphs in the evolutionary model of growth with preferential connectivity.

Tue Feb 22:
A. Mehta, A. Saberi, U. Vazirani, V. Vazirani, Adwords and generalized Online matching.

Tue March 1: PROJECTS

A. ON RANDOM GRAPH MODELS FOR COMPLEX NETWORKS

(1) To what extend are most graphs with heavy tailed statistics "similar" ? Read the Kleinberg and Kleinberg paper which establishes differences between Erdos-Renyi (near regular) random graphs and graphs in the evolutionary model of growth with preferential connectivity. Can you extend their result to the configurational random graph model? Can you argue about the concentration (or lack of it) of the number of small induced subgraphs?

(2) Characterize "clustering" in various random graph models. More formally, consider models of growth with preferential connectivity as well as copying, and either derive analytical bounds on their conductance, or derive experimental results on their high end spectrum (the largest eigenvalues of a suitable normalization of the graph). Here are two papers with very general evolutionary models: Cooper and Frieze and Chung et al. If you work on analytical bounds, you will also need to read the papers from Feb 15: Theorem 4.1, Lemma 3.2 and Theorem 1.

(3) Explain, either theoretically or experimentally, the effectiveness of the spectral filtering method for information retrieval. For an empirical understanding, read, for example, this paper, and also read papers on spectral methods from Chris Ding's page at Lawrence Berkelekey Labs. For an analytical understanding, you need to read Frank McSherry's papers Spectral Partitioning of Random Graphs, in STOC 2001, and Spectral Analysis of Random Graphs with Skewed Degree Distributions, in FOCS 2004. We are particularly interested in copying models that express biological networks, as in Cooper and Frieze and Chung et al.

B. INFORMATION RETRIEVAL FOR GENE AND PROTEIN INTERACTION NETWORKS

(1) Apply the spectral filtering method for information retrieval in existing gene and protein interaction data. The data could be from references (1), (2) and (3) of Feb 1. Infer the semantics of the found clusters from known databases.

(2) Quantify the high end spectrum of copying random graph models that express gene and protein interaction networks. We are particularly interested in spectrum shifts as a function of various parameters of the models. This project is the experimental facet of A. (2).

C. SEARCH ENGINES
(1) How can you "spam" Google's pagerank? Look at this spamming business! How can you detect such spamming? This might be a starting point.
(2) Implement and quantify the effectivess of the online adwords selection algorithm of Mehta, Saberi, Vazirani and Vazirani.

D. PEER-TO-PEER NETWORKS
How can you maintain an expander graph in a distributed setting? We are working on techniques from the Markov chain Monte Carlo. Two very relevant papers are Sampling Regular Graphs and Sampling Knapsack Solutions .

E. AD-HOC NETWORKS
Every network that has an underlying geometry is bounded by the physical constrains of this geometry. For example, a network with an underlying planar geometry (e.g. sensor, mobile, wireless) is bounded by a O(1/sqrt(n)) expansion. However, this performance can be improved by mobility (at the cost of delays). Can you quantify the effectiveness of mobility in terms of improved expansion? Relevant papers are The Capacity of Wireless Networks by Gupta and Kumar, Mobility Increases Capacity by Brossglauser and Tse (best paper Infocom 01), and Throughput-Delay Trade-Off in Wireless Networks by El Gamal, Mammen, Prabhakar and Shah (best paper Infocom 04).

Tue March 8: Some Algorithmic Issues in P2P Networks
Peer-to-Peer Research at Stanford, Bawa et al.
Estimating Aggregates on a Peer-to-Peer network, Bawa et al.
Hybrid Search Schemes in Unstructured P2P Networks, Gkantsidis, Mihail, Saberi.
Percolation Search in Power Law Networks: Making Unstructured P2P networks Scalable, Sarshar, Boykin, Roychowdhury.
Fastest Mixing Markov Chain on a Graph, Boyd, Diaconis, Xiao.
Bounding Fastest Mixing, Sebastian Roch.
Distributed Computation of Eigenvectors, Franc McSherry.

Tue March 15: No Class, Infocom 2005.

Tue March 22: No Class, Spring Break.

Tue March 30: Performance in Ad-Hoc Networks.
Here is a set of very well received papers on the performance of wireless, sensor, ad-hoc networks.
The Capacity of Wireless Networks by Gupta and Kumar, Mobility Increases Capacity by Brossglauser and Tse (best paper Infocom 01), and Throughput-Delay Trade-Off in Wireless Networks by El Gamal, Mammen, Prabhakar and Shah (best paper Infocom 04).
In class we will attemp to obtain a combinatorial explanation of these results, in terms of the spectral gap of the underlying topologies (and hence routing congestion, throughput, and mixing times as they relate to velocity of agents), and a fundamental probabilistic fact known as the birthday paradox.

Tue April 5: Algorithmic Game Theory.
Keynote address on Algorithms, Games and the Internet, from STOC 2001.
A very introductory presentation on games, by Avrim Blum.
Arrow-Debreu and Nash Equilibria, by Papadimitriou.

Tue April 12:
Steven Young's term project presentation. Do powerlaw graphs have 0-1 laws?

Tue April 19:
David Cash and Christos Gkantsidis term project presentation. Maintaining P2P topology by random local link exchanges.