Spring 2005
We will focus on algorithmic questions arizing in networks driving
today's technology: the Internet, the WWW, P2P networks, ad-hoc and
sensor networks,
and biological networks at the genome level (gene and protein interactions).
These networks are expressed "locally", by their nodes (data points)
and links (relationships between two data points).
How do these networks organize to execute the fundamental tasks
that they are formed to perform?
How can formal models and algorithms infer their semantics?
How can algorithm design methods improve their performance?
This special topics course will meet once a week for 2 hours,
during which time we will focus and discuss a topic.
In collaboration with the instructor, each student will be responsible
for leading one of the discussions. Students will also do one
project/term paper, in groups of 2-3, meet with the instructor
to discuss the progress of their projects, and do a presentation
of their work towards the end of the semester. Project areas include:
-Study and comparison of algorithmic features of different search engines,
and their sponsored links such as Google, MSN, Yahoo and A9.
-Study and comparison of different mechanisms by which P2P networks maintain
connectivity, and study and comparison of different mechanisms for large
scale content distribution.
-Information propagation and efficiency in ad-hoc (e.g. mobile, wireless)
and sensor networks.
-Study of the semantics of gene and protein interaction networks,
inference of building blocks of such genetic networks,
inference of evolutionary models explaining their statistics.
We wish to get a mixture of graduate students between ACO/theory, networking,
databases, and bioinformatics. Undergraduates with strong background
and/or experize in one of the application areas are also wellcome
(however, you will need permission of the instructor).
Tue 12:00-2:00, College of Computing, Room 52.
The material covered in each lecture together with references
and/or notes will be posted in this web page.
Milena Mihail,
Associate Professor
Office phone: 404-385-0617
Office hours: Thu 12:00-:200, College of Computing 238,
or by appointment.
mihail@cc.gatech.edu
(1) Handbook on Graphs and Networks: From the Genome to the Internet,
by Bornholdt and Schuster.
(2) Notes from similar classes at
Cornell,
Stanford,
Harvard,
Berkeley.
Class participation and presentation: 50%.
Project or Term Paper: 50%.
Class Web-site http://www.cc.gatech.edu/~mihail/index8802.html. This site is maintained by Milena Mihail, please report problems to mihail@cc.gatech.edu.
Tue Jan 11: Homework
(1) Browse this
web site. The spirit and several topics of our class will be very similar.
(2) Send email to mihail@cc.gatech.edu, with title 8802,
and with the following information: Your name, email, school, year, major,
advisor, your technical background (very important)
and your special interests.
(3) Download the 5 papers in the reading below, concerning the Case of Google.
This is your reading material for this week, and it will be covered
in next week's meeting.
If you see this material for the first time, or if your backbround
in linear algebra and probability
is not very strong, then read only papers (1) and (5),
and the introductions/discussions of the rest.
For ACO/theory students and for those with solid background
in linear algebra and probability, read all the papers.
(Note: The further paper on Statistical Mechanics of Complex Networks
is NOT required for next week, but may be interesting for math majors).
(4) If you are a math or theory major, start reading the articles for
the math oriented below. You may choose to write a report on these
two articles as your term paper.
(5) If you are interested in bioinformatics, start reading the articles for
the "bioX" oriented below.
You may choose to write a report on these
five articles as your term paper.
Tue Jan 11, Jan 18: The Case of Google, Readings.
(1) S. Brin and L. Page,
The Anatomy of a Large-Scale Hypertextual Web Search Engine,
Proc. 7th International World Wide Web Conference, 1998.
(2) L. Page, S. Brin, R. Motwani, T. Winograd,
The PageRank Citation Ranking: Bringing Order to the Web.
(3) R. Lempel, S. Moran,
The Stochastic Approach for Link-Structure Analysis (SALSA)
and the TKC Effect,
9th International World Wide Web Conference, May 2000.
(4) H. Zhang, A. Goel, R. Govindan, K. Mason, B. Van Roy,
Improving Eigenvector-Based Reputation Systems against Collusions,
Workshop on Algorithms and Models for the Web Graph (WAW)
2004.
(5) J. Kleinberg,
Authoritative sources in a hyperlinked environment,
Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
Extended version in Journal of the ACM 46(1999).
Tue Jan 25: A preamble for the math oriented, reading.
(1) R. Albert and L.A. Barabasi,
Statistical Mechanics of Complex Networks,
Rev. Mod. Phys. 74, 47-97 (2002).
(2) B. Bollobas,
Mathematical Results in Scale-Free random Graphs.
(3) M. Newman,
The Structure and Function of Complex networks,
SIAM Review 45, 167-256 (2003).
Tue Feb 1: A preamble for the bioX oriented, reading.
(1) Schwikowski et al,
A Network of Protein-Protein Interactions in Yeast,
Nature, December 2000, Vol 18, No 12, 1257-1261.
(2) Giot et al,
A Protein Interaction Map of Drosophila Melanogaster,
Science, Vol 302, Issue 5651, 1727-1737, 5 December 2003.
(3) Tong et al,
Global Mapping of the Yeast Genetic Interaction Network,
Science, Vol 303, Issue 5659, 808-813 , 6 February 2004.
(4) Middendorf et al,
Inferring Network Mechanisms:
The Drosophila Melanogaster Protein Interaction Network.
(5) Middendorf et al,
Discriminative Topological Features Reveal Biological
Network Mechanisms.
Tue Feb 8: On the Small World Phenomenon.
(1) D. Watts,
Networks, Dynamics and the Small World Phenomenon,
American Journal of Sociology, 1999.
(2) J. Kleinberg,
The Small World Phenomenon: An Algorithmic Perspective,
STOC 2000.
Tue Feb 15: Performance, Spectrum and Conductance.
(1) Gkantsidis, Mihail, Saberi,
On the Random Walk Method for Peer-to-Peer Networks,
in particular Theorem 4.1,
Infocom 2004.
Theorem 4.1 contains the folklore argument of expansion of random (regular) graphs.
(2) Gkantsidis, Mihail, Saberi,
Conductance and Congestion in Power Law Graphs,
Sigmetrics 2003,
in particular Lemma 3.2.
This Lemma contains the basic argument of expansion of random graphs with arbitrary
degree sequences in the configurational model.
(3) Mihail, Papadimitriou, Saberi,
On Certain Connectivity Properties of the Internet Topology,
Focs 2003,
in particular Theorem 1.
This Theorem contains the basic argument of expansion of random graphs
in the evolutionary model of growth with preferential connectivity.
Tue Feb 22:
A. Mehta, A. Saberi, U. Vazirani, V. Vazirani,
Adwords and generalized Online matching.
Tue March 1: PROJECTS
A. ON RANDOM GRAPH MODELS FOR COMPLEX NETWORKS
(1) To what extend are most graphs with heavy tailed statistics "similar" ?
Read the
Kleinberg and Kleinberg paper which establishes differences between
Erdos-Renyi (near regular) random graphs and graphs in the
evolutionary model of growth with preferential connectivity.
Can you extend their result to the configurational random graph model?
Can you argue about the concentration (or lack of it)
of the number of small induced subgraphs?
(2) Characterize "clustering" in various random graph models.
More formally, consider models of growth with preferential connectivity
as well as copying, and either derive analytical bounds on their
conductance, or derive experimental results on their high end spectrum
(the largest eigenvalues of a suitable normalization of the graph).
Here are two papers with very general evolutionary models:
Cooper and Frieze
and Chung et al.
If you work on analytical bounds, you will also need to read the papers from Feb 15:
Theorem 4.1,
Lemma 3.2
and Theorem 1.
(3) Explain, either theoretically or experimentally, the effectiveness
of the spectral filtering method for information retrieval.
For an empirical understanding, read, for example,
this paper,
and also read papers on spectral methods from
Chris Ding's page
at Lawrence Berkelekey Labs.
For an analytical understanding, you need to read Frank McSherry's papers
Spectral Partitioning of Random Graphs, in STOC 2001,
and
Spectral Analysis of Random Graphs with Skewed Degree Distributions,
in FOCS 2004.
We are particularly interested in copying models that express
biological networks,
as in
Cooper and Frieze
and Chung et al.
B. INFORMATION RETRIEVAL FOR GENE AND PROTEIN INTERACTION NETWORKS
(1) Apply the spectral filtering method for information retrieval
in existing gene and protein interaction data.
The data could be from references (1), (2) and (3) of Feb 1.
Infer the semantics of the found clusters from known databases.
(2) Quantify the high end spectrum of copying random graph models
that express gene and protein interaction networks.
We are particularly interested in spectrum shifts as a function
of various parameters of the models.
This project is the experimental facet of A. (2).
C. SEARCH ENGINES
(1) How can you "spam" Google's pagerank?
Look at this spamming business!
How can you detect such spamming?
This
might be a starting point.
(2) Implement and quantify the effectivess of the online adwords selection
algorithm of
Mehta, Saberi, Vazirani and Vazirani.
D. PEER-TO-PEER NETWORKS
How can you maintain an expander graph in a distributed setting?
We are working on techniques from the Markov chain Monte Carlo.
Two very relevant papers are
Sampling Regular Graphs and
Sampling Knapsack Solutions .
E. AD-HOC NETWORKS
Every network that has an underlying geometry is bounded by the
physical constrains of this geometry. For example, a network with
an underlying planar geometry (e.g. sensor, mobile, wireless)
is bounded by a O(1/sqrt(n)) expansion. However, this performance can
be improved by mobility (at the cost of delays).
Can you quantify the effectiveness of mobility in terms
of improved expansion?
Relevant papers are
The Capacity of Wireless Networks by Gupta and Kumar,
Mobility Increases Capacity by Brossglauser and Tse
(best paper Infocom 01), and
Throughput-Delay Trade-Off in Wireless Networks by
El Gamal, Mammen, Prabhakar and Shah
(best paper Infocom 04).
Tue March 8: Some Algorithmic Issues in P2P Networks
Peer-to-Peer Research at Stanford, Bawa et al.
Estimating Aggregates on a Peer-to-Peer network, Bawa et al.
Hybrid Search Schemes in Unstructured P2P Networks,
Gkantsidis, Mihail, Saberi.
Percolation Search in Power Law Networks:
Making Unstructured P2P networks Scalable,
Sarshar, Boykin, Roychowdhury.
Fastest Mixing Markov Chain on a Graph,
Boyd, Diaconis, Xiao.
Bounding Fastest Mixing,
Sebastian Roch.
Distributed Computation of Eigenvectors,
Franc McSherry.
Tue March 15: No Class, Infocom 2005.
Tue March 22: No Class, Spring Break.
Tue March 30: Performance in Ad-Hoc Networks.
Here is a set of very well received papers on the performance
of wireless, sensor, ad-hoc networks.
The Capacity of Wireless Networks by Gupta and Kumar,
Mobility Increases Capacity by Brossglauser and Tse
(best paper Infocom 01), and
Throughput-Delay Trade-Off in Wireless Networks by
El Gamal, Mammen, Prabhakar and Shah
(best paper Infocom 04).
In class we will attemp to obtain a combinatorial explanation
of these results, in terms of the spectral gap of the underlying topologies
(and hence routing congestion, throughput, and mixing times
as they relate to velocity of agents),
and a fundamental probabilistic fact known as the
birthday paradox.
Tue April 5: Algorithmic Game Theory.
Keynote address on
Algorithms, Games and the Internet,
from STOC 2001.
A very introductory presentation on
games,
by Avrim Blum.
Arrow-Debreu and Nash Equilibria,
by Papadimitriou.
Tue April 12:
Steven Young's term project presentation.
Do powerlaw graphs have 0-1 laws?
Tue April 19:
David Cash and Christos Gkantsidis term project presentation.
Maintaining P2P topology by random local link exchanges.