Web Search and Text Mining
CSE6240: Web Search and Text Mining
Spring, 2012
Lecture: TR 9:35pm - 10:55pm
Office hours: T 1:00pm - 2:00pm.
Instructor:
Hongyuan Zha, Office: 1314 KACB, Phone: 404-385-1491
Course Description
This course will cover the data analytic aspects of three closely related topics: Web search,
recommendation systems and social network analysis. The emphasis is on probabilistic and
statistical methods, user behavior modeling, and dyanmic behavior and structure co-evolution
in social networks.
Ideally you should have formal exposure to data mining and machine learning at the level of CSE6740 and be comfortable with using a script and/or high-level language.
List of Topics
- Introductoion on IR: inverted indices, query processing, tf-idf weighting, scoring, anchor texts,
precision and recall, DCG
- Link analysis: PageRank and HITS algorithms
- Learning to rank methods
- Implicit relevance feedback using user click and behavior data
- Search result diversity
- Latent dirichlet allocation and extensions
- Instroduction on RS: Content-based recommendations, collaborative recommendations, user and item-based methods, matrix factorization, evaluation
- Matrix factorization and item/user features
- Cold start problems in collaborative filtering
- Local, mobile and collaborative filtering
- Incorporating multiple data sources in collaborative filtering
- Strong and weak ties in social networks
- Inferring signed relations from user behavior
- Diffusion in nteworks and small-world phenomenon
- Survival and event history analysis
- Structure and behavior co-evolution
- Epidemics and structure discovery from user behavior
Class Policies
- Please let me know as soon as possible if you have any special needs during the semester.
- Each student must read and abide by the Georgia Tech Academic Honor Code.
Grading
- Homeworks: 50%
- Projects: 50%