This course introduces the fundamental knowledge of Web Mining. The topics covered in this course broadly lie in network science, text analysis, recommender systems, and social media analysis. The emphasis is on both the theoretical and empirical aspects. Students will be introduced to machine learning techniques and data mining tools apt to reveal insights from large-scale web-based datasets.

**Instructor: Prof. Srijan Kumar****TAs**: Sandeep Soni, Sejoon Oh, Kezhen Zhang**Office Hours**:- Srijan Kumar: Thursday 10am-11am
- Sandeep Soni: Wednesdays 10am-11am
- Sejoon Oh: Tuesdays 11am-noon
- Kezhen Zhang: Monday 1pm-2pm

**Lectures**: are on Monday and Wednesday 2:00 pm - 3:15 pm**Piazza: Enroll here**. The students should use Piazza for**all**course-related queries.

**Class Policies**:

- Let the instructor know as soon as possible if you have any special needs during the semester.
- Each student must read and abide by the Georgia Tech Academic Honor Code.

**Public Resources**: The course material will be posted online as the course progresses. Use these resources. Please note that this course is heavily inspired by and based on Prof. Jure Leskovec's course CS 224W at Stanford, which you can find here. Several slides in this class are based on Prof. Leskovec's slides.

**Students are expected to have the following background:**

- Knowledge of basic computer science principles, sufficient to write a reasonably non-trivial computer program
- Familiarity with the basic probability theory (CS109 or Stat116 are sufficient but not necessary)
- Familiarity with the basic linear algebra

**Announcements:**

**01/12/2021:**First class is on January 20, 2021, as per Georgia Tech calendar.

The schedule is subject to change. Reading materials will be posted periodically below.

The time for all deadlines used in this course is 23:59 Eastern Time (11:59 PM ET).

Date |
Description | Readings and Notes | Events | Deadlines | ||
---|---|---|---|---|---|---|

Jan 18 |
No class - MLK Holiday | |||||

Jan 20 |
Introduction | |||||

Jan 25 |
Web Networks and Properties | Graph structure in the Web | Project Teams due | |||

Jan 27 |
Random Graph Models |
1. Small world phenomenon 2. Collective dynamics of ‘small-world’ networks |
||||

Feb 1 | Link Analysis (PageRank and HITS) |
1. Book chapter 'Link Analysis' from 'Introduction to Information Retrieval' 2. The PageRank Citation Ranking: Bringing Order to the Web 3. Authoritative Sources in a Hyperlinked Environment |
||||

Feb 3 |
Personalized PageRank and Recommendations |
1. Random walk with restart: fast solutions and applications 2. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time |
||||

Feb 8 | Cascades, Contagion, and Epidemics | 1. Epidemiological Modeling of News and Rumors on Twitter | HW1 out |
Project Proposal due |
||

Feb 10 |
Node Representation Learning |
1. DeepWalk: Online Learning of Social Representations 2. node2vec: Scalable Feature Learning for Networks |
||||

Feb 15 |
Graph Neural Networks |
1. Semi-Supervised Classification with Graph Convolutional Networks 2. Blog post 3. Inductive Representation Learning on Large Graphs |
||||

Feb 17 |
Message Passing and Node Classification | 1. REV2: Fraudulent User Prediction in Rating Platforms | HW1 due | |||

Feb 22 |
Belief Propagation and Applications | 1. NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks | HW2 out |
|||

Feb 24 |
Graph and Knowledge Graph Representation Learning |
1. Anonymous Walk Embedding 2. Translating Embeddings for Modeling Multi-relational Data 3. Learning entity and relation embeddings for knowledge graph completion |
||||

Mar 1 |
Discrete-time Temporal Graph Representation Learning |
1. EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs 2. DynGEM: Deep Embedding Method for Dynamic Graphs |
||||

Mar 3 |
Continuous-time Temporal Graph Representation Learning |
1. DyRep: Learning Representations over Dynamic Graphs |
HW2 due | |||

Mar 8 |
Project Discussions and Feedback |
|||||

Mar 10 |
Recommender Systems | 1. MMDS book chapter | ||||

Mar 15 |
Recommender Systems II | 1. MMDS book chapter | ||||

Mar 17 |
Deep Learning-Based Recommender Systems |
1. Deep Learning Based Recommender System: A Survey and New Perspectives 2. Recurrent Recommender Networks 3. Latent Cross: Making Use of Context in Recurrent Recommender Systems |
||||

Mar 22 |
Deep Learning-Based Recommender Systems II | 1. Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks | HW3 out (moved) |
Project Milestone due (moved) | ||

Mar 24 |
Spring Break - No class |
|||||

Mar 29 |
Self reading - No class |
|||||

Mar 31 |
Take home exam | |||||

Apr 5 | Web Search and Information Retrieval | |||||

Apr 7 | Information Retrieval (continued) | |||||

Apr 12 | Project team 1:1 meetings with instructor |
HW3 due (moved) | ||||

Apr 14 | Project team 1:1 meetings with instructor |
|||||

Apr 19 | Advanced topics | |||||

Apr 21 | Guest Lecture: Dr. Khalifeh Al Jadda, Home Depot | |||||

Apr 26 |
Project Final Reports are due | |||||

Apr 28 |
Project Presentations are due | |||||

Apr 30 |
Peer grades are due |

**Sample datasets and projects are available here.**You are welcome to use these datasets, but feel free to be creative and use other public datasets. The selected dataset and project topics should align with one or more of the topics taught in the class. Sample projects are given below as well to spark ideas and style of work.