NLP: CS-4650

MW 3:30-4:45pm, Ford ES&T L1255

Course Information
Schedule
Grading
Policies
Prerequisites
FAQs
External Resources

Course Information

This course gives an overview of modern data-driven techniques for natural language processing. The course moves from shallow bag-of-words models to richer structural representations of how words interact to create meaning. At each level, we will discuss the salient linguistic phenomena and most successful computational models. Along the way we will cover machine learning techniques which are especially relevant to natural language processing.

Slides, materials, and projects information for this iteration of NLP courses are borrowed from Jacob Eisenstein, Yulia Tsvetkov and Robert Frederking at CMU, Dan Jurafsky at Stanford, David Bamman at UC Berkeley, Noah Smith at UW, Kai-Wei Chang at UCLA.

Instructor

Diyi Yang

Teaching Assistants

Andrew Silva (Head TA)

Amal Alabdulkarim

Haard Shah

Omar Shaikh

Class Meets: Mondays and Wednesdays, 3:30-4:45pm
Piazza: piazza.com/class/kruumtvm8ly4bk
Staff Mailing List: fall2021-cs4650-nlp-staff@googlegroups.com
Office Hours (Eastern Time): Haard Shah:
Mondays 1-2PM ET
Remote: https://bluejeans.com/7932917220; Andrew Silva:
Tuesdays 1-2PM ET
Outdoor tables in front of CoC/In-person; Amal Alabdulkarim:
Thursdays 1-2PM ET
Remote: https://bluejeans.com/506825067/8716; Omar Shaikh:
Fridays 12:30PM-1:30PM ET
CCB, First Floor Commons (I'll have a whiteboard with me that says CS4650).
Optionally remote (email staff).

Schedule

Note: tentative schedule is subject to change.

Date	Topic	"Buzz"words	Optional Reading
Aug 23	Introduction Slides	GPT3, Semantics, Syntax, Jeopardy!	SLP3 ch 2
Aug 25	Text Processing Slides	Tokenization, Regular Expression	SLP3 ch 2
Aug 30	Text Classification (1) Slides HW1 Out, HW1 Template	Naive Bayes, Logistic Regression	SLP3 ch 4 E ch 2
Sep 1	Text Classification (2) Slides	Neural Networks, Activation Functions, RNNs	E ch 3 E ch 4
Sep 6	Holiday - Labor Day
Sep 8	PyTorch and Neural Networks for NLP Slides Intro to PyTorch Video HW1 Due HW2 PDF, HW2 Template HW2 Code, HW2 Colab
Sep 13	Language Modeling (1) Slides	N-gram language models	SLP3 ch 3
Sep 15	Language Modeling (2) Slides Project Slides	Perplexity, Smoothing, Neural LMs (BERT, GPT)	SLP3 ch 3 Chen & Goodman
Sep 20	Word Embedding (1) Slides	TF-IDF, PPMI	SLP3 ch 6
Sep 22	Word Embedding (2) Slides HW2 Due HW3 PDF, HW3 Template HW3 Code	Word2Vec, FastText	SLP3 ch 6 E ch 14
Sep 27	Word Embedding (3) Slides	BERT, GPT, ELMO
Sep 29	Sequence Labeling (1) Slides Project Proposal	POS Tagging	SLP3 ch 8 E ch 7
Oct 4	Sequence Labeling (2) Slides	HMM, Forward Algorithm, Viterbi	SLP3 ch 8 E ch 7
Oct 6	Midterm
Oct 8	HW3 Due
Oct 11	Fall Break
Oct 13	Constituency Parsing (1) Slides HW4 PDF, HW4 Template HW4 Code	Syntax, CFGs	SLP3 ch 12 E ch 9
Oct 18	Constituency Parsing (2) Slides	CKY Parsing	SLP3 ch 13 E ch 10
Oct 20	Guest Speaker: Andrew Silva! Semantic Parsing Meets Robots Slides	Semantic Parsing, Robotics	SLP3 ch 15
Oct 25	Ethics + NLP Slides	Ethics, Bias, and Fairness
Oct 27	Midway Presentation
Nov 1	Machine Translation (1) Slides HW4 Due	Alignment, Noisy Channel Models	E ch 18
Nov 3	Machine Translation (2) Slides HW5 PDF, HW5 Template HW5 Code	Seq2Seq and Attention	E ch 18
Nov 8	Group Project Consultations Slides	SQuAD
Nov 10	Question Answering Slides	SQuAD, Adversarial Attacks, Open Domain QA	SLP3 ch 25
Nov 15	Dialogue Systems (1) Slides	Chatbots, IR-based systems	E ch 19
Nov 17	Guest Speaker: Caleb Ziems! HW5 Due HW6 PDF, HW6 Template HW6 Code	Dialects and Framing	E ch 19
Nov 22	Dialogue Systems (2) Slides	Task Oriented Dialogue Systems, Personas	SLP3 ch 26
Nov 24	Thanksgiving Break
Nov 29	Computational Social Science Slides	Bias and Persuasion
Nov 30	HW6 Due
Dec 1	Summarization Slides	Document and Dialogue Summarization
Dec 6	Final Project Presentations Slides
Dec 8	No Class

Grading

60% Homework Assignments
- Homework 1: 10%
- Homework 2: 10%
- Homework 3: 10%
- Homework 4: 10%
- Homework 5: 10%
- Homework 6: 10%
10% Midterm Exam
30% Presentation/Project/Proposal
- Project Proposal: 5%, Due Sep 29th, 11:59pm ET
- Midterm Report: 10%, Due Oct 27th, 11:59pm ET
- Final Report: 10%, Due Dec 10th, 11:59pm ET
- Presentations: 5% (2.5% for each presentation), Oct 27th & Dec 6th (delivery mode TBD).
5% Misc. Bonus
1% CIOS

Policies

Late Policies:

Student will have a total of six late days to use when turning in homework assignments; each late day extends the deadline by 24 hours. There are no restrictions on how the late days can be used (e.g., all 6 could be used on one homework). Using late days will not affect your grade. However, homework submitted late after all late days have been used will receive no credit.

Class Policies:

Attendance will not be taken, but you are responsible for knowing what happens in every class. The instructor will try to post slides and notes online, and to share announcements, but there are no guarantees. So if you cannot attend class, make sure you check up with someone who was there.

Prerequisites

The official prerequisite for CS 4650 is CS 3510/3511, “Design and Analysis of Algorithms.” This prerequisite is essential because understanding natural language processing algorithms requires familiarity with dynamic programming, as well as automata and formal language theory: finite-state and context-free languages, NP-completeness, etc. While course prerequisites are not enforced for graduate students, prior exposure to analysis of algorithms is very strongly recommended.

Furthermore, this course assumes:

Good coding ability, corresponding to at least a third or fourth-year undergraduate CS major. Assignments will be in Python.
Background in basic probability, linear algebra, and calculus.
Familiarity with machine learning is helpful but not assumed. Of particular relevance are linear classifiers: perceptron, naive Bayes, and logistic regression.

People sometimes want to take the course without having all of these prerequisites. Frequent cases are:

Junior CS students with strong programming skills but limited theoretical and mathematical background,
Non-CS students with strong mathematical background but limited programming experience.

Students in the first group suffer in the exam and don’t understand the lectures, and students in the second group suffer in the problem sets. My advice is to get the background material first, and then take this course.

FAQs

The class is full. Can I still get in?

Sorry. The course admins in CoC control this process. Please talk to them.
I am graduating this Fall and I need this class to complete my degree requirements. What should I do?

Talk to the advisor or graduate coordinator for your academic program. They are keeping track of your degree requirements and will work with you if you need a specific course.
I have a question. What is the best way to reach the course staff?

Registered students – your first point of contact is Piazza (so that other students may benefit from your questions and our answers). If you have a personal matter, email us at the class mailing list: fall2021-cs4650-nlp-staff@googlegroups.com