Course Information

This course gives an overview of modern data-driven techniques for natural language processing. The course moves from shallow bag-of-words models to richer structural representations of how words interact to create meaning. At each level, we will discuss the salient linguistic phenomena and most successful computational models. Along the way we will cover machine learning techniques which are especially relevant to natural language processing.

Slides, materials, and projects information for this iteration of NLP courses are borrowed from Jacob Eisenstein, Yulia Tsvetkov and Robert Frederking at CMU, Dan Jurafsky at Stanford, David Bamman at UC Berkeley, Noah Smith at UW, Kai-Wei Chang at UCLA.



Class Meets
Mondays and Wednesdays, 3:00-4:15pm; Kendeda Building 152
Piazza
piazza.com/gatech/spring2020/cs7650cs4650
Staff Mailing List
cs4650-7650-s20-staff@googlegroups.com
Office Hours
Ian Stewart: 8-10pm ET on Tuesdays
Jiaao Chen: 2-4pm ET on Thursdays
Nihal Singh: 9-11am ET on Fridays
Jingfeng Yang: 10pm-11:59pm ET on Mondays
Please check the piazza/canvas for the BlueJeans Link
Online Instruction FAQs

Schedule

Note: tentative schedule is subject to change.

Date Topic Optional Reading
W1: Jan 6 Introduction to NLP
Slides
W1: Jan 8 Text Classification
Slides, HW1 Out
W2: Jan 13 Neural Networks for Text Classification
Slides
W2: Jan 15 Language Modeling I
Slides, HW2 Out
W3: Jan 20 MLK Day: No class
W3: Jan 22 Course Project, Pytorch Tutorial
Slides, Pytorch Slides, DL Slides (optional)
W4: Jan 27 Language Modeling II
Slides
W4: Jan 29 Vector Semantics
Slides
W5: Feb 3 Word Embedding
Slides, HW3 Out
W5: Feb 5 Sequence Labeling: POS & HMM
Slides
W6: Feb 10 No Class : AAAI
W6: Feb 12 Sequence Labeling: Viterbi & Forward Alg
Slides
W7: Feb 17 Context Free Grammar
Slides
W7: Feb 19 Constituency Parsing
Slides
W8: Feb 24 Midterm Review
Slides
W8: Feb 26 Midterm
W9: Mar 2 Dependency Parsing Syntax
Slides
W9: Mar 4 Dependency Parsing (by Yuval Pinter)
Slides, Upcoming Project Deadline Info
W10: Mar 9 Project Feedback
Sign-up
W10: Mar 11 Computational Ethics
Slide
W11: Mar 16 Spring Break
W11: Mar 18 Spring Break
W12: Mar 23 Online Instruction Testing / No Class
W12: Mar 25 Online Instruction Testing / No Class
W13: Mar 30 Question Answering
Slide
W13: Apr 1 Information Extraction
Slide
W14: Apr 6 Conversational Agents
Slide
W14: Apr 8 Machine Translation I
Slide
W15: Apr 13 Machine Translation II
Slide
W15: Apr 15 Generation
Slide
W16: Apr 20 Computational Social Science
Slide

Grading

  • 45% Homework Assignments
    • Homework 1: 6%
    • Homework 2: 13%
    • Homework 3: 13%
    • Homework 4: 13%
  • 15% Midterm Exam
    • No make-up exam unless under emergency situation
  • 30 + (2)% Course Project
    • Project proposal (2 pages): 5%
    • Midway report (3 pages): 10%
    • Final report (7 pages): 15%
    • Video Presentation (5 min): Bonus 2%
  • 10% Online Instruction
    • Awarded to every student enrolled

Policies

Late Policies:

Student will have a total of four late days to use when turning in homework assignments; each late day extends the deadline by 24 hours. There are no restrictions on how the late days can be used (e.g., all 4 could be used on one homework). Using late days will not affect your grade. However, homework submitted late after all late days have been used will receive no credit.

Class Policies:

Attendance will not be taken, but you are responsible for knowing what happens in every class. The instructor will try to post slides and notes online, and to share announcements, but there are no guarantees. So if you cannot attend class in person, make sure you check up with someone who was there.

Respect your classmates and your instructor by avoiding distractions. This means be there on time, turn off your cell phone, and save side conversations for after class.

Multiple studies have shown that using a laptop in class – even for taking notes – reduces students’ educational attainment. You are suggested to try pen and paper for a few weeks, and see if it helps you concentrate. Whatever technology you decide to use, it is your responsibility to ensure that it does not distract your classmates or the instructor.

Prerequisites

The official prerequisite for CS 4650 is CS 3510/3511, “Design and Analysis of Algorithms.” This prerequisite is essential because understanding natural language processing algorithms requires familiarity with dynamic programming, as well as automata and formal language theory: finite-state and context-free languages, NP-completeness, etc. While course prerequisites are not enforced for graduate students, prior exposure to analysis of algorithms is very strongly recommended.

Furthermore, this course assumes:

  • Good coding ability, corresponding to at least a third or fourth-year undergraduate CS major. Assignments will be in Python.
  • Background in basic probability, linear algebra, and calculus.
  • Familiarity with machine learning is helpful but not assumed. Of particular relevance are linear classifiers: perceptron, naive Bayes, and logistic regression.

People sometimes want to take the course without having all of these prerequisites. Frequent cases are:

  • Junior CS students with strong programming skills but limited theoretical and mathematical background,
  • Non-CS students with strong mathematical background but limited programming experience.

Students in the first group suffer in the exam and don’t understand the lectures, and students in the second group suffer in the problem sets. My advice is to get the background material first, and then take this course.

Project

This semester-long project will involve one to three students and should focus on natural language processing – either focusing on core NLP methods or using NLP in support of an empirical research question. The project will be comprised of four components:

  • Project proposal (March 13th) Students will propose the research question to be examined, motivate its rationale as an interesting question worth asking, and assess its potential to contribute new knowledge by situating it within related literature in the scientific community. (2 pages, excluding references)
  • Midterm report (April 3rd) By the middle of the course, students should present initial experimental results and establish a validation strategy to be performed at the end of experimentation. (3 pages, excluding references)
  • Video Presentation (April 23th) (Optional) At the end of the semester, teams can present their work in a video/demo. This should be submitted together with the final report.
  • Final report (April 23rd) The final report will include a complete description of work undertaken for the project, including data collection, development of methods, experimental details (complete enough for replication), comparison with past work, and a thorough analysis. Projects will be evaluated according to standards including clarity, originality, soundness, substance, evaluation, meaningful comparison, and impact (of ideas, software, and/or datasets). (7 pages, excluding references)

All reports should use the ACL 2020 style files for either LaTeX or Microsoft Word.

FAQs

  • The class is full. Can I still get in?

    Sorry. The course admins in CoC control this process. Please talk to them.

  • I am graduating this Fall and I need this class to complete my degree requirements. What should I do?

    Talk to the advisor or graduate coordinator for your academic program. They are keeping track of your degree requirements and will work with you if you need a specific course.

  • Can I audit this class or take it pass/fail?

    No. Due to the large demand for this class, we will not be allowing audits or pass/fail. Letter grades only. This is to make sure students who want to take the class for credit can.

  • Can I simply sit in the class (no credits)?

    In general, we welcome members of the Georgia Tech community (students, staff, and/or faculty) to sit-in. Out of courtesy, we would appreciate if you let us know beforehand (via email or in person). If the classroom is full, we would ask that you please allow registered students to attend.

  • I have a question. What is the best way to reach the course staff?

    Registered students – your first point of contact is Piazza (so that other students may benefit from your questions and our answers). If you have a personal matter, email us at the class mailing list cs4650-7650-s20-staff@googlegroups.com