Course Information

This course gives an overview of modern data-driven techniques for natural language processing. The course moves from shallow bag-of-words models to richer structural representations of how words interact to create meaning. At each level, we will discuss the salient linguistic phenomena and most successful computational models. Along the way we will cover machine learning techniques which are especially relevant to natural language processing.

Slides, materials, and projects information for this iteration of NLP courses are borrowed from Jacob Eisenstein, Yulia Tsvetkov and Robert Frederking at CMU, Dan Jurafsky at Stanford, David Bamman at UC Berkeley, Noah Smith at UW, Kai-Wei Chang at UCLA.



Class Meets
Mondays and Wednesdays, 3:00-4:15pm; Kendeda Building 152
Piazza
piazza.com/gatech/spring2020/cs7650cs4650
Gradescope
CS4803: To be announced!
CS7643: To be announced!
Staff Mailing List
cs4650-7650-s20-staff@googlegroups.com
Office Hours
Ian Stewart: Tuesdays, 2-4pm, CODA C1106
Jiaao Chen: Thursdays, 2-4pm, CODA C1008
Nihal Singh: Fridays, 9-11am, CODA C1008
Jingfeng Yang: Mondays, 10am-12pm, Coda 14th common area

Schedule

Note: tentative schedule is subject to change.

Date Topic Optional Reading
W1: Jan 6 Introduction to NLP
Slides
W1: Jan 8 Text Classification
Slides, HW1 Out
W2: Jan 13 Neural Networks for Text Classification
Slides
W2: Jan 15 Language Modeling
Slides HW2 Out
W3: Jan 20 MLK Day: No class
W3: Jan 22 Introduction to Deep Learning, Pytorch Tutorial
Slides

Grading

  • 45% Homework Assignments
    • Homework 1: 6%
    • Homework 2: 13%
    • Homework 3: 13%
    • Homework 4: 13%
  • 15% Midterm Exam
    • No make-up exam unless under emergency situation
  • 40% Course Project
    • Project proposal (2 pages): 5%
    • Midway report (4 pages): 10%
    • Final report (8 pages): 20%
    • Presentation (in class presentation): 5%

Policies

Late Policies:

Student will have a total of four late days to use when turning in homework assignments; each late day extends the deadline by 24 hours. There are no restrictions on how the late days can be used (e.g., all 4 could be used on one homework). Using late days will not affect your grade. However, homework submitted late after all late days have been used will receive no credit.

Class Policies:

Attendance will not be taken, but you are responsible for knowing what happens in every class. The instructor will try to post slides and notes online, and to share announcements, but there are no guarantees. So if you cannot attend class in person, make sure you check up with someone who was there.

Respect your classmates and your instructor by avoiding distractions. This means be there on time, turn off your cell phone, and save side conversations for after class.

Multiple studies have shown that using a laptop in class – even for taking notes – reduces students’ educational attainment. You are suggested to try pen and paper for a few weeks, and see if it helps you concentrate. Whatever technology you decide to use, it is your responsibility to ensure that it does not distract your classmates or the instructor.

Prerequisites

The official prerequisite for CS 4650 is CS 3510/3511, “Design and Analysis of Algorithms.” This prerequisite is essential because understanding natural language processing algorithms requires familiarity with dynamic programming, as well as automata and formal language theory: finite-state and context-free languages, NP-completeness, etc. While course prerequisites are not enforced for graduate students, prior exposure to analysis of algorithms is very strongly recommended.

Furthermore, this course assumes:

  • Good coding ability, corresponding to at least a third or fourth-year undergraduate CS major. Assignments will be in Python.
  • Background in basic probability, linear algebra, and calculus.
  • Familiarity with machine learning is helpful but not assumed. Of particular relevance are linear classifiers: perceptron, naive Bayes, and logistic regression.

People sometimes want to take the course without having all of these prerequisites. Frequent cases are:

  • Junior CS students with strong programming skills but limited theoretical and mathematical background,
  • Non-CS students with strong mathematical background but limited programming experience.

Students in the first group suffer in the exam and don’t understand the lectures, and students in the second group suffer in the problem sets. My advice is to get the background material first, and then take this course.

Project

This semester-long project will involve one to three students and should focus on natural language processing – either focusing on core NLP methods or using NLP in support of an empirical research question. The project will be comprised of four components:

  • Project proposal. Students will propose the research question to be examined, motivate its rationale as an interesting question worth asking, and assess its potential to contribute new knowledge by situating it within related literature in the scientific community. (2 pages, excluding references)
  • Midterm report. By the middle of the course, students should present initial experimental results and establish a validation strategy to be performed at the end of experimentation. (4 pages, excluding references)
  • Final report. The final report will include a complete description of work undertaken for the project, including data collection, development of methods, experimental details (complete enough for replication), comparison with past work, and a thorough analysis. Projects will be evaluated according to standards including clarity, originality, soundness, substance, evaluation, meaningful comparison, and impact (of ideas, software, and/or datasets). (8 pages, excluding references)
  • Presentation. At the end of the semester, teams will present their work in a presentation session.

All reports should use the ACL 2020 style files for either LaTeX or Microsoft Word.

FAQs

  • The class is full. Can I still get in?

    Sorry. The course admins in CoC control this process. Please talk to them.

  • I am graduating this Fall and I need this class to complete my degree requirements. What should I do?

    Talk to the advisor or graduate coordinator for your academic program. They are keeping track of your degree requirements and will work with you if you need a specific course.

  • Can I audit this class or take it pass/fail?

    No. Due to the large demand for this class, we will not be allowing audits or pass/fail. Letter grades only. This is to make sure students who want to take the class for credit can.

  • Can I simply sit in the class (no credits)?

    In general, we welcome members of the Georgia Tech community (students, staff, and/or faculty) to sit-in. Out of courtesy, we would appreciate if you let us know beforehand (via email or in person). If the classroom is full, we would ask that you please allow registered students to attend.

  • I have a question. What is the best way to reach the course staff?

    Registered students – your first point of contact is Piazza (so that other students may benefit from your questions and our answers). If you have a personal matter, email us at the class mailing list cs4650-7650-s20-staff@googlegroups.com