MW 3:00-4:15pm, Kendeda 152
This course gives an overview of modern data-driven techniques for natural language processing. The course moves from shallow bag-of-words models to richer structural representations of how words interact to create meaning. At each level, we will discuss the salient linguistic phenomena and most successful computational models. Along the way we will cover machine learning techniques which are especially relevant to natural language processing.
Slides, materials, and projects information for this iteration of NLP courses are borrowed from Jacob Eisenstein, Yulia Tsvetkov and Robert Frederking at CMU, Dan Jurafsky at Stanford, David Bamman at UC Berkeley, Noah Smith at UW, Kai-Wei Chang at UCLA.
- Class Meets
- Mondays and Wednesdays, 3:00-4:15pm; Kendeda Building 152
- CS4803: To be announced!
- CS7643: To be announced!
- Staff Mailing List
- Office Hours
- Ian Stewart: Tuesdays, 2-4pm, CODA C1106
- Jiaao Chen: Thursdays, 2-4pm, CODA C1008
- Nihal Singh: Fridays, 9-11am, CODA C1008
- Jingfeng Yang: Mondays, 10am-12pm, Coda 14th common area
Note: tentative schedule is subject to change.
|W1: Jan 6||
Introduction to NLP
|W1: Jan 8||
Slides, HW1 Out
|W2: Jan 13||
Neural Networks for Text Classification
|W2: Jan 15||
Slides HW2 Out
|W3: Jan 20||MLK Day: No class|
|W3: Jan 22||
Introduction to Deep Learning, Pytorch Tutorial
- 45% Homework Assignments
- Homework 1: 6%
- Homework 2: 13%
- Homework 3: 13%
- Homework 4: 13%
- 15% Midterm Exam
- No make-up exam unless under emergency situation
- 40% Course Project
- Project proposal (2 pages): 5%
- Midway report (4 pages): 10%
- Final report (8 pages): 20%
- Presentation (in class presentation): 5%
Student will have a total of four late days to use when turning in homework assignments; each late day extends the deadline by 24 hours. There are no restrictions on how the late days can be used (e.g., all 4 could be used on one homework). Using late days will not affect your grade. However, homework submitted late after all late days have been used will receive no credit.
Attendance will not be taken, but you are responsible for knowing what happens in every class. The instructor will try to post slides and notes online, and to share announcements, but there are no guarantees. So if you cannot attend class in person, make sure you check up with someone who was there.
Respect your classmates and your instructor by avoiding distractions. This means be there on time, turn off your cell phone, and save side conversations for after class.
Multiple studies have shown that using a laptop in class – even for taking notes – reduces students’ educational attainment. You are suggested to try pen and paper for a few weeks, and see if it helps you concentrate. Whatever technology you decide to use, it is your responsibility to ensure that it does not distract your classmates or the instructor.
The official prerequisite for CS 4650 is CS 3510/3511, “Design and Analysis of Algorithms.” This prerequisite is essential because understanding natural language processing algorithms requires familiarity with dynamic programming, as well as automata and formal language theory: finite-state and context-free languages, NP-completeness, etc. While course prerequisites are not enforced for graduate students, prior exposure to analysis of algorithms is very strongly recommended.
Furthermore, this course assumes:
- Good coding ability, corresponding to at least a third or fourth-year undergraduate CS major. Assignments will be in Python.
- Background in basic probability, linear algebra, and calculus.
- Familiarity with machine learning is helpful but not assumed. Of particular relevance are linear classifiers: perceptron, naive Bayes, and logistic regression.
People sometimes want to take the course without having all of these prerequisites. Frequent cases are:
- Junior CS students with strong programming skills but limited theoretical and mathematical background,
- Non-CS students with strong mathematical background but limited programming experience.
Students in the first group suffer in the exam and don’t understand the lectures, and students in the second group suffer in the problem sets. My advice is to get the background material first, and then take this course.
This semester-long project will involve one to three students and should focus on natural language processing – either focusing on core NLP methods or using NLP in support of an empirical research question. The project will be comprised of four components:
- Project proposal. Students will propose the research question to be examined, motivate its rationale as an interesting question worth asking, and assess its potential to contribute new knowledge by situating it within related literature in the scientific community. (2 pages, excluding references)
- Midterm report. By the middle of the course, students should present initial experimental results and establish a validation strategy to be performed at the end of experimentation. (4 pages, excluding references)
- Final report. The final report will include a complete description of work undertaken for the project, including data collection, development of methods, experimental details (complete enough for replication), comparison with past work, and a thorough analysis. Projects will be evaluated according to standards including clarity, originality, soundness, substance, evaluation, meaningful comparison, and impact (of ideas, software, and/or datasets). (8 pages, excluding references)
- Presentation. At the end of the semester, teams will present their work in a presentation session.
All reports should use the ACL 2020 style files for either LaTeX or Microsoft Word.
The class is full. Can I still get in?
Sorry. The course admins in CoC control this process. Please talk to them.
I am graduating this Fall and I need this class to complete my degree requirements. What should I do?
Talk to the advisor or graduate coordinator for your academic program. They are keeping track of your degree requirements and will work with you if you need a specific course.
Can I audit this class or take it pass/fail?
No. Due to the large demand for this class, we will not be allowing audits or pass/fail. Letter grades only. This is to make sure students who want to take the class for credit can.
Can I simply sit in the class (no credits)?
In general, we welcome members of the Georgia Tech community (students, staff, and/or faculty) to sit-in. Out of courtesy, we would appreciate if you let us know beforehand (via email or in person). If the classroom is full, we would ask that you please allow registered students to attend.
I have a question. What is the best way to reach the course staff?
Registered students – your first point of contact is Piazza (so that other students may benefit from your questions and our answers). If you have a personal matter, email us at the class mailing list firstname.lastname@example.org
Related Classe (not exhaustive!)
- Introduction to NLP, CMU
- Algorithm for NLP, CMU
- Natural Language Processing, UT Austin
- Natural Language Processing, UC Berkeley
- Natural Language Processing with Deep Learning, Stanford