CS 539-001, Natural Language Processing, Fall 2019

Coordinates T/Th, 4-5:20pm, BEXL 417 [Registrar] [Canvas] [Slack]
Instructor Liang Huang (huanlian@)
TAs Junkun Chen (chenjun2@)
Office hours Liang: T: 5:30-6pm and Th: 3:30-3:50pm, KEC 2069.
Junkun: M/F 3-4pm, KEC Atrium.
Prerequisites
  • required: algorithms: CS 325/519/515.
    a solid understanding of dynamic programming is extremely important.
  • required: proficiency in at least one mainstream programming languages (Python, C/C++, Java).
    HWs will be done in Python only. You can learn Python from these slides in 2 hours.
  • recommended: automata and formal language theory: CS 321/516.
  • recommended: machine learning: CS 534.
Textbooks
(optional)
  • Jurafsky and Martin, 2009 (2nd edi.), Speech and Language Processing. (default)
  • Manning and Schutze. 1999. Foundations of Statistical Natural Language Processing.
Grading
(tentative)
  • HWs 1-5: programming homework (in groups of 3): 12% x 4 + 15 =63%.
  • EXs: simple exercises (individually): 5% x 2=10%.
  • HW6 (individually): 12%.
  • midterm: 15%.
  • no final, no project.
Other Policies
  • this course can be used to fulfill the AI area requirement.
  • no late submission will be accepted (since you work in teams).
  • use Slack for discussions, and Canvas for HW submission and checking grades.
  • All course information (slides, HWs, readings, etc) will be available on this page.
Previous Offerings
MOOCs
(coursera)
  • Jurafsky and Manning (Stanford)
  • Collins (Columbia) -- more mathematical
Objectives This course provides an introduction to natural language processing, the study of human language from a computational perspective. We will cover finite-state machines (weighted FSAs and FSTs), syntactic structures (weighted context-free grammars and parsing algorithms), and machine learning methods (maximum likelihood and expectation-maximization). The focus will be on (a) modern quantitative techniques in NLP that use large corpora and statistical learning, and (b) various dynamic programming algorithms (Viterbi, CKY, Forward-Backward, and Inside-Outside). At the end of this course, students should have a good understanding of the research questions and methods used in different areas of natural language processing. Students should also be able to use this knowledge to implement simple natural language processing algorithms and applications. Students should also be able to understand and evaluate original research papers in natural language processing that build on and go beyond the textbook material covered in class.

Topics/Slides

Exercises

EXs are to be done individually. They usually prepare you for HWs and the midterm.

Programming Assignments

HWs (to be done in groups of 3) are generally due every other Monday at midnight.
They involve Python implementations of various dynamic programming algorithms such as Viterbi, Forward-Backward, and CKY, as well as machine learning algorithms such as MLE and EM.

Background on Japanese

As you can see from the course materials, unlike most NLP courses, this class (following Kevin Knight's tradition) makes very heavy use of the Japanese language as a running example to demonstrate the linguistic diversity, to illustrate transliteration and translation, and to teach the Viterbi and EM algorithms. While we do not require students to have any prior knowledge of Japanese, it is helpful to be familiar with various linguistic aspects of the language, especially in phonology. Here is a great video on the linguistic background of Japanese.