CS 519, Applied Machine Learning (e-campus), Spring 2019

“Equations are just the boring part of mathematics. I attempt to see things in terms of geometry.”
-- Stephen Hawking (1942--2019)

Coordinates [Syllabus] [Canvas] [Registrar]
Instructor Liang Huang (liang.huang@...)
TA & Office Hours Liang Zhang (zhanglia@...). Office Hours: M/W 7-8pm and F 5-6pm webex link
Prerequisites
  • CS: algorithms and datastructures. fluent in at least one mainstream languages (Python, C/C++, Java).
    HWs will be done in Python+numpy only.
  • Math: linear algebra, calculus, and basic probability theory. geometric intuitions.

Textbooks
  • Hal Daume III. A Course in Machine Learning (CIML). default reference. easy to understand.
  • Bishop (2007). Pattern Recognition and Machine Learning (PRML). Actually I do not recommend it for beginners. But the figures are pretty and I use them in my slides.
Grading
  • Background survey (on Canvas): each student gets 2% by submitting on time.
  • Quizzes (on Canvas): 10% + 5% = 15%. everybody has two attempts.
  • EX (theory, concepts): 8%. graded by completeness, not correctness.
  • HWs (programming): 15% x 5 = 75%. graded by correctness.
    In Python+numpy only, on a Unix-like environment (Linux or Mac OS X). Windows is not supported. IDEs are not necessary nor recommended.
  • Late Penalty: Each student can be late by 24 hours only once without penalty. No more late submissions will be accepted.
  • Curve: A/A-: ~45%; B+/B/B-: ~50%; C+ and below: ~5%.
Organization This course is organized into 5 Units, each with 2 weeks and 1 programming HW
which is usually out on Mondays and due on Saturdays the week after. See syllabus for more details.
Communication
  • We post textbook, slides, videos, homework, data, readings here on this homepage.
  • Canvas is only used for announcements, discussions, homework submission, quizzes, and grades.
  • Please post all course-related questions on Canvas so that the whole class may benefit from our conversation.
  • Please post questions on Canvas under the corresponding Units (e.g. "Unit 1 Q/A").
  • Please contact us privately only for matters of a personal nature.
  • As a strictly enforced course policy, we will not reply to any technical questions via email.

Machine Learning evolves around a central question: How can we make computers to learn from experience and without being explicitly programmed? In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, accurate spam filters, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that everybody uses it dozens of times a day without knowing it.

This course will survey the most important algorithms and techniques in the field of machine learning. The treatment of Math will be rigorous, but unlike most other machine learning courses which feature tons of equations, my version will focus on the geometric intuitions and the algorithmic perspective. I will try my best to visualize every concept.

Even though machine learning appears to be "mathy" on the surface, it is not abstract in any sense, unlike mainstream CS (algorithms, theory, programming languages, etc.). In fact, machine learning is so applied and empirical that it is more like alchemy. So we will also discuss practical issues and implementation details.
Some preparatory materials:
Weekly Materials (see syllabus for an overview)
Week Topics (CIML References) Slides/Handouts Videos homework/exercises/quizzes
Unit 1: ML intro, k-NN, and math/numpy review
1 1-2. Introduction (0, 2.7)
3. Training, Test, and Generalization Errors (2.5, 2.6)
    Underfitting and overfitting (2.4)
    Leave-one-out cross-validation (5.6).
4. k-nearest neighbor classifier (k-NN) (3)

5. viewing HW1 data on terminal
6-7. data pre-processing: binarization
slides (1-5)

notebook (6-7)
background survey (required)
Quiz 1 (ML basics)
HW1 out (k-NN) [data]
2 Geometric Review of Linear Algebra
Numpy Tutorial (also matplotlib):
1. ipython notebook; ndarray; %pylab; +/-/*, dot, concat
2. linear regression; np.polyfit; np.random.rand(); broadcasting
3. visualizing vectors operations; dot product, projection
handout slides

notebook
Quiz 2 (numpy/linear algebra)
HW1 due
Unit 2: Linear Classification and Perceptron Algorithm
3 Linear Classification and Perceptron
1. Historical Overview; Bio-inspired Learning (4.1)
2. Linear Classifier (4.3, 4.4); Augmented space (not in CIML)
3. Perceptron Algorithm (4.2)
4. Convergence Proof (4.5)
5. Limitations and Non-Linear Feature Map (4.7, 5.4)

6. ipynb demo
slides

notebook (from [181])
EX out and due [tex]
HW2 out [tex]
(same data as HW1)
4 Perceptron Extensions; Perceptron in Practice
1. Python demo
2. Perceptron Extensions: voted and averaged (4.6)
3. MIRA and aggressive MIRA (not in CIML)
4. Practical Issues (5.1, 5.2. 5.3, 5.4)
5. Perceptron vs. Logistic Regression (9.6)
slides

demo
HW2 due
Unit 3: Linear and Polynonmial Regression
5-6 linear and polynomial regression (not in CIML) HW3 (housing price prediction)
kaggle data tex
Unit 4: Applications: Text Classification
7-8 Application: Text Classification
Sentiment Analysis (thumbs up?)
HW4 [tex]
data and code
Unit 5: Exposure to cutting-edge ML research
9-10 Paper Review: cutting-edge ML topics and papers HW5

Classical Papers:
Tech Giants Are Paying Huge Salaries for Scarce A.I. Talent (New York Times)
Liang Huang