CS 519, Applied Machine Learning (e-campus), Spring 2018

“Equations are just the boring part of mathematics. I attempt to see things in terms of geometry.”
-- Stephen Hawking (1942--2018)

Coordinates

[Registrar] [Canvas] [syllabus]

Instructor

Liang Huang (liang.huang@...)

TAs & Office Hours

Kaibo Liu (lead)	liukaib@...	T/Th 5-6pm, link
Yilin Yang	yangyil@...	W 5-6pm, link
He Zhang	zhangh7@...	M 5-6pm, link
Dezhong Deng	dengde@...	F 5-6pm, link

Prerequisites

CS: algorithms and datastructures. fluent in at least one mainstream languages (Python, C/C++, Java).
HWs will be done in Python+numpy only.
Math: linear algebra, calculus, and basic probability theory. geometric intuitions.

Textbooks

Hal Daume III. A Course in Machine Learning (CIML). default reference. easy to understand.
Bishop (2007). Pattern Recognition and Machine Learning (PRML). Actually I do not recommend it for beginners. But the figures are pretty and I use them in my slides.

Grading

EXs (theory, concepts): 5% x 3 = 15%. Due on Saturdays. graded by completeness, not correctness.
Quizzes (on Canvas): 5% x 3 = 15%. Due on Saturdays. everybody has two attempts.
HWs (programming): 15% x 3 = 45%. Due on Tuesdays. graded by correctness, including accuracy of predictions.
In Python+numpy only, on a Unix-like environment (Linux or Mac OS X). Windows is not supported. IDEs are not necessary nor recommended.
NO FINAL EXAM.
Class Participation: 5%.
Late Penalty: Each student can be late by 24 hours only once without penalty. No more late submissions will be accepted.
Curve: A/A-: ~45%; B+/B/B-: ~50%; C+ and below: ~5%.

Email Policy

Please post all course-related questions on Canvas so that the whole class may benefit from our conversation. Please contact us privately only for matters of a personal nature (by default, please cc all TAs unless you want to complain about a TA). As a course policy we will not reply to any technical questions via email.

Machine Learning evolves around the following central question: How can we make computers to act without being explicitly programmed and to improve with experience? In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, accurate spam filters, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it.
This course will survey the most important algorithms and techniques in the field of machine learning. The treatment of Math will be rigorous, but unlike most other machine learning courses which feature tons of equations, my version will focus on the geometric intuitions and the algorithmic perspective. I will try my best to visualize every concept.
Even though machine learning appears to be "mathy" on the surface, it is not abstract at all, unlike mainstream CS (algorithms, theory, programming languages, etc.). In fact, machine learning is so applied and empirical that it is more like "alchemy". So we will also discuss practical issues and implementation details.

Some preparatory materials:

Linear Algebra:
- a geometric review (recommended, but too basic -- to be supplemented by the following). also slides.
- geometric views of eigen vectors and covariance matrix
- a more advanced, non-geometric review
- a nice textbook with videos
Probability Theory: probs/stats
Python+numpy Tutorial: python/numpy tutorial

Weekly Materials

Week	Topics and CIML References	Slides	Videos	homework	extra
1	Introduction (0, 2.7) Training, Test, and Generalization Errors (2.5, 2.6) Underfitting and overfitting (2.4) Leave-one-out cross-validation (5.6) k-nearest neighbor classifier (k-NN) (3)	pdf		Quiz 1 (on Canvas)
2	Linear Classification and Perceptron 1. Historical Overview; Bio-inspired Learning (4.1) 2. Linear Classifier (4.3, 4.4); Augmented space (not in CIML) 3. Perceptron Algorithm (4.2) 4. Convergence Proof (4.5) 5. Limitations and Non-Linear Feature Map (4.7, 5.4)	pdf		EX1
3	Perceptron Extensions; Perceptron in Practice 1. Python demo 2. Perceptron Extensions: voted and averaged (4.6) 3. MIRA and aggressive MIRA (not in CIML) 4. Practical Issues (5.1, 5.2. 5.3, 5.4) 5. Perceptron vs. Logistic Regression (9.6)	pdf		HW1 data	demo code
4	Review/Tutorial: linear algebra, numpy, matplotlib, geometry 1. ipython notebook; ndarray; %pylab; +/-/*, dot, concatenate 2. linear regression using np.polyfit; np.random.rand(); broadcasting 3. visualizing vectors and vector operations; dot product, projection 4. conditional slicing; np.random.randn(); perc testing and training demo. try: `jupyter notebook numpy_demo.ipynb` 5. binarization: pandas.get_dummies() 6. binarization: python try: `jupyter notebook binarize.ipynb` from `hw1-data` dir	numpy ipynb binarize ipynb		n/a	perc demo2
5-6	linear and polynomial regression (not in CIML)			HW2 due Wed May 16: kaggle data
7-8	SVM (7.7); Kernels (11)			HW3 simple hw1.py
9-10	Application: Text Classification Sentiment Analysis (thumbs up?)			HW4 data and code

Classical Papers:

Rosenblatt 1958 (perceptron)
Novikoff 1962 and a longer 1963 version (perceptron convergence proof)
Vapnik and Chervonenkis 1964. On a class of perceptrons. (translated from Russian) (origin of max-margin and SVM)
Aizerman et al 1964. Theoretical foundations of the potential function method in pattern recognition learning. (translated from Russian, in the same journal) (origin of kernels and kernelized perceptron)
... A LONG GAP ... THEN THE FALL OF USSR ...
Boser, Guyon and Vapnik 1992 (training optimal margin classifier)
Cortes and Vapnik 1995 (SVN)
Freund and Schapire 1999 (large-margin voting/averaged perceptron)
Joachims 2006 (linear SVM in linear time)
Pegosos 2011 (Primal SVM, 2007)
The structured prediction revolution 2001--2003:
- Lafferty et al 2001. CRF. best paper award, ICML.
- Collins 2002. structured perceptron. best paper award, EMNLP.
- Taskar et al 2003. max-margin markov network (structured SVM). best paper award, NIPS.

Tech Giants Are Paying Huge Salaries for Scarce A.I. Talent (New York Times)

Liang Huang