CS 534, Machine Learning, Fall 2017

“Equations are just the boring part of mathematics. I attempt to see things in terms of geometry.”
-- Stephen Hawking (b. 1942)

Coordinates	T/Th 12-1:20pm, WNGR 149 [Registrar] [Canvas]
Instructor	Liang Huang
TAs	Dezhong Deng Yilin Yang
Office Hours (tentative)	LH: M 9:30-10am, 5-5:30pm, KEC 2069 TAs: Dezhong T/Th 10-11am, Yilin W/F 4-5pm, both at KEC Atrium. Additional office hours available before exams.
Recitations	Tue 10/3 @ 7:40pm (linear algebra and geometry) Thu 10/5 @ 7pm (probs/stats, python/numpy), both at KEC 1001.
Prerequisites	CS: algorithms and datastructures. fluent in at least one mainstream languages (Python, C/C++, Java). HWs will be done in Python+numpy only. Math: linear algebra, calculus, and basic probability theory. good sense of geometric intuitions.
Textbooks	Hal Daume III. A Course in Machine Learning (CIML). default reference. easy to understand. Tom Mitchell (1994). Machine Learning. a classical textbook. CS perspective. an easy read. outdated but still more helpful than most recent ones. Mohri et al (2012). Foundations of Machine Learning. theory perspective. covers more recent advances such as SVMs that weren't covered in Mitchell. Bishop (2007). Pattern Recognition and Machine Learning (PRML). Actually I do not recommend it, definitely not for beginners. But the figures are pretty and I use them in my slides.
Grading	Midterm: 25%. NO FINAL EXAM. Project (groups of up to 3): 25%. No late submission is allowed. (5% proposal, 5% presentation, 15% report). HWs (programming, groups of up to 3): 10% x 3 = 30%. EXs (theoretical, individual): 4% x 2 = 8%. Class Participation: 4%. Quiz (tentatively before thanksgiving): 8%. Late Penalty: Each student can be late by 24 hours only once without penalty. No more late submissions will be accepted. If a group submission is late, it is considered late for all teammates. E.g., if a team of A and B submits late and it's the first late submission from A and the second from B, then A will receive credit for this submission but B will not.

Machine Learning evolves around the following central question: How can we make computers to act without being explicitly programmed and to improve with experience? In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, accurate spam filters, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it.
This course will survey the most important algorithms and techniques in the field of machine learning. The treatment of Math will be rigorous, but unlike most other machine learning courses which feature tons of equations, my version will focus on the geometric intuitions and the algorithmic perspective. I will try my best to visualize every concept.

The overall structure is quite similar to previous offerings by Prof. Xiaoli Fern (Spring 2012, Spring 2013, Spring 2014, Fall 2014, Fall 2015, Fall 2016). Some new aspects in this offering include:

It covers fewer topics, and goes deeper on some (e.g., structured prediction).
HWs and the project must be done in Python.
It will have in-class demos (ipynb).

You can study the exams from previous offerings, but do not copy HW solutions (since we have different HWs).

See also my previous offering of this course at CUNY.

Some preparatory materials:

Linear Algebra:
- a geometric review (recommended, but too basic -- to be supplemented by the following)
- geometric views of eigen vectors and covariance matrix
- a more advanced, non-geometric review
- a nice textbook with videos
Probability Theory:
Python+numpy Tutorial:

Tentative Schedule:

week	topic		HW/EX due
0	intro
1	perceptron	
2	perc, mira   	ex1 (perceptron theory)
3	SVM, KKT	hw1 (perceptron, logistic)
4	SVM dual        ex2 (SVM/KKT theory)
5	kernels, k-NN	hw2 (SVM, pegasos, kernel)
6	midterm		
7	struct predict	project_proposal
8	kmeans, EM	HW3 (struct perc)
9	PCA/quiz 
10	proj_present	
11	                project_report

Slides, Assignments, and Extra Readings:

intro
perceptron (Chaps. 4-5). Ex1 HW1 data. HW1.
Extra reading: Aggressive MIRA (AMIRA).
SVM (Chap. 7.7). Ex2 demo. HW2
kernels and k-NN (Chaps. 11, 3)
AlphaGo Zero and reinforcement/imitation learning (Chap. 18)
Project Guidelines Topics from previous years
structured prediction (Chap. 17)
HW3 (struct prediction) data
unsupervised learning (Chaps. 15-16)

Classical Papers:

Rosenblatt 1958 (perceptron)
Novikoff 1962 and a longer 1963 version (perceptron convergence proof)
Vapnik and Chervonenkis 1964. On a class of perceptrons. (translated from Russian) (origin of max-margin and SVM)
Aizerman et al 1964. Theoretical foundations of the potential function method in pattern recognition learning. (translated from Russian, in the same journal) (origin of kernels and kernelized perceptron)
... A LONG GAP ... THEN THE FALL OF USSR ...
Boser, Guyon and Vapnik 1992 (training optimal margin classifier)
Cortes and Vapnik 1995 (SVN)
Freund and Schapire 1999 (large-margin voting/averaged perceptron)
Joachims 2006 (linear SVM in linear time)
Pegosos 2011 (Primal SVM, 2007)
The structured prediction trio 2001--2003:
Lafferty et al 2001. CRF. best paper award, ICML.
Collins 2002. structured perceptron. best paper award, EMNLP.
Taskar et al 2003. max-margin markov network (structured max-margin). best paper award, NIPS.

Tech Giants Are Paying Huge Salaries for Scarce A.I. Talent (New York Times)

Liang Huang