CS 434: Machine Learning and Data Mining

 Fall  2007

MWF 12:00 - 12:50 OWEN 103




Instructor: Xiaoli Fern

Email:
xfern@eecs.oregonstate.edu
Office:
kelly 3073
Office hour:
WF 1-2pm, or by appointment
Class email list:
cs434-f07@engr.oregonstate.edu


Machine learning and Data mining is a subfield of artificial intelligence that develops computer programs that can learn from past experience and find useful patterns in data.  This field has provided many tools that are widely used and making significant impacts in both industrial and research settings. Some of the application domains include personalized spam filters, HIV vaccine design, handwritten digit recognition, face recognition, credit card fraud detection, unmanned vehicle control, medical diagnosis, intelligent web search, etc.

This course will provide a basic introduction to this dynamic and fast advancing field. Topics include the three basic branches in this field: (1) Supervised learning for prediction problems (learn to predict); (2) Unsupervised learning for clustering data and discovering interesting patterns from data (learn to understand); and (3) Reinforcement learning for learning to select actions based on positive and negative feedback (learn to act). It will have a special focus on the practical side --- students will not only learn various machine learning and data mining techniques, but also learn how to apply them to real problems in practice.

Syllabus

Course Policy


Midterm contest

Below are the links to the training set and the test set for the contest problem. Here the task is to predict customers' reactions to a certain coupon offer based on their history with 20 other coupons. There are three possible outcomes: the customer prints no coupon (N), prints coupon A (A), or prints coupon B (B).
Note that the test file does not have the class labels, making it one attribute fewer than the training set. Weka will not directly take this test set for evaluation because it contains different number of attributes. One way to deal with this is to add a fake class label to each test example and just make weka record the predicted labels.
The deadline for submitting your results is Saturday Nov 3rd 11:59 PM. Please email me the following items: (1) the predicted labels for all test examples in a text file, one line per example. (2). A short discription of your method (no more than 1 page).
Note that both training and test sets contains a large number of examples. Learning and testing will take some time. Tuning algorithms and doing model selection may take longer than you expect. I would recommend you to at start early and get a reasonable baseline performance first.
Finally, anyone who gets a reasonable accuracy will get half of the total points. The rest of the points will be granted based on ranking of the results.

training set ; test set


Midterm solution


Course materials

       Lecture notes and reading materials will be posted on the webpage, please check regularly.

Learning objectives

Upon completing the course, students are expected to be able to:
1) Students are able to apply supervised learning algorithms to prediction problems and evaluate the results.
2) Students are able to apply unsupervised learning algorithms to data analysis problems and evaluate results.
3) Students are able to apply reinforcement learning algorithms to control problem and evaluate results.
4) Students are able to take a description of a new problem and decide what kind of problem (supervised, unsupervised, or reinforcement) it is.


Lecture Schedule

Date Topics
Reading Assignments
9/24 M
Introduction
reinforcement learning demo

Assignment 1; Solution
9/26 W
LTU, perceptron


9/28 F
KNN


10/1 M
Off-the-shelf classifier, DT

Assignment 2
10/3 W
Decision tree cont.


10/5 F
Review of probability


10/8 M
Naive Bayes Classifier

Assignment 3; solution
10/10 W
Naive Bayes cont. Logistic Regression


10/12 F
LR cont, SVM


10/15 M
SVM


10/17 W
SVM cont. Ensemble learning

Assignment 4 ; Assignment 4 written part solution; Final project
10/19 F
Ensemble learning cont.


10/22 M
Feature selection


10/24 W
Supervised learnng summary


10/26 F
Unsupervised learning overview, clustering


10/29 M
Midterm


10/31 W
Unsupervised learning overview, HAC clustering


11/2 F
Kmeans


11/5 M
Mixture of Gaussian models


11/7 W
Unsupervised dimension reduction


11/9 F
PCA , association rule mining


11/12 M
Association rule Cont. Slides see above

Assignment 5; cluster.csv ; random.csv, Solution
11/14 W
Example application, mining strategies from hci data , Reinforcement learning, MDP


11/16 W
MDP cont.


11/19 M
MDP III


11/21 W
Reinforcement learning, Passive learning

HW6; Solutions
11/23 F
Thanks giving holiday


11/26 M
Active Reinforcement learning, Q-learning


11/28 W
Function Approximation


11/30 F
Final review