
CS 434: Machine
Learning and Data Mining
Fall 2007
MWF 12:00 - 12:50 OWEN
103
Email:
|
xfern@eecs.oregonstate.edu
|
Office:
|
kelly 3073
|
Office hour:
|
WF 1-2pm, or by appointment
|
Class
email list:
|
cs434-f07@engr.oregonstate.edu
|
Machine learning and Data mining is a subfield of
artificial
intelligence that develops computer programs that can learn from past
experience and find useful patterns in data.
This field has provided many tools that are widely used and
making
significant impacts in both industrial and research settings. Some of
the
application domains include personalized spam filters, HIV vaccine
design,
handwritten digit recognition, face recognition, credit card fraud
detection,
unmanned vehicle control, medical diagnosis, intelligent web search,
etc.
This
course will provide a basic introduction to this dynamic and fast
advancing
field. Topics include the three basic branches in this field: (1)
Supervised
learning for prediction problems (learn to predict); (2) Unsupervised
learning
for clustering data and discovering interesting patterns from data
(learn to
understand); and (3) Reinforcement learning for learning to select
actions
based on positive and negative feedback (learn to act). It will have a
special focus on the practical side --- students will not only learn
various
machine learning and data mining techniques, but also learn how to
apply them
to real problems in practice.
Course
Policy
Midterm contest
Below are the links to the training set and the test set for the contest problem. Here the task is to predict customers' reactions to a certain coupon offer based on their history with 20 other coupons. There are three possible outcomes: the customer prints no coupon (N), prints coupon A (A), or prints coupon B (B).
Note that the test file does not have the class labels, making it one attribute fewer than the training set. Weka will not directly take this test set for evaluation because it contains different number of attributes. One way to deal with this is to add a fake class label to each test example and just make weka record the predicted labels.
The deadline for submitting your results is Saturday Nov 3rd 11:59 PM. Please email me the following items: (1) the predicted labels for all test examples in a text file, one line per example. (2). A short discription of your method (no more than 1 page).
Note that both training and test sets contains a large number of examples. Learning and testing will take some time. Tuning algorithms and doing model selection may take longer than you expect. I would recommend you to at start early and get a reasonable baseline performance first.
Finally, anyone who gets a reasonable accuracy will get half of the total points. The rest of the points will be granted based on ranking of the results.
Course materials
- Text: Introduction to machine
learning, by Ethem Alpaydin, MIT Press
- Optional reference: Data
Mining: Practical machine learning tools and techniques (2nd edition), Ian
H. Witten and Eibe Frank, Morgan Kaufmann, placed on reserve in
Valley library.
Lecture notes and reading materials
will be posted on the webpage, please check regularly.
Learning objectives
Upon completing the course, students are expected to be able to:
1) Students are able to apply
supervised learning algorithms to prediction problems and evaluate the
results.
2) Students are able to apply
unsupervised learning algorithms to data analysis problems and evaluate
results.
3) Students are able to apply
reinforcement learning algorithms to control problem and evaluate
results.
4) Students are able to take a
description of a new problem and decide what kind of problem
(supervised, unsupervised, or reinforcement) it is.
Lecture Schedule
| Date |
Topics
|
Reading |
Assignments
|
9/24 M
|
Introduction
reinforcement learning demo
|
|
Assignment 1; Solution
|
9/26 W
|
LTU, perceptron
|
|
|
9/28 F
|
KNN
|
|
|
10/1 M
|
Off-the-shelf classifier, DT
|
|
Assignment 2
|
10/3 W
|
Decision tree cont.
|
|
|
10/5 F
|
Review of probability
|
|
|
10/8 M
|
Naive Bayes Classifier
|
|
Assignment 3; solution
|
10/10 W
|
Naive Bayes cont. Logistic Regression
|
|
|
10/12 F
|
LR cont, SVM
|
|
|
10/15 M
|
SVM
|
|
|
10/17 W
|
SVM cont. Ensemble learning
|
|
Assignment 4 ; Assignment 4 written part solution; Final project
|
10/19 F
|
Ensemble learning cont.
|
|
|
10/22 M
|
Feature selection
|
|
|
10/24 W
|
Supervised learnng summary
|
|
|
10/26 F
|
Unsupervised learning overview, clustering
|
|
|
10/29 M
|
Midterm
|
|
|
10/31 W
|
Unsupervised learning overview, HAC clustering
|
|
|
11/2 F
|
Kmeans
|
|
|
11/5 M
|
Mixture of Gaussian models
|
|
|
11/7 W
|
Unsupervised dimension reduction
|
|
|
11/9 F
|
PCA , association rule mining
|
|
|
11/12 M
|
Association rule Cont. Slides see above
|
|
Assignment 5; cluster.csv ; random.csv, Solution
|
11/14 W
|
Example application, mining strategies from hci data , Reinforcement learning, MDP
|
|
|
11/16 W
|
MDP cont.
|
|
|
11/19 M
|
MDP III
|
|
|
11/21 W
|
Reinforcement learning, Passive learning
|
|
HW6; Solutions
|
11/23 F
|
Thanks giving holiday
|
|
|
11/26 M
|
Active Reinforcement learning, Q-learning
|
|
|
11/28 W
|
Function Approximation
|
|
|
11/30 F
|
Final review
|
|
|