Project Proposal due Friday Nov 10. Project presentations Tue/Thu Nov 28/30. Each group has ~5 minutes. Final Report due Sun Dec 3 (no extensions). Create your project groups under People "Project Groups". Once the proposal is submitted, no group changes are allowed. ------------------ Proposal (2 pages) You must have the following sections: 1. Problem what problem are you trying to solve? why is it interesting? 2. Background what techniques have been tried on this problem? how did they perform? 3. Techniques what techniques (e.g. SVM) are you going to use? 4. Data you must have the training, dev, and test sets ready by proposal submission. 5. Evaluation how do you know if you are successful? evluation metrics include accuracy (you need to define it), speed, etc. Notes: 1. It is fine to use existing tools. 2. It is fine to reimplement an existing algorithm, given there is no existing implementation. 3. It is fine to improve an existing algorithm based on an existing codebase. 4. It is fine to propose a new problem. In that case, also review related problems in Section 3. 5. It is fine to use techniques not covered in this course, such as deep learning. 6. It is fine to have negative results. If you have some new ideas, explore them. Negative results will not be penalized. 7. Kaggle competitions and UCI Machine Learning Repository have lots of datasets. 8. The amount of work in this project should be ~2.5x HW1. Some Possible Topics (see also previous years' topics from the course website) * Computer Vision Tasks -- Object recognition http://pascallin.ecs.soton.ac.uk/challenges/VOC/ -- Face Recognition Tasks http://www.ryanmwhite.com/research/tr_hot.html http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html -- Scene recognition (e.g. house vs. no house) Object recognition in aerial/satellite images See http://elm.eeng.dcu.ie/~oconaire/cv_datasets.html for a list of computer vision benchmark data sets * Audio Analysis Tasks -- Speaker identification - recognize a particular speaker -- Speaker sentiment - happy vs. angry vs. sad .. -- Music genre -- Language recognition - Chinese vs. English -- Bird species recognition and discovery by songs (data available from instructor) * Text Classification and Clustering -- Spam filtering -- Newgroup document classifier -- Sentiment classification -- There are a few text classification datasets on Andrew McCallum's webpage * Natural language processing -- part-of-speech tagging or named-entity recognition -- chunking -- syntactic parsing -- co-reference resolution, as a classification or a clustering problem -- Event and argument extraction from text -- Entity linking from text to knowledge base * Bio-informatics -- gene clustering/expression profile analysis -- gene sequence analysis -- classifying an RNA sequence to protein coding or non-coding