CS 419/519

Information Filtering and Retrieval


Class Meeting Times/Location:
     TR 9:30-10:50 1/9/06-3/17/06

Exam Dates:
     Tuesday, Feb 14 th (9:30-10:50AM),
     Thursday, Mar 9 th (9:30-10:50AM)

Final Presentation Times:
     Thursday, March 16 th (9:30-10:50AM),
      (may change) Friday, March 24 th (7:30-9:20AM)


Instructor:
     Jon Herlocker
     Electrical Engineering and Computer Science
     1048 Kelley Engineering Center (Office: Kelley 2053)
     (541) 737-8894
     Fax: (541) 737-3014
     herlock@eecs.oregonstate.edu

Teaching Assistant: Mr. Dana Benson - bensond@eecs.oregonstate.edu

Office Hours:
     Dr. Herlocker: TThu 11-Noon (this time may change, check the web site) in Kelley 2053. To make an appointment outside of office hours, send an email request to the above listed email address.
     Teaching Assistant: MW 11-Noon KEC Computer Lab

Text: Modern Information Retrieval , Ricardo Baeza-Yates, Berthier Ribeiro-Neto. ACM Press: New York. 1999.

Syllabus: IR Syllabus winter 2006.doc


Slides

  • Lectures Slides
  • Guest Speaker Presentation Slides:

 


Course Objectives: At the end of this course, you should be able to...

  • Explain what information filtering and retrieval are and recognize core terminology specific to information retrieval and filtering.
  • Explain the basic capabilities of current text information retrieval technology.
  • Describe in detail the Boolean and vector-space retrieval algorithms for text.
  • Design an empirical evaluation for an information retrieval or filtering system.
  • Describe what collaborative filtering is and explain how it works, its strengths and its weaknesses.
  • Describe the purpose of popular text encoding standards and information retrieval protocols.
  • Find relevant articles using library journal indexes, library catalogs, bibliographic citation databases, and article indexes.
  • Describe one topic area of information retrieval in depth that you covered in your class project.

Overall Grading Distribution:

  • 1 course-long project – 50%
  • Two exams – 40%
  • Other, including contributions to the class in general – 10%

Tools

  • Collaborative Filtering Engines
  • Precision/Recall Graphs
  • Morphology/Stemming Tools
  • Search Engines
    • Lucene
      • http://lucene.apache.org/java/docs/
      • Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Links
  • Books
    • HCI
    • Managing Gigabytes
      • http://www.cs.mu.oz.au/mg/
      • "Managing Gigabytes" is a great book that will teach you all the gory details of how you actually build a high performance search engine. This link will give you information about the book and give you a link to their free open source software - the mg indexer and the seft search engine.
  • Alternative Search Engines