STA414S/2104S: Statistical Methods for Data Mining and Machine Learning

January - April, 2010


Meets Tuesday 12-2, Thursday 12-1.

Course Information
This course will consider topics in statistics that have played a role in the development of techniques for data mining and machine learning. We will cover linear methods for regression and classification, nonparametric regression and classification methods, generalized additive models, aspects of model inference and model selection, model averaging and tree based methods.

Prerequisite: Either STA 302H (regression) or CSC 411H (machine learning). CSC108H was recently added: this is not urgent but you must be willing to use a statistical computing environment such as R or Matlab.

Office Hours: Tuesdays, 3-4; Thursdays, 2-3; or by appointment.

Textbook: Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. Springer-Verlag.

Book web page

Course evaluation: Two homework sets: 40%. Midterm exam: 20%. Final project: 40%.

Computing: I will refer to, and provide explanations for, the R computing environment. You are welcome to use some other package if you prefer. There are many online resources for R, including:


Material from lectures