Predictive Modelling

Course information

This course introduces predictive modelling using multiple linear regression. It presents the theory underpinning the normal linear model accommodating continuous and categorical predictors and non-linear effects. In addition it contrasts common methods for model assessment and selection.

Prerequisite Knowledge

Learners should have a basic understanding of mathematics including matrix algebra and calculus, for example, differentiation. Learners should also have basic experience with the R programming language (e.g., data management and plotting).

This course is typically taken in year 1 of the MSc in Data Analytics/Data Analytics for Government programme.

This course assumes that you have comparative knowledge and skills covered in the following courses, alternatively, you may wish to consider taking some of the courses listed before attempting this course.

Intended Learning Outcomes

By the end of this course learners will be able to:

  • formulate normal linear models in vector-matrix notation and apply general results to derive ordinary least squares estimators;
  • construct a design matrix incorporating categorical covariates or covariates with a nonlinear effect;
  • derive, evaluate and interpret point and interval estimates of model parameters;
  • derive, evaluate and interpret hypothesis tests and confidence and prediction intervals for the response at particular values of the explanatory variables;
  • assess the assumptions of a normal linear model;
  • implement these statistical methods using R;
  • fit multiple linear regression models;
  • make use of and critique different methods for assessing the performance of a predictive model and use these for model or variable selection;
  • implement fitting and model selection of multiple linear regression methods using R;
  • identify scenarios where data may be considered to be smooth functions and apply suitable non-linear techniques.

Syllabus

Week 1 (sample material)

  • Understand the scope of predictive modelling
  • Definition of a statistical model
  • Definition of a linear regression model

Week 2

  • Fitting a simple normal linear model from first principles
  • Least squares estimation in vector matrix notation

Week 3

  • Defining models with continuous and categorical variables in vector matrix notation
  • Interpreting regression coefficient estimates
  • Fit a normal linear regression model in R

Week 4

  • Define residuals of the normal linear model
  • State the assumptions of the normal linear model
  • Assess model assumptions using residual plots

Week 5

  • Calculate confidence and prediction intervals for a specified confidence level
  • Perform hypothesis testing to test the significance of coefficients in a normal linear model
  • Describe the analysis of variance table for a normal linear model

Mid-term week break

Week 6

  • Assessing the significance of an interaction
  • Perform variables selection with several explanatory variables

Week 7

  • Hierarchical Models
  • Forward Selection and Stepwise Regression
  • Criterion-based procedures for model selection

Week 8

  • Demonstration of model selection in R with several examples

Week 9

  • Describe non-parametric regression modelling
  • State, explain and compare several methods of smoothing
  • Construct simple splines

Week 10

  • Write down simple spline models in vector matrix notation
  • Assess fitting criteria for regression splines and penalised regression spline
  • Specifying an additive model and describing methods used to fit an additive model
  • Using existing R libraries to fit additive models

“Challenging and topics covered are extensive but relevant for developing skills in data analytics.”

Software

To take our courses please use an up-to-date version of a standard browser (such as Google Chrome, Firefox, Safari, Internet Explorer or Microsoft Edge) and a PDF reader (such as Acrobat Reader). Learning material will be distributed through Moodle. We encourage all learners to install R and RStudio and we provide detailed installation instructions, but learners can also use free cloud-based services (RStudio Cloud). Learners need to install Zoom for participating in video conferencing sessions. We recommend the use of a head set for video conferencing sessions.