# Data Science MSc

# Regression Models (Level M) STATS5025

**Academic Session:**2019-20**School:**School of Mathematics and Statistics**Credits:**10**Level:**Level 5 (SCQF level 11)**Typically Offered:**Semester 1**Available to Visiting Students:**Yes**Available to Erasmus Students:**Yes

#### Short Description

This course introduces the Normal Linear Model in vector-matrix notation. It shows how to estimate and test parameters of the model and to make predictions, then it describes tools that are commonly used for the construction, evaluation and verification of Normal Linear Models.

#### Timetable

20 lectures (4 each week in Weeks 7-11 of Semester 1)

5 tutorials (1 each week in Weeks 7-11)

2, 2-hour practical sessions

#### Requirements of Entry

Some optional courses may be constrained by space and entry to these is not guaranteed unless you are in a programme for which this is a compulsory course.

#### Excluded Courses

STATS4015 Linear Models

STATS3016 Statistics 3L: Linear Models

#### Co-requisites

STATS5024 Probability (Level M)

STATS5028 Statistical Inference (Level M)

#### Assessment

90-minute, end-of-course examination (85%), coursework (15%)

**Main Assessment In:** December

#### Course Aims

The aims of this course are:

■ to introduce students to estimation, prediction and testing in statistical models, in particular the Normal Linear Model in vector-matrix notation;

■ to discuss some important special cases of the Normal Linear Model, such as linear, multiple and polynomial regression, one- and two-way analysis of variance, analysis of covariance;

■ to introduce residuals as a mechanism for detecting breakdowns in the standard assumptions for the Normal Linear Model;

■ to describe and contrast several common methods for variable selection in the Normal Linear Model;

■ to show students how to implement these statistical methods using the R computer package.

#### Intended Learning Outcomes of Course

By the end of this course students will be able to:

■ formulate Normal Linear Models in vector-matrix notation and apply general results to derive ordinary least squares estimators in particular contexts;

■ derive, evaluate and interpret point and interval estimates of model parameters and differences between parameters, including multiple comparisons;

■ conduct and interpret hypothesis tests in the context of the Normal Linear Model;

■ derive, evaluate and interpret confidence and prediction intervals for the response at particular values of the explanatory variables;

■ assess the assumptions of a Normal Linear Model using residual plots;

■ estimate and make inferences about the population correlation coefficient;

■ calculate and comment on R2

■ define and briefly explain the relative advantages and disadvantages of common variable selection procedures - stepwise selection, best subsets and lattices - and implement these rules for model building in particular cases;

■ define, calculate and use Akaike's Information Criterion (AIC) for model selection;

■ describe the problems created by multicollinearity and heteroscedasticity, formulate strategies for the detection and elimination of these problems and implement these strategies in particular cases;

■ describe the circumstances under which variable transformation might be required, describe the Box-Cox and Box-Tidwell procedures for transforming variables, formulate a strategy for implementing the Box-Cox scheme in specific examples;

■ state the Gauss-Markov Theorem and the implications and restrictions of this result;

■ implement these statistical methods using the R computer package;

■ frame statistical conclusions clearly.

#### Minimum Requirement for Award of Credits

Students must submit at least 75% by weight of the components (including examinations) of the course's summative assessment.