Please note: there may be some adjustments to the teaching arrangements published in the course catalogue for 2020-21. Given current circumstances related to the Covid-19 pandemic it is anticipated that some usual arrangements for teaching on campus will be modified to ensure the safety and wellbeing of students and staff on campus; further adjustments may also be necessary, or beneficial, during the course of the academic year as national requirements relating to management of the pandemic are revised.

Data Mining and Machine Learning II: Big Data and Unstructured Data (ODL) STATS5081

  • Academic Session: 2021-22
  • School: School of Mathematics and Statistics
  • Credits: 10
  • Level: Level 5 (SCQF level 11)
  • Typically Offered: Summer
  • Available to Visiting Students: No
  • Available to Erasmus Students: No
  • Taught Wholly by Distance Learning: Yes

Short Description

This course introduces data mining and machine learning methods used in big data scenarios and also introduces methods for analysing networks and unstructured data.


The course mostly consists of asynchronous teaching material.

Requirements of Entry

The course is only available to online-distance learning students on the PGCert/PGDip/MSc in Data Analytics and Data Analytics for Government.

Excluded Courses

Big Data Analytics

Big Data Analytics (Level M)




100% Continuous Assessment

This will typically be made up of a project (40%), two oral assessments (40%) and one homework exercise / online quiz (20%). Full details are provided in the programme handbook.

Course Aims

The aims of this course are:

■ to introduce students to Gaussian processes;

■ to introduce the students to big data methods commonly applied in Machine Learning, notably regularised regression;

■ to illustrate the role of sparsity when analysing high-dimensional data;

■ to introduce students to graphical models and how they can be used for structural inference in high-dimensional data;

■ to introduce students to informal and formal methods for social network analysis and quantitative text analysis.

Intended Learning Outcomes of Course

By the end of this course students will be able to:

■ fit Gaussian process models;

■ describe the challenges of the analysis of high-dimensional data and discuss, in a particular context, strategies for tackling big data problems;

■ formulate and fit a regularised linear model, such as ridge regression, the LASSO and partial least-squares;

■ infer statements about (conditional) independence from graphical models and factorisations of the joint distribution;

■ describe methods for structural inference in graphical models and apply them in a given context;

■ make appropriate use of informal and formal methods for social network analysis and quantitative text analysis.

Minimum Requirement for Award of Credits

Students must submit at least 75% by weight of the components (including examinations) of the course's summative assessment.