Please note: there may be some adjustments to the teaching arrangements published in the course catalogue for 2020-21. Given current circumstances related to the Covid-19 pandemic it is anticipated that some usual arrangements for teaching on campus will be modified to ensure the safety and wellbeing of students and staff on campus; further adjustments may also be necessary, or beneficial, during the course of the academic year as national requirements relating to management of the pandemic are revised.

Introduction to Data Science and Systems (M) COMPSCI5089

  • Academic Session: 2020-21
  • School: School of Computing Science
  • Credits: 15
  • Level: Level 5 (SCQF level 11)
  • Typically Offered: Semester 2
  • Available to Visiting Students: No
  • Available to Erasmus Students: No

Short Description

To give students a grounding in foundational elements of data science theory and systems, including Data transformations, Database Systems, and practical data processing pipelines.

Timetable

Weeks 1-11 of Semester 2 

Two hours of lectures and one one-hour tutorial per week.

Requirements of Entry

Acceptance into one of the MSc programmes listed in section 10 below.

Excluded Courses

None

Co-requisites

Programming and Systems Development (H)

Assessment

Exam worth 50%

Practical assessed exercises, worth 50%

Main Assessment In: April/May

Course Aims

This course will give students a grounding in foundational elements of data science theory and systems, including:

1. Data transformation fundamentals: working with array data, implementation of linear algebra, visualisation, probabilistic concepts.

2. Database Systems Fundamentals: To introduce students to physical DB design (storage, indexing), fundamental query processing algorithms based on file organization, basic indexing methods, practical query optimization, and transactional semantics.

3. Practical data processing pipelines: Give students experience of data cleaning and integration with modern tools (e.g. pandas, numpy, scikit-learn). This includes handling incomplete and noisy data from diverse sources. Representing diverse types of data (including text), creating vector representations of data, measuring item similarity, clustering and linking data objects, and then visualising the results using modern algorithms.

Intended Learning Outcomes of Course

By the end of this course students will be able to: 

1. formulate problems in tensor form and fluently manipulate tensors;

2. efficiently run vectorized code; 

3. apply matrix decomposition to practical problems;

4. be able to vectorise data, and measure distances between items (including text data)

5. formulate and understand problems with stochastic elements;

6. discuss the impact of storage and indexing decisions on database performance

7. explain the impact of optimisation choices on query processing performance

8. create effective, clear, and precise visualisations of scientific data;

9. demonstrate proficiency in the application of common tools in a pipeline to process, filter, integrate, analyse, summarise and visualise data.

Minimum Requirement for Award of Credits

Students must submit at least 75% by weight of the components (including examinations) of the course's summative assessment.