Introduction to Data Science and Systems (M) COMPSCI5089

  • Academic Session: 2024-25
  • School: School of Computing Science
  • Credits: 15
  • Level: Level 5 (SCQF level 11)
  • Typically Offered: Semester 1
  • Available to Visiting Students: No
  • Collaborative Online International Learning: No

Short Description

To give students a grounding in foundational elements of data science theory and systems, including Data transformations, Database Systems, and practical data processing pipelines.

Timetable

TBC

Requirements of Entry

Acceptance into one of the MSc programmes listed in section 10 below.

Excluded Courses

None

Co-requisites

Programming and Systems Development (H)

Assessment

Exam worth 70%

Coursework worth 30%

Main Assessment In: December

Course Aims

This course will give students a grounding in foundational elements of data science theory and systems, including:

1. Data transformation fundamentals: working with array data, implementation of linear algebra, visualisation, probabilistic concepts.

2. Database Systems Fundamentals: To introduce students to physical DB design (storage, indexing), fundamental query processing algorithms based on file organization, basic indexing methods, practical query optimization, and transactional semantics.

3. Practical data processing pipelines: Give students experience of data cleaning and integration with modern tools (e.g. pandas, numpy, scikit-learn). This includes handling incomplete and noisy data from diverse sources. Representing diverse types of data (including text), creating vector representations of data, measuring item similarity, clustering and linking data objects, and then visualising the results using modern algorithms.

Intended Learning Outcomes of Course

By the end of this course students will be able to: 

1. formulate problems in tensor form and fluently manipulate tensors;

2. efficiently run vectorized code; 

3. apply matrix decomposition to practical problems;

4. be able to vectorise data, and measure distances between items (including text data)

5. formulate and understand problems with stochastic elements;

6. discuss the impact of storage and indexing decisions on database performance

7. explain the impact of optimisation choices on query processing performance

8. create effective, clear, and precise visualisations of scientific data;

9. demonstrate proficiency in the application of common tools in a pipeline to process, filter, integrate, analyse, summarise and visualise data.

Minimum Requirement for Award of Credits

Students must submit at least 75% by weight of the components (including examinations) of the course's summative assessment.