Introduction to Data Science and Systems (M) COMPSCI5089
- Academic Session: 2020-21
- School: School of Computing Science
- Credits: 15
- Level: Level 5 (SCQF level 11)
- Typically Offered: Semester 2
- Available to Visiting Students: No
- Available to Erasmus Students: No
To give students a grounding in foundational elements of data science theory and systems, including Data transformations, Database Systems, and practical data processing pipelines.
Weeks 1-11 of Semester 2
Two hours of lectures and one one-hour tutorial per week.
Requirements of Entry
Acceptance into one of the MSc programmes listed in section 10 below.
Programming and Systems Development (H)
Exam worth 50%
Practical assessed exercises, worth 50%
Main Assessment In: April/May
This course will give students a grounding in foundational elements of data science theory and systems, including:
1. Data transformation fundamentals: working with array data, implementation of linear algebra, visualisation, probabilistic concepts.
2. Database Systems Fundamentals: To introduce students to physical DB design (storage, indexing), fundamental query processing algorithms based on file organization, basic indexing methods, practical query optimization, and transactional semantics.
3. Practical data processing pipelines: Give students experience of data cleaning and integration with modern tools (e.g. pandas, numpy, scikit-learn). This includes handling incomplete and noisy data from diverse sources. Representing diverse types of data (including text), creating vector representations of data, measuring item similarity, clustering and linking data objects, and then visualising the results using modern algorithms.
Intended Learning Outcomes of Course
By the end of this course students will be able to:
1. formulate problems in tensor form and fluently manipulate tensors;
2. efficiently run vectorized code;
3. apply matrix decomposition to practical problems;
4. be able to vectorise data, and measure distances between items (including text data)
5. formulate and understand problems with stochastic elements;
6. discuss the impact of storage and indexing decisions on database performance
7. explain the impact of optimisation choices on query processing performance
8. create effective, clear, and precise visualisations of scientific data;
9. demonstrate proficiency in the application of common tools in a pipeline to process, filter, integrate, analyse, summarise and visualise data.
Minimum Requirement for Award of Credits
Students must submit at least 75% by weight of the components (including examinations) of the course's summative assessment.