Introduction to Data Science and Systems (M) COMPSCI5089
- Academic Session: 2019-20
- School: School of Computing Science
- Credits: 15
- Level: Level 5 (SCQF level 11)
- Typically Offered: Semester 1
- Available to Visiting Students: No
- Available to Erasmus Students: No
To give students a grounding in foundational elements of data science theory and systems, including Data transformations, Database Systems, and practical data processing pipelines.
Weeks 7-11 of Semester 1 (five weeks)
Four hours of lectures and one two-hour tutorial per week.
Requirements of Entry
Data Fundamentals (H)
Programming and Systems Development (H)
Exam in December, worth 50%
Practical assessed exercises, worth 50%
Main Assessment In: December
Are reassessment opportunities available for all summative assessments? No
The coursework cannot be redone because the feedback provided to the students after the original coursework would give any students redoing the coursework an unfair advantage.
This course will give students a grounding in foundational elements of data science theory and systems, including:
1. Data transformation fundamentals: working with array data, implementation of linear algebra, visualisation, probabilistic concepts.
2. Database Systems Fundamentals: To introduce students to physical DB design (storage, indexing), fundamental query processing algorithms based on file organization, basic indexing methods, practical query optimization, and transactional semantics.
3. Practical data processing pipelines: Give students experience of data cleaning and integration with modern tools (e.g. pandas, numpy, scikit-learn). This includes handling incomplete and noisy data from diverse sources. Representing diverse types of data (including text), creating vector representations of data, measuring item similarity, clustering and linking data objects, and then visualising the results using modern algorithms.
Intended Learning Outcomes of Course
By the end of this course students will be able to:
1. formulate problems in tensor form and fluently manipulate tensors;
2. efficiently run vectorized code;
3. apply matrix decomposition to practical problems;
4. be able to vectorise data, and measure distances between items (including text data)
5. formulate and understand problems with stochastic elements;
6. discuss the impact of storage and indexing decisions on database performance
7. explain the impact of optimisation choices on query processing performance
8. create effective, clear, and precise visualisations of scientific data;
9. demonstrate proficiency in the application of common tools in a pipeline to process, filter, integrate, analyse, summarise and visualise data.
Minimum Requirement for Award of Credits
Students must submit at least 75% by weight of the components (including examinations) of the course's summative assessment.