Introduction to Data Science with Python for Engineers and Researchers GEOG5129E

  • Academic Session: 2023-24
  • School: School of Geographical and Earth Sciences
  • Credits: 10
  • Level: Level 5 (SCQF level 11)
  • Typically Offered: Semester 2
  • Available to Visiting Students: No
  • Taught Wholly by Distance Learning: Yes

Short Description

The rapid proliferation of computerised automation has brought 'Big Data' into many aspects of the workplace. From consumer applications of predictive text, and self-driving cars to tools for improved sales forecasts. Machine learning and artificial intelligence are transforming how data is used in the work environment and how products behave. There is a growing demand for employees who can build exploratory data tools to visualise and model problems to arrive at data driven solutions.

 

This online comprehensive short course will present fundamental principles of machine learning to a technically minded audience. We will introduce Python as a tool to begin exploring the fundamental concepts of data science. This course will cover data types, data wrangling and ethics around data. We will build up applications of unsupervised and supervised machine learning from a cursory overview of multivariate statistics. A final project will be developed by the student to address a problem of their choice. By the end of the course the students will have a broad overview of data science providing a solid foundation in data science project to solve industrially relevant problems. The skills here will also prepare the student of more in-depth course work on more advanced applications.

 

1. An introduction to Python fundamentals will explore why this offers a sustainable tool for data analysis beyond Excel. We will introduce common data structures and variables, as well as some of the common libraries used in data science applications including SKLearn, PANDAS, SciPy, matplotlib, and numpy.

2. We will work with various types of data encountered in industrial contexts including: tabular / csv data, as well as image data [photos, remote sensing, micrographs].

3. Students will develop tools for data wrangling, and become exposed to the ethics around handling data.

4. We will introduce the Statistical foundations machine learning tools using / multivariate analysis (ex. hypothesis testing, ANOVA and t-tests).

5. We will use unsupervised machine learning tools for dimensional reduction and classification tasks with unlabelled data.

6. We will leverage supervised machine learning algorithms to make predictions and classifications using labelled data.

Timetable

10 weeks, 4 hours per week broken down into 1 hour of lectures, 2 hour of practicals, and an 1 hour tutorial / live Question and Answer session.

Requirements of Entry

This is a master's level course, as such we expect learners to normally have a bachelor's degree in an engineering or science topic. Alternatively, students could have at least 3 years of work experience in a technical field working with data.

Excluded Courses

None

Co-requisites

None

Assessment

■ Assessment will consist of a bring your own data project (report, 1000-1200 words - 70%), which will be designed by the student to reflect their ability to use the tools presented to solve a relevant problem.

■ The remaining assessment will be a short video presentation (30%, equivalent to 500 words).

 

If students do not have their own data, open data sets (such as the SMID, EarthChem or other open data sources can be made available)

Course Aims

This course will introduce students to the fundamental of data science. Students will be exposed to core concepts as supervised and unsupervised machine learning, data wrangling and data exploration. These will be built up from initial topics in multivariate statistics. We will leverage the open source programming language Python and Jupyter notebooks as flexible platform for learning these concepts while developing basic coding skills.

Intended Learning Outcomes of Course

By the end of this course students will be able to:

■ Demonstrate how to use the data structures, functions, and visualisation tools in Python (and several dependant libraries) to explore and analyse multivariate data.

■ Produce summary statistics for exploratory data analysis and multivariate statistical tests.

■ Employ supervised machine learning algorithms to perform classification and prediction tasks on data sets.

■ Apply unsupervised machine learning to perform dimensional reduction, data clustering, and categorisation on unlabelled high-dimensional data.

Minimum Requirement for Award of Credits

Students must submit at least 75% by weight of the components (including examinations) of the course's summative assessment.