Postgraduate taught 

Data Science MSc

Text as Data - An Introduction to Document Analytics (M) COMPSCI5096

  • Academic Session: 2021-22
  • School: School of Computing Science
  • Credits: 10
  • Level: Level 5 (SCQF level 11)
  • Typically Offered: Semester 2
  • Available to Visiting Students: No
  • Available to Erasmus Students: No

Short Description

This course will introduce stochastic notions necessary for analysing text/documents, in both unsupervised and supervised approaches. The course is designed to be applied in nature, using various text analytics examples, and introducing students to various toolkits for document processing (e.g. scikit-learn, spaCy).

Timetable

TBC

Excluded Courses

None

Co-requisites

None

Assessment

Class Test 70%

Practical assessed lab exercises 30%.

(We note that there is no option for a Class Test in the list above)

Course Aims

This course aims to introduce students to language modelling, representations of documents, natural language processing, information theory, network models and in general the use of stochastic thinking (including building on the probability theory learned during the IDSS/Data Fundamentals course). This is illustrated through a number of supervised and unsupervised text processing and analytics approaches as well as their applications such as information extraction, question answering, summarization, and dialogue systems. The course is designed to be applied in nature, using text analytics as an example, and introducing students to various toolkits for document processing.

Intended Learning Outcomes of Course

By the end of this course students will be able to:

1. Describe classical models for textual representations such as the one-hot encoding, bag-of-words models, and sequences with language modelling.

2. Identify potential applications of text analytics in practice.

3. Describe various common techniques for classification, clustering and topic modelling, and select the appropriate machine learning task for a potential document processing application.

4. Represent data as features to serve as input to machine learning models.

5. Assess machine learning model quality in terms of relevant error metrics for document processing tasks, in an appropriate experimental design.

6. Deploy unsupervised and machine learned approaches for document/text analytics tasks.

7. Critically analyze and critique recent developments in natural language and text processing academic literature.

8. Evaluate and explain the appropriate application of recent research developments to real-world problems.

Minimum Requirement for Award of Credits

Students must submit at least 75% by weight of the components (including class tests) of the course's summative assessment.