Closed-Loop Data Science
The project has a number of workpackages which give specifica application of closed-loop data science techniques to particular real-world problems:
WP1: Closed-loop interaction with probabilistic models
Evdoxia Taka, Sebastian Stein, John H. Williamson
Data science as practiced currently is an open-loop, statistician-controlled process: from a static dataset, models are learned and a subset of evaluation data is presented via fixed numerical and visual representations. There is limited scope for users to probe:
- whether a representation accurately captures domain knowledge;
- sensitivity to specific data points;
- uncertainty in predictions;
- or alternative explanations of the same observations.
Closing this loop becomes increasingly important as sophisticated models form the basis for human decision-making. We explore methods that support closed-loop interaction between user groups (statisticians, experimenters, domain experts and end users) that explicitly model and promote interaction with uncertainty. We foresee a vision of closed-loop data science predicated on interacting with probabilistic models at multiple timescales:
- Interactions at the visceral (subsecond) level for sensitivity analysis through interactive visualisations;
- interactions at the tactical level (seconds to minutes) for closed-loop interaction with priors and model structures;
- and interactions at the strategic level (hours to days) for postulating or acquiring new data or new hypotheses.
Evaluation of closed-loop Bayesian data science
We have developed an evaluation protocol  to quantitatively evaluate closed-loop interaction tools for Bayesian models. This establishes a well-defined query space for probabilistic models, objective functions for evaluating user performance according to comprehension and rationality. A web-based framework implementing this protocol is currently in preparation.
Closing the user-model-data loop with probabilistic programs
We propose the Interactive Probabilistic Model Explorer (IPME) to automatically transform probabilistic programs into interactive graphical representations. These give simultaneous insight into a model’s structure and its predictions, including closed-loop exploration of the implications of uncertainty in predictions and inferred latent parameters .
IPME seamlessly integrates uncertainty visualisation of prior and posterior beliefs, predictions and predictive checks. It focuses on interactively exploring beliefs encoded in MCMC sample traces across a range of granularities. Figure 1 illustrates a Bayesian hierarchical model of driver response time as a function of sleep deprivation. Even this deceptively simple structure hides significant complexities which can be revealed through closed-loop exploration and postulation.
Intervention analysis for COVID-19
As part of a collaboration with another EPSRC-funded project, we developed interactive tools for exploring the effect of non-medical interventions on the COVID-19 outbreak. These include lockdown, fast periodic switching , and population compartmentalisation under varying beliefs about epidemiological parameters and forecasts of population responses to policy interventions. Interactive exploration of the parameter space enables closed-loop sensitivity analysis and provides an intuitive and robust understanding of parameters’ effect on the pandemic’s spread.
 (under review) Sebastian Stein, John H. Williamson. Evaluating Bayesian Model Visualisations. Submitted to CHI 2021
 (in press) Evdoxia Taka, Sebastian Stein and John H. Williamson. Increasing Interpretability of Bayesian Probabilistic Programming Models through Interactive Visualizations, Frontiers in Computer Science, Special Issue on Uncertainty Visualization and Decision Making, 2020.
 (under revision) M. Bin, P. Cheung, E. Crisostomi, P. Ferraro, H. Lhachemi, R. Murray-Smith, C.Myant, T. Parisini, R. Shorten, S. Stein, L. Stone. Post-Lockdown Abatement of COVID-19 by Fast Periodic Switching, PLOS Computational Biology, 2020.
WP2: Closed-loop Data science and Pulmonary hypertension
Iadh Ounis, Craig Macdonald, Amir Jadidinejad, Sean Macaveny
Recommender systems are pervasive in daily life - such as in video on demand services exemplified by Netflix, shopping sites such as Amazon or next-app-to-use on mobile phones. There are a plethora of machine learned models that can infer the latent preferences of users, and the properties of items, and use these to predict the next items (shows/produces/apps) that users are likely to interact with.
These learned models are typically trained, and evaluated, based on the previous items that users have interacted with in the past. However the users’ previous interaction behaviour can be affected by what items are being promoted to users by any already deployed recommender (aka item-selection bias). For instance, users may only interact with those items that they have been exposed to by the system. In this way, using historical interactions in the training and evaluation of a new recommender model, where those interactions have been obtained from the deployed recommender system, forms a closed feedback loop, i.e. the deployed recommender system has a direct effect on the collected feedback.
In our experiments using a number of common recommendation datasets and models, we have found that:
- Evaluation data collected from historical user interactions can be affected by a system’s popularity bias. 
- The exploration of interactions produced by an open-loop (random) system within the evaluation data can expose an inconsistency between the closed loop and open loop (random) evaluation. 
- The typical offline evaluation of recommender systems suffers from the so-called Simpson’s paradox due to the occurrence of the closed-loop feedback .
- Our proposed propensity-based stratified evaluation method, which aims to alleviate the feedback loops, is shown to more accurately estimate the performance of a given new recommendation model .
 How Sensitive is Recommendation Systems' Offline Evaluation to Popularity? Amir Jadidinejad, Craig Macdonald and Iadh Ounis. In REVEAL 2019 Workshop at RecSys.
 Using Exploration to Alleviate Closed-Loop Effects in Recommender Systems. Amir Jadinedad, Craig Macdonald, Iadh Ounis. In Proceedings of SIGIR 2020.
 The Simpson’s Paradox in the Offline Evaluation of Recommendation Systems. Amir Jadinedad, Craig Macdonald, Iadh Ounis. Submitted to ACM TOIS 2020 (under review)
WP2: Closed loop mass spectrometry measurement and analysis of metabolomics
Metabolomics is the study of metabolites, small molecules that occur in all living organisms. Collecting and understanding metabolomics data is challenging. Data can be collected by putting a sample through a liquid chromatography Mass Spectrometry (LC-MS) system, in which the molecules present are first separated by the chromatography before being analysed by the mass spectrometer. The result is a series of (typically 1000s) of Mass Spectrometry scans, each of which records the mass to charge ratio and intensities of the metabolites present at a particular chromatographic retention time. Unfortunately, it is almost always impossible to identify molecule from their mass alone and it is therefore common to fragment metabolites via interleaving the normal MS scans with targeted MS/MS scans. Each MS/MS scan can provide information that can aid in the structural elucidation of one or more molecules. A key challenge when using mass spectrometry to collect and understand metabolomics data is choosing where and when to prioritise the MS/MS scans, while still maintaining enough normal MS scans be able to carry out basic data analysis. Acquiring improved data will benefit experiments, but it could be at the cost of missing something that would have otherwise been collected, potentially damaging experiments and trials.
Closed loop issues become a problem when we attempt to make real-time intelligent decisions, rather than using pre-defined scan scheduling rules. When trying to make real-time closed loop decisions, two loops affect how we prioritise scans, as can be seen in the figure. Firstly, there is a short-term within sample loop (blue arrow), where we feedback the information gained from each scan in order to determine the best next scan type and location. Secondly, there is a loop where the results from previous sample are fed back into the decision process (red arrow), providing information which can help guide scan choices in future samples. Within this general measurement framework there are several inherent closed loop issues. The first is the finite scan budget: we have a limited number of scans and they cannot be done simultaneously. This must be done with only partially observed time series and with no knowledge of the success of our choices until after the completion of the sample (lagged reward). Within the second, outer loop we see issues with exploration vs exploitation, where we have the choice between confirming metabolites we think we know (exploitation) or attempting to fragment new, interesting metabolites (exploration). Balancing this trade-off well could lead to improved studies, while doing it badly could result in biased results which could invalidate entire studies.
Work has involved designing a framework which allows us to design controllers which can be used to control scans in real-time in both the Mass Spectrometer and in simulation . Using this framework, new and improved controllers have been designed which have been shown to improve the number of metabolites fragmented  (inner loop). Current work is focused on how we can apply these methods across multiple samples and correct timing differences across samples in real-time to improve performance (outer loop).
 Wandy, J., Davies, V., JJ van der Hooft, J., Weidt, S., Daly, R., & Rogers, S. (2019). In silico optimization of mass spectrometry fragmentation strategies in metabolomics. Metabolites, 9(10), 219.
 Davies, V., Wandy, J., Weidt, S., van der Hooft, J. J., Miller, A., Daly, R., & Rogers, S. (2020). Rapid Development of Improved Data-dependent Acquisition Strategies. bioRxiv. (under revision for Analytical Chemistry)
Closed-loop optimisation of hearing aid parameters, based on subjective user feedback
WP2: Closed-loop hearing aid optimisation
466 million people worldwide have disabling hearing loss due to genetic causes, complications at birth, certain infectious diseases, chronic ear infections, the use of particular drugs, exposure to excessive noise, and ageing (source: WHO). A hearing loss impacts an individual’s ability to communicate (effectively) with others, leading e.g. to academic and adjustment problems for children. Exclusion from communication can have a significant impact on everyday life, causing feelings of loneliness, isolation, frustration and dependence. Unaddressed hearing loss poses an annual estimated global cost of $750 billion (12th most common contributor), including health sector costs and productivity.
Modern hearing aids can partially compensate for a hearing loss. However, the effectiveness of a hearing aid is highly dependent on a suitable configuration of the advanced medical device. A state-of-the-art hearing aid requires 10s of different algorithms and 1000s of parameters to compensate for even the simplest hearing loss; each parameter configuration must be tuned to the specific user and condition. The complexity of hearing loss and the HA itself have traditionally called for experts to tune the device in clinical conditions based on audiograms and informal verbal communication with the patient with little control left to the user when leaving the clinic.
We will enable real-time and context-dependent optimisation of hearing aid configurations to empower the HA-user and ultimately improve the user/listening experience. By exploiting the availability of population-wide, real-time data streams we will investigate real-time collaborative modelling, intervention and optimisation strategies to provide a significant quality improvement and speed-up in the closed-loop optimisation for both individual and the population of HA-users as a whole.
RQ1 – User Modelling and Analysis:
Models for hearing/hearing loss and choice of configuration are typically formulated and derived based on a single - or a few - patients. By continuously collecting millions of observations from users and clinicians about HA-use and configurations, we will develop scalable non-parametric hierarchical Bayesian models of users, contexts and configurations which enables group-based analysis of preference structures. It will provide predictive models for HA-configurations (e.g. based on user demographic, audiogram and behaviour) and the probabilistic nature of the models will support robust and informed clinical decision-making. Based on the models we will extract and analyse user and preference patterns related to current HA-use, configurations and contexts.
RQ2: Closed-loop collaborative preference elicitation and optimisation:
It is possible to elicit, model and optimise subjective preference in the HA-domain using Bayesian modelling and reinforcement learning (/Bayesian optimisation) in a closed-loop fashion. However, this has been done for individuals independently of other peoples preferences, configurations and context. We will develop new techniques to support multiple feedback loops, multiple timescales and asynchronous feedback to support the collaborative setting with input from multiple users and clinicians at differences timescales and with mixed fidelity. We will develop multi-fidelity/confidence-based optimisation and intervention strategies which accounts for user consistency and reliability in order to avoid suboptimal learning and biases in the closed-loop and collaborative setup. We will develop Bayesian optimisation methods supporting multi-objective and engaging optimisation taking into the intention of the user and costs involved in asking users questions or consulting a clinician. We expect to evaluate the models and strategies in several empirical studies to investigate the effect and consequence of the collaborative and closed-loop optimisation.
 Salman Mohammadi, Anders Kirk Uhrenholt and Bjørn Sand Jensen, Odd-One-Out Representation Learning, NeurIPS 2020: Workshop on Object Representations for Learning and Reasoning.
 Lisa Laux, Marie F.A. Cutiongco, Nikolaj Gadegaard and Bjørn Sand Jensen. Interactive machine learning for fast and robust cell profiling, PLoS ONE, 2020.
Anders Kirk Uhrenholt and Bjørn Sand Jensen, Efficient Bayesian Optimization for Target Vector Estimation, AISTATS 2019.
 Anders Kirk Uhrenholt, Valentin Charvet and Bjørn Sand Jensen, Probabilistic Selection of induction point in sparse Gaussian process models, https://arxiv.org/abs/2010.09370 (under review for AISTATS 2021).
 Jasper Kirton-Wingate, Modelling Contextual Preference for Hearing Aid Settings - A Machine Learning Approach, Master of Science by Research, University of Glasgow, 2020.
WP3: Intermittent Control in Data Science
Alberto Álvarez, Henrik Gollee, Roderick Murray-Smith
The variability that event-driven control systems naturally introduce leads to periods of open exploration of the state-space. In particular, Intermittent Control (IC), which switches between open and closed-loop regimes, allows the identification of causal relationships between the data sources, and provides clearly defined trigger points for process interactions to take place. The observed variability in these systems has been traditionally represented with stochastic models; however, IC provides a clear implementation which is capable of accurately model this in a generative fashion. Extending the current IC framework with machine learned models of triggering, causal identification techniques, and additional reinforcement learning capabilities are at the core of our research. Specifically, our goals and current activities are:
- Extend the Intermittent Control framework to incorporate causality models for dynamical systems with the intention of improving the knowledge of the system in real-time.
- Incorporate probabilistic models to the inter-sample behaviour of IC.
- Implementation of direct adaptive intermittent controllers in the context of reinforcement learning, complementing the existing indirect implementations.
- Introduce IC as a closed-loop modelling approach to classical Human-Computer Interaction (HCI) tasks, such as pointing and tracking.
- Evaluate the effects of intermittency introduced by IC in the context of recommender systems.
- Implementing causal identification techniques in the context of multi-segmental structures controlled by IC.
- Remodelling of the system-matched hold element in IC as a Gaussian process in the context of single inverted pendulums.
- Fitting of an IC to describe experimental data from a two dimensional HCI pointing task.
- Adding a reinforcement learning layer to the standard IC framework, based on policy iteration methods, to obtain optimal control gains to stabilize the cart-pole system.
 Alberto Álvarez, Henrik Gollee, Jörg Müller, Roderick Murray-Smith, Intermittent control as a model of mouse movements, Submitted to the ACM Transactions on Computer-Human Interaction (TOCHI).
WP3: Closed loop issues in Urban traffic control
Traffic congestion has been a key urban issue, causing significant costs on economics in many cities worldwide. The traffic signal control system, as an essential tool of smart road principles, can be used to reduce the level of congestion, and thereby transport emissions and fuel consumption in an urban area. The adaptive traffic signal control approach like SCOOT (Hunt et al., 1982) and SCATS (Sims and Dobinson, 1980) has been widely used in real-world traffic network. They are based on an open loop control system that does not consider feedback control in the traffic network. In contrast, a learning-based method like a reinforcement learning (RL) algorithm can be employed to learn from the traffic environment by taking actions (e.g., switch signal phasing) and observing the feedbacks (e.g., queue lengths), enabling researchers and planners to predict the traffic flow more accurately, thereby optimize the traffic signal plan.
Due to the new technology improvements, high-tech vehicles (e.g., Connected and Automated Vehicle (CAV)) can communicate with traffic infrastructure, generating important new data. For example, planners can optimize traffic signal control plans based on more accurate current traffic flows and future traffic flow predictions by using data (e.g., speed, location, headway, etc.) sent from CAVs to traffic signal controls. In addition, CAVs can drive more efficiently by receiving updated traffic signal plans, potentially improving an overall network system performance (e.g., number of stops, fuel consumptions, etc). Unfortunately, research on the adoptive traffic signal control with a RL approach and the applications of CAVs in this environment is scarce. In this WP, we use the road intersection as a RL agent while allowing the CAVs to adjust their speed adaptively based on the real-time signal plan received from the controller to examine the closed-loop effects of learning traffic signal controllers and CAVs on the network system performance. Preliminary results show that the proposed method outperforms other dynamic traffic signal control strategies in terms of average vehicle delay and queue length. The sensitivity analysis with different market penetration rates of CAVs and traffic saturation degrees is in the process.
 CAV-based Adaptive Traffic Signal Control with Reinforcement Learning, Transport and computing science conference, October 1, 2020.
 A Real-time Adaptive Traffic Signal Control in a Connected Vehicle Environment: Optimization of Signal Planning and Vehicle Speed Guidance with Reinforcement Learning, (Plan to submit to the Special Issue: Managing Future Motorway and Urban Traffic Systems in Transportation Research Part C, December 2020)
 Two-level real time adaptive traffic control with reinforcement learning, (To submit to Sustainability, January 2021)
Yilmaz, S., Dudkina, E., Bin, M., Crisostomi, E., Ferraro, P., Murray-Smith, R. , Parisini, T., Stone, L. and Shorten, R. (2020) Kemeny-based testing for COVID-19. PLoS ONE, 15(11), e0242401. (doi: 10.1371/journal.pone.0242401)
Borowska, A. , Giurghita, D. and Husmeier, D. (2020) Gaussian process enhanced semi-automatic approximate Bayesian computation: parameter inference in a stochastic differential equation system for chemotaxis. Journal of Computational Physics, (doi: 10.1016/j.jcp.2020.109999) (In Press)
Taka, E., Stein, S. and Williamson, J. H. (2020) Increasing interpretability of Bayesian probabilistic programming models through interactive visualizations. Frontiers in Computer Science, (doi: 10.3389/fcomp.2020.567344) (Accepted for Publication)
Anagnostopoulos, C. and Kolomvatsos, K. (2020) Predictive intelligence of reliable analytics in distributed computing environments. Applied Intelligence, 50, pp. 3219-3238. (doi: 10.1007/s10489-020-01712-5)
Wu, Y., Macdonald, C. and Ounis, I. (2020) A Hybrid Conditional Variational Autoencoder Model for Personalised Top-n Recommendation. In: ICTIR 2020: The 6th ACM International Conference on the Theory of Information Retrieval, Stavanger, Norway, 14-18 Sep 2020, pp. 89-96. ISBN 9781450380676 (doi:10.1145/3409256.3409835)
Quiros, A. C., Murray-Smith, R. and Yuan, K. (2020) PathologyGAN: Learning Deep Representations of Cancer Tissue. Proceedings of Machine Learning Research, 124, pp. 669-695.
Laux, L., Cutiongco, M. F.A. , Gadegaard, N. and Jensen, B. S. (2020) Interactive machine learning for fast and robust cell profiling. PLoS ONE, 15(9), e0237972. (doi: 10.1371/journal.pone.0237972) (PMID:32915784)
Kolomvatsos, K., Anagnostopoulos, C. , Koziri, M. and Loukopoulos, T. (2020) Proactive & time-optimized data synopsis management at the edge. IEEE Transactions on Knowledge and Data Engineering, (doi: 10.1109/TKDE.2020.3021377) (Early Online Publication)
Savva, F. , Anagnostopoulos, C. , Triantafillou, P. and Kolomvatsos, K. (2020) Large-scale data exploration using explanatory regression functions. ACM Transactions on Knowledge Discovery from Data, 14(6), 76. (doi: 10.1145/3410448)
Tonolini, F., Radford, J., Turpin, A. , Faccio, D. and Murray-Smith, R. (2020) Variational inference for computational imaging inverse problems. Journal of Machine Learning Research, 21(179), pp. 1-46.
Husmeier, D. and Paun, L. M. (2020) Closed-loop effects in cardiovascular clinical decision support. In: Ladde, G. and Samia, N. (eds.) Proceedings of the 2nd International Conference on Statistics: Theory and Applications (ICSTA'20). Avestia Publishing: Ottawa, Canada, p. 128. ISBN 9781927877685 (doi:10.11159/icsta20.128)
Savva, F. , Anagnostopoulos, C. and Triantafillou, P. (2020) Adaptive learning of aggregate analytics under dynamic workloads. Future Generation Computer Systems, 109, pp. 317-330. (doi: 10.1016/j.future.2020.03.063)
Husmeier, D. and Paun, L. M. (2020) Closed-loop effects in coupling cardiac physiological models to clinical interventions. In: Irigoien, I., Lee, D.-J., Martínez-Minaya, J. and Rodríguez-Álvarez, M. X. (eds.) Proceedings of the 35th International Workshop on Statistical Modelling. Servicio Editorial de la Universidad del País Vasco: Bilbao, Spain, pp. 120-125. ISBN 9788413192673
Jadidinejad, A. H. , Macdonald, C. and Ounis, I. (2020) Using Exploration to Alleviate Closed-Loop Effects in Recommender Systems. In: 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), Xi'an, China, 25-30 Jul 2020, pp. 2025-2028. ISBN 9781450380164 (doi:10.1145/3397271.3401230)
Williamson, J. H. , Quek, M., Popescu, I., Ramsay, A. and Murray-Smith, R. (2020) Efficient human-machine control with asymmetric marginal reliability input devices. PLoS ONE, 15(6), e0233603. (doi: 10.1371/journal.pone.0233603)
Savva, F. , Anagnostopoulos, C. and Triantafillou, P. (2020) SuRF: Identification of Interesting Data Regions with Surrogate Models. In: 36th IEEE International Conference on Data Engineering (IEEE ICDE), Dallas, TX, USA, 20-24 April 2020, pp. 1321-1332. ISBN 9781728129037 (doi:10.1109/ICDE48307.2020.00118)
Savva, F. , Anagnostopoulos, C. and Triantafillou, P. (2020) Aggregate Query Prediction under Dynamic Workloads. In: 2019 IEEE International Conference on Big Data (IEEE BigData 2019), Los Angeles, CA, USA, 09-12 Dec 2019, pp. 671-676. ISBN 9781728108582 (doi:10.1109/BigData47090.2019.9006267)
Anagnostopoulos, C. and Triantafillou, P. (2020) Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics, 9(1), pp. 17-55. (doi: 10.1007/s41060-018-0163-5)
Ireland, D.G. , Doring, M., Glazier, D.I., Haidenbauer, J., Mai, M., Murray-Smith, R. and Ronchen, D. (2019) Kaon photoproduction and the Lambda decay parameter alpha. Physical Review Letters, 123, 182301. (doi: 10.1103/PhysRevLett.123.182301)
Wandy, J., Davies, V., van der Hooft, J. J.J. , Weidt, S., Daly, R. and Rogers, S. (2019) In silico optimization of mass spectrometry fragmentation strategies in metabolomics. Metabolites, 9(10), 219. (doi: 10.3390/metabo9100219) (PMID:31600991)
Jadidinejad, A. , Macdonald, C. and Ounis, I. (2019) How Sensitive is Recommendation Systems' Offline Evaluation to Popularity? In: REVEAL 2019 Workshop at RecSys, Copenhagen, Denmark, 20 Sep 2019,
Davies, V. , Harvey, W. T., Reeve, R. and Husmeier, D. (2019) Improving the identification of antigenic sites in the H1N1 Influenza virus through accounting for the experimental structure in a sparse hierarchical Bayesian model. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(4), pp. 859-885. (doi: 10.1111/rssc.12338) (PMID:31598013) (PMCID:PMC6774336)
Savva, F. , Anagnostopoulos, C. and Triantafillou, P. (2019) Explaining Aggregates for Exploratory Analytics. In: IEEE Big Data 2018, Seattle, WA, USA, 10-13 Dec 2018, pp. 478-487. ISBN 9781538650356 (doi:10.1109/BigData.2018.8621953)
Jadidinejad, A. H. , Macdonald, C. and Ounis, I. (2019) Unifying Explicit and Implicit Feedback for Rating Prediction and Ranking Recommendation Tasks. In: 5th ACM SIGIR International Conference on the Theory of Information Retrieval, Santa Clara, CA, USA, 02-05 Oct 2019, pp. 149-151. ISBN 9781450368810 (doi:10.1145/3341981.3344225)
Moran, O., Caramazza, P., Faccio, D. and Murray-Smith, R. (2018) Deep, Complex, Invertible Networks for Inversion of Transmission Effects in Multimode Optical Fibres. In: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada, 02-08 Dec 2018,
Currently no vacancies.