Biostatistics and Statistical Genetics

  • Lung Cancer in Greater Glasgow

    Map of Glasgow

    Statisticians and health professionals are interested in producing maps showing cancer risk, so that areas with elevated risks can be detected.

  • Network Meta-Analysis

    Plot

    Mean, mean plot with 95% confidence intervals for a network meta-analysis in diabetes.

  • Studying Obesity

    Violin plots

    Statistical models studying the connections between obesity indicators and socio-economic factors help developing anti-obesity strategies.

  • Estimation of Historic Population Sizes

    Violin plots

    Genetic variability between modern individuals allows estimation of historic population sizes.

  • Network Meta-Analysis

    Shade plot

    Shade plot for P-values for all pairwise contrasts for a network meta-analysis.

  • Where did modern humans come from?

    Violin plots

    Maximum-likelihood phylogenetic analysis of mitochondrial DNA variation reveals that modern humans exited Africa via South Asia.

  • Regression to the Mean

    Scatter plot

    Regression to the mean: Diastolic Blood Pressure measurements on two occasions.

  • Ancestry of African Slaves

    Violin plots

    Statistical genetics provides a means of partitioning the ancestry of descendants of African slaves in the USA to different regions of Africa.

  • Illustration of the t-Test

    Illustration

    Geometric repesentation of Student's t-test showing critical region and a point in the sample space for a sample of two observations.

  • Statistical Testing

    Illustration

    Bivariate contours under null and alternative hypotheses with Bonferroni boundaries illustrated.

This group researches into design, and analysis of quantitative investigations in human health and genetics with a particular emphasis on applying advanced methods of statistical inference.

Dr Christina A Cobbold Senior Lecturer

Bayesian methods; inference and statistical methods for dynamical systems with applications to genetic data

Member of other research groups: Mathematical Biology

Dr Mayetri Gupta Reader

Bayesian methodology for gene regulation;  Statistical analysis of microarray, tiling array and deep sequencing data; Phylogenetic analysis;  Analysis of GWAS

Member of other research groups: Statistical Modelling, Statistical Methodology
Postgraduate opportunities: Clustering methods to detect genetic associations, Bayesian variable selection for genetic and genomic studies, Detection of genomic signals in sequence data

Dr Vincent Macaulay Reader

Statistical genetics; population genetics; Bayesian methods; phylogenetics

Member of other research groups: Statistical Methodology
Research student: Colette Mair
Postgraduate opportunities: The evolution of shape, Modelling Genetic Variation

Dr Tereza Neocleous Lecturer

Forensic statistics; quantile regression; semiparametric models; biostatistics applications

Member of other research groups: Statistical Modelling
Research students: Charalampos Chanialidis, Gary Napier, Elizabeth Irwin
Postgraduate opportunities: Topics in compositional data analysis

Dr Surajit Ray Senior lecturer

Analysis of mixture models; high-dimensional data; medical image analysis; analysis of earth systems data; immunoinformatics

Member of other research groups: Statistical Methodology, Environmental Statistics

Craig Anderson PhD Student

Research Topic: Modelling disease risk in space and time
Member of other research groups: Statistical Modelling
Supervisors: Duncan Lee, Nema Dean

Emanuel Baah PhD Student

Research Topic: Analysis of spontaneous reports of side-effects
Supervisor: Stephen Senn

Rob Donald PhD Student

Research Topic: Online Assessment of Event Prediction Models
Supervisor: Ludger Evers

Rachael Fulton PhD Student

Research Topic: Covariate adjustment in stroke trials
Supervisors: Kennedy Lees (MVLS), Stephen Senn

Colette Mair PhD Student

Research Topic: Dimension reduction in population genetic inference
Supervisor: Vincent Macaulay

Fraser Tough PhD Student

Research Topic: The role of weight gain in the identification of under and over nutrition: compiling a longitudinal growth dataset
Supervisor: John McColl

Modelling the evolution of disease risk in space and time (MSc / PhD)

Supervisors: Duncan Lee
Relevant research groups: Biostatistics and Statistical Genetics, Environmental Statistics

Mapping the spatial pattern in disease risk over a city or country is a common problem in epidemiology, and the primary aim is to determine which areas exhibit the greatest risks of disease. A recent extension to this field is to try and model how and to what extent the spatial risk surface changes over time. The motivation for this is to address questions such as: (1) on average across the study region, is the risk of disease getting more or less pronounced? and (2) in which areas of the study region are the disease risks getting worse? This project will develop statistical models to address these questions, and will apply them to map the evolution of important diseases, such as cancer and coronary heart disease, across regions of the UK.

 

Modelling Genetic Variation (PhD)

Supervisors: Vincent Macaulay
Relevant research groups: Biostatistics and Statistical Genetics

Variation in the distribution of different DNA sequences across individuals has been shaped by many processes which can be modelled probabilistically, processes such as demographic factors like prehistoric population movements, or natural selection. This project involves developing new techniques for teasing out information on those processes from the wealth of raw data that is now being generated by high-throughput genetic assays, and is likely to involve computationally-intensive sampling techniques to approximate the posterior distribution of parameters of interest. The characterization of the amount of population structure on different geographical scales will influence the design of experiments to identify the genetic variants that increase risk of complex diseases, such as diabetes or heart disease.

 

Estimating the effects of air pollution on human health (MSc / PhD)

Supervisors: Duncan Lee
Relevant research groups: Biostatistics and Statistical Genetics, Environmental Statistics

The health impact of exposure to air pollution is thought to reduce average life expectancy by six months, with an estimated equivalent health cost of 19 billion each year (from DEFRA). These effects have been estimated using statistical models, which quantify the impact on human health of exposure in both the short and the long term. However, the estimation of such effects is challenging, because individual level measures of health and pollution exposure are not available. Therefore, the majority of studies are conducted at the population level, and the resulting inference can only be made about the effects of pollution on overall population health. However, the data used in such studies are spatially misaligned, as the health data relate to extended areas such as cities or electoral wards, while the pollution concentrations are measured at individual locations. Furthermore, pollution monitors are typically located where concentrations are thought to be highest, known as preferential sampling, which is likely to result in overly high measurements being recorded. This project aims to develop statistical methodology to address these problems, and thus provide a less biased estimate of the effects of pollution on health than are currently produced.

 

Detection of genomic signals in sequence data (PhD)

Supervisors: Mayetri Gupta
Relevant research groups: Biostatistics and Statistical Genetics, Statistical Modelling

Demarcating functional regions in the genome is an essential component in gaining insight into the working of biological systems, from the cellular level to the organism as a whole. One important problem is the accurate detection of transcription factor binding sites, which act as "switches" turning genes on or off as needed. Detection of these sparse signals from genomic sequence data is a significant challenge, due to the high volume of noise compared to the actual signal, along with latent dependencies in the data such as positional or structural constraints. Augmenting sequence data with additional information- such as knowledge of biological pathways, or transcriptional experiments, presents a more powerful alternative, but leads to additional challenges in statistical modelling and analysis. In this project, we aim to develop fast, accurate and efficient statistical methods for the detection of transcription factor binding sites. Techniques that will be used may involve hidden Markov models for segmentation, joint modelling approaches, and fast and efficient Markov chain Monte Carlo-based computational techniques for model-fitting and estimation. Using a Bayesian statistical framework in such problems is highly desirable and appropriate, in order to bring in a necessary structure to the problem, and incorporate pertinent biological information that can lead to more accurate inference. A related problem to be addressed, with somewhat different challenges, is detecting boundaries of functionally varying genomic regions, such as nucleosomes, or regions of methylation.

 

Clustering methods to detect genetic associations (PhD)

Supervisors: Mayetri Gupta
Relevant research groups: Biostatistics and Statistical Genetics, Statistical Methodology

Many common diseases including cardiovascular disease and osteoporosis are characterized by complex traits, which are determined by the interplay of numerous genetic variants and various environmental factors. Although genetic and phenotypic data may contain the information to decipher complex diseases, building global models that can associate complex traits with the appropriate genetic profile leads to several formidable statistical and computational challenges. Model-based methods for clustering provide a promising approach, but are generally difficult to implement here due to unknown numbers of clusters and a lack of a grouping structure in a large part of the data. This project aims to develop a Bayesian model-based framework and methodologies for clustering blocks of SNPs and phenotypes that can identify sets of candidate genes associated with traits for different diseases. In the scientific context, it is becoming increasingly important to understand the biological system as a whole, taking into account heterogeneity between populations and individuals, simultaneously with  individual-level genome-specific biological characteristics. A longer-term goal would be to develop efficient methods for detecting associations incorporating genomic function with genetic variability.

 

Friday 14th June15:00-16:00Maths 203
Sofia Massa (University of Oxford)
Friday 20th September15:00-16:00Maths 204
Paul Fearnhead (Lancaster University)
Friday 27th September15:00-16:00Maths 204
Yee Whye Teh (University of Oxford)

Show all past events relating to this group