# Dr Mayetri Gupta

**Reader in Statistics**(Statistics)

**telephone**:
01413307753

**email**:
Mayetri.Gupta@glasgow.ac.uk

The Mathematics and Statistics Building, University of Glasgow, University Place, Glasgow, G12 8SQ

## Research interests

My research primarily involves the development of novel statistical, in particular Bayesian, methodology for scientific problems arising in the fields of computational biology and genetics. Detection of sparse signals from noisy discrete data is a significant challenge in many fields, but especially so in genomic data analysis, due to latent positional or structural constraints in such data. I have been involved in developing novel Bayesian statistical approaches for detecting recurrent conserved patterns (motifs) in DNA sequence data, and prediction of chromatin structure (positioning of nucleosomes in DNA) through new adaptations of hidden Markov and hidden semi-Markov models. Motifs are often the sites of active gene regulation, and tend to be prone to damage from the environment, thus locating them accurately is an important challenge for biologists and clinicians alike. I also work on the development of Bayesian regression mixture, and hidden Markov regression models and associated Monte-Carlo based estimation procedures. These provide a robust and efficient way of deciphering regulatory networks of genes and associated motifs, combining different types of genomic data, including genomic sequence, gene expression microarray and tiling array data. I am also interested in general Bayesian methodology for clustering, classification, and model selection with high-dimensional, correlated data. In previous work with collaborators, we have developed a new information-based prior framework that allows for efficient inference in high-dimensional regression-type models. I am currently working on extensions of these ideas to be applied to the discovery of causal genetic mutations behind complex disease phenotypes.

### Research units

## Publications

**34**.

## 2022

Zhang, H., Swallow, B. and Gupta, M.
(2022)
Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets.
*Australian and New Zealand Journal of Statistics*, 64(2),
pp. 313-337.
(doi: 10.1111/anzs.12370)

## 2021

Wu, J., Gupta, M. , Hussein, A. I. and Gerstenfeld, L.
(2021)
Bayesian modeling of factorial time- course data with applications to a bone aging gene expression study.
*Journal of Applied Statistics*, 48(10),
pp. 1730-1754.
(doi: 10.1080/02664763.2020.1772733)

## 2020

Redivo, E., Nguyen, H. and Gupta, M.
(2020)
Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions.
*Computational Statistics and Data Analysis*, 152,
107040.
(doi: 10.1016/j.csda.2020.107040)

## 2019

Al Alawi, M., Ray, S. and Gupta, M. (2019) A New Framework for Distance-based Functional Clustering. In: 34th International Workshop on Statistical Modelling, Guimarães, Portugal, 07-12 Jul 2019,

## 2015

Moser, C. B., Gupta, M. , Archer, B. N. and White, L. F.
(2015)
The impact of prior information on estimates of disease transmissibility using Bayesian tools.
*PLoS ONE*, 10(3),
e0118762.
(doi: 10.1371/journal.pone.0118762)
(PMID:25793993)
(PMCID:PMC4368801)

## 2014

Bis, J. C. et al.
(2014)
Associations of NINJ2 sequence variants with incident ischemic stroke in the Cohorts for Heart and Aging in Genomic Epidemiology (CHARGE) consortium.
*PLoS ONE*, 9(6),
e99798.
(doi: 10.1371/journal.pone.0099798)
(PMID:24959832)
(PMCID:PMC4069013)

Lin, H. et al.
(2014)
Strategies to design and analyze targeted sequencing data: cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium targeted sequencing study.
*Circulation: Cardiovascular Genetics*, 7(3),
pp. 335-343.
(doi: 10.1161/CIRCGENETICS.113.000350)

Gupta, M.
(2014)
An evolutionary Monte Carlo algorithm for Bayesian block clustering of data matrices.
*Computational Statistics and Data Analysis*, 71,
375- 391.
(doi: 10.1016/j.csda.2013.07.006)

Lin, H. et al.
(2014)
Targeted sequencing in candidate genes for atrial fibrillation: the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) targeted sequencing study.
*Heart Rhythm*, 11(3),
pp. 452-457.
(doi: 10.1016/j.hrthm.2013.11.012)

## 2013

Chanialidis, C. , Craigmile, P., Davies, V. , Dean, N. , Evers, L. , Filiippone, M., Gupta, M. , Ray, S. and Rogers, S. (2013) Discussion of Henning and Liao: How to find an appropriate clustering for mixed type variables with application to socio-economic stratification. Journal of the Royal Statistical Society: Series C. 62, 309-369. Discussion Paper. Springer. (doi: 10.1111/j.1467-9876.2012.01066.x).

Gelfond, J.A., Ibrahim, J.G., Gupta, M. , Cheng, M.-H. and Cody, J.D.
(2013)
Differential expression analysis with global network adjustment.
*BMC Bioinformatics*, 14(258),
(doi: 10.1186/1471-2105-14-258)

## 2012

Gupta, M. and Ray, S.
(2012)
Sequence pattern discovery with applications to understanding gene regulation and vaccine design.
In: Rao, C.R., Chakraborty, R. and Sen, P.K. (eds.)
*Handbook of Statistics.*
Elsevier Press.

Moser, C. and Gupta, M.
(2012)
A generalized hidden Markov model for determining sequence-based predictors of nucleosome positioning.
*Statistical Applications in Genetics and Molecular Biology*, 11(2),
Art. 2.

Hendricks, A.E., Dupuis, J., Gupta, M. , Logue, M.W. and Lunetta, K.L.
(2012)
A comparison of gene region simulation methods.
*PLoS ONE*, 7(7),
e40925.
(doi: 10.1371/journal.pone.0040925)

## 2011

Gupta, M. , Cheung, C.-L., Hsu, Y.-H., Demissie, S., Cupples, L.A., Kiel, D.P. and Karasik, D.
(2011)
Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome-wide associations.
*Journal of Bone and Mineral Research*, 26(6),
pp. 1261-1271.
(doi: 10.1002/jbmr.333)

Mitra, R. and Gupta, M.
(2011)
A continuous-index Bayesian hidden Markov model for prediction of nucleosome positioning in genomic DNA.
*Biostatistics*, 12(3),
pp. 462-477.
(doi: 10.1093/biostatistics/kxq077)

## 2010

Meltzer, M., Long, K., Nie, Y., Gupta, M. , Yang, J. and Montano, M.
(2010)
The RNA editor gene ADAR1 is induced in myoblasts by inflammatory ligands and buffers stress response.
*Clinical and Translational Science*, 3(3),
pp. 73-80.
(doi: 10.1111/j.1752-8062.2010.00199.x)

## 2009

Gelfond, J.A.L., Gupta, M. and Ibrahim, J.G.
(2009)
A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data.
*Biometrics*, 65(4),
pp. 1087-1095.
(doi: 10.1111/j.1541-0420.2008.01180.x)

Gupta, M.
(2009)
Model selection and sensitivity analysis for sequence pattern models.
*Institute of Mathematical Statistics Collections*, 1(1),
pp. 390-407.

Cheng, F., Hartmann, S., Gupta, M. , Ibrahim, J.G. and Vision, T.J.
(2009)
A hierarchical model for incomplete alignments in phylogenetic inference.
*Bioinformatics*, 25(5),
pp. 592-598.
(doi: 10.1093/bioinformatics/btp015)

Gupta, M. and Ibrahim, J.G.
(2009)
An information matrix prior for Bayesian analysis in generalized linear models with high dimensional data.
*Statistica Sinica*, 19(4),
pp. 1641-1663.

Zhou, Q. and Gupta, M.
(2009)
Regulatory motif discovery: from decoding to meta-analysis.
In: Fan, J., Lin, X. and Liu, J.S. (eds.)
*New Developments in Biostatistics and Bioinformatics.*
Series: Frontiers of Statistics (1).
World Scientific, pp. 179-208.
ISBN 9789812837431
(doi: 10.1142/9789812837448_0008)

## 2008

Jeong, Y.-C., Walker, N.J., Burgin, D.E., Kissling, G., Gupta, M. , Kupper, L., Birnbaum, L.S. and Swenberg, J.A.
(2008)
Accumulation of M_{1}dG DNA adducts after chronic exposure to PCBs, but not from acute exposure to polychlorinated aromatic hydrocarbons.
*Free Radical Biology and Medicine*, 45(5),
pp. 585-591.
(doi: 10.1016/j.freeradbiomed.2008.04.043)

## 2007

Gupta, M. , Qu, P. and Ibrahim, J.G.
(2007)
A temporal hidden Markov regression model for the analysis of gene regulatory networks.
*Biostatistics*, 8(4),
pp. 805-820.
(doi: 10.1093/biostatistics/kxm007)

Gupta, M. and Ibrahim, J.G.
(2007)
Variable selection in regression mixture modeling for the discovery of gene regulatory networks.
*Journal of the American Statistical Association*, 102(479),
pp. 867-880.
(doi: 10.1198/016214507000000068)

Gupta, M.
(2007)
Generalized hierarchical markov models for the discovery of length-constrained sequence features from genome tiling arrays.
*Biometrics*, 63(3),
pp. 797-805.
(doi: 10.1111/j.1541-0420.2007.00760.x)

Maki, A., Kono, H., Gupta, M. , Asakawa, M., Suzuki, T., Matsuda, M., Fujii, H. and Rusyn, I.
(2007)
Predictive power of biomarkers of oxidative stress and inflammation in patients with hepatitis C virus-associated hepatocellular carcinoma.
*Annals of Surgical Oncology*, 14(3),
pp. 1182-1190.
(doi: 10.1245/s10434-006-9049-1)

## 2006

Giresi, P.G., Gupta, M. and Lieb, J.D.
(2006)
Regulation of nucleosome stability as a mediator of chromatin function.
*Current Opinion in Genetics and Development*, 16(2),
pp. 171-176.
(doi: 10.1016/j.gde.2006.02.003)

Gupta, M. and Liu, J.S.
(2006)
Bayesian modeling and inference for motif discovery.
In: Do, K.-A., Müller, P. and Vannucci, M. (eds.)
*Bayesian Inference for Gene Expression and Proteomics.*
Cambridge University Press: Cambridge, UK.
ISBN 9780521860925

## 2005

Altman, N., Banks, D., Hardwick, J., Roeder, K., Craigmile, P.F., Hardin, J. and Gupta, M.
(2005)
*The IMS New Researchers' Survival Guide.*
The Institute of Mathematical Statistics.

Gupta, M. and Liu, J.S.
(2005)
De novo cis-regulatory module elicitation for eukaryotic genomes.
*Proceedings of the National Academy of Sciences of the United States of America*, 102(20),
pp. 7079-7084.
(doi: 10.1073/pnas.0408743102)

## 2004

Gupta, M. and Liu, J.S.
(2004)
Discussions on "A Bayesian Approach to DNA Sequence Segmentation".
*Biometrics*, 60(3),
pp. 582-583.
(doi: 10.1111/j.0006-341X.2004.206_3.x)

## 2003

Gupta, M. and Liu, J.S.
(2003)
Discovery of conserved sequence patterns using a stochastic dictionary model.
*Journal of the American Statistical Association*, 98(461),
pp. 55-66.
(doi: 10.1198/016214503388619094)

## 2002

Liu, J.S., Gupta, M. , Liu, X.L., Mayerhofer, L. and Lawrence, C.L.
(2002)
Statistical models for biological sequence motif discovery.
In:
*Case Studies in Bayesian Statistics.*
Series: Lecture Notes in Statistics, 6 (167).
Springer.

**Fri Aug 19 16:56:41 2022 BST**.

**34**.

## Articles

Zhang, H., Swallow, B. and Gupta, M.
(2022)
Bayesian hierarchical mixture models for detecting non-normal clusters applied to noisy genomic and environmental datasets.
*Australian and New Zealand Journal of Statistics*, 64(2),
pp. 313-337.
(doi: 10.1111/anzs.12370)

Wu, J., Gupta, M. , Hussein, A. I. and Gerstenfeld, L.
(2021)
Bayesian modeling of factorial time- course data with applications to a bone aging gene expression study.
*Journal of Applied Statistics*, 48(10),
pp. 1730-1754.
(doi: 10.1080/02664763.2020.1772733)

Redivo, E., Nguyen, H. and Gupta, M.
(2020)
Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions.
*Computational Statistics and Data Analysis*, 152,
107040.
(doi: 10.1016/j.csda.2020.107040)

Moser, C. B., Gupta, M. , Archer, B. N. and White, L. F.
(2015)
The impact of prior information on estimates of disease transmissibility using Bayesian tools.
*PLoS ONE*, 10(3),
e0118762.
(doi: 10.1371/journal.pone.0118762)
(PMID:25793993)
(PMCID:PMC4368801)

Bis, J. C. et al.
(2014)
Associations of NINJ2 sequence variants with incident ischemic stroke in the Cohorts for Heart and Aging in Genomic Epidemiology (CHARGE) consortium.
*PLoS ONE*, 9(6),
e99798.
(doi: 10.1371/journal.pone.0099798)
(PMID:24959832)
(PMCID:PMC4069013)

Lin, H. et al.
(2014)
Strategies to design and analyze targeted sequencing data: cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium targeted sequencing study.
*Circulation: Cardiovascular Genetics*, 7(3),
pp. 335-343.
(doi: 10.1161/CIRCGENETICS.113.000350)

Gupta, M.
(2014)
An evolutionary Monte Carlo algorithm for Bayesian block clustering of data matrices.
*Computational Statistics and Data Analysis*, 71,
375- 391.
(doi: 10.1016/j.csda.2013.07.006)

Lin, H. et al.
(2014)
Targeted sequencing in candidate genes for atrial fibrillation: the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) targeted sequencing study.
*Heart Rhythm*, 11(3),
pp. 452-457.
(doi: 10.1016/j.hrthm.2013.11.012)

Gelfond, J.A., Ibrahim, J.G., Gupta, M. , Cheng, M.-H. and Cody, J.D.
(2013)
Differential expression analysis with global network adjustment.
*BMC Bioinformatics*, 14(258),
(doi: 10.1186/1471-2105-14-258)

Moser, C. and Gupta, M.
(2012)
A generalized hidden Markov model for determining sequence-based predictors of nucleosome positioning.
*Statistical Applications in Genetics and Molecular Biology*, 11(2),
Art. 2.

Hendricks, A.E., Dupuis, J., Gupta, M. , Logue, M.W. and Lunetta, K.L.
(2012)
A comparison of gene region simulation methods.
*PLoS ONE*, 7(7),
e40925.
(doi: 10.1371/journal.pone.0040925)

Gupta, M. , Cheung, C.-L., Hsu, Y.-H., Demissie, S., Cupples, L.A., Kiel, D.P. and Karasik, D.
(2011)
Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome-wide associations.
*Journal of Bone and Mineral Research*, 26(6),
pp. 1261-1271.
(doi: 10.1002/jbmr.333)

Mitra, R. and Gupta, M.
(2011)
A continuous-index Bayesian hidden Markov model for prediction of nucleosome positioning in genomic DNA.
*Biostatistics*, 12(3),
pp. 462-477.
(doi: 10.1093/biostatistics/kxq077)

Meltzer, M., Long, K., Nie, Y., Gupta, M. , Yang, J. and Montano, M.
(2010)
The RNA editor gene ADAR1 is induced in myoblasts by inflammatory ligands and buffers stress response.
*Clinical and Translational Science*, 3(3),
pp. 73-80.
(doi: 10.1111/j.1752-8062.2010.00199.x)

Gelfond, J.A.L., Gupta, M. and Ibrahim, J.G.
(2009)
A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data.
*Biometrics*, 65(4),
pp. 1087-1095.
(doi: 10.1111/j.1541-0420.2008.01180.x)

Gupta, M.
(2009)
Model selection and sensitivity analysis for sequence pattern models.
*Institute of Mathematical Statistics Collections*, 1(1),
pp. 390-407.

Cheng, F., Hartmann, S., Gupta, M. , Ibrahim, J.G. and Vision, T.J.
(2009)
A hierarchical model for incomplete alignments in phylogenetic inference.
*Bioinformatics*, 25(5),
pp. 592-598.
(doi: 10.1093/bioinformatics/btp015)

Gupta, M. and Ibrahim, J.G.
(2009)
An information matrix prior for Bayesian analysis in generalized linear models with high dimensional data.
*Statistica Sinica*, 19(4),
pp. 1641-1663.

Jeong, Y.-C., Walker, N.J., Burgin, D.E., Kissling, G., Gupta, M. , Kupper, L., Birnbaum, L.S. and Swenberg, J.A.
(2008)
Accumulation of M_{1}dG DNA adducts after chronic exposure to PCBs, but not from acute exposure to polychlorinated aromatic hydrocarbons.
*Free Radical Biology and Medicine*, 45(5),
pp. 585-591.
(doi: 10.1016/j.freeradbiomed.2008.04.043)

Gupta, M. , Qu, P. and Ibrahim, J.G.
(2007)
A temporal hidden Markov regression model for the analysis of gene regulatory networks.
*Biostatistics*, 8(4),
pp. 805-820.
(doi: 10.1093/biostatistics/kxm007)

Gupta, M. and Ibrahim, J.G.
(2007)
Variable selection in regression mixture modeling for the discovery of gene regulatory networks.
*Journal of the American Statistical Association*, 102(479),
pp. 867-880.
(doi: 10.1198/016214507000000068)

Gupta, M.
(2007)
Generalized hierarchical markov models for the discovery of length-constrained sequence features from genome tiling arrays.
*Biometrics*, 63(3),
pp. 797-805.
(doi: 10.1111/j.1541-0420.2007.00760.x)

Maki, A., Kono, H., Gupta, M. , Asakawa, M., Suzuki, T., Matsuda, M., Fujii, H. and Rusyn, I.
(2007)
Predictive power of biomarkers of oxidative stress and inflammation in patients with hepatitis C virus-associated hepatocellular carcinoma.
*Annals of Surgical Oncology*, 14(3),
pp. 1182-1190.
(doi: 10.1245/s10434-006-9049-1)

Giresi, P.G., Gupta, M. and Lieb, J.D.
(2006)
Regulation of nucleosome stability as a mediator of chromatin function.
*Current Opinion in Genetics and Development*, 16(2),
pp. 171-176.
(doi: 10.1016/j.gde.2006.02.003)

Gupta, M. and Liu, J.S.
(2005)
De novo cis-regulatory module elicitation for eukaryotic genomes.
*Proceedings of the National Academy of Sciences of the United States of America*, 102(20),
pp. 7079-7084.
(doi: 10.1073/pnas.0408743102)

Gupta, M. and Liu, J.S.
(2004)
Discussions on "A Bayesian Approach to DNA Sequence Segmentation".
*Biometrics*, 60(3),
pp. 582-583.
(doi: 10.1111/j.0006-341X.2004.206_3.x)

Gupta, M. and Liu, J.S.
(2003)
Discovery of conserved sequence patterns using a stochastic dictionary model.
*Journal of the American Statistical Association*, 98(461),
pp. 55-66.
(doi: 10.1198/016214503388619094)

## Books

Altman, N., Banks, D., Hardwick, J., Roeder, K., Craigmile, P.F., Hardin, J. and Gupta, M.
(2005)
*The IMS New Researchers' Survival Guide.*
The Institute of Mathematical Statistics.

## Book Sections

Gupta, M. and Ray, S.
(2012)
Sequence pattern discovery with applications to understanding gene regulation and vaccine design.
In: Rao, C.R., Chakraborty, R. and Sen, P.K. (eds.)
*Handbook of Statistics.*
Elsevier Press.

Zhou, Q. and Gupta, M.
(2009)
Regulatory motif discovery: from decoding to meta-analysis.
In: Fan, J., Lin, X. and Liu, J.S. (eds.)
*New Developments in Biostatistics and Bioinformatics.*
Series: Frontiers of Statistics (1).
World Scientific, pp. 179-208.
ISBN 9789812837431
(doi: 10.1142/9789812837448_0008)

Gupta, M. and Liu, J.S.
(2006)
Bayesian modeling and inference for motif discovery.
In: Do, K.-A., Müller, P. and Vannucci, M. (eds.)
*Bayesian Inference for Gene Expression and Proteomics.*
Cambridge University Press: Cambridge, UK.
ISBN 9780521860925

Liu, J.S., Gupta, M. , Liu, X.L., Mayerhofer, L. and Lawrence, C.L.
(2002)
Statistical models for biological sequence motif discovery.
In:
*Case Studies in Bayesian Statistics.*
Series: Lecture Notes in Statistics, 6 (167).
Springer.

## Research Reports or Papers

Chanialidis, C. , Craigmile, P., Davies, V. , Dean, N. , Evers, L. , Filiippone, M., Gupta, M. , Ray, S. and Rogers, S. (2013) Discussion of Henning and Liao: How to find an appropriate clustering for mixed type variables with application to socio-economic stratification. Journal of the Royal Statistical Society: Series C. 62, 309-369. Discussion Paper. Springer. (doi: 10.1111/j.1467-9876.2012.01066.x).

## Conference Proceedings

Al Alawi, M., Ray, S. and Gupta, M. (2019) A New Framework for Distance-based Functional Clustering. In: 34th International Workshop on Statistical Modelling, Guimarães, Portugal, 07-12 Jul 2019,

**Fri Aug 19 16:56:41 2022 BST**.

## Supervision

**Li**, Lanxin

Developing Effective Bayesian Variable Selection Methods for Genome-wide Association Studies