Handling Uncertain Auxiliary Covariates in Two-Phase Study Design and Analysis
Friday 5th September 11:00-12:00
Maths 311B
Abstract
The two-phase study is cost-effective way to collect and analyze expensive predictors data by accruing data in two phases. In the first phase, observed outcomes and/or inexpensive (auxiliary) covariates for all subjects are used to identify a subset of informative subjects for expensive predictor measurement. In the second phase, all available data are analyzed by leveraging missing data methods.
Existing literature implicitly assume that auxiliary covariates are certain, i.e., are known and well-characterized (e.g., assume a single auxiliary covariate relates linearly to the outcome and/or the expensive predictor), while work on uncertain auxiliary covariates has received little to no attention. Here, I present two approaches that challenge this assumption.
The first one, motivated by post-genome-wide association studies (post-GWAS) and polygenic risk score (PRS) construction, consists of integrating multiple PRS methods for two-phase re-sequencing study design. The proposal solves a convex combination problem aiming to identify the PRS combination that minimizes the mean squared error. In non-edge cases, the resulting combination has the same residuals as a linear regression model with all PRS as covariates, i.e., a residual dependent sampling (RDS). The main advantage of the convex optimization approach is that the resulting PRS combination can be stratified to serve as a sole auxiliary covariate in maximum likelihood methods, whereas stratification in the model with all PRS as covariates remains unclear. The optimization method is evaluated against alternative RDS designs with single or both PRS methods via simulations and real data.
The second one, motivated by the potential of leveraging high-dimensional auxiliary covariates (HACs), evaluates dimension reduction techniques in phase 1 HACs to identify informative subsamples for phase two analysis using RDS. We focus on four techniques: Principal Component Analysis, Uniform Manifold Approximation and Projection, Adaptive Lasso, and Meta-Visualization, to transform HACs into a univariable predictor with essential signals preserved. Altogether, this work aims to shed light on identifying appropriate dimension reduction approaches for handling HAC in two-phase studies. Comprehensive simulations and real data application are presented.
Add to your calendar
Download event information as iCalendar file (only this event)