 |
Royal Statistical Society
Statistical Computing Section Meetings
|
Half-day meeting on the use and abuse of missing data
"Tricks of the trade: what to do with missing data "
Joint Meeting with the Official Statistics Section
"Missing data a pervasive fact of life: a user's perspective."
Dick Wiggins (City University, London
) and
Gopal Netuveli (Imperial College, London)
Synopsis: This introduction provides a user's perspective to some of the basic terminology used to describe the process of missingness and the critical application of various remedies to handle missing data in the context of the British Household Panel Survey using STATA. The strategies include the cost of ignoring the problem, ad hoc methods, hot decking, an introduction to multiple imputation and Heckman's approach.
Dick and Gopal's slides are available in Powerpoint format.
"The role and scope of multiple imputation for incomplete data."
Nick Longford (SNTL, Leicester
)
Synopsis: Multiple imputation (MI) was originally designed for the setting with a number of secondary analysts who wish to apply complete-data methods on publicly available databases without requiring any expertise in methods for handling missing values, yet they and the distributor (data constructor) have a stake in good statistical practice (near-efficient estimation). In principle, MI can be applied to any problem that can be formulated as involving missinginformation - its scope is wider than of the EM algorithm, which is constrained by our limited ability or capacity to iteratively execute the E-step (estimation of the complete-data sufficient statistics). Direct modelling of the data-generating (sampling) and nonresponse processes is the gold standard, but its application is practical only in one-off analyses, and the expertise in it is difficult to permeate through the community of analysts. The strengths and weaknesses of MI will be overviewed, with references to published examples. The 'difficult' parts of the MI method (modelling of the nonresponse process) will be discussed from the 'missionary' perspective -- how good statistical practice can be promoted without the stringent requirements of expertise in statistical theory or any specialised training. NMAR mechanisms will be discussed in connection with sensitivity analysis.
Nick's slides are available in Acrobat format (PDF).
"Practical solutions for dealing with missing data"
Rob Woods (SPSS
)
Synopsis: Missing data is a common issue that should be considered when undertaking any applied work. There are a number of ways to deal with missing data. When starting out on this process, it is always imperative that the reasons for missing data are understood. This presentation outlines some practical tips to overcome missing data. This presentation also outlines how some techniques are better at dealing with missing data than others, along with other general approaches to imputing missing data.
Robs's slides are available in Powerpoint format.
"Multiple Imputation for Multilevel Data."
James Carpenter and Mike Kenward (London School of Hygiene and Tropical Medicine
)
Synopsis: Multiple imputation has proved to be an invaluable tool for handling incomplete data, especially with large messy datasets with many incomplete explanatory variables. A central role is played by the imputation distribution which requires modelling multivariate distributions made up of potentially diverse types of variable (e.g. continuous, ordinal and nominal discrete.) An important additional complication is the presence of multilevel structure among these variables, in particular hierarchical and
longitudinal. In this talk we discuss some of the approaches used for modelling such multivariate data, and for drawing appropriate imputations. We focus in particular on a newly developed macro in MLwiN for multilevel multiple imputation.
"SOLAS 3.2 for Missing Data Analysis."
Fiona O'Callaghan (Statistical Solutions Ltd
)
Synopsis: SOLAS 3.2 for Missing Data Analysis is a windows based software tool for data imputation and missing data exploratory analysis that provides a choice of both Multiple Imputation and Single Imputation methods.
The Single Imputation methods available in SOLAS include; Hot Decking, Regression Imputation, Group Means, and Last Value Carried Forward.
Multiple Imputation was originally proposed by Rubin in the early 1970's as a possible solution to the problem of survey nonresponse, to address the failings of standard analyses of incomplete datasets. The idea behind Multiple Imputation is that for each missing value in a dataset, we impute several values (M) instead of just one, to represent the uncertainty about which values to impute.
In SOLAS, users have two Multiple Imputation approaches to choose from, namely: a predictive model-based approach, where the predictive information contained in a user-specified set of covariates is used to predict the missing values, or a propensity score-based approach, in which cases are grouped according to their probability of being missing (i.e. propensity score) and then an approximate Bayesian bootstrap is applied to sample observed values to impute the missing values.
This demonstration will include examples of how SOLAS can be used to perform multiple imputation on datasets containing both continuous and categorical data.
Fiona's slides are available in Powerpoint format.
Date & Time
Wednesday 18th May 2005 at 2:00-5:30 pm
Place
Errol Street
Refreshments
There will be tea and biscuits served at about 3:30 pm.
The Royal Statistical Society
Statistical Computing Section home page