Abstract
Zusammenfassung. Fehlende Werte stellen in der empirisch-psychologischen Forschung oftmals ein Problem dar. Häufig verwendete Verfahren wie fallweiser und paarweiser Ausschluss, Regression- und Mean-Imputation sind aus methodischer Sicht defizitär. Alternative Verfahren für die Analyse von Datensätzen mit fehlenden Werten, die in den letzten drei Jahrzehnten entwickelt wurden, werden in der Forschungspraxis noch selten angewendet. Der vorliegende Beitrag führt zunächst in die grundlegende Terminologie fehlender Werte nach Rubin (1976) ein. Im Anschluss daran wird eine Übersicht der in der Literatur diskutierten Ansätze zum Umgang mit fehlenden Werten vorgenommen, wobei drei Typen von Verfahren unterschieden werden: klassische Verfahren (z.B. fallweiser Ausschluss), imputationsbasierte Verfahren, in denen fehlende Werte ersetzt (imputiert) werden, und modellbasierte Verfahren, in denen die Schätzung des Modells und die Behandlung der fehlenden Werte in einem Schritt vorgenommen werden. Anhand eines Datenbeispiels wird dann der Einsatz der Multiplen Imputation veranschaulicht. Abschließend werden Implikationen für die Forschungspraxis diskutiert.
Abstract. Missing data are a pervasive problem in empirical psychological research. From the methodological perspective, traditional procedures such as Casewise and Pairwise Deletion, Regression Imputation, and Mean Imputation have distinct weaknesses. Yet modern statistical methods for the analysis of datasets with missing values that have been developed in the past three decades have not yet gained a significant foothold in research practice. We begin this article by introducing the basic concepts and terminology of missing data, as proposed by Rubin (1976). We then give an overview of the different approaches to handling missing data discussed in the literature, distinguishing between three types of procedures: traditional procedures (e.g., Listwise Deletion), imputation-based procedures, in which missing values are replaced by imputed values, and model-based procedures, in which models are estimated and missing data handled in a single step. In the empirical section of the article, we demonstrate the application of Multiple Imputation using a dataset from a large-scale educational assessment. Implications for research practice are discussed.
Literatur
Allison, P. D. (2001). Missing data . Thousands Oaks, CA: SageArbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 243-277). Mahwah, NJ: Lawrence Erlbaum PublishersBaumert, J. , Bos, W. , Lehmann, R. (Hrsg.) (2000). TIMSS/III. Dritte Internationale Mathematik- und Naturwissenschaftsstudie - Mathematische und naturwissenschaftliche Bildung am Ende der Schullaufbahn: Vol. 1. Mathematische und naturwissenschaftliche Grundbildung am Ende der Pflichtschulzeit . Opladen: Leske + BudrichBaumert, J. , Klieme, E. , Neubrand, J. , Prenzel, M. , Schiefele, U. , Schneider, W. , Stanat, P. , Tillmann, K.-J. , Weiß, M. (Hrsg.) (2001). PISA 2000. Basiskompetenzen von Schülerinnen und Schülern im internationalen Vergleich . Opladen: Leske + BudrichBuck, S. F. (1960). A method of estimation of missing values in multivariate data suitable for use with an electronic Computer. Journal of the Royal Statistical Society , 22, 302– 306Chen, J. , Shao, J. (2000). Nearest-neighbor imputation for survey data. Journal of Official Statistics , 16, 583– 599Chen, J. , Shao, J. (2001). Jackknife variance estimation for nearest-neighbor imputation. Journal of the American Statistical Association , 96, 260– 269Cohen, J. (1988). Statistical power analysis for the behavioral sciences . Hillsdale: ErlbaumCollins, L. M. , Schafer, J. L. , Kam, C.-M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods , 6, 330– 351De Leeuw, E. D. (1999). Prevention is the better cure: How to reduce missing data. Kwantitative Methoden , 62, 39– 55Demirtas, H. (2004). Modeling incomplete longitudinal data. Journal of Modern Applied Statistical Methods , 3, 305– 321Dempster, A. P. , Laird, N. M. , Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society , 39, 1– 38Dietz, F. , Schmid, S. , Fries, S. (2005). Lernen oder Freunde treffen? Lernmotivation unter den Bedingungen multipler Handlungsoptionen. Zeitschrift für Pädagogische Psychologie , 19, 173– 189Durrant, G. B. (2005). Imputation methods for handling item-nonresponse in the social sciences: A methodological review . National Centre for Research Methods Working Paper SeriesEnders, C. K. (2001a). A primer on maximum likelihood algorithms available for use with missing data. Structural Equation Modeling , 8, 128– 141Enders, C. K. (2001b). The impact of nonnormality on full information maximum-likelihood estimation for structural equation models with missing data. Psychological Methods , 6, 352– 370Fay, R. E. (1999). Theory and application of nearest neighbor imputation in Census 2000. Proceedings of the Section on Survey Research Methods, American Statistical Association , 112– 121Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing. Psychometrika , 44, 409– 420Gelman, A. , Carlin, J. B. (2002). Poststratification and weighting adjustments. In R. M. Groves, D. A. Dillman, J. L. Eltinge & R. J. A. Little (Eds.), Survey nonresponse (pp. 289-314). New York: WileyGelman, A. , Carlin, J. , Stern, H. , Rubin, D. B. (2003). Bayesian data analysis . London: Chapman & HallGraham, J. W. (2003). Adding missing-data-relevant variables to FIML-based structural equation models. Structural Equation Modeling , 10, 80– 100Graham, J. W. , Cumsille, P. E. , Elek-Fisk, E. (2003). Methods for handling missing data. In J. A. Schinka & W. F. Velicer (Eds.), Handbook of psychology: Research methods in psychology (Vol. 2, pp. 87-114). New York: John Wiley & SonsGraham, J. W. , Schafer, J. L. (1999). On the performance of multiple imputation for multivariate data with small sample size. In R. Hoyle (Ed.), Statistical strategies for small sample research (pp. 1-29). Thousand Oaks, CA: SageGraham, J. W. , Taylor, B. J. , Cumsille, P. E. (2001). Planned missing data designs in the analysis of change. In L. M. Collins & A. G. Sayer (Eds.), New methods for the analysis of change (pp. 335-353). Washington, DC: American Psychological AssociationHorton, N. J. , Lipsitz, S. R. (2001). Multiple imputation in practice: Comparison of software pakkages for regression models with missing variables. The American Statistican , 55, 244– 254Hox, J. J. (1999). A review of current software for handling missing data. Kwantitative Methoden , 62, 123– 138King, G. , Honacker, J. , Joseph, A. , Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review , 95, 49– 69Köller, O. , Watermann, R. , Trautwein, U. , Lüdtke, O. (2004). Wege zur Hochschulreife in Baden-Württemberg. TOSCA - Eine Untersuchung an allgemein bildenden und beruflichen Gymnasien . Opladen: Leske + BudrichLittle, R. J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association , 83, 1198– 1202Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association , 87, 1227– 1237Little, R. J. A. , Rubin, D. B. (2002). Statistical analysis with missing data . New York: WileyLövdén, M. , Ghisletta, P. , Lindenberger, U. (2005). Social participation attenuates decline in perceptual speed in old and very old age. Psychology and Aging , 20, 423– 434McDonald, R. P. , Ho, M.-H. R. (2002). Principles and practice in reporting structural equation analyses. Psychological Methods , 7, 64– 82Mislevy, R. J. , Beaton, A. E. , Kaplan, B. , Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement , 29, 133– 161Muthén, B. , Kaplan, D. , Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika , 42, 431– 462Münnich, R. , Rässler, S. (2005). PRIMA: A new multiple imputation procedure for binary variables. Journal of Official Statistics , 21, 325– 341Newman, D. A. (2003). Longitudinal modeling with randomly and systematically missing data: A simulation of ad hoc, maximum likelihood, and multiple imputation techniques. Organizational Research Methods , 6, 328– 362Peugh, J. L. , Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research , 74, 525– 556Rost, D. H. (2005). Interpretation und Bewertung pädagogisch-psychologischer Studien . Weinheim: BeltzRubin, D. B. (1976). Inference and missing data. Biometrika , 63, 581– 592Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys . New York: WileyRubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association , 91, 473– 489Rubin, D. B. (2003). Nested multiple imputation of NMES via partially incompatible MCMC. Statistica Neerlandica , 57, 3– 18Schafer, J. L. (1997). Analysis of incomplete multivariate data . London: Chapman & HallSchafer, J. L. (1999). NORM: Multiple Imputation of Incomplete Data under a Normal Model [Computer Software]. Retrieved from: http://methodology .psu.edu [4.10.2004]Schafer, J. L. , Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods , 7, 147– 177Schafer, J. L. , Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research , 33, 545– 571Schafer, J. L. , Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics , 11, 437– 457Sijtsma, K. , van der Ark, L. (2003). Investigation and treatment of missing item scores in test and questionnaire data. Multivariate Behavioral Research , 38, 505– 528Sinharay, S. , Stern, H. S. , Russell, D. (2001). The use of multiple imputation for the analysis of missing data. Psychological Methods , 6, 317– 329Spieß, M. (2004). Analyse von Längsschnittdaten mit fehlenden Werten. Grundlagen, Verfahren und Anwendungen . Habilitationsschrift: Bremen. http://nbn-resolving.de/urn:nbn:de: gbv:46-20050620037Tanner, M. A. , Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association , 82, 528– 540Toutenburg, H. , Heumann, C. , Nittner, T. (2004). Statistische Methoden bei unvollständigen Daten. Discussion Paper 380 . Ludwig-Maximilians-Universität, MünchenTrautwein, U. , Lüdtke, O. , Köller, O. , Baumert, J. (2006). Self-esteem, academic self-concept, and achievement: How the learning environment moderates the dynamics of self-concept. Journal of Personality and Social Psychology , 90, 334– 349Van Buuren, S. , Oudshoorn, K. (1999). Flexible imputation by MICE . Report TNO-PG 99.054, TNO Prevention and Health, Leiden. Retrieved from: www.multiple.imputation.com [20.04.2006]Verbeke, G. , Molenberghs, G. (2000). Linear mixed models for longitudinal data . New York: Springervon Hippel, P. T. (2004). Biases in SPSS 12.0 missing value analysis. American Statistician , 58, 160– 164West, S. G. (2001). New approaches to missing data in psychological research: Introduction to the special section. Psychological Methods , 6, 315– 316Wilkinson, L. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist , 54, 594– 604