Abstract

Zusammenfassung. Fehlende Werte stellen in der empirisch-psychologischen Forschung oftmals ein Problem dar. Häufig verwendete Verfahren wie fallweiser und paarweiser Ausschluss, Regression- und Mean-Imputation sind aus methodischer Sicht defizitär. Alternative Verfahren für die Analyse von Datensätzen mit fehlenden Werten, die in den letzten drei Jahrzehnten entwickelt wurden, werden in der Forschungspraxis noch selten angewendet. Der vorliegende Beitrag führt zunächst in die grundlegende Terminologie fehlender Werte nach Rubin (1976) ein. Im Anschluss daran wird eine Übersicht der in der Literatur diskutierten Ansätze zum Umgang mit fehlenden Werten vorgenommen, wobei drei Typen von Verfahren unterschieden werden: klassische Verfahren (z.B. fallweiser Ausschluss), imputationsbasierte Verfahren, in denen fehlende Werte ersetzt (imputiert) werden, und modellbasierte Verfahren, in denen die Schätzung des Modells und die Behandlung der fehlenden Werte in einem Schritt vorgenommen werden. Anhand eines Datenbeispiels wird dann der Einsatz der Multiplen Imputation veranschaulicht. Abschließend werden Implikationen für die Forschungspraxis diskutiert.

Handling of missing data in psychological research: Problems and solutions

Abstract. Missing data are a pervasive problem in empirical psychological research. From the methodological perspective, traditional procedures such as Casewise and Pairwise Deletion, Regression Imputation, and Mean Imputation have distinct weaknesses. Yet modern statistical methods for the analysis of datasets with missing values that have been developed in the past three decades have not yet gained a significant foothold in research practice. We begin this article by introducing the basic concepts and terminology of missing data, as proposed by Rubin (1976). We then give an overview of the different approaches to handling missing data discussed in the literature, distinguishing between three types of procedures: traditional procedures (e.g., Listwise Deletion), imputation-based procedures, in which missing values are replaced by imputed values, and model-based procedures, in which models are estimated and missing data handled in a single step. In the empirical section of the article, we demonstrate the application of Multiple Imputation using a dataset from a large-scale educational assessment. Implications for research practice are discussed.

Literatur

Allison, P. D. (2001). Missing data . Thousands Oaks, CA: Sage First citation in article Google Scholar
Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 243-277). Mahwah, NJ: Lawrence Erlbaum Publishers First citation in article Google Scholar
Baumert, J. , Bos, W. , Lehmann, R. (Hrsg.) (2000). TIMSS/III. Dritte Internationale Mathematik- und Naturwissenschaftsstudie - Mathematische und naturwissenschaftliche Bildung am Ende der Schullaufbahn: Vol. 1. Mathematische und naturwissenschaftliche Grundbildung am Ende der Pflichtschulzeit . Opladen: Leske + Budrich First citation in article Crossref, Google Scholar
Baumert, J. , Klieme, E. , Neubrand, J. , Prenzel, M. , Schiefele, U. , Schneider, W. , Stanat, P. , Tillmann, K.-J. , Weiß, M. (Hrsg.) (2001). PISA 2000. Basiskompetenzen von Schülerinnen und Schülern im internationalen Vergleich . Opladen: Leske + Budrich First citation in article Google Scholar
Buck, S. F. (1960). A method of estimation of missing values in multivariate data suitable for use with an electronic Computer. Journal of the Royal Statistical Society , 22, 302– 306 First citation in article Google Scholar
Chen, J. , Shao, J. (2000). Nearest-neighbor imputation for survey data. Journal of Official Statistics , 16, 583– 599 First citation in article Google Scholar
Chen, J. , Shao, J. (2001). Jackknife variance estimation for nearest-neighbor imputation. Journal of the American Statistical Association , 96, 260– 269 First citation in article Crossref, Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences . Hillsdale: Erlbaum First citation in article Google Scholar
Collins, L. M. , Schafer, J. L. , Kam, C.-M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods , 6, 330– 351 First citation in article Crossref, Google Scholar
De Leeuw, E. D. (1999). Prevention is the better cure: How to reduce missing data. Kwantitative Methoden , 62, 39– 55 First citation in article Google Scholar
Demirtas, H. (2004). Modeling incomplete longitudinal data. Journal of Modern Applied Statistical Methods , 3, 305– 321 First citation in article Google Scholar
Dempster, A. P. , Laird, N. M. , Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society , 39, 1– 38 First citation in article Google Scholar
Dietz, F. , Schmid, S. , Fries, S. (2005). Lernen oder Freunde treffen? Lernmotivation unter den Bedingungen multipler Handlungsoptionen. Zeitschrift für Pädagogische Psychologie , 19, 173– 189 First citation in article Link, Google Scholar
Durrant, G. B. (2005). Imputation methods for handling item-nonresponse in the social sciences: A methodological review . National Centre for Research Methods Working Paper Series First citation in article Google Scholar
Enders, C. K. (2001a). A primer on maximum likelihood algorithms available for use with missing data. Structural Equation Modeling , 8, 128– 141 First citation in article Crossref, Google Scholar
Enders, C. K. (2001b). The impact of nonnormality on full information maximum-likelihood estimation for structural equation models with missing data. Psychological Methods , 6, 352– 370 First citation in article Crossref, Google Scholar
Fay, R. E. (1999). Theory and application of nearest neighbor imputation in Census 2000. Proceedings of the Section on Survey Research Methods, American Statistical Association , 112– 121 First citation in article Google Scholar
Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing. Psychometrika , 44, 409– 420 First citation in article Crossref, Google Scholar
Gelman, A. , Carlin, J. B. (2002). Poststratification and weighting adjustments. In R. M. Groves, D. A. Dillman, J. L. Eltinge & R. J. A. Little (Eds.), Survey nonresponse (pp. 289-314). New York: Wiley First citation in article Google Scholar
Gelman, A. , Carlin, J. , Stern, H. , Rubin, D. B. (2003). Bayesian data analysis . London: Chapman & Hall First citation in article Google Scholar
Graham, J. W. (2003). Adding missing-data-relevant variables to FIML-based structural equation models. Structural Equation Modeling , 10, 80– 100 First citation in article Crossref, Google Scholar
Graham, J. W. , Cumsille, P. E. , Elek-Fisk, E. (2003). Methods for handling missing data. In J. A. Schinka & W. F. Velicer (Eds.), Handbook of psychology: Research methods in psychology (Vol. 2, pp. 87-114). New York: John Wiley & Sons First citation in article Crossref, Google Scholar
Graham, J. W. , Schafer, J. L. (1999). On the performance of multiple imputation for multivariate data with small sample size. In R. Hoyle (Ed.), Statistical strategies for small sample research (pp. 1-29). Thousand Oaks, CA: Sage First citation in article Google Scholar
Graham, J. W. , Taylor, B. J. , Cumsille, P. E. (2001). Planned missing data designs in the analysis of change. In L. M. Collins & A. G. Sayer (Eds.), New methods for the analysis of change (pp. 335-353). Washington, DC: American Psychological Association First citation in article Crossref, Google Scholar
Horton, N. J. , Lipsitz, S. R. (2001). Multiple imputation in practice: Comparison of software pakkages for regression models with missing variables. The American Statistican , 55, 244– 254 First citation in article Crossref, Google Scholar
Hox, J. J. (1999). A review of current software for handling missing data. Kwantitative Methoden , 62, 123– 138 First citation in article Google Scholar
King, G. , Honacker, J. , Joseph, A. , Scheve, K. (2001). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. American Political Science Review , 95, 49– 69 First citation in article Google Scholar
Köller, O. , Watermann, R. , Trautwein, U. , Lüdtke, O. (2004). Wege zur Hochschulreife in Baden-Württemberg. TOSCA - Eine Untersuchung an allgemein bildenden und beruflichen Gymnasien . Opladen: Leske + Budrich First citation in article Google Scholar
Little, R. J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association , 83, 1198– 1202 First citation in article Crossref, Google Scholar
Little, R. J. A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association , 87, 1227– 1237 First citation in article Google Scholar
Little, R. J. A. , Rubin, D. B. (2002). Statistical analysis with missing data . New York: Wiley First citation in article Crossref, Google Scholar
Lövdén, M. , Ghisletta, P. , Lindenberger, U. (2005). Social participation attenuates decline in perceptual speed in old and very old age. Psychology and Aging , 20, 423– 434 First citation in article Crossref, Google Scholar
McDonald, R. P. , Ho, M.-H. R. (2002). Principles and practice in reporting structural equation analyses. Psychological Methods , 7, 64– 82 First citation in article Crossref, Google Scholar
Mislevy, R. J. , Beaton, A. E. , Kaplan, B. , Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement , 29, 133– 161 First citation in article Crossref, Google Scholar
Muthén, B. , Kaplan, D. , Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika , 42, 431– 462 First citation in article Crossref, Google Scholar
Münnich, R. , Rässler, S. (2005). PRIMA: A new multiple imputation procedure for binary variables. Journal of Official Statistics , 21, 325– 341 First citation in article Google Scholar
Newman, D. A. (2003). Longitudinal modeling with randomly and systematically missing data: A simulation of ad hoc, maximum likelihood, and multiple imputation techniques. Organizational Research Methods , 6, 328– 362 First citation in article Crossref, Google Scholar
Peugh, J. L. , Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research , 74, 525– 556 First citation in article Crossref, Google Scholar
Rost, D. H. (2005). Interpretation und Bewertung pädagogisch-psychologischer Studien . Weinheim: Beltz First citation in article Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika , 63, 581– 592 First citation in article Crossref, Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys . New York: Wiley First citation in article Crossref, Google Scholar
Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association , 91, 473– 489 First citation in article Crossref, Google Scholar
Rubin, D. B. (2003). Nested multiple imputation of NMES via partially incompatible MCMC. Statistica Neerlandica , 57, 3– 18 First citation in article Crossref, Google Scholar
Schafer, J. L. (1997). Analysis of incomplete multivariate data . London: Chapman & Hall First citation in article Crossref, Google Scholar
Schafer, J. L. (1999). NORM: Multiple Imputation of Incomplete Data under a Normal Model [Computer Software]. Retrieved from: http://methodology .psu.edu [4.10.2004] First citation in article Google Scholar
Schafer, J. L. , Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods , 7, 147– 177 First citation in article Crossref, Google Scholar
Schafer, J. L. , Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research , 33, 545– 571 First citation in article Crossref, Google Scholar
Schafer, J. L. , Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics , 11, 437– 457 First citation in article Crossref, Google Scholar
Sijtsma, K. , van der Ark, L. (2003). Investigation and treatment of missing item scores in test and questionnaire data. Multivariate Behavioral Research , 38, 505– 528 First citation in article Crossref, Google Scholar
Sinharay, S. , Stern, H. S. , Russell, D. (2001). The use of multiple imputation for the analysis of missing data. Psychological Methods , 6, 317– 329 First citation in article Crossref, Google Scholar
Spieß, M. (2004). Analyse von Längsschnittdaten mit fehlenden Werten. Grundlagen, Verfahren und Anwendungen . Habilitationsschrift: Bremen. http://nbn-resolving.de/urn:nbn:de: gbv:46-20050620037 First citation in article Google Scholar
Tanner, M. A. , Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association , 82, 528– 540 First citation in article Crossref, Google Scholar
Toutenburg, H. , Heumann, C. , Nittner, T. (2004). Statistische Methoden bei unvollständigen Daten. Discussion Paper 380 . Ludwig-Maximilians-Universität, München First citation in article Google Scholar
Trautwein, U. , Lüdtke, O. , Köller, O. , Baumert, J. (2006). Self-esteem, academic self-concept, and achievement: How the learning environment moderates the dynamics of self-concept. Journal of Personality and Social Psychology , 90, 334– 349 First citation in article Crossref, Google Scholar
Van Buuren, S. , Oudshoorn, K. (1999). Flexible imputation by MICE . Report TNO-PG 99.054, TNO Prevention and Health, Leiden. Retrieved from: www.multiple.imputation.com [20.04.2006] First citation in article Google Scholar
Verbeke, G. , Molenberghs, G. (2000). Linear mixed models for longitudinal data . New York: Springer First citation in article Google Scholar
von Hippel, P. T. (2004). Biases in SPSS 12.0 missing value analysis. American Statistician , 58, 160– 164 First citation in article Crossref, Google Scholar
West, S. G. (2001). New approaches to missing data in psychological research: Introduction to the special section. Psychological Methods , 6, 315– 316 First citation in article Crossref, Google Scholar
Wilkinson, L. Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist , 54, 594– 604 First citation in article Crossref, Google Scholar

Volume 58Issue 2April 2007

ISSN: 0033-3042eISSN: 2190-6238

Licenses & Copyright

Keywords

Acknowledgments:

Die Autoren danken Michael Becker, Mareike Kunter und Peter Wittek sowie zwei anonymen Gutachtern für wertvolle Hinweise und Kommentare zu früheren Versionen dieses Beitrags.

PDF download

Verify Phone

Congrats!

Umgang mit fehlenden Werten in der psychologischen Forschung

Abstract

Literatur

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

Umgang mit fehlenden Werten in der psychologischen Forschung

Abstract

Literatur

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners