Skip to main content

Abstract

The chapter discusses statistical techniques to specify and estimate regression models for continuous dependent variables. They focus either on the level or the change of the dependent variable. While most techniques explicitly model the panel design, some only control for the serial correlations, e.g., by using robust standard errors. Ordinary or generalized least squares is used for model estimation. The discussion includes first differences as well as fixed and random effects estimation and their variants.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    An independent variable is called endogenous, when it is not only an explanatory variable of Y, but at the same time causally determined by Y itself. For the wagepan data, one could hypothesize that the explanatory variable union is also determined by the dependent variable lwage, because higher income groups may be less inclined to become union members.

  2. 2.

    It should be stressed, however, that the assumption of β 0(t)=β 0 is only made for ease of exposition. Since most y trend over time, it is usually necessary to model the time trend. Otherwise, the estimates of the other explanatory variables are biased. The analysis of trends is deferred to Sect. 4.2.

  3. 3.

    It should be noted that, in most cases, serial correlation is positive (see also footnote 5 on p. 69). Negative serial correlation would imply that negative residuals, at one point in time, are associated with mostly positive residuals at the next point in time (and vice versa). Such an oscillatory pattern of residuals is hardly observed with panel data.

  4. 4.

    Since we are using cluster-robust standard errors, the degrees of freedom, df 2=n−1, of the overall F test depend on the number of clusters (n=545 units), and not on the number of observations (N=4,360) in the data set.

  5. 5.

    The effects of effect coded dummy variables will measure differences from the overall mean, if (and only if) the different categories of the corresponding categorical variable (in this case: country) have identical frequencies. In our case, we have reduced the data to a balanced panel, hence each country is observed the same number of times.

  6. 6.

    More specifically, government spending in the US amounts to 35.2 %, if all independent variables are zero (i.e., lowwage = trade = unemp = growthpc = depratio = left = cdem = 0). This is not a very realistic situation, and it is perhaps a better strategy to center these variables around their US mean before putting them into the regression model. A similar caveat applies to all the other LSDV specifications.

  7. 7.

    We did not observe this in our LSDV analysis of the garmit data, because the model did not include any time-constant variables. As mentioned, if there had been any time-constant explanatory variables, it would have been impossible to estimate their effects when specifying country dummies.

  8. 8.

    Of course, we would get exactly the same picture, if we would graph the residuals from a regression including only a dummy for Sweden.

  9. 9.

    Its R 2 and the F statistic for the country effects are not comparable to the other models and have no substantive meaning. Basically, the F statistic tests whether country effects and the constant are zero. Correspondingly, the computation of R 2 assumes that the average of Y is zero and uses ∑(y it −0)2 and not \(\sum(y_{\mathit{it}}-\bar{\bar {y}}_{..})^{2}\) in its denominator.

  10. 10.

    Some FE routines also report an “overall” R 2, measuring the proportion of the overall variance explained by all independent variables except the unit-specific effects. Therefore, its value is not identical to the R 2 from LSDV estimation.

  11. 11.

    Remember: There are only n−1 country dummies in the unrestricted model, because one dummy had to be excluded for model identification.

  12. 12.

    Nevertheless, a model including these many dummies is feasible on today’s computers.

  13. 13.

    At this point, it becomes obvious why some FE programs estimate a regression constant. Otherwise, estimates of the unobserved individual-specific effect, u i , would have to control for the overall level of the dependent variable.

  14. 14.

    Many programs provide estimates of ν i . Thus, it is not necessary to perform the computations manually. Unfortunately, some programs label \(\hat{\nu}_{i}\) with the letter u, giving the wrong impression that \(\hat{\nu}_{i}\) is identical to an estimate of unobserved heterogeneity u i . But, as already mentioned, \(\hat{\nu}_{i}\) includes unobserved and observed heterogeneity.

  15. 15.

    The effects of the time-constant independent variables (e.g., schooling) have exactly the same estimates as those in the pooled OLS regression model, because the estimates \(\hat{\nu}_{i}\) include the effects of z 1i ,…,z ji (see (4.14)).

  16. 16.

    You should check that the first-order serial correlations from Table 3.7 are identical to the numbers below the main diagonal in the lower part of Table 4.7.

  17. 17.

    Since, in both cases, some of the correlations are positive and some are negative, we computed the average of their absolute values.

  18. 18.

    This estimate is easily derived from the between-unit variance (\(\hat {\sigma}_{b}^{2}=0.3907^{2}\)) and within-unit (\(\hat{\sigma}_{w}^{2}=0.3872^{2}\)) variance of (log) hourly wages (see Table 3.6). In the FE model, \(\hat{\sigma}_{\nu}^{2}=\hat{\sigma}_{b}^{2}\) and \(\hat{\sigma}_{e}^{2}=\hat{\sigma}_{w}^{2}\) (see Textbox 3.1), hence \(\hat{\rho}_{\mathit{FE}}=0.3907^{2}/(0.3907^{2}+0.3872^{2})=0.5045\). In other words: in the case of an empty FE model (k=0) (4.17) and (4.18) reduce to (3.2) and (3.3).

  19. 19.

    We use X 2 to symbolize test statistics that are distributed as χ 2. However, since there are several of them, we distinguish them by indices (\(X_{1}^{2}\), \(X_{2}^{2}\), etc.).

  20. 20.

    Note that the mean of a time-constant explanatory variable Z equals the variable itself (e.g., \(\bar{z}_{1i}=z_{1i}\)). Note, also, that you have to generate a new “variable” constant that equals (1−θ) to obtain an estimate of the regression constant. Include constant in your model as an additional variable and force your regression program not to estimate a regression constant. The RE constant is estimated by the regression coefficient of the variable constant.

  21. 21.

    The same is true for the OLS estimate, but it is far more similar to the BE estimate than the RE estimate.

  22. 22.

    It is essential to include the means of all X into the model. Otherwise, the FE estimates will not be replicated. Moreover, you must have a balanced panel. Otherwise, the effects of the time-varying explanatory variables in the hybrid model will only approximate the FE estimates. Depending on the amount of missing panel data, the differences may become quite large.

  23. 23.

    Naturally, this test focuses only on the effects of those variables that are estimated in both models, i.e., the time-varying explanatory variables X.

  24. 24.

    It should also be noted that the inversion of \((\widehat{\varPsi}^{\mathit{FE}}-\widehat{\varPsi}^{\mathit{RE}})\) in (4.32) is sometimes problematic. One such situation is, when the model includes variables that show no between-unit variance, such as functions of time that are the same for all units. If the model would include only such variables, RE and FE estimates would be identical (see also the discussion surrounding the following Fig. 4.5). The other has to do with the two variance-covariance matrices \(\widehat{\varPsi}^{\mathit{FE}}\) and \(\widehat{\varPsi}^{\mathit{RE}}\). To estimate both of them, an estimate of σ u is needed. As Table 4.11 shows, \(\hat{\sigma}^{\mathit{RE}}_{u}\) and \(\hat{\sigma}^{\mathit{FE}}_{u}\) can be different and this may cause the problem. Some software implementations of the Hausman test allow the user to specify which one of the two to use in the computation of (4.32). These practical problems do not exist, if one uses the hybrid model to test RE–FE differences.

  25. 25.

    As an example, see the simulation study by Clark and Linzer (2012). The study also analyzes the power of the Hausman test. Unfortunately, it focuses on data sets with limited sample sizes as they are typical in political science research.

  26. 26.

    Why U is multiplied with a 1 will become clear later.

  27. 27.

    Technically, it is easy to replicate the RE and FE estimates of Table 4.8 with SEM. See the web site (Sect. 7.3) for the corresponding syntax file.

  28. 28.

    The figure shows a slightly different trend for men than the corresponding figure in Andreß and Bröckel (2007), in which men—from the third year after separation—have about the same degree of life satisfaction as women. This is due to the larger sample they used, and to the fact that they weighted the data for this descriptive plot.

  29. 29.

    This is not exactly true for how we defined the time dummies. Since we measured all observations three and more years before separation with the same dummy (D i,−3=1 if t≤−3), the impact function does not change for the more distant observations before separation. The same is true for the more distant observations after separation (t≥3) for which D i3=1. Hence, the following linear dependence is not perfect and in principle, we could estimate a FE or FD model that also includes the variable age. We have not done this, because this age effect would be estimated from the observations that are at least three years away from the event of separation.

  30. 30.

    As already noted in footnote 9 on p. 138, estimating regression models without a constant results in not very useful R 2 and F statistics. Hence, the R 2 of the FD model (0.0223) should not be compared with the overall R 2 values of the FE and RE model. If one is only interested in the effects of the time-dependent explanatory variables X, Wooldridge (2009, 466) suggests to stick with the undifferenced time dummies and to specify a model that includes a constant and (T−2) (original) time dummies. The estimated effects of the explanatory variables X will be the same as in the “completely” differenced model.

  31. 31.

    Again, the largest differences are observed for the variable age (ager) and being divorced (divorce).

  32. 32.

    We do not interpret the substantive effects of model 3, because they are similar to those of the former model 2.

References

  • Allison, P. (2009). Fixed effects regression models. Thousand Oaks: Sage.

    Google Scholar 

  • Andreß, H.-J., & Bröckel, M. (2007). Income and life satisfaction after marital disruption in Germany. Journal of Marriage and Family, 69(2), 500–512.

    Article  Google Scholar 

  • Annacker, D., & Hildebrandt, L. (2004). Unobservable effects in structural models of business performance. Journal of Business Research, 57(5), 507–517.

    Article  Google Scholar 

  • Baltagi, B. (2008). Econometric analysis of panel data. Chichester: Wiley.

    Google Scholar 

  • Bollen, K., & Curran, P. (2006). Latent curve models: A structural equation perspective. New York: Wiley-Interscience.

    MATH  Google Scholar 

  • Bollen, K. A., & Brand, J. E. (2010). A general panel model with random and fixed effects: A structural equations approach. Social Forces, 89(1), 1–34.

    Article  Google Scholar 

  • Breusch, T. S., & Pagan, A. R. (1980). The Lagrange multiplier test and its applications to model specification in econometrics. The Review of Economic Studies, 47(1), 239–253.

    Article  MathSciNet  MATH  Google Scholar 

  • Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Clark, T. S., & Linzer, D. A. (2012). Should I use fixed or random effects? Technical report, Emory University: Department of Political Science. http://polmeth.wustl.edu/mediaDetail.php?docId=1315 (accessed 18.4.2012).

  • Duncan, T., Duncan, S., & Strycker, L. (2006). An introduction to latent variable growth curve modeling: Concepts, issues, and applications. Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Garrett, G., & Mitchell, D. (2001). Globalization, government spending and taxation in the OECD. European Journal of Political Research, 39, 145–177.

    Google Scholar 

  • Greene, W. H. (2008). Econometric analysis. Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Hausman, J. (1978). Specification tests in econometrics. Econometrica, 46, 1251–1271.

    Article  MathSciNet  MATH  Google Scholar 

  • Hausman, J. A., & Taylor, W. E. (1981). Panel data and unobservable individual effects. Econometrica, 49(6), 1377–1398.

    Article  MathSciNet  MATH  Google Scholar 

  • Hox, J. (2010). Multilevel analysis: Techniques and applications. London: Routledge.

    MATH  Google Scholar 

  • Hsiao, C. (2003). Analysis of panel data (Econometric Society Monographs). Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Inglehart, R. (1971). The silent revolution in Europe: Intergenerational change in postindustrial societies. American Political Science Review, 65, 991–1017.

    Article  Google Scholar 

  • Johnson, D. R., & Wu, J. (2002). An empirical test of crisis, social selection, and role explanations of the relationship between marital disruption and psychological distress: A pooled time-series analysis of four-wave panel data. Journal of Marriage and Family, 64(1), 211–224.

    Article  Google Scholar 

  • Kittel, B., & Winner, H. (2005). How reliable is pooled analysis in political economy? The globalization-welfare state nexus revisited. European Journal of Political Research, 44(2), 269–293.

    Article  Google Scholar 

  • Klein, M., & Pötschke, M. (2004). Die intra-individuelle Stabilität gesellschaftlicher Wertorientierungen. KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie, 56(3), 432–456.

    Article  Google Scholar 

  • Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica, 46(1), 69–85.

    Article  MathSciNet  MATH  Google Scholar 

  • Snijders, T., & Bosker, R. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks: Sage.

    Google Scholar 

  • Vella, F., & Verbeek, M. (1998). Whose wages do unions raise? A dynamic model of unionism and wage rate determination for young men. Journal of Applied Econometrics, 13, 163–183.

    Article  Google Scholar 

  • Wooldridge, J. M. (2009). Introductory econometrics. A modern approach. Mason: South-Western.

    Google Scholar 

  • Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data (2nd ed.). Cambridge: MIT Press.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Andreß, HJ., Golsch, K., Schmidt, A.W. (2013). Panel Analysis of Continuous Variables. In: Applied Panel Data Analysis for Economic and Social Surveys. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32914-2_4

Download citation

Publish with us

Policies and ethics