Panel Analysis of Continuous Variables

Andreß, Hans-Jürgen; Golsch, Katrin; Schmidt, Alexander W.

doi:10.1007/978-3-642-32914-2_4

Hans-Jürgen Andreß⁴,
Katrin Golsch⁵ &
Alexander W. Schmidt⁴

6599 Accesses

Abstract

The chapter discusses statistical techniques to specify and estimate regression models for continuous dependent variables. They focus either on the level or the change of the dependent variable. While most techniques explicitly model the panel design, some only control for the serial correlations, e.g., by using robust standard errors. Ordinary or generalized least squares is used for model estimation. The discussion includes first differences as well as fixed and random effects estimation and their variants.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
An independent variable is called endogenous, when it is not only an explanatory variable of Y, but at the same time causally determined by Y itself. For the wagepan data, one could hypothesize that the explanatory variable union is also determined by the dependent variable lwage, because higher income groups may be less inclined to become union members.
2.
It should be stressed, however, that the assumption of β ₀(t)=β ₀ is only made for ease of exposition. Since most y trend over time, it is usually necessary to model the time trend. Otherwise, the estimates of the other explanatory variables are biased. The analysis of trends is deferred to Sect. 4.2.
3.
It should be noted that, in most cases, serial correlation is positive (see also footnote 5 on p. 69). Negative serial correlation would imply that negative residuals, at one point in time, are associated with mostly positive residuals at the next point in time (and vice versa). Such an oscillatory pattern of residuals is hardly observed with panel data.
4.
Since we are using cluster-robust standard errors, the degrees of freedom, df ₂=n−1, of the overall F test depend on the number of clusters (n=545 units), and not on the number of observations (N=4,360) in the data set.
5.
The effects of effect coded dummy variables will measure differences from the overall mean, if (and only if) the different categories of the corresponding categorical variable (in this case: country) have identical frequencies. In our case, we have reduced the data to a balanced panel, hence each country is observed the same number of times.
6.
More specifically, government spending in the US amounts to 35.2 %, if all independent variables are zero (i.e., lowwage = trade = unemp = growthpc = depratio = left = cdem = 0). This is not a very realistic situation, and it is perhaps a better strategy to center these variables around their US mean before putting them into the regression model. A similar caveat applies to all the other LSDV specifications.
7.
We did not observe this in our LSDV analysis of the garmit data, because the model did not include any time-constant variables. As mentioned, if there had been any time-constant explanatory variables, it would have been impossible to estimate their effects when specifying country dummies.
8.
Of course, we would get exactly the same picture, if we would graph the residuals from a regression including only a dummy for Sweden.
9.
Its R ² and the F statistic for the country effects are not comparable to the other models and have no substantive meaning. Basically, the F statistic tests whether country effects and the constant are zero. Correspondingly, the computation of R ² assumes that the average of Y is zero and uses ∑(y _it−0)² and not \(\sum(y_{\mathit{it}}-\bar{\bar {y}}_{..})^{2}\) in its denominator.
10.
Some FE routines also report an “overall” R ², measuring the proportion of the overall variance explained by all independent variables except the unit-specific effects. Therefore, its value is not identical to the R ² from LSDV estimation.
11.
Remember: There are only n−1 country dummies in the unrestricted model, because one dummy had to be excluded for model identification.
12.
Nevertheless, a model including these many dummies is feasible on today’s computers.
13.
At this point, it becomes obvious why some FE programs estimate a regression constant. Otherwise, estimates of the unobserved individual-specific effect, u _i, would have to control for the overall level of the dependent variable.
14.
Many programs provide estimates of ν _i. Thus, it is not necessary to perform the computations manually. Unfortunately, some programs label \(\hat{\nu}_{i}\) with the letter u, giving the wrong impression that \(\hat{\nu}_{i}\) is identical to an estimate of unobserved heterogeneity u _i. But, as already mentioned, \(\hat{\nu}_{i}\) includes unobserved and observed heterogeneity.
15.
The effects of the time-constant independent variables (e.g., schooling) have exactly the same estimates as those in the pooled OLS regression model, because the estimates \(\hat{\nu}_{i}\) include the effects of z _1i,…,z _ji (see (4.14)).
16.
You should check that the first-order serial correlations from Table 3.7 are identical to the numbers below the main diagonal in the lower part of Table 4.7.
17.
Since, in both cases, some of the correlations are positive and some are negative, we computed the average of their absolute values.
18.
This estimate is easily derived from the between-unit variance (\(\hat {\sigma}_{b}^{2}=0.3907^{2}\)) and within-unit (\(\hat{\sigma}_{w}^{2}=0.3872^{2}\)) variance of (log) hourly wages (see Table 3.6). In the FE model, \(\hat{\sigma}_{\nu}^{2}=\hat{\sigma}_{b}^{2}\) and \(\hat{\sigma}_{e}^{2}=\hat{\sigma}_{w}^{2}\) (see Textbox 3.1), hence \(\hat{\rho}_{\mathit{FE}}=0.3907^{2}/(0.3907^{2}+0.3872^{2})=0.5045\). In other words: in the case of an empty FE model (k=0) (4.17) and (4.18) reduce to (3.2) and (3.3).
19.
We use X ² to symbolize test statistics that are distributed as χ ². However, since there are several of them, we distinguish them by indices (\(X_{1}^{2}\), \(X_{2}^{2}\), etc.).
20.
Note that the mean of a time-constant explanatory variable Z equals the variable itself (e.g., \(\bar{z}_{1i}=z_{1i}\)). Note, also, that you have to generate a new “variable” constant that equals (1−θ) to obtain an estimate of the regression constant. Include constant in your model as an additional variable and force your regression program not to estimate a regression constant. The RE constant is estimated by the regression coefficient of the variable constant.
21.
The same is true for the OLS estimate, but it is far more similar to the BE estimate than the RE estimate.
22.
It is essential to include the means of all X into the model. Otherwise, the FE estimates will not be replicated. Moreover, you must have a balanced panel. Otherwise, the effects of the time-varying explanatory variables in the hybrid model will only approximate the FE estimates. Depending on the amount of missing panel data, the differences may become quite large.
23.
Naturally, this test focuses only on the effects of those variables that are estimated in both models, i.e., the time-varying explanatory variables X.
24.
It should also be noted that the inversion of \((\widehat{\varPsi}^{\mathit{FE}}-\widehat{\varPsi}^{\mathit{RE}})\) in (4.32) is sometimes problematic. One such situation is, when the model includes variables that show no between-unit variance, such as functions of time that are the same for all units. If the model would include only such variables, RE and FE estimates would be identical (see also the discussion surrounding the following Fig. 4.5). The other has to do with the two variance-covariance matrices \(\widehat{\varPsi}^{\mathit{FE}}\) and \(\widehat{\varPsi}^{\mathit{RE}}\). To estimate both of them, an estimate of σ _u is needed. As Table 4.11 shows, \(\hat{\sigma}^{\mathit{RE}}_{u}\) and \(\hat{\sigma}^{\mathit{FE}}_{u}\) can be different and this may cause the problem. Some software implementations of the Hausman test allow the user to specify which one of the two to use in the computation of (4.32). These practical problems do not exist, if one uses the hybrid model to test RE–FE differences.
25.
As an example, see the simulation study by Clark and Linzer (2012). The study also analyzes the power of the Hausman test. Unfortunately, it focuses on data sets with limited sample sizes as they are typical in political science research.
26.
Why U is multiplied with a 1 will become clear later.
27.
Technically, it is easy to replicate the RE and FE estimates of Table 4.8 with SEM. See the web site (Sect. 7.3) for the corresponding syntax file.
28.
The figure shows a slightly different trend for men than the corresponding figure in Andreß and Bröckel (2007), in which men—from the third year after separation—have about the same degree of life satisfaction as women. This is due to the larger sample they used, and to the fact that they weighted the data for this descriptive plot.
29.
This is not exactly true for how we defined the time dummies. Since we measured all observations three and more years before separation with the same dummy (D _i,−3=1 if t≤−3), the impact function does not change for the more distant observations before separation. The same is true for the more distant observations after separation (t≥3) for which D _i3=1. Hence, the following linear dependence is not perfect and in principle, we could estimate a FE or FD model that also includes the variable age. We have not done this, because this age effect would be estimated from the observations that are at least three years away from the event of separation.
30.
As already noted in footnote 9 on p. 138, estimating regression models without a constant results in not very useful R ² and F statistics. Hence, the R ² of the FD model (0.0223) should not be compared with the overall R ² values of the FE and RE model. If one is only interested in the effects of the time-dependent explanatory variables X, Wooldridge (2009, 466) suggests to stick with the undifferenced time dummies and to specify a model that includes a constant and (T−2) (original) time dummies. The estimated effects of the explanatory variables X will be the same as in the “completely” differenced model.
31.
Again, the largest differences are observed for the variable age (ager) and being divorced (divorce).
32.
We do not interpret the substantive effects of model 3, because they are similar to those of the former model 2.

References

Allison, P. (2009). Fixed effects regression models. Thousand Oaks: Sage.
Google Scholar
Andreß, H.-J., & Bröckel, M. (2007). Income and life satisfaction after marital disruption in Germany. Journal of Marriage and Family, 69(2), 500–512.
Article Google Scholar
Annacker, D., & Hildebrandt, L. (2004). Unobservable effects in structural models of business performance. Journal of Business Research, 57(5), 507–517.
Article Google Scholar
Baltagi, B. (2008). Econometric analysis of panel data. Chichester: Wiley.
Google Scholar
Bollen, K., & Curran, P. (2006). Latent curve models: A structural equation perspective. New York: Wiley-Interscience.
MATH Google Scholar
Bollen, K. A., & Brand, J. E. (2010). A general panel model with random and fixed effects: A structural equations approach. Social Forces, 89(1), 1–34.
Article Google Scholar
Breusch, T. S., & Pagan, A. R. (1980). The Lagrange multiplier test and its applications to model specification in econometrics. The Review of Economic Studies, 47(1), 239–253.
Article MathSciNet MATH Google Scholar
Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.
Book Google Scholar
Clark, T. S., & Linzer, D. A. (2012). Should I use fixed or random effects? Technical report, Emory University: Department of Political Science. http://polmeth.wustl.edu/mediaDetail.php?docId=1315 (accessed 18.4.2012).
Duncan, T., Duncan, S., & Strycker, L. (2006). An introduction to latent variable growth curve modeling: Concepts, issues, and applications. Mahwah: Lawrence Erlbaum Associates.
Google Scholar
Garrett, G., & Mitchell, D. (2001). Globalization, government spending and taxation in the OECD. European Journal of Political Research, 39, 145–177.
Google Scholar
Greene, W. H. (2008). Econometric analysis. Upper Saddle River: Prentice Hall.
Google Scholar
Hausman, J. (1978). Specification tests in econometrics. Econometrica, 46, 1251–1271.
Article MathSciNet MATH Google Scholar
Hausman, J. A., & Taylor, W. E. (1981). Panel data and unobservable individual effects. Econometrica, 49(6), 1377–1398.
Article MathSciNet MATH Google Scholar
Hox, J. (2010). Multilevel analysis: Techniques and applications. London: Routledge.
MATH Google Scholar
Hsiao, C. (2003). Analysis of panel data (Econometric Society Monographs). Cambridge: Cambridge University Press.
Book Google Scholar
Inglehart, R. (1971). The silent revolution in Europe: Intergenerational change in postindustrial societies. American Political Science Review, 65, 991–1017.
Article Google Scholar
Johnson, D. R., & Wu, J. (2002). An empirical test of crisis, social selection, and role explanations of the relationship between marital disruption and psychological distress: A pooled time-series analysis of four-wave panel data. Journal of Marriage and Family, 64(1), 211–224.
Article Google Scholar
Kittel, B., & Winner, H. (2005). How reliable is pooled analysis in political economy? The globalization-welfare state nexus revisited. European Journal of Political Research, 44(2), 269–293.
Article Google Scholar
Klein, M., & Pötschke, M. (2004). Die intra-individuelle Stabilität gesellschaftlicher Wertorientierungen. KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie, 56(3), 432–456.
Article Google Scholar
Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica, 46(1), 69–85.
Article MathSciNet MATH Google Scholar
Snijders, T., & Bosker, R. (2011). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks: Sage.
Google Scholar
Vella, F., & Verbeek, M. (1998). Whose wages do unions raise? A dynamic model of unionism and wage rate determination for young men. Journal of Applied Econometrics, 13, 163–183.
Article Google Scholar
Wooldridge, J. M. (2009). Introductory econometrics. A modern approach. Mason: South-Western.
Google Scholar
Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data (2nd ed.). Cambridge: MIT Press.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Empirische Sozial- und Wirtschaftsforschung, Universität Köln, Cologne, Germany
Hans-Jürgen Andreß & Alexander W. Schmidt
Fakultät für Soziologie, Universität Bielefeld, Bielefeld, Germany
Katrin Golsch

Authors

Hans-Jürgen Andreß
View author publications
You can also search for this author in PubMed Google Scholar
Katrin Golsch
View author publications
You can also search for this author in PubMed Google Scholar
Alexander W. Schmidt
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Andreß, HJ., Golsch, K., Schmidt, A.W. (2013). Panel Analysis of Continuous Variables. In: Applied Panel Data Analysis for Economic and Social Surveys. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32914-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-32914-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32913-5
Online ISBN: 978-3-642-32914-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics