Background

Services aimed at elderly living in psycho-geriatric (PG) wards, who mainly suffer from advanced dementia, are not often evaluated using cost-effectiveness analysis. Still, in general, trading off costs and benefits is as important in long-term care as it is in curative care. Especially in times of budget cuts or when care innovations find their way into the long-term care sector, considering the costs and benefits of interventions is important. In curative care, this is commonly done using cost-utility analysis, where the primary outcome is health-related quality of life (HrQol). Preference-based health-related quality of life measures attach utility weights to specific health states in order to be able to compute utility gains from health changes. Such gains are then compared to the (incremental) costs of an intervention [1]. Using this approach to evaluate services for the PG elderly, however, is problematic.

A major problem is that health-related quality of life measures aim to detect and value changes in health and functioning, while services for the elderly may (be aimed to) affect quality of life more broadly [24]. For example, it is not uncommon to physically restrain the PG elderly to prevent them from falling [5], but doing so restricts freedom of movement, autonomy, and enjoyment of life. Removing such restraints would restore some control over their lives and allow more enjoyment through an increased capacity to fill their day with more varied activities. Whether the health of unrestrained patients would also improve, however, is questionable since freedom of movement may not directly affect existing health problems. Therefore, in determining the value for money of interventions aimed to reduce restraints, HrQol is likely to be a too restrictive evaluative space, since it does not (directly) value self-control or enjoyment of life. HrQoL measures (such as the EQ-5D and SF-6D) may therefore not fully account for all benefits of such interventions, and using them in these contexts could misinform decision makers.

A promising approach to measure Qol more comprehensively in the PG elderly is to use the newly developed ICEpop (Investigating Choice Experiments for the Preferences of Older People) capability measure for older people (ICECAP-O). The ICECAP instruments can be seen as measuring capability Qol [6] achieved by the capacity to perform certain actions and achieve certain states [7]. The ICECAP-O measures five capability dimensions—attachment, security, role, enjoyment, and control—with one question per dimension. Each dimension can be scored on four levels. The ICECAP-O was developed using rigorous qualitative and quantitative approaches [811]. In order to obtain tariffs for the ICECAP-O, the attributes were valued using best-worst scaling, a special type of discrete choice analysis. The ICECAP-O has been used in the British general elderly population, demonstrating that it is related to, but not exclusively dependent on HrQol [9]. The overwhelming majority of the included elderly lived at home and did not receive long-term health or social care. To date, ICECAP-O has not been used in populations receiving long-term care. This lack of validation is especially problematic for the vulnerable PG elderly populations, who consume substantial amounts of health and social services [12].

It needs noting that substantial effort has been put in recent years into developing dementia-specific Qol instruments for use in patients with mild to moderate dementia [1320]) and in severe dementia [21]. However, not only do these instruments normally not have related utility weights, limiting their usefulness in cost-utility analysis, they also are, by definition, disease specific rather than generic, which limits their usefulness in decision making across diseases and sectors. Moreover, disease-specific measures can still focus on health-related, rather than general quality of life. Hence, here we focus on the generic ICECAP-O, with its preference-based tariffs.

The use of QoL instruments in a PG patient population is difficult, since due to their cognitive limitations, patients may not be able to assess their Qol accurately. It has been shown to be possible to develop user-friendly (disease specific) instruments for self-completion in this context, especially for mild to moderate dementia patients. However, with diminishing cognitive ability, this becomes increasingly difficult. Currently, to our knowledge, there are no generic Qol instruments with accompanying utility weights that are recommended for use in people with dementia. The lack of validation in this particular population, i.e., that PG elderly, is likely to be related to limited cognitive ability due to severe dementia [22], hampering self-completion of questionnaires. We therefore decided to use proxies, who complete the questionnaire on the patient’s behalf. An important issue with proxies is that they may not complete the questionnaire as the client would have. A prerequisite in using proxies is that they can at least provide reasonable approximations of the patient’s Qol [23]. Proxy measurement has been associated with a consistent negative bias in Qol measurement [13], although this may be more typical in case of informal carers of dementia patients [14]. It has been suggested that such proxy effects can be minimized using substituted judgement [14], asking the proxies to fill out the questionnaire as if they were the person with dementia.

The aim of this study is to explore the validity of the ICECAP-O for the PG elderly. To that end, we first investigated the convergent validity of the ICECAP-O by comparing it to other care-related HrQol and overall Qol instruments. We used a sample of elderly in Dutch psycho-geriatric nursing homes to establish the discriminant validity of the ICECAP-O by (1) comparing a restrained group to a non-restrained group and (2) investigating whether the ICECAP-O was indeed measuring a concept broader than health. To complete the validation exercise, we compared questionnaires filled out by two appropriate proxies, namely nursing staff and family members. This is to our knowledge the first study of its kind.

Method

Design

The ICECAP-O questionnaire was forward–backward translated into Dutch by two independent translators. For our study, we used the baseline measurement from an economic evaluation study of a quality improvement intervention that aimed to reduce restraints in the Care for Better quality collaborative in Dutch long-term care [24, 25]. Four nursing homes and a total of 122 clients from different geographic regions in the Netherlands participated in the study. All 72 clients in restraints participated and 50 randomly selected non-restrained clients in the same departments served as a control group. We distributed two copies of the questionnaire for each client, one to be filled out by nursing staff that personally cared for the client (nursing version) and one for family members (family version) asking proxies to use substituted judgement. Since data collection of the nursing version was carried out in the context of a national quality improvement program, no ethical committee approval was necessary under Dutch law [26, 27]. Informed consent was obtained for the family version. The researchers received no personal information about the clients during the study.

Measures

Besides the ICECAP-O (as shown in Appendix), the questionnaire contained the following Qol measures: the EQ-5D, EQ-VAS instrument, Cantril’s ladder, and overall life satisfaction. It also contained the Hospital Anxiety and Depression Scale (HADS). The nursing version contained the care dependency scale (CDS), which needs to be completed by care professionals. The EQ-5D [28] measures HrQol along five dimensions (mobility; self-care; daily activities; pain and discomfort; and anxiety and depression) with three levels each (1 = no problems, 2 = moderate problems, and 3 = extreme problems). It has been used with proxies in a large number of studies, including clients with Alzheimer and severe dementia [29]. The EQ-VAS is a one-dimensional HrQol measure frequently used alongside the EQ-5D in validation studies and has also been used with proxies [23]. The EQ-VAS comprises a single scale ranging from zero (worst imaginable health) to 100 (best imaginable health). Cantril’s ladder is a classic one-dimensional overall quality of life scale [30], with the bottom rung representing no quality of life and the top representing full quality of life. It has been used with proxies [31]. We also used an overall life satisfaction scale, a one-dimensional index ranging from zero (completely dissatisfied) to 10 (completely satisfied) [32]. The HADS scale was originally developed for use in hospitals but has since been used in various populations [33] and with proxies [34] to assess anxiety and depression symptoms. HADS consists of two 7-item scales, one for depression and one for anxiety, which can be also used in a composite index (Cronbach’s alpha = 0.82 nursing version, 0.87 family version in this current study, comparable to self-reported values in Dutch elderly [35]) with values ranging from 0 (no problems) to 42 (severe depression and anxiety). The care dependency scale (CDS) developed by Dijkstra [36] contains 15 dimensions measuring the amount of independence the patient has retained with regard to dimensions such as eating and drinking, body posture, incontinence, learning ability, ability to structure the day, communication, and autonomy. The CDS has scores that range from 15 (completely care dependent) to 75 (completely care independent). The CDS has been used and validated extensively [37, 38] and is a useful instrument for assessing need for care. CDS scores have been shown to be associated with a number of problems in elderly care, such as fall-risk, pressure ulcers, and so on and are designed to be completed by nurses and professional caregivers [37, 38].

Hypotheses

For convergent validity, we expect the ICECAP-O to correlate with overall measures (Cantril’s ladder and overall life satisfaction) and HrQoL measures, as well as with CDS scores, since all measurement instruments differentiate between better and worse states. With respect to discriminant validity, we expect to find differences between the non-restrained and restrained groups in terms of ICECAP-O scores and other overall Qol measures, but not in HrQoL measures, since we expect the two groups to be in a similar health state, while their non-health circumstances differ. To test whether capabilities are indeed measuring a concept broader than health, we expect to observe a difference in ICECAP-O scores between the restrained and the non-restrained clients even when controlling for HrQol, demographic variables, and care dependency. For proxy agreement, we expect the nursing and family proxies for each client to be correlated and the scores to be not significantly different from each other.

Analysis

We performed an item-level analysis to determine non-response for all scales in the questionnaire. We used multiple imputations to treat item non-response for nursing and family questionnaires separately with the Markov chain Monte-Carlo method (MCMC) [39]. We also tested the assumption of multivariate normality underlying the MCMC method. Following multiple imputations, utility and sum-scores were computed where relevant (see Appendix). For the CDS, which was only included in the nursing version, the nursing scores for the patients for which a family version was also present were also used in the analysis pertaining to the family version. Remaining missing observations were imputed.

We used descriptive statistics to analyze demographic characteristics. Means and standard deviations were computed for continuous variables, medians for ordinal variables. All comparisons between demographic variables were performed using the Mann–Whitney-U test, except in the case of education, where a Chi-square was performed. Data were analyzed using STATA 11.

Concurrent validity was assessed using correlations in the nursing and family versions separately. To test discriminant validity, we employed chi-square tests and Mann–Whitney-U tests to compute mean differences between the restrained and non-restrained groups. We performed this comparison on the nursing and the family proxy separately. To further investigate whether the ICECAP-O could both discriminate between the groups and measure a concept broader than HrQol, we performed multivariate regressions. For this purpose, we controlled for demographic variables and care dependency. Two multivariate ordinary least squares regression (OLS) models were fitted on the ICECAP-O index, one using nursing proxy variables and one using family proxy variables. Regression assumptions were checked.

To test agreement between the two proxy groups, we used the Mann–Whitney-U tests and correlations for the questionnaires for which both proxies were available.

Results

Response

For a total of 122 clients, 96 nurses and 68 family members completed the questionnaires, implying response rates of 78 and 56%, respectively. For the 96 nursing questionnaires, 62 clients (64%) were in restraints; for the 68 family questionnaires, 47 clients (69%) were in restraints. For 56 clients, we received both types of proxy questionnaires. Item non-response was not systematic and averaged around 2% across all items in the nursing questionnaires, and 4% in the family questionnaires. In the nursing version, multiple imputations allowed for using 96 cases instead of 88-91 in bivariate analysis and 87 in multivariate analyses. In the family version, multiple imputations allowed using 68 cases instead of 58-61 in bivariate analysis and 47 in multivariate analysis.

Descriptive characteristics and relationship between different proxies

Client’s demographic and care-related characteristics can be seen in Table 1, split according to the two versions of the questionnaire.

Table 1 Demographic and care-related characteristics

Convergent validity

As can be seen in Tables 2 and 3, there was a significant correlation between capabilities and HrQol, as shown by the significant correlation between the ICECAP-O tariffs and the EQ5D and the EQ-VAS health measures. The correlation, however, was not particularly strong. The ICECAP-O tariffs were also correlated with Cantril’s ladder and the overall life satisfaction measures. There was also a significant relationship between ICECAP-O tariffs in both versions of the questionnaire and CDS, though the correlation was stronger in the nursing version. The HADS was not correlated with the ICECAP-O tariffs in either the nursing or family questionnaires.

Table 2 Convergent and discriminant validity nursing version
Table 3 Convergent and discriminant validity family version

Discriminative validity

The demographic and care-related characteristics for the respondents of the restrained and unrestrained client groups can be seen in Table 4. Age and gender were not significantly different for the two groups. There was no significant association between education and being in restraints. The mean CDS score was significantly lower in the group in restraints, indicating higher dependency. HADS scores differed significantly for clients in restraints in the nursing version; they were more depressed and anxious. In the nursing version, there was a significant difference between the groups in all ICECAP-O dimensions except for security. In the family version, two dimensions—role and enjoyment—were significantly different. A difference was also observed in the ICECAP-O tariffs. Clients without restraints score somewhat higher on HrQol as measured by EQ5D and EQ-VAS, but the difference was not significant at the 5 percent confidence level. A Mann–Whitney-U test indicated that there was a significant difference in terms of capabilities. This was also true for the overall Qol as measured by Cantril’s ladder and overall life satisfaction.

Table 4 Comparison of restrained and non-restrained clients

Table 5 shows how the ICECAP-O tariffs discriminated between clients with and without restraints using a multivariate analysis. Being in restraints independently discriminated between capability Qol in the nursing version, but not in the family version, when controlling for HrQol measures, demographic measures, and care dependency. The individual influences of the EQ5D and CDS on the ICECAP-O tariffs are pronounced in both versions.

Table 5 Regression results

Relationship between the two proxies

Table 6 shows the agreement between the nursing and family assessment of the variables using Mann–Whitney-U and correlations for the 56 clients for whom both proxy versions were available. The results of the Mann–Whitney-U show that three out of five ICECAP-O dimensions had a significantly different distribution between the proxy groups, while the average tariffs were not significantly different. The distributions of the EQ-5D were not significantly different except for the mobility dimension. The EQ-VAS was significantly different. Overall Qol measures were the same in both proxy groups.

Table 6 Analysis of selection of respondents for whom both versions were available

Agreement between the nursing and family proxies was low for the ICECAP-O dimensions. A significant correlation existed only between the two versions for the control dimension. Neither the ICECAP-O tariffs for both proxy groups nor the EQ5D scores were significantly correlated. Measures of overall quality of life were also uncorrelated between the two proxy groups; this is true for both Cantril’s ladder and overall life satisfaction. On the other hand, there was a slightly significant correlation between the EQ-VAS scores in both proxy groups. The HADS score was significantly correlated in the two proxy groups.

Discussion

Summary of main results

Our study is the first attempt to measure Qol and capabilities of physically restrained psycho-geriatric nursing home clients. It was performed in the context of a validation exercise of the ICECAP-O. The ICECAP-O seems a promising, generic, preference-based instrument in the context of evaluating interventions in the psycho-geriatric context. Our study showed reasonable convergent and discriminant validity. Although related to HrQol, the relationship did not turn out to be very strong in our study. Given that and the multivariate regression results, the ICECAP-O appears to encompass a broader evaluative space than health alone. As expected, when the two groups were compared, clients in restraints had a lower Qol than clients without restraints. Being in restraints discriminated in capability Qol in the nursing version, even when correcting for the influence of other variables. This was not the case in the family version. In general, little agreement between the family and nursing versions was found for the different variables, raising important questions about which proxy version to consider superior or most reliable.

Methodological limitations

There are some noteworthy methodological limitations to our study. Ours has been a relatively small-scale study in a particular setting, limiting the generalization of results. This is especially true since collection of additional data on diagnosis and disease severity was not feasible, given limited space in the questionnaire. Only the functional consequences of the disease were measured through the CDS. Unfortunately, using dementia-specific Qol measures also was not possible. This study was performed alongside a real-world economic evaluation, and adding additional instruments to the questionnaire would have decreased the number of participating organizations even further due to the increased burden caused by the study. Therefore, we necessarily restricted the scope of this study, focusing on generic quality of life instruments that are particularly useful in economic evaluation. Another limitation concerns the use and interpretation of the HADS. This instrument to our knowledge has not been validated in people with severe dementia. We nonetheless opted for inclusion of the HADS because symptoms of depression and anxiety could be particularly important in the context of restraints. Also, HADS is widely used making comparisons to other populations straightforward. Moreover, it is a relatively short instrument compared to instruments like the Cornell Scale for Depression and Dementia [40], while showing similar reliability [40].

Moreover, given the limited number of respondents, we used multiple imputations to retain the full sample in the analyses. Multiple imputations allow for a more valid statistical inference [39] than full-case analysis, as long as only a small percentage of the data are imputed even if the assumption of multivariate normality is not met, as in this case. In the current study, imputed results are comparable with a full-case analysis (not shown here). OLS estimates in the nursing version had non-normally distributed error terms; in our analysis of it, we thus used robust estimation techniques. Clearly, therefore, repeating studies like this, using larger samples is encouraged.

Security dimension

The nursing version of the ICECAP-O discriminated between restrained and non-restrained clients on all dimension levels except for security. This may be related to the fact that the scores on this item were relatively high. This was somewhat surprising since the average score on the security dimension was low in the study among general British elderly [8]. This difference may have to do with the study setting or item phrasing. Regarding the former, it is quite possible that nursing home clients suffering from dementia really did not seem worried. This may imply that nursing homes provide a safe environment. It is also possible, however, that these patients may not have been (seen as being) able to worry about or have a grasp of their future. In future (proxy) studies, it may be worthwhile to further investigate this by, for instance, using alternative wording. Then, the underlying reasons for indicating being able to think about the future without worry (i.e., because there is nothing to worry about or one has lost the ability to worry) can be distinguished. Additionally, the proximity to death of some clients may influence the security dimension for the proxies completing the questionnaire. We have received anecdotal information from some respondents that the proximity to death makes questions regarding the future difficult to answer. It is also noteworthy in this context that in another version of the ICECAP, the ICECAP-A, the wording of the security dimension reads “feeling settled and secure” rather than “thinking about the future” [41].

Clients in restrains

According to the nursing version of the ICECAP-O, clients in restrains are indeed worse off than non-restrained clients in terms of capabilities, indicating that physical freedom seems to be an empirically important element of Qol. This finding is in line with earlier studies that indicated that being restrained is not beneficial to the elderly [5, 42]. We do note that, since the current study did not measure cognition directly, only indirectly through the CDS, it is possible that unobserved differences in cognition between two groups may have influenced our results. Cognition, however, is not consistently identified as a predictor of using physical restraints [5]. Our study, in that sense, gives further rationale for efforts toward reducing the use of physical restraints in psycho-geriatric nursing homes [42].

Differences between proxies

The differences between the nursing and family versions of the questionnaire raise important, yet difficult to answer questions regarding suitable (and valid) proxies. The observed differences may well relate to a difference in reference points. Nursing staff might answer the questions with similar clients in mind, while family members may assess the client’s current capability Qol in relation to former capability Qol, i.e., before psycho-geriatric services were necessary. While both viewpoints can be relevant in their own right, for evaluating interventions aimed at improving the situation of clients in the care context, the nursing proxy seems the most logical choice.

An important limitation of the ICECAP-O to date is that its sensitivity to change has not been explored. Indirectly, our study provides some indication of it in that nursing proxies distinguish between restrained and non-restrained clients. The fact that family proxies apparently did not may strengthen the choice for using nursing staff as proxies. Still, it is necessary to test and explain the discrepancy between proxies further, also in relation to sensitivity to change.

Decision making and transferability of tariffs

Concerning the use of ICECAP-O in cost-utility studies, it should be noted that on a theoretical level, the ICECAP instruments are rooted in capability theory rather than utility theory. In capability theory, developed by Sen [7], people’s wellbeing is measured in terms of their capacity to perform certain actions and achieve certain states [7]. Its prescription for societal redistribution may be seen as maximizing capabilities or as guaranteeing basic capabilities for everyone [43]. Until recently, the approach did not have an empirically tested, well-defined list of capabilities [44], which can be a weakness if societal redistribution is an issue. It would also be possible to use capability-based Qol instruments in cost-effectiveness studies [45]. In such an evaluation, the final outcome would be based on capability attributes instead of HrQol attributes, allowing the computation of “capability QALYs” [6]. Such an approach could be considered to be consistent with the extra-welfarist framework [46] underlying cost-effectiveness analysis, which allows the broadening of the evaluative space to include (also) non-utility information [46]. On the other hand, there is considerable theoretical and empirical uncertainty about how such an approach might work [8, 45] with respect to the valuation of health and capabilities.

Our study used the British tariffs to compute capability valuation since Dutch tariffs are not (yet) available. Using Dutch tariffs would probably not have led to vastly different results, since, in the nursing proxy questionnaire, already four of the five dimensions of the ICECAP-O had significantly different scores for the restrained versus non-restrained group. Still, the weights attached to different capabilities may vary between countries.

Besides the problem of tariffs, the transferability of the capability dimensions themselves can also be a point of discussion. According to Sen [7], who does not list specific capabilities, relevant capabilities should be tailored to the local population and hence generating a list should be performed on a more local level. On the contrary, Nussbaum [43] proposed that basic capabilities exist and can be used globally. Since the capability measure is a possible outcome used in optimization and redistributive policies, using a standardized descriptive system across health systems (and countries) to evaluate similar interventions aimed at basic capabilities is clearly advantageous. On the other hand, specific (non-basic) capabilities may be valuable for the relevant target group of a particular intervention. The issue here is whether the ICECAP-O measures basic capabilities, or at least transferable capabilities, or more specifically capabilities important to British elderly. The fact that the dimensions of the ICECAP-O resemble frequently reported universal subjective well-being measures [47] is indicative of the former, although the physical dimension is not measured directly. It seems, therefore, that the ICECAP-O is suitable as a more generic outcome measure in elderly care. As such, it may assist decision makers to make choices based on ensuring and enhancing basic capabilities for this group.

Conclusion

The ICECAP-O instrument appears to be a promising tool for use in evaluations of interventions in psycho-geriatric care that do not necessarily or primarily improve health. The nursing proxy version of the questionnaire particularly demonstrated convergent and discriminant validity. Future research will have to confirm these findings in other settings, with particular attention paid to dementia severity, diagnosis, and validation alongside dementia-specific Qol measures. Additional research is also required on (1) the ICECAP-O’s sensitivity to change, especially in evaluating interventions, (2) the relationship between overall quality of life, utilities, and capabilities for different settings, and (3) eliciting valid proxy information. With respect to the clients involved in this study, the ICECAP-O makes it clear that interventions aimed at removing restraints may well be worthwhile if capabilities are deemed important.