Introduction

Many decedents in long-term care (LTC) settings have dementia, with 50–92% of individuals with dementia dying in nursing homes (NHs) across several countries [1, 2]. Unfortunately, numerous shortcomings exist in end-of-life care, including resident suffering and unmet family needs [3, 4]. To identify and address these shortcomings in clinical practice and research, measurement instruments specific for LTC settings are needed. These instruments should be useful for residents with dementia because of the high number of people dying with dementia in these settings and the projected increase of this population [5, 6]. Further, this population is especially at risk for poor care due to challenges of communication and assessment. For example, pain and other symptoms may not be expressed verbally, and so must instead be detected by more subtle behavioral cues.

When evaluating quality at the end-of-life of people with dementia in LTC, two constructs, “quality of care (QOC)” and “quality of dying (QOD)” must be distinguished. QOC reflects elements of the setting in which dying takes place, such as the availability of professionals, types of communication and scope of treatment. QOD refers to symptom burden and other resident experiences potentially influenced by care, and is also partly the result of various resident-related factors. For example, an individual’s QOD may be affected by pain (symptom burden) that was treated with medications and massage (care), as well by his cognitive status (patient factor). QOD may be considered synonymous with quality of life while dying, and while the time period to which this refers cannot be precisely defined, measurement tools often encapsulate the period to encompass a specific period of time prior to death. Both, QOC and QOD instruments, could be used to evaluate quality at the end-of-life toward the development of quality indicators.

Ten instruments are available to measure quality when dying with dementia in LTC and which have been developed or used in populations with a substantial number of dementia residents [7]. As these measures have become available only after 2000, little is known about their psychometric properties when used with this group. Table 1 presents characteristics and previously published psychometric properties of these ten measurements, including findings from samples of residents without dementia. This paper extends our knowledge of these instruments by assessing and comparing the validity and reliability of the measures of perceived QOC and QOD for residents dying with dementia in NHs and residential care homes (RCs).

Table 1 Characteristics and psychometric properties of ten available measurement instruments reported in the literature

Methods

The tested instruments

Instruments that assess QOC

The end-of-life in dementia satisfaction with care (EOLD–SWC) was developed for after-death assessment of satisfaction with care by family of dementia patients [8]. Higher scores reflect higher satisfaction. The family assessment of treatment at the end-of-life–short version (FATE–S) was developed to evaluate respondents’ perceptions of outcomes of care, asking how well the provided care met resident or family needs. FATE–S scores are expressed as a percentage of valid responses for which families provided the best possible response; higher percentages reflect better treatment [11]. The family perception of care scale (FPCS) asks the family’s opinion about the quality of resident care, family support, communication, and rooming [13]. Higher summed total scores reflect more favorable perceptions of end-of-life care. The family perception of physician–family caregiver communication (FPPFC) was designed to assess family perception of physician communication during the dying process, and higher scores reflect higher quality of family–physician communication [14]. The nursing home version of the after-death bereaved family member interview of the toolkit of instruments to measure end-of-life care (TIME) consists of items asking family about seven domains of care [15]. For five of the domains, the questions are summarized as “problem scores,” with higher numbers reflecting more opportunity to improve. For the physical comfort and emotional support domain, questions are not summarized. For the support self-efficacy of the family domain, questions are summarized on a 3-point scale, with higher scores reflecting more support. Additionally, the instrument produces an overall rating scale for patient-focused, family-centered care, with a higher score being favorable. Cut points are not available for any of the QOC instruments.

Instruments that assess QOD

The EOLD–comfort assessment at death (EOLD–CAD) comprises the subscales physical distress, dying symptoms, emotional symptoms, and well-being; the EOLD–symptom management (EOLD–SM) assesses the extent to which symptoms were experienced [8]. Although no cut-off scores are available for either instrument, higher scores reflect more comfort and better symptom management, respectively. Both were originally developed for dementia patients. The mini-suffering state examination (MSSE) was developed as a prospective assessment to reflect the amount of suffering experienced by dementia patients, with higher scores indicating more suffering [17], but can also be used retrospectively. The developers considered scores of 0–3 as a low level of suffering; 4–6 as intermediate; and 7–10 as a high level of suffering. The palliative care outcome scale (POS) was developed for cancer patients, but has also been used for cognitively impaired patients [19]. It concerns the physical, psychological, and spiritual domains of life within the field of palliative care [18, 19]. The POS developers considered a mean score of 2.0 or lower as favorable.

Instruments that assess the QOC and QOD

The quality of dying–long-term care (QOD–LTC) was developed for cognitively impaired and intact residents in NHs and residential care/assisted living settings [21]. It assesses perspectives on quality of personhood, closure, and preparatory tasks. Higher mean scores reflect a higher quality of end-of-life in LTC. A cut-off score is not available.

Translation

For the EOLD-instruments and MSSE, valid Dutch versions were available [22, 23]. A Dutch version of the POS was also available [19], but to adhere to standard norms, the translation process was repeated. Two researchers (MvS-P and JTS) independently translated the English versions of the POS and other instruments into Dutch in consultation with the developers when necessary; they resolved differences in translations by reviewing and discussing each differently translated word, iteratively, until agreement was reached. The instruments were back translated by a professional translator.

Data collection

The study was conducted between March 2008 and April 2009 in seven NHs and seven RCs in the Netherlands. Eligible resident cases were those who died with dementia within this time period, excluding a single case who stayed in the hospital for more than half of the last month of life.

For each eligible resident, retrospective interviews were completed with a professional (within 2 weeks of death) and a family caregiver (approximately 2 months after death). Two weeks assured that professional caregivers would remember that particular patient and have access to records, while 2 months allowed the family time to grieve yet was not too long to affect recall. These measurement time points were the same as for the CASCADE study [24], and used in a recent Dutch study [4]. The professional caregiver interview included the EOLD–CAD, EOLD–SM, MSSE, POS, and QOD–LTC, as well as the seven-item Bedford Alzheimer nursing severity–scale (BANS–S) [25], which assessed dementia status 1 month pre-death. BANS–S scores (range 7–28) of 17 and higher are regarded as severe dementia [26]. In NHs, specialty physicians (referred to as elderly care physicians) completed the questionnaires, and in four of the seven NHs, a nurse completed it independently of the physician. In RC, elderly care physicians completed the questionnaire in consultation with nurses. According to the Dutch model of care, elderly care physicians are on-staff in NHs, and they are also responsible for dementia care units in RCs [27].

Family received an informed consent form by postal mail approximately 6 weeks after death of their family member. When they agreed to participate, they received a mailed questionnaire with the EOLD–SWC, EOLD–SM, EOLD–CAD, FPPFC, MSSE, POS, and QOD–LTC. The MSSE contains 10 items, one of which is the clinician’s opinion about suffering. We adapted the instrument for families, separately asking the physician’s and nurse’s opinion. The instruments were compiled in order of timeframe of reference, beginning with instruments referring to the last month of life, and ending with instruments focusing on the last 3 days of life. To avoid possible bias due to the order of instruments within each timeframe, for half of the population the order within timeframes was reversed. Four overall assessments of family perceptions preceded the other measures: “On a scale of 1–10, where 1 = the worst possible and 10 = the best possible, how would you rate the overall quality of care (quality of life) in the last month (last days) of life?” Additionally, family provided resident demographic data. After the family completed the written survey, they were contacted by telephone to complete the FATE–S, FPCS and TIME. The TIME was administered by phone due to its development as a telephone interview and its complexity; the FATE–S and FPCS were identified midway through the study and so were included only in the last 23 family interviews.

In addition to retrospective questionnaires and interviews, in two NHs, 24 residents who were expected to die were observed prospectively by the coordinating elderly care physicians twice daily when possible; these data were regarded as the gold standard and used to assess differences in retrospective and prospective reporting. The physicians completed the MSSE, the EOLD–CAD, the discomfort scale–dementia of Alzheimer type (DS–DAT), [28] and the pain assessment in advanced dementia (PAINAD) scale [29]. These measures specifically assess discomfort and pain in dementia. The coordinating elderly care physicians were trained in use of the DS–DAT and PAINAD with an instructional video, and during the training session, practiced scoring of observational instruments referring to a gold standard. The DS–DAT has four response option (range 0–3) indicating frequency and intensity of behavior; nine items are summed to calculate a total score (range 0–27) with a higher score reflecting more discomfort. The PAINAD has three response options (range 0–2); five items are summed to calculate a total score (range 0–10) with a higher score reflecting more pain. Mean scores were calculated for all of these instruments averaging the two assessments per day and over days.

Data analysis

Validity

Validity was assessed by: (1) comparing prospective observed discomfort and symptoms with prospective and retrospective measurements of the EOLD–CAD and MSSE (concurrent validity); (2) comparing the average (sub)scale scores on the QOC instruments and the overall assessment of QOC and similarly comparing the QOD instruments to the overall assessment of QOD (convergent validity); (3) comparing total scores of instruments measuring similar constructs (QOC or QOD), and total scores of the QOD instruments with the total scores of DS–DAT and PAINAD (convergent validity); and (4) confirmatory factor analysis (CFA) to assess the factor structure of the instruments (factorial validity).

First, we compared total scores on the EOLD–CAD and MSSE administered prospectively before death with total scores on the same instruments retrospectively after death in the same residents. Referring to prospective measurement as the gold standard, we assumed EOLD–CAD and MSSE total scores to be comparable to those assessed in retrospect. To examine the agreement between prospective observations and retrospective ratings by professionals and family, we expressed the differences in effect sizes (d) and interpreted these according to Cohen (1988): small effect when d is between 0.2 and 0.3, moderate effect when d is around 0.5 and large effect when d ≥ 0.8 [30]. Additionally, concordance correlation coefficients (CCC) were calculated for total scores, including 95% bias corrected accelerated (BCa) bootstrap confidence intervals (CI) [31, 32]. A CCC of <0.00 represents poor concordance; 0.00–0.20 slight; 0.21–0.40 fair; 0.41–0.60 moderate; 0.61–0.80 substantial; and 0.81–1.00 almost perfect concordance [33].

Second, the correlation between the average (sub)scale scores on the QOC instruments and the overall QOC assessment was calculated to determine whether the instruments measure the expected construct (high correlation in the expected direction). Similarly, we calculated correlations between the QOD instruments and the overall assessment. We selected the appropriate construct and closest timeframe (1 month or last days for the overall assessments). For instruments about the last week of life, we calculated the correlation with the overall assessment of the last month as well as the last days. Correlation coefficients ≥0.5 represent high correlations, 0.3–0.5 moderate, 0.3–0.1 small, and <0.1 insubstantial correlations [30].

Third, we examined the intercorrelation of total scores of instruments measuring similar constructs, as well as the total scores of QOD instruments with DS–DAT and PAINAD total scores.

Lastly, CFA was conducted for the QOD instruments completed by professionals to confirm the structural model as reported by the instruments’ developers. No CFA was performed for the family data due to the small sample size (CFA requires a minimum of 100 respondents). Further, CFA is only relevant when the instrument is based (or likely based) on a reflective model, meaning that the items are reflections of the same construct [34, 35]. For example, we did not assess CFA on the MSSE because this instrument comprises a collection of different items that cause suffering in a resident which are not expected to correlate. Similarly, we did not apply CFA to the POS. We examined whether predetermined scales by the developers (four for the EOLD–CAD, and three for the QOD–LTC) could be confirmed in our analyses. Tucker–Lewis index (TLI)/comparative fit index (CFI) values >0.95 represent good fit [36]. A root mean square error of approximation (RMSEA) value of <0.08 represents acceptable fit and <0.05 represents good fit [36]. For the RMSEA, 90% CIs were also calculated.

Reliability

Internal consistency was assessed for family and professionals separately, calculating Cronbach’s alpha for all instruments based (or likely based) on a reflective model. We computed alphas for the FPPFC, the subscales of the EOLD–CAD, FPCS and QOD–LTC, and the TIME domains. Alphas between 0.70 and 0.95 were considered adequate [37].

Additionally, differences in ratings of family, nurses and physicians were tested with paired sample t tests when both professional and family caregiver data were available. To assess the professional and family caregiver, concordance of scores in individual residents CCCs were calculated [31, 32].

Imputations and software

When at least 75% of the items were answered, missing values, except those in the FATE–S, were imputed with the subject’s mean score. A two-way imputation (with subject mean, item mean, and a random factor), was also conducted, but because it yielded similar results, the simpler subject means were used. CFA was conducted using Mplus version 6.1 [38]. We used SPSS 15.0 for all other analyses (SPSS Inc., Chicago Illinois, USA).

Results

The study included 119 decedents on whose behalf 70 family and 103 professionals (physician and/or nurse) completed an evaluation (Fig. 1). Decedents were predominantly female (82%), widowed (69%), native (96%), with a mean age at death of 88.1 (standard deviation [SD] 6.6). Sixty-three percent had severe dementia 1 month pre-death. Sixty-eight percent died in a NH, and 32% in a RC; none died in a hospital. Mean length of stay was 31 months (SD 29 months).

Fig. 1
figure 1

Overview of the data collection to evaluate properties of instruments

Most family caregivers were female (67%), married (67%), native (99%), and children of the decedent (76%). Their mean age was 60.6 (SD 8.5). In the last week before death, the mean number of visits was 5.2 (SD 2.2), and family caregivers spoke to staff 3.0 times (SD 2.2). In the last month, family caregivers visited the residents 16.8 times (SD 8.6) and spoke to staff 13.5 times (SD 7.9). Of the family caregivers, 39% were present at death. A physician was present in 3% of cases, a nurse in 31%, and another family caregiver was present in 41% of cases. No one was present at death for 27% of the decedents. Death was expected by family caregivers in 39% of cases, whereas professional caregivers expected death in 71% of cases.

Validity

The mean total EOLD–CAD scores (Table 2) and subscale scores (not presented) were not significantly different between prospective and retrospective ratings of physicians, but the retrospective rating of the nurse was significantly different from the prospective physicians’ ratings. For the mean total MSSE scores, there were no significant differences. However, prospective observations tended to be more favorable for all pairs, and were statistical significantly more favorable when compared with family ratings. The CCC of the MMSE by prospective observation and the physicians’ retrospective rating of the MSSE was substantial (CCC = 0.68) and the effect size was moderate (d = 0.31). For all other pairs, the concordance of scores in individual residents was slight to moderate (CCC = 0.20–0.44) and the effect size was large (d = 0.81–1.55).

Table 2 Prospective observation by elderly care physicians versus retrospective completion of the same QOD instruments, by respondent type

Table 3 shows that correlations with the overall assessment were generally higher for the QOC instruments (r = 0.13–0.70) than for the QOD instruments (r = 0.03–0.47). High correlations with the overall assessment were found for the EOLD–SWC, FPCS (totalscale, subscale resident care and communication), the TIME overall rating scale, and the QOD–LTC subscale personhood. We found a small correlation for the EOLD–CAD subscale emotional distress; insubstantial correlations for the EOLD–CAD well-being subscale and the QOD–LTC subscale preparatory tasks, and the others were moderate. The only direction that was not consistent with our hypothesis was the correlation between the EOLD–CAD scale well-being and the QOD overall assessment. We found mostly moderate to high inter-correlation between instruments measuring the same construct (QOC r = 0.39–0.88; QOD r = 0.38–0.72), except for some TIME domains (Table 4). The instrument that measures two constructs, the QOD–LTC, correlated somewhat better with QOC instruments than with QOD instruments. Correlations of the QOD instruments with the prospective observations of DS–DAT and PAINAD were low for the EOLD–CAD (r = 0.20–0.25), and even more so for the EOLD–SM, MSSE, and POS (r = 0.03–0.15) (Table 5).

Table 3 Correlation of instrument scores completed retrospectively by family caregiver with overall assessments of the QOC/QOD (N = 70)
Table 4 Correlation of instrument scores completed retrospectively
Table 5 Correlation of prospective observation with the DS–DAT and PAINAD by elderly care physicians versus instrument scores completed retrospectively by professional caregivers (N = 24 decedents)

CFA showed that the fit index (according to CFI and TLI) was good for the EOLD–CAD (CFI and TLI = 0.98), but not for the QOD–LTC (CFI = 0.87 and TFI = 0.83) (Table 6). The EOLD–CAD also showed acceptable fit according to the RMSEA criterion (0.08), but the CIs overlap both good and poor fit (0.05–0.11).

Table 6 Reliability of instruments (and subscales) for instruments completed by family (N = 70a) and professional caregivers (N = 103)a

Internal consistency

Cronbach’s alpha’s for the QOC instruments were adequate (α = 0.79–0.91), except for the subscale Rooming of the FPCS (α = 0.31), and some domains of the TIME (α = 0.31–0.58) (Table 6). For the QOD instruments, the Cronbach’s alpha was adequate (α = 0.70–0.83), except for the physical distress subscale of the EOLD–CAD (α = 0.64). For the QOD–LTC, only the Cronbach’s alpha of the Personhood subscale was adequate (α = 0.70).

Inter respondent reliability

Mean QOD–LTC, total EOLD–CAD, and EOLD–SM scores were higher (better) for professionals compared with family (Table 7). The effect size of the difference was small for the EOLD–CAD (d = 0.39) and moderate for the QOD–LTC (d = 0.57) and EOLD–SM (d = 0.75). The CCC was moderate for the MSSE (CCC = 0.50), slight for the QOD–LTC (CCC = 0.18) and fair for the other instruments (CCC = 0.21–0.32).

Table 7 Instrument scores of family caregivers and professional caregivers (N = 54)

Total scores between nurses and physicians in NHs differed significantly for the EOLD–SM, POS and QOD–LTC (Table 8). Physicians rated the quality as more favorable than did nurses for four of five instruments, with the difference being small to moderate (d = 0.24-0.58). Reliability was moderate for the EOLD–CAD (CCC = 0.59) and EOLD–SM (CCC = 0.48), and fair for the other instruments CCC = 0.25–0.40).

Table 8 Instrument scores of Physicians and Nurses in NHs (N = 33)

Discussion

We examined the validity and reliability of ten after-death instruments that are commonly used to assess family and professional caregivers’ perspectives regarding QOC or QOD with dementia at the end-of-life. Of the instruments measuring QOC, the EOLD–SWC and FPCS showed better validity and internal consistency than the FPPFC, which in turn performed better than the FATE–S and the TIME. Of the QOD instruments, the EOLD–CAD and MSSE were most valid and internally consistent, followed by the EOLD–SM. The POS was not a valid measure to assess the QOD of dementia people in LTC settings. The QOD–LTC captures both QOC and QOD, and evidenced mean validity and internal consistency. The concordance between mean and individual scores given by family, physician and nurses was low for four of five instruments (the EOLD–CAD, EOLD–SM, POS, and QOD–LTC), likely reflecting the different perspectives, including different values and points of reference, of the family and professional caregivers. Thus, respondent type should be considered when selecting an instrument for use, as both perspectives offer valuable information unique to the situation [39]. However, to improve QOC and QOD, both perspectives are important. Therefore, for research and individual assessments, an integrated approach may be the best [40].

The correlations across instruments that measure the same construct were moderate to good. The correlation between the total scores of EOLD–CAD and MSSE was good (r = −0.58) and similar to the correlation Aminoff found for people with dementia (r = −0.5 to −0.8) [13]. Internal consistency was similar to literature reports for most instruments, except for the TIME for which it was slightly higher.

The correlation of the QOD instruments with the overall assessment was lower than for the QOC instruments. It is possible that ratings of QOD are more individualized than are ratings of QOC, and so less well captured in standardized measures. For example, the presence of particular symptoms may be expected by some as part of the dying process, while others may find them very disturbing [41]. The QOD instruments assess frequency of symptoms whereas the overall assessment may include the degree to which symptoms are perceived as burdensome, or other constructs that are not reflected in the individual measures.

The mean scores of the prospective observations with the same instruments were more favorable than of the retrospective assessments. This difference may reflect the shorter time frame of prospective observations (average of five minutes versus last week of life). Further, the difference was largest when the retrospective assessment was performed by a family (EOLD–CAD and MSSE) or nurse (EOLD–CAD), and both generally assigned less favorable scores than physicians. The CCC for concordance between prospective observer and retrospective physician assessment of the EOLD–CAD was low but the total EOLD–CAD scores were not significantly different, as found in previous work [39]. The prospectively assessed MSSE may compare better to the retrospectively assessed MSSE than the EOLD–CAD, because the MSSE may be more stable over time (i.e., it includes more stable items in addition to symptoms). Further, comparing slightly different constructs such as observed discomfort and quality of life in retrospect, a CCC of about 0.5 is expected [42]. Nevertheless, our results confirm the validity of the tested instruments in a sense that the retrospective assessments are at least to some degree based on prospectively observed experiences of dying.

Although sample sizes for some of our analyses were small, and the numbers were smaller for the FATE–S and FPCS which we added late, this study offers a well-rounded, in-depth examination of existent measures to assess the QOC and QOD. Exploratory analysis showed no systematic differences between respondents who did and did not complete the FATE–S and FPCS. A strength of our study is that we included prospective observations and professionals’ views, where as most studies are limited to retrospective family evaluations. Of note, Parker et al. recently published a systematic review of palliative care outcome measures used to assess the quality of palliative care provided in residential aged care facilities [43]. They identified eight of the same instruments evaluated by this analysis and no additional instruments were identified that fit our inclusion criteria [7]. Finally, we presented only internal consistency and CFA of reflective (sub)scales, although the differences between reflective (or “clinimetric”) and formative (or “psychometric”) measures is somewhat subjective. We chose to present internal consistency and CFA also when the nature of the measure was in question. A recently performed factor analysis [20] showed that the POS consists of two subscales and some solitary items, and therefore it is not useful to calculate the internal consistency for the whole POS instrument.

The psychometric properties of the instruments presented in this paper are generally consistent with findings in a parallel matched-design study of US family respondents, although there are some differences which will be addressed in detail in future work [44]; still the results of the Dutch study likely apply to other Western countries when using the English versions or properly translated versions. Further work will also consider usefulness and feasibility issues to help decide which instruments are best to use for measuring QOC and QOD for those dying with dementia in LTC settings.