FormalPara Take-home message

In this cohort, the 2015 ERC/ESICM algorithm reliably predicted poor outcome after out-of-hospital cardiac arrest, but many patients with a poor outcome were not detected. Explorative versions to simplify the algorithm also correctly predicted poor outcome in this study, but our results should be validated, preferably in patients where withdrawal of life-sustaining therapy is uncommon.

Introduction

The European Resuscitation Council (ERC) and the European Society of Intensive Care Medicine (ESICM) published joint guidelines for neurological prognostication after cardiac arrest (CA) in 2014 and 2015 [1, 2]. The included algorithm consists of 4 separate steps and was based on the current level of evidence for individual methods and expert opinions about combinations of methods.

According to ERC/ESICM, prediction of neurological recovery cannot be done with high confidence before 72 h after CA and only after confounding factors such as metabolic derangements or effects of residual sedation and muscle-relaxants are excluded (Step 0) [1]. In Step 1, the patient’s best response to painful stimuli is evaluated according to the Glasgow Coma Scale Motor Score (GCS-M) as a screening criterion. If patients either have no motor response to pain or extend the extremities (GCS-M ≤ 2), they will be assessed further whilst patients with at least stereotypic flexor response (GCS-M 3) or better are excluded from further prognostication. In Step 2 the most robust predictors are considered. Outcome is “very likely poor” if a patient has bilaterally absent corneal and pupillary light reflexes, and/or bilaterally absent N20 response on short-latency somatosensory evoked potentials (SSEP). A patient not fulfilling the Step-2-criteria should be re-examined after ≥ 24 h. If GCS-M is still ≤ 2, Step 3 of the algorithm states that outcome will be “likely poor” if there are ≥ 2 pathological findings of the following: “high” serum neuron-specific enolase (NSE) according to locally established cut-off values, unreactive burst-suppression or unreactive status epilepticus on electroencephalography (EEG), generalized oedema on head computed tomography (CT) ≤ 24 h post-arrest or on magnetic resonance imaging (MRI) or early (≤ 48 h) status myoclonus. Since the publication of the ERC/ESICM guidelines, a standardized classification of post-arrest EEG patterns has been suggested [3,4,5,6] and two large studies on serum NSE levels have been published [7, 8].

The aim of this study was to assess the performance of the ERC/ESICM algorithm in a large international cohort of patients comparing predicted neurological outcome with the outcome reported 6 months post-arrest. Additionally, we wanted to identify strengths and weaknesses of the current algorithm, and explore possible modifications.

Methods

Retrospective descriptive analysis using data from the international multicentre Target Temperature after Out-of-hospital Cardiac Arrest (TTM) Trial randomizing 939 adult patients with CA of presumed cardiac cause to a targeted temperature management of 33 °C or 36 °C between 2010 and 2013. Rationale, design and results have previously been published [9, 10]. Ethical consent was obtained in each participating country [10]. The TTM-database contains information on clinical data, patient demographics, neurological prognostication, withdrawal of life-sustaining-therapy (WLST) and follow-up at 6 months after CA [11, 12]. Poor neurological outcome was defined as Cerebral Performance Category Scale (CPC) 3–5 (severe cerebral disability, vegetative state or brain death) [13]. GCS-M and clinical seizures were evaluated daily; brain stem reflexes were registered at formal neurological prognostication ≥ 108 h post-arrest [10, 11, 14]. In this study, we used GCS-M on day 4 (72–96 h post-arrest), since this is closest to guideline recommendations [1, 2].

A routine EEG was performed in unconscious patients 48–72 h post-arrest, and if available, SSEP was performed during normothermia and was recommended for patients unconscious between 84 and 108 h after cardiac arrest [12]. Blinded retrospective evaluation of original EEG data was based on the terminology of the American Clinical Neurophysiology Society [15] and classified into unreactive burst-suppression and unreactive status epilepticus (abundant rhythmic/periodic discharges) according to ERC/ESICM [2]. In an exploratory analysis the recently proposed standardized highly malignant EEG patterns was applied [3, 4, 15, 16].

Serum samples were collected at 24, 48 and 72 h after CA, stored in a central biobank and analysed after TTM-trial completion [7]. We defined NSE levels as “high” if ≥ 48 pg/mL at 48 h and/or ≥ 38 pg/mL at 72 h, corresponding to 2% false positive rates for poor outcome as previously published [7]. In a sensitivity analysis, we explored an alternative cut-off ≥ 33 pg/mL at 48 and/or 72 h as suggested in previous guidelines [17, 18] and recently used validating the ERC/ESICM algorithm [19, 20]. Both CT and MRI were performed on clinical indication and evaluated for generalized oedema by local radiologists in a pragmatic approach similar to clinical practice. The first available CT was included in this study [21].

According to trial protocol, WLST was permitted when any of the following criteria were fulfilled; (1) status myoclonus ≤ 24 h post-arrest and bilaterally absent N20 potentials after rewarming, (2) persisting coma (GCS-M ≤ 2) AND bilaterally absent N20 potentials OR a treatment refractory status epilepticus at ≥ 108 h post-arrest, (3) brain death according to national legislation or (4) ethical reasons (also including treatment refractory shock or end-stage multiorgan failure) [12]. If applicable, the presumed cause of death was reported by the physician responsible for patient care [12].

Statistical analyses

Continuous variables are expressed as median (interquartile range) and categorical variables in numbers (percentages). Analyses were performed using two different cohorts. Prognostic accuracies of the ERC/ESICM algorithm and explorative variations thereof were calculated in patients examined with GCS-M on day 4 (n = 585). Missing diagnostic examinations were regarded “negative/non-pathological”, allowing evaluation according to remaining ERC/ESICM criteria similar to clinical practice.

To reduce selection bias, prognostic accuracies of single and combinations of diagnostic methods were calculated using all patients with 6-month outcome registered (n = 933). Only patients actually examined were included when calculating prognostic accuracies. Prognostic accuracies of methods were compared to each other using the McNemar’s Test.

The term “true” was used when predicted outcome and reported outcome were identical, whilst “false” indicated that outcome prediction was contrary to the reported outcome. “Negative” referred to good outcome (CPC1–2), and “positive” referred to poor outcome (CPC3–5). For example, “true positive” (TP) indicated a patient where both predicted and reported outcome was poor. 95% confidence intervals were calculated with Wilson’s method. Analyses were performed using R version 3.5.1 (The R Foundation for Statistical Computing) [22].

Results

Prognostic performance of the ERC/ESICM algorithm

Of 939 patients, 180 were sedated (Step 0), 140 died before day 4 and 34 had missing data (eFig. 1, Table 1). Of the included patients, 205/585 (35%) had GCS-M ≤ 2 (Table 2).

Table 1 Characteristics of the two cohorts of patients used for statistical analyses
Table 2 Sensitivities and specificities of single prognostic methods as recommended by ERC/ESICM and variations thereof

Prediction of poor outcome

Figure 1 shows predicted and reported outcome in each step of the algorithm. In the first step, 380 patients had GCS-M ≥ 3 of which 305 patients with good outcome (true negative, TN) and 75 with poor outcome (false negative, FN). Patients with GCS-M ≤ 2 were examined further. In Step 2, all 67 patients fulfilling either one or both criteria for “poor outcome very likely” had poor outcome (TP). Of 138 patients evaluated in Step 3, 36 additional patients were correctly identified as TP by fulfilling criteria for ≥ 2 pathological findings. Among the remaining 102 patients, 14 had good outcome (TN) and 88 had poor outcome (FN). The majority of FN patients had died at 6 months: 137/163 (84%). Presumed cause of death was neurological in 79/137 (57.7%). WLST due to neurological futility was performed in 59.2–87.7% of poor outcome patients with pathological prognostic findings (eTable 2). Of patients with bilaterally absent SSEP N20, 46/53 (86.8%) had ≥ 1 additional pathological Step 2 or 3 finding (eFig. 2).

Fig. 1
figure 1

This flow chart demonstrates the number of patients with 6-month outcome (n = 933), and patients excluded or included (n = 585) when assessing overall prognostic performance of the ERC/ESICM algorithm. In patients with day 4 Glasgow Coma Scale Motor Score (GCS-M), we present numbers of predicted and reported outcome when applying the current ERC/ESICM algorithm. PR & CR –/– bilaterally absent pupillary light reflexes and bilaterally absent corneal reflexes; SSEP N20 –/– bilaterally absent N20 response on short-latency somatosensory evoked potentials; NSE, elevated serum neuron-specific enolase ≥ 48 pg/mL at 48 h and/or ≥ 38 pg/mL at 72 h after cardiac arrest; EEG, unreactive status epilepticus (abundant rhythmic/periodic discharges) or unreactive burst-suppression on EEG according to ERC/ESICM criteria [2; CT or MRI, generalized oedema on head computed tomography OR on magnetic resonance imaging; S. myoclonus, generalized status myoclonus ≤ 48 h after cardiac arrest; true positive, TP; predicted and reported outcome poor (CPC3–5); true negative, TN; predicted and reported outcome good (CPC1–2); false negative, FN; predicted good and reported poor. There were no false positive, FP, predictions of poor outcome in patients with reported good outcome

In this cohort, the ERC/ESICM algorithm predicted poor outcome (CPC 3–5) with 100% specificity (95% CI 98.8–100) and 38.7% sensitivity (95% CI 33.1–44.7). Applying an alternative definition of poor outcome (CPC 4–5), the overall sensitivity was 41.8% (95% CI 35.8–48.1) and specificity was 99.7% (95% CI 98.4–100) due to 1 FP patient with CPC 3 (eFig. 3).

Prediction of good outcome

Three hundred and nineteen (54.5%) patients had good outcome. GCS-M ≥ 3 on day 4 predicted good outcome with 95.6% sensitivity (95% CI 92.8–97.4) and 71.8% specificity (95% CI 66.1–76.9). Good outcome patients with GCS-M ≥ 3 were significantly younger and had shorter time to return of spontaneous circulation and more often an initial shockable rhythm on ECG compared to GCS-M ≥ 3 patients with a poor outcome (eTable 3). All 14 patients with GCS-M ≤ 2 on day 4 and good outcome were male, most had ventricular fibrillation as the initial rhythm on ECG, and their NSE levels were below the cut-off for poor outcome applied in this study in all but one patient (eTable 4). Overall, the ERC/ESICM algorithm predicted good outcome for patients who did not fulfil the criteria for a “very likely” or “likely” poor outcome (TN) with 100% sensitivity (95% CI 98.8–100) and 38.7% specificity (95% CI 33.1–44.7, Fig. 1). The majority of good outcome patients 310/319 (97.2%) had no pathological findings in the examinations performed within this study. Nine had single pathological findings: one had early status myoclonus, one fulfilled ERC/ESICM criteria for unreactive status epilepticus on EEG 75 h after CA, and the remaining 7 patients had elevated NSE levels at 48 h and/or 72 h, of which 3 had decreasing levels of NSE from 24 to 72 h (eTable 5). Most patients (6/9) with single FP findings were awake and obeying commands (GCS-M 6) on day 4. Details on GCS-M levels and numbers of pathological findings are displayed in eTables 6A + B.

Exploratory variations of the current ERC/ESICM algorithm

Alternative cut-off values for NSE

Three additional poor outcome patients were identified by the algorithm, when using NSE cut-off ≥ 33 pg/mL at either 48 and/or 72 h post-arrest, increasing the overall sensitivity of the algorithm slightly to 39.5% (95% CI 33.8–45.5) without FP predictions.

Highly malignant patterns on EEG

By evaluating EEGs according to criteria for “highly malignant” patterns [3], three additional poor outcome patients were identified, increasing overall sensitivity to 39.8% (95% CI 34.2–45.8) with preserved 100% specificity (95% CI 98.8–100). 44/210 (21%) patients fulfilled ERC/ESICM criteria for pathological EEG, 62/210 (29.5%) the criteria for “highly malignant” EEG pattern, and 18/210 (8.6%) fulfilled both criteria.

GCS-M

Twenty-three patients had GCS-M = 3 on day 4, 9 with good outcome and 14 with poor outcome. Using GCS-M ≤ 3 as screening criteria, 6 additional poor outcome patients were correctly identified (Fig. 2a), increasing the overall sensitivity of the algorithm to 41% (95% CI 35.2–47) with remaining 100% specificity.

Fig. 2
figure 2

Modified versions of Fig. 1 with exploratory alterations of the ERC/ESICM algorithm. Step 0 has been removed for clarity and is identical to Fig. 1. The figures a + b demonstrate how alterations of GCS-M as a screening criterion in Step 1 impact prognostic accuracy of the algorithm. In a, patients with day 4 GCS-M ≤ 3 are prognosticated further, and in b, patients are prognosticated irrespectable of GCS-M. In c, any ≥ 2 pathological findings in Steps 2 and 3 combined are considered indicative of poor outcome (as in the TTM2 and TAME Trials [39, 40], but we here used the ERC/ESICM definitions of pathological EEG [41] as stated in the methods section). d Represents the simplest model of multimodal prognostication, with Steps 2 and 3 combined (as in c), but without considering GCS-M in Step 1. Pathological findings were defined according to ERC/ESICM criteria [2] as described in the legend of Fig. 1 and in the methods section. True positive, TP; predicted and reported outcome poor (CPC3–5), True negative, TN; predicted and reported outcome good (CPC1–2), False negative, FN; predicted good and reported poor outcome. There were no false positive, FP, predictions of poor outcome in patients with reported good outcome. 95% confidence intervals (CI) were calculated with Wilson’s method

When prognosticating all patients according to Step 2 and 3 criteria irrespective of the GCS-M score, sensitivity increased further to 42.5% (95% CI 36.7–48.5) without FP predictions (Fig. 2b). All TP patients had GCS-M ≤ 4.

Combing Steps 2 and 3

We explored combining Steps 2 and 3 where any ≥ 2 pathological findings were considered indicative of “poor outcome likely”, either with GCS-M as a screening criterion (Fig. 2c) or without (Fig. 2d). The overall sensitivity of the algorithm decreased to 34.6% (95% CI 29.1–40.5), but increased slightly again to 38.3% (95% CI 32.7–44.3) when GCS-M in Step 1 was removed. This decrease in sensitivity was due to 11 patients who had bilaterally absent SSEP N20-potentials (n = 7) or bilaterally absent pupillary and corneal reflexes (n = 4), but no other pathological findings. 8/11 (72.7%) had WLST due to neurological futility (WLST-N). All TP patients according to these criteria had GCS-M ≤ 4 (eTable 6B).

Single and combined prognostic accuracies

Data from 933 patients were used when analysing the accuracies of methods for prognostication included in the ERC/ESICM algorithm (eFig. 1).

Among individual methods, early status myoclonus had the lowest sensitivity (6.8%) to identify poor outcome patients, whilst NSE and GCS-M ≤ 2 had the highest sensitivities for predicting poor outcome (60.2–67.3% for NSE cut-off and 71.8% for GCS-M ≤ 2, respectively) (Table 2). All pairs of methods predicted poor outcome with 100% specificity, with the exception of GCS-M ≤ 2 in combination with elevated NSE, both methods with limited individual specificity (Fig. 3). When patients subjected to WLST-N were censored, sensitivities of individual methods decreased by 3–26.5%, whilst specificities remained unchanged (eTable 7).

Fig. 3
figure 3

Sensitivities and specificities of single and combined methods for prediction of poor outcome (CPC 3–5 at 6 months) in percentages, numbers of examined patients in (). The overall cohort is described in eFig. 1 and in the right column of Table 1 (n = 933). Only patients examined with a single method (bold font) or with both methods within a combination (regular font) were included therefore sensitivities of single methods may differ between combinations. Significance levels of single prognostic accuracies within combinations in Step 2/3 were calculated using the McNemars’s Test (eTables 1A + B) and are indicated by asterisks; *p < 0.05, **p < 0.01, ***p < 0.001. The absence of an asterisk (*) in Step 2/3 methods indicates that single sensitivities or specificities within combinations did not differ significantly. For example, in the combined model PR/CR and SSEP, *** signifies p < 0.001, therefore one method had significantly higher sensitivity than the other method when calculated in patients examined with both methods. GCS-M ≤ 2, Glasgow Coma Scale Motor Score on day 4 after cardiac arrest; PR/CR, bilaterally absent pupillary light reflexes AND bilaterally absent corneal reflexes

Discussion

Applied to the cohort of a large pragmatic international trial, the ERC/ESICM algorithm predicted poor neurological outcome without false positive predictions and correctly identified 38.7% of poor outcome patients. Despite various exploratory modifications with the same outcome definitions used, specificity remained at 100% in this cohort. No good outcome patient had ≥ 2 pathological findings and only 3% had 1 pathological finding, elevated NSE being the most common.

The ERC/ESICM algorithm failed to identify 60% of the poor outcome patients, among whom the “presumed cause of death” was neurological in approximately 60%. An algorithm intended for identifying cerebral injuries cannot be expected to identify the remaining patients with other causes of death. Maintaining a very high specificity is essential for an algorithm predicting poor outcome, but improved sensitivity is nevertheless desirable. Two single-centre studies recently validated the ERC/ESICM algorithm and both reassuringly also concluded on 100% specificity, but WLST was permitted due to specified criteria in both studies [19, 20]. The reported sensitivities for the ERC/ESICM algorithm were 18–26% and 32%, respectively [19, 20].

The sensitivity of a prognostic method or an algorithm will vary depending on the extent to which the different prognostic methods are available and which definitions are used for pathological findings, as well as the selection of patients included. In our study, this is illustrated by the low sensitivity of the strictly defined status myoclonus (6.9%) and the relatively high sensitivity of an elevated NSE (60.2–67.3%), frequently analysed due to the common biobank.

The ERC/ESICM recommends GCS-M 1–2 as a screening criterion [1]. Our results confirm that GCS-M should not be used to make decisions on level-of-care due to limited specificity. Evaluation of persisting coma by motor score may anyhow be adequate to differentiate between unconscious patients in need of further prognostication and those with a presumed good outcome. In this study, a considerable fraction of patients with GCS-M 3–4 had poor outcome and ≥ 2 pathological prognostic findings, and it may be considered whether the current dichotomization should remain between GCS-M 2–3 in future guideline algorithms.

As supported by our results, false positive findings may occur with all methods currently used for prognostication, emphasizing the importance of a multimodal approach to reduce the risk of overly pessimistic predictions [8, 23,24,25,26]. Six patients with single pathological findings were awake and obeying commands on day 4, illustrating that sufficient time for recovery post-arrest is also an important part of neurological prognostication [1, 27].

The ERC/ESICM algorithm permits unimodal prognostication using SSEP, whilst the TTM-trial protocol permitted unimodal prognostication for patients fulfilling specific SSEP or EEG criteria [1, 11]. Applying a stricter multimodal approach, any ≥ 2 pathological findings predicted poor outcome with maximal specificity in unconscious patients irrespective of GCS-M level (Fig. 2c, d). In this multimodal approach, overall sensitivity was slightly decreased and we speculate that sensitivity may have been higher with an increased use of diagnostic methods.

The evidence for the ERC/ESICM algorithm consists largely of studies where withdrawal of therapy was permitted, and hence influence from the self-fulfilling prophecy cannot be excluded [1, 28, 29]. Whilst WLST was common in the TTM-trial, the trial protocol was designed to avoid premature decisions, applying conservative rules for prognostication [11]. The pre-specified TTM-criteria permitting WLST (SSEP, status myoclonus and EEG) partly overlaps with the ERC/ESICM recommendations published 2 years after trial completion [2, 11]. Bilaterally absent N20-potentials in combination with early status myoclonus or in isolation were considered predictive of poor outcome in the TTM-trial. Whilst SSEP is considered a very robust method for prognostication after cardiac arrest, the self-fulfilling prophecy may have affected most previous studies [25, 28, 29]. One patient with absent N20-potentials in the TTM-trial awoke before scheduled prognostication [11] and the majority of our patients lacking N20 potentials had ≥ 1 additional prognostic finding indicating severe brain injury. We still cannot exclude the self-fulfilling prophecy affecting our results but find it reassuring that bilaterally absent N20 predicted poor outcome with 100% specificity in two recent studies where WLST was not practiced [30, 31].

When we calculated prognostic accuracies of the ERC/ESICM algorithm using an alternative definition of poor outcome (CPC 4–5), one patient who fulfilled the criteria for “poor outcome likely” and survived with severe cerebral disability (CPC 3) was reclassified. Definitions of poor outcome vary between studies, and survival with CPC 3 includes a wider range of severe cerebral disabilities, some of which may be considered acceptable outcome by patients or caregivers.

Strengths of this study include the large cohort, a conservative and protocolized approach to prognostication and the extensive clinical information available [9, 10]. Results of EEG and NSE have been obtained after the study in a blinded fashion, and all results on clinical tests, SSEPs, CT-examinations and decisions on level-of-care have been presented in separate projects [3, 4, 7, 11, 14, 16, 21].

There are several limitations to this retrospective study. Some of the co-authors who designed the TTM-trial also participated in the ERC/ESICM recommendations for prognostication after cardiac arrest, creating a risk for inherent bias. Clinical neurological examinations in the TTM-trial were performed according to local routines. We acknowledge that an increased focus on neurological examination techniques might have improved prognostic performance, since imprecise testing may be common [32]. Neuroimaging and SSEP was often performed on clinical indication in patients with presumed poor neurological prognosis likely leading to selection bias. We included the first reported CT-examination despite guidelines only recommending CT ≤ 24 h post-arrest. A stricter application of the 24-h time limit would presumably have reduced overall sensitivity of the algorithm. However, recent studies indicate improved performance of brain CT after 24 h [21, 33, 34]. NSE cut-off values were defined from the same TTM-study cohort [7]. Despite NSE being analysed after trial completion, its prognostic performance may still be indirectly affected by WLST based on other predictors. For statistical reasons, we excluded the 24-h delay between Steps 2 and 3 and only included each prognostic method once in our analyses, which is an approximation of clinical practice where patients are continuously re-examined. In the future, quantitative methods such as pupillometry [35, 36], standardized evaluation of neuroimaging and electrophysiology or novel serum biomarkers [37, 38] might prove themselves valuable additions to the current algorithm.

Conclusion

In this cohort the ERC/ESICM algorithm and exploratory variations thereof predicted poor outcome without false positive predictions. The ERC/ESICM algorithm identified 38.7% of poor outcome patients, and patients not identified often had a non-neurological presumed cause of death. Our results should be validated in patients where withdrawal of life-sustaining therapy is uncommon to reduce the risk of self-fulfilling prophecies.