FormalPara Key Points for Decision Makers

Monetary incentives in healthcare are widely used to increase the quality of care and reduce costs. They gain increasing importance when it comes to cooperation and integration of providers.

Results of studies concerning the effects of different reimbursement systems are ambiguous.

Decision makers should carefully consider this weak level of evidence, as implementing new reimbursement mechanisms is often linked to high costs and imponderables.

1 Introduction

In health systems all over the world, new reimbursement strategies and systems are tested to improve quality of care and/or reduce expenses. Reimbursement systems not only serve as payment mechanisms but also provide control and incentive functions [1]. Thus, the design of reimbursement systems is extremely important.

When providing funding and allocating financial resources, maintaining a balance between appropriate provider remuneration and an adequate level of financial burden on payers is important. The above-mentioned controlling function is based on the economic incentives of these payments, which can influence provider behavior [1]. This function often receives special attention during the design phase of a reimbursement system. Incentive systems are deliberately chosen stimuli implemented to achieve a certain degree of behavior control [2]. They can pursue different objectives, such as inducing motivation, selection, or cooperation [3].

Various classifications of incentives exist in the literature. In this review, we differentiate between monetary and nonmonetary incentives. Salaries and wages are classical monetary incentives, as are other direct financial benefits such as pensions, childcare allowance, health insurance services, or indirect financial benefits; subsidized transport belongs to this category [4]. Nonmonetary incentives in a negative delimitation are all incentives that do not belong to the category of monetary incentives [5].

Given the increasing emphasis on teamwork, the implementation level of an incentive should also be considered [6, 7]. Within this review, we distinguish between individual- and group-based incentives. Management research has found that incentives at the individual and group level both have specific advantages and disadvantages according to the setting in which they are applied [7, 8]. Barnes et al. [9] noted that a combination of individual- and group-level incentives can be useful but does not provide a comprehensive solution; an exact analysis of positive and negative effects in view of the goal is required.

Reimbursement systems that contribute to the cooperation and integration of providers have become increasingly important within the healthcare sector [10]. They are demanded by politics, society, and the medical profession itself [11]. Against this background, using the findings of management research and analyzing the effects of monetary incentives in physician group settings is worthwhile.

The aims of this systematic review were to describe and gain a better understanding of the effects of monetary incentives in the setting of physician groups. Many systematic reviews have focused on single reimbursement schemes (e.g., pay-for-performance [P4P]) and specific care settings (e.g., cancer care) or compared different reimbursement schemes. Our review builds on this existing evidence by reviewing reviews rather than individual studies.

This research will help avoid the undesirable effects of reimbursement systems that might occur in a group setting. This would allow a target-oriented use of reimbursement systems as a control or management instrument.

To achieve this objective, we focused on two research subjects: the effects of monetary incentives on healthcare services and the influence of the level at which monetary incentives are applied (individual vs. group).

The term “group” was deliberately defined broadly to cover a wide range of collaboration types. We consider a group as any collaboration of physicians that does not restrict therapeutic freedom.

2 Methods

2.1 Search Strategy

The search strategy followed the PICO scheme [12] with the following components:

  • Population: physician groups

  • Intervention: monetary incentives

  • Comparison: not applicable

  • Outcome: changes in therapy-oriented, economic, or behavior-related indicators.

2.1.1 Types of Physician Group

Besides common general terms, such as “physician group” or “group practice,” we also included the following specific initiatives, which aim for collaborative medical practice and include payment mechanisms as key mechanisms:

  • Managed care organizations (MCOs)

  • Health maintenance organizations (HMOs)

  • Preferred provider organizations (PPOs)

  • Accountable care organizations (ACOs)

  • Physician group practice demonstrations (PGPDs).

2.1.2 Types of Monetary Incentives

As described, different types of monetary incentives exist. For this reason, the development of the search strategy regarding this aspect required two steps: First, we included terms describing the incentive/reimbursement character, such as incentive, reward, bonus, or reimbursement; second, we expanded the search strategy to include the terms of an internationally established system of healthcare reimbursement options, which differentiates the following categories: salary, fee for service (FFS), bundled payment/global fee/case rate, P4P, and capitation [1, 13, 14]. Descriptions of each incentive category are provided in Sect. 3.

2.1.3 Types of Outcome Indicators

Effects of monetary incentives can be found in many areas, so we chose a broad set of indicators, ranging from outcome quality oriented (quality and outcome) to economic (effectiveness, productivity, and performance) and behavior-related aspects (behavior and adherence [regarding compliance with standards, guidelines, etc.]).

2.1.4 Additional Criteria

We expanded the PICO scheme by adding a category for study type. Given the solid base of research on various reimbursement systems, we limited our search to systematic reviews by using search filter resources with validated search filters.

No restrictions were applied to language or time to allow us to gain a comprehensive overview of the available data.

For the MEDLINE (PubMed) and Cochrane Library databases, we added relevant medical subject heading (MeSH) terms; see Appendix A in the electronic supplementary material (ESM) for an example of the MeSH term selection procedure. The search strategy was reviewed using the PRESS evidence checklist [15]; refer to Appendix B in the ESM for the documentation.

The search was conducted on 8 January 2020. We searched the MEDLINE (PubMed), Cochrane Library, CINAHL, PsycINFO, EconLit, and ISI Web of Science databases. The search strategy was adapted according to the syntax of each database and can be found in Appendix C in the ESM.

Our search for gray literature covered the websites of the following key organizations: European Observatory on Health Systems and Policies, The Health Systems and Policy Monitor, Robert Graham Center, The Commonwealth Fund, Centre for Reviews and Dissemination (University of York), and Social Science Research Network (Economics Research Network). We also screened the reference lists of the reviews included in this systematic review.

2.2 Information Extraction and Analysis

First, AH screened the titles and abstracts to identify relevant reviews. Cultural differences might influence results, so we excluded reviews focusing exclusively on non-western areas in this stage of the review.

Reviews classified as potentially relevant were assessed as to whether they met the following inclusion criteria:

  • Systematic literature review with a transparent description of the review process

  • Examined the effects of monetary incentives

  • Explicitly included the setting of physician groups.

The remaining reviews underwent full-text screening by AH and HM. Appendix D in the ESM provides an overview of reviews excluded in this stage, with the rationale for exclusion, and Appendix E provides the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart.

The quality of the studies was assessed using the AMSTAR (Assessment of Multiple SysTemAtic Reviews) checklist [16].

The 21 remaining reviews were analyzed in terms of specific characteristics such as goals and outcome indicators. We extracted information about each review’s search period, study type, geographical location, and key results.

To structure the presentation of the results, we introduced two grouping levels. First, we focused on the type of monetary incentive, mainly represented by the healthcare reimbursement systems and initiatives. Second, we concentrated on the type of outcome indicator, as most studies had outcome parameters related to the three Donabedian quality dimensions: structure, process, and outcome quality.

3 Results

3.1 Description of Studies

Searching the databases yielded 1106 results after duplicates were removed. An additional six reviews were identified from the gray literature and the authors’ personal collections. After title and abstract screening, potentially relevant reviews were assessed against the predefined inclusion criteria. The final evaluation comprised 21 reviews. Table 1 shows the geographic setting of the studies included in the reviews.

Table 1 Geographic settings of the studies in the included reviews

Six reviews [17,18,19,20,21,22] focused on ACOs or HMOs, which means the studies already comprised a group setting. The other reviews explicitly included “provider groups” in their analyses, but only some differentiated between individual- and group-specific incentives, which provides less reliable results for interpretation. Petersen et al. [23], Van Herck et al. [24], and Kondo et al. [25] included the incentive level (individual vs. group).

Fig. 1
figure 1

Number of reviews among AMSTAR quality levels

Within the reviews, we found seven different incentive schemes/initiatives (Table 2); Appendix F in the ESM provides a detailed overview.

Table 2 Number of reviews per reimbursement scheme

3.2 Risk of Bias

To identify the risk of bias among the included reviews, we applied the validated AMSTAR checklist [16]. Information about the scores for each review can be found in Appendix F. A higher score in the AMSTAR quality rating indicates a lower risk of bias. Figure 1 shows a summary of the results.

High-quality studies were identified for P4P, salaried payment, and bundled payment, without indicating a trend in quality of review by incentive type. We also analyzed whether an association existed between the geographic areas included (USA only vs. various countries) and the quality of reviews. Six reviews included only studies from the USA. Five of them were of moderate quality and one was of low quality. All high-quality reviews comprised studies from various countries, whereas two of the four reviews in the low-quality category did not provide information about the geographical origin of the included studies.

Only one review performed a meta-analysis regarding results for P4P [26]. Many of the other reviews reported that the heterogeneous settings or outcomes were major obstacles that prevented them aggregating results using meta-analysis.

Reviews with low AMSTAR quality scores lacked a priori planning and gray literature searches, did not address potential publication bias, and did not provide information about included and excluded studies. Often, no comprehensive literature search was performed; how the studies were selected, data were extracted, and the scientific quality of studies was assessed remained unclear; and conflicts of interest were not sufficiently addressed. Major concerns that led to a quality assessment of “moderate” comprised a priori planning, incomplete information about included or excluded studies, or missing consideration of publication bias.

3.3 Effects of Interventions

3.3.1 Salary

The unit of remuneration measure for salaries is the period of working time. Reimbursement does not depend on the type and number of patients treated and, therefore, is relatively easy to apply. Within this reimbursement scheme, no incentives for (unnecessary) extension of services or for quality improvements or cost consciousness existed.

Chaix-Couturier et al. [27] and Scott et al. [28] dealt with the effects of paying physicians a fixed salary, among other effects. These reviews were of low and high quality, respectively. Both reviews focused on measures of process and outcome quality. Chaix-Couturier et al. [27] used process and outcome indicators as well as costs, whereas Scott et al. [28] used patient-reported outcomes, changes in physician behavior, and physiological indicators. Chaix-Couturier et al. [27] concluded that salaries were associated with a lower referral rate and fewer activities than was FFS. Conversely, Scott et al. [28] did not report any statistically significant changes in patient-reported outcomes. The heterogeneity of indicators precluded overall conclusions being drawn about the results of these two reviews.

3.3.2 Fee for Service

In FFS, remuneration is based on the services actually provided. This direct link between services and reimbursement can provide an incentive to reduce costs but might also induce an extension or selection of certain low-cost/high-margin services.

Five reviews analyzed the effects of FFS reimbursement using structure, process, and outcome indicators [21, 22, 27, 29, 30]. Four were of moderate quality and one was of low quality. Chaix-Couturier et al. [27] found a higher level of activities in FFS, e.g., a higher fee for visits led to an increase in the number of visits made by the physicians themselves instead of deputies. Another study [31] reported that FFS resulted in more elective procedures. Steiner and Robinson [29] focused on managed care, with FFS forming the main comparator. Overall, the results varied among the indicators analyzed. For example, in FFS, preventive screening was lower, hospital admission rates were higher, and health outcomes were virtually identical compared with managed care. For a more detailed description, see Sect. 3.3.6. Generally, FFS seemed to be less favorable than managed care. However, a closer look at the medical indication is warranted. Results suggested that specific conditions such as depression treatment/mental health were treated better in FFS. Keyhani et al. [21] analyzed the effects of two types of reimbursement on oversupply of services and found only a slight difference between FFS and managed care. Nejati et al. [22] compared FFS versus per-diem reimbursement and bundled payment. They reported less favorable results in length of stay and costs compared with per-diem payments and in 5-year cost and quality outcomes in FFS compared with bundled payments. Wranik et al. [30] assumed that team characteristics influenced outcomes and found some evidence that FFS had a negative effect on teamwork.

Overall, FFS seemed to have neither a clear positive nor a negative impact on structure or outcome of care. On the other hand, process quality might be negatively affected by FFS compared with other reimbursement types.

3.3.3 Bonus Payments

Bonus payments are supposed to incentivize certain services and, therefore, are paid in addition to the overall reimbursement system.

Two reviews of moderate quality [32, 33] examined whether bonus payments had a positive impact on the provision of certain services. Hamilton et al. [32] focused on smoking cessation, especially process indicators, e.g., recording of smoking status or referral to smoking-cessation services. Those studies that provided detailed information about the bonus payments reported bonuses of $US24–152 per patient advised or referred. Most of the studies showed improvements regarding process indicators. On the other hand, the results of studies evaluating the quit rate did not allow clear deductions regarding the effects of bonus payments. Sabatino et al. [33] focused on screening for breast, cervical, and colorectal cancer. The bonuses varied from a practice bonus paid per quarter of approximately 5% of capitation through to year-end physician bonuses, for which no further details about bonus potential were provided. The inconsistency of study results meant the authors could not draw any clear conclusions.

Overall, the impact of bonus payment cannot be clearly classified.

3.3.4 Bundled Payments

Bundled payments are much more sophisticated than salaried and FFS payments. They define cases based on diagnosis or therapy and provide a single payment for an episode of care or multiple services. By facilitating the comparison between payments received and costs, transparency is increased and efficiency might be incentivized. However, bundled payments also bear some risks, e.g., cost shifting to other sectors or complete omission of services.

Within the system of bundled payments, providers receive predetermined payments based on expected costs for a defined episode of care. Three reviews, two moderate quality and one high quality, reported results on this type of payment. Aviki et al. [17] focused on oncological care, and their review indicated positive effects but did not provide sufficient evidence. For example, one of the studies [34] showed an increase in guideline adherence but only for two of the five types of cancer analyzed. Another study [35] discovered a reduction in hospitalization and radiotherapy, but the cost of chemotherapy drugs increased. Hussey et al. [36] analyzed the effects of bundled payments on costs and quality of care. The authors identified 20 different designs of bundled payments and concluded that the effects were weak but consistent: bundled payments led to cost reductions but did not show significant effects on quality. Nejati et al. [22] focused on cancer care. Results showed significant improvements regarding 5-year costs for bundled payments compared with FFS but were heterogeneous regarding outcome quality.

The results of these three reviews reporting on bundled payments are mixed regarding process and outcome quality.

3.3.5 Pay for Performance

With the application of P4P, a new unit of remuneration was introduced: treatment success. Success is determined by the achievement of defined quality indicators, which sets incentives for quality improvement. One of the challenges of P4P is the selection of valid indicators.

The concept of P4P has recently attracted widespread interest. It was examined in ten reviews [22, 23, 25, 32, 33, 35, 65,66,67,68], of which three were of high, six were of moderate, and one was of low quality.

Huang et al. [26] conducted an indication-based review to analyze the effects of P4P on management of diabetes using meta-analysis. Study heterogeneity accounted for some limitations of the analysis. Physician behavior, mainly measured by process indicators as well as outcomes, were positively influenced by applying P4P. Mendelson et al. [38] focused on the effects of P4P regarding the process of care, utilization of services, and outcomes. No clear results could be gained for ambulatory care. Methodologically sound controlled before–after studies assessing the effects of P4P in the process of care did not show improvements, whereas six other studies, of which three were at high risk of selection bias, found positive results. A randomized controlled study reported appropriate management of blood pressure, though it was not accompanied by guideline adherence in terms of medication [41]. Available studies were inconsistent about the utilization of services: Mendelson et al. [38] noted that studies with a higher-quality design found no effects. When focusing on blood pressure control and cholesterol levels as intermediate outcomes, no statistically significant effects were reported. Petersen et al. [23] and Schatz [37] considered the levels at which incentives were provided, with Schatz’s [37] work in the ambulatory setting showing contradictory results and Petersen et al. [23] drawing more positive conclusions: most studies found at least a partially positive impact of P4P on both single- and group-level P4P. Christianson et al. [39] reviewed evaluations of P4P plans. Incentive size varied strongly from approximately 0.5 to 12% of a physician’s total compensation. Most of the studies reported process quality measures, and few contained outcome quality measures. Overall, each study found at least partial quality improvements. Van Herck et al. [24] studied the impact of P4P on clinical effectiveness and equity of care. Similar to Christianson et al. [39], most included studies applied process indicators. Outcome indicators were less frequently used. The high-quality review reported weak evidence regarding coordination, patient centeredness, continuity, and cost effectiveness. Kondo et al. [25] analyzed P4P in veteran care and community settings, where the evidence for effectiveness of P4P was limited and insufficient for clear conclusions.

Scott et al. [28] conducted a very detailed analysis of the effects of blended payment schemes, including schemes that directly rewarded performance and quality. They identified three different schemes that followed P4P thinking: tournament-based pay, threshold target payments, and a fixed fee for a patient achieving a certain outcome. Tournament-based pay is a system rewarding medical groups according to their relative performance. The Cochrane review by Scott et al. [28] included one study [42] examining the effects of tournament-based pay on the provision of diabetes-related services (glycated hemoglobin testing, urinalysis, lipoprotein density level, and eye examination). Approximately 5% of each physician’s annual fee was covered by the tournament-based pay, which depended on clinical quality, patient satisfaction, and practice efficiency. The results of the study showed better rates of adherence to eye examination guidelines only. Single-threshold target payments are conditional on reaching certain targets. The studies included by Scott et al. [28] measured effects by process indicators [43,44,45]. Results were mixed, so no conclusion regarding the effects of these payment methods could be drawn. Mullen et al. [45] evaluated the effects of a combination of tournament-based pay and single-threshold target payments. Indicators included screening rates and appropriate asthma medication. Only one of the indicators (increased screening rate for cervical cancer) showed statistically significant change. Another study [46] from the review by Scott et al. [28] examined the effects of paying a fixed fee for a patient achieving an outcome, which in this case was defined as the rate of smokers being “smoke free” at 12-month follow-up. This type of incentive did not have an effect.

The idea of P4P has also been applied in England, Wales, Scotland, and Northern Ireland, where the NHS introduced the Quality and Outcomes Framework (QOF). Hamilton et al. [32] focused on evaluating monetary incentive systems in the field of smoking cessation. They reported the following impacts of QOF: increased recording of smoking status, provision of cessation advice, and referrals to smoking cessation services, whereas no effect on reduced smoking rates could be proved. The review by Mendelson et al. [38] did not have an indication-specific focus. The authors’ conclusions regarding the effects of QOF in ambulatory care were ambiguous: Although the included studies showed a tendency for improved process and outcome indicators, this tendency could not be found in methodologically stronger studies. Mendelson et al. [38] reported that incentive payments accounted for up to 30% of practice income. Forbes et al. [40] analyzed the effects of QOF in the context of long-term conditions. They reported the amount of payments depending on incentives as 10–15% of practice income. The five studies reported modest improvements regarding emergency admissions and consultations in severe mental illness. Process quality of diabetes care was also positively affected. No clear results were found regarding mortality.

Overall, P4P seemed to have a positive impact on process quality. Outcome quality may also partially benefit, but results were inconclusive and dependent on the outcome measure applied.

3.3.6 Capitation

In capitation, a cross-sector lump-sum reimbursement is paid for a patient’s expected healthcare utilization. This is supposed to incentivize continuity of care and lead to service provision by the most efficient provider. However, it bears some risk for risk selection.

Many different forms of capitated payments exist. Five reviews dealt with this type of payment, three of low and two of moderate quality. For example, Chaix-Couturier et al. [27] differentiated capitation, managed care initiatives, and fund-holding models, in which capitated payments were made for each patient registered. For capitation, the authors’ conclusions referred to gynecology patients and reported a reduction in elective procedures in this setting. For managed care, Chaix-Couturier et al. [27] found a reduction in resource spending due to shorter hospital stays, a lower number of diagnostic services, and higher-quality decision making. Additionally, guideline adherence improved significantly. On the other hand, outcomes of care did not show significant overall improvements. Both positive effects (reduced prescribing costs, decreased number of drugs per prescription, and reduced referral rates for elective surgery and to private clinics) and negative effects (no reduction in physician workload) were observed with fund holding.

Steiner and Robinson [29] conducted a very detailed review on the effects of managed care. The authors analyzed the effects of managed care mainly compared with FFS in seven categories. In terms of utilization, they found less use of hospital care in managed care mostly due to lower admission rates and more frequent visits to physicians—at least for non-mental care patients. In mental health, studies recognized fewer physician office visits and less specialized treatment in managed care. Results regarding the use of prescription medication were mixed. For the second category, charges and expenditures, the type of payment seemed to have no significant effects. Regarding preventive screening and health promotion, Steiner and Robinson [29] reported higher activity rates for managed care. Quality of care was measured in terms of structure, process, and outcome quality. Results for process quality were inconsistent, and those for outcome quality did not differ. However, the authors found that access to treatment (structural quality) was more difficult for enrollees of managed care. When it came to enrollee satisfaction, rates were mostly lower for managed care. The sixth category, equity of care, required a differentiated view: children seemed to at least partially benefit from managed care. The care they received within managed care was reported to be as good as or even better than that in an FFS environment. This was especially proven by an increase in doctor visits, specialist referrals, laboratory tests, and preventive screening. Preventive screening for low-income women was similar in managed care and FFS, but antenatal care was worse, although no differences in childbirth outcomes were observed. Regarding care for elderly people, findings were mixed. The last category contained specific conditions, such as cancer care and chronic disease management. The studies analyzed by Steiner and Robinson [29] indicated mainly better or similar cancer care in managed care and FFS. On the other hand, the treatment of depression was either worse or no different. Chronic disease management showed equivalent results for managed care and FFS, and results were mixed for myocardial infarction but did not result in differences regarding mortality. Overall, there seemed to be favorable tendencies for managed care, but results were too inconclusive to determine an overall benefit.

The moderate-quality review by Wranik et al. [30] aimed to determine the effect of capitation on team expansion. Compared with FFS, team expansion tended to increase under capitated payments. However, the evidence level was weak.

Hodgson et al. [18] analyzed, among others, how FFS and HMO reimbursement affected the treatment and outcomes of patients with colorectal cancer. HMOs are a special care model wherein coverage of care is usually limited to physicians who work for or contract with the HMO. Regarding the medical treatment, Hodgson et al. [18] found only little evidence for a statistically significant impact of the reimbursement type, and outcomes did not differ substantially. Johri et al. [19] focused on social HMOs (S/HMOs), a special type of HMO that aims at care for elderly patients. The S/HMOs assessed within the review put the financial risk for provision of care at a single organizational structure. Payments were provided as capitated payments in advance. For this kind of S/HMO, Johri et al. [19] drew negative conclusions: analyzed studies showed negative results regarding costs, utilization of services, and outcomes.

Summarizing the results on the level of quality dimensions did not lead to any clear evidence regarding the effect of capitation.

3.3.7 Accountable Care Organizations

ACOs were introduced to the US healthcare system in 2010 with the Patient Protection and Affordable Care Act. They have been implemented in the Medicare and Medicaid system as well as by private care providers. A key element of ACOs is the assumption of responsibility for medical care by provider networks. Payments are based on FFS and supported by additional elements to ensure quality and efficiency of care. Within “shared savings” programs, an ACO can participate in savings by receiving a certain proportion of the savings as a bonus payment, whereas within “shared risk” programs, the ACO also participates in losses—in return for a higher share of participation in savings. In addition, ACOs must meet certain quality criteria. Three reviews, all of moderate quality, dealt with ACOs. Aviki et al. [17] analyzed the value of care per dollar spent in cancer care. Two of three studies found a reduction in inpatient hospital treatment, especially for the length of stay [47, 48]. The third study, focusing on 30-day mortality, readmission and complication rates, and inpatient length of stay, did not report any effects caused by ACO participation [49]. Kaufman et al. [20] examined the effect of ACOs on the utilization of services, the process of care itself, and outcomes while differentiating between Medicare, Medicaid, and private payer ACOs. Overall, the authors reported a correlation between ACO participation and a reduction of both inpatient care and emergency department visits. The process of care itself was improved, especially for chronic diseases and regarding preventive care services. Regarding outcomes, no generally valid conclusions on the effect of ACO participation could be drawn. Some of the studies reported partially positive effects for patient experience [51, 52] and mortality [53], whereas others did not find any effects [49, 50, 54, 55]. Nejati et al. [22] evaluated the impact of ACOs in cancer care and reported mixed results but did find some improvements in process quality due to decreased utilization of low-value services within the Medicare Pioneer ACO.

On the level of Donabedian’s quality dimensions, ACOs seemed to have a positive impact on process quality. However, this effect did not result in better outcome quality. Table 3 provides an overview of the results reported within this section.

Table 3 Results for structure, process, and outcome quality by category

3.4 Influence of Application Level (Group vs. Individual)

The initially defined research subject regarding this aspect was insufficiently addressed by the reviews included in this review. Only three reviews in the field of P4P [23,24,25] provided more detailed information about the difference between group and individual incentives. Petersen et al. [23] and Van Herck et al. [24] differentiated between studies of physician-level and group-level incentives and both reported that most studies showed positive results. However, Petersen et al. [23] found that effects for group-level incentives were weaker. This was supported by Kondo et al. [25], who also stated that physician-level incentives were more effective. A possible explanation was provided by Petersen et al. [23], who argued that this might be because the link between individual performance and the incentive is less direct in the group-level context.

4 Discussion

The systematic review aimed to examine the effects of monetary incentives within physician groups. We included 21 reviews in this review. Four had low, 13 had moderate, and four had high AMSTAR quality levels. Given this heterogeneity, results should be interpreted cautiously.

The included reviews contained six different types of monetary incentives (salary, FFS, bonus payments, bundled payments, P4P, capitation) and one initiative with monetary incentives playing an important role (ACOs).

No clear, generally acceptable conclusion can be drawn from the analysis of these reviews in terms of the effects of monetary incentives on quality of care. This result is similar to that from a Cochrane review from 2000, which assessed the effects of capitation, salary, FFS, and combined payment systems on physicians without focusing on physician groups [56].

However, tendencies were found for two types of incentives: a reimbursement system that depends on performance measured by selected indicators seems to encourage quality improvements in certain settings. Six reviews reported positive effects regarding process or outcome quality. However, the remaining reviews were inconclusive, so overall evidence is weak. This result is similar to the conclusions drawn by Eijkenaar et al. [58] in their systematic review of systematic reviews focusing on P4P in healthcare in general. The authors found potential for but no clear and convincing evidence for the (cost) effectiveness of this reimbursement mechanism. One of the main challenges of this reimbursement type is the selection of valid quality indicators, as they also bear a risk for disincentives. For example, selection effects might occur, i.e., a direction of services toward the incentivized indicators [57]. This was proven by Minchin et al. [59], who analyzed the development of certain indicators once their relevance for payment was taken away and found an immediate decrease in the performance of these indicators.

ACOs, although quite recent, have already gained much attention. The key issue of ACOs is to achieve certain quality goals in combination with cost reductions by letting providers participate in savings. Regarding process quality, Aviki et al. [17] and Kaufman et al. [20] found a reduction of inpatient and emergency department services, and Nejati et al. [22] reported decreased utilization of low-value services. Heterogeneous results for one type of payment, as we found for most payment mechanisms included in this review, can occur for various reasons. First, results are dependent on the indicators of interest. This is also a limitation of this review. The broad range of outcomes included in the search strategy meant that the reviews analyzed presented different settings, e.g., some focused on a special indication such as cancer care, whereas others examined a special target group, e.g., the elderly. Conversely, aspects such as the period or duration of observation may influence the results. The periods at the single study level ranged from 1984 to 2017. Given that generation-specific factors might also have an impact on the effect of monetary incentives, this aspect is also relevant [60,61,62,63]. Additionally, studies often covered a very short period of time, and some effects, e.g., habituation or consequences of varying payments among physicians, take more time and therefore require a longer period of observation.

Cultural differences were not addressed in this review. The studies included in the analyzed reviews were carried out in different cultural areas. The question of how cultural aspects influence the effect on monetary incentives remains unanswered [64, 65].

Overall, the design of physician payment systems is very complex, not only because of the range of different reimbursement systems but also because, within each system, additional design aspects, e.g., the size of the incentive or incentive implementation level, need to be considered. Most of the included reviews lacked detailed information about those aspects, so we could not determine their impact on the results.

Regarding incentive size, Hamilton et al. [32] indicated that higher bonuses increased the likelihood for improvements but might be financially impractical. This was supported by Christianson et al. [39], who argued that “In some instances, P4P will be ineffective because the performance reward is ‘too small’, while in other cases the size of the reward will be ‘more than necessary’ to bring about change.” A qualitative study by Hillman et al. [66] found that 5% of capitation income was the minimum level for an incentive to have an impact on behavior. Eijkenaar [67] presented some evidence that the positive relation between incentive size and improvement level might occur only up to a certain point. When a certain income level is reached, the impact of additional payments might cease.

Eijkenaar [67], Park et al. [68], and Conrad [69] presented some characteristics that they found to be linked to the success of monetary incentive systems. Figure 2 lists the success factors for those reimbursement mechanisms dealt with in this review. 

Fig. 2
figure 2

Success factors for different reimbursement schemes found by Eijkenaar [67], Park et al. [68], and Conrad [69]. FFS fee for service

In the group context, insights into the implementation level of an incentive are of special interest. As studies have indicated, monetary incentives might be more effective when provided at the physician level rather than at the group level. However, this aspect has not yet been analyzed closely. Further research is necessary to gain more detailed insight into the mechanisms of group incentives.

5 Conclusion

Monetary incentives in healthcare are often implemented for control reasons and are intended to increase quality of care and reduce costs. The heterogeneity of the study results indicates that this is not always successful.

Implementing new reimbursement mechanisms is often linked to high costs and imponderables. Against this background, decision makers in healthcare need to be aware of the sparse evidence in this field.

In group settings, the level at which the incentive is implemented also matters. Many initiatives provide incentives based on group performance, which seems to be less effective than incentives for individual physician performance. Therefore, how incentives are allocated within the group should be carefully considered.

Our heterogeneous results reveal a need for research in the field of effects of monetary incentives in healthcare. In particular, the isolation and attribution of certain effects to a single reimbursement system can be challenging. Simultaneously, an isolated view can be critical as, among monetary incentives, many types of nonmonetary incentives exist. Phipps-Taylor and Shortell [70] reported that, for ACOs, focusing on monetary aspects is insufficient. The need for research, especially into the interaction of monetary and nonmonetary incentives, is significant.