A more technical post for those evaluating risk scores for care management, coupled with a real-world example.

As healthcare organizations move toward taking on greater financial risk for keeping people healthy, it is critical for organizations to match people to the interventions they’re most likely to benefit from. This has traditionally been done by applying risk scores that are based on claims data.

To choose the right approach, it’s critical that organizations have an objective and reliable method of evaluating their accuracy.

One major pitfall we help clients avoid is the typical reliance on metrics that are familiar but inappropriate for care management/population health. Specifically, the task of finding individuals most likely to benefit from an intervention (e.g., falls prevention, re-admission prevention, disease management, etc.).

In healthcare, C-stat (area under ROC) and/or r-squared grew up as ways to assess the “risk” of populations and are typically used as the “industry standard” for evaluating predictive performance. Because existing benchmarks help to make customers comfortable, we calculate these statistics as part of each engagement. However, we also help our customers understand that positive predictive value is a more meaningful metric for designing care management efforts.

Statistical measures are tools. Much like the screwdriver versus the hammer - which is “right” is dependent on the task at hand. So, let’s compare intended uses of each to the needs of care management/population health.

Area Under ROC / C-Statistic1 - At first glance, this seems like a good statistic because it measures how accurately the model identifies low need and high need patients in their respective categories.   The measure is created by plotting the true positive rate against the false-positive rate and is typically applied across the entire population.

The first misalignment here is that in care management, our focus is typically on the smaller subset of a population in need of intervention.  For example, when helping a Medicare Advantage program find the right people for a falls prevention program, it doesn’t matter how well the model predicts the likelihood of each of the 100,000 members needing help. In other words, knowing that the c-stat is .73 for 100,000 members isn’t all that helpful since what really matters is finding the top 1,000 members most in need of help.

The second issue is that area under roc values successful identification of true negatives equally to the identification of true positives. In other words, these methods do better in larger populations where fewer people need help. Many vendors use this characteristic of c-stat to make their performance look better. Claiming to have models built of millions of rows of claims data that get a c-stat of .9 is pretty meaningless on a number of levels. Not untrue, just irrelevant.

R-Squared2 - Actuaries have long used R Squared as the standard for evaluating how accurately risk scores predict aggregate population costs. Technically, this statistic measures the proportion of variance in the dependent variable that is predictable from the independent variables.  Unfortunately, this metric suffers from many of the same issues as Area Under ROC by looking at aggregate fit instead of evaluating if the individuals that were recommended for interventions actually benefited.

The task of predicting a group’s total medical cost is simply a different problem than predicting the probability of any one individual experiencing a specific event. Even the latest Society of Actuaries report on risk scoring models cautions users that “R-Squared values alone are not sufficient to explain the predictive abilities of a risk scoring model”3. They warn that R-Square values are easily influenced by outliers and recommend using additional measures.

For more on the origins of risk scores and how care management requirements differ from actuarial science, check out our post: You’re not from around here: Risk Scores in Care Management / Population Health.

Positive Predictive Value4 - (a.k.a. Precision) is a far more useful measure.  In care management, resource allocation decisions, the question that matters is this: If you visit 100 people, how many are the right people? 10? 40? 80? It’s about as straightforward a measure as you can use because it measures the percentage of predictions that are correct.

Positive predictive value also has the advantage of being useful for understanding the likely success of an outreach campaign at different levels of effort when applied incrementally to the population.   

Using Positive Predictive Value to Design Interventions

To illustrate, let’s say Cyft is asked to predict hospital admissions within the next 90 days. A typical Cyft model may have a positive predictive value of .5 - for the first 400 people it identifies as the most likely to be admitted.  That means of the first 400 people Cyft identifies for interventions, 200 would end up having that admission.

It’s then easy to calculate the potential return on investment by working with the care management leadership team to estimate some basic costs and potential savings.

For example, if it costs $500 per person to visit their homes and 60% of the top 400 visits reach someone who would otherwise be headed for a $12,000 admission, the math becomes quite compelling.  We know that no intervention program is perfect, but they don’t need to be. Even if only one-third of those admissions can be prevented,  the ROI will be quite appreciated by both CMO and CFO alike.

Clinical and financial effectiveness of providing first 400 interventions

Quantifying the accuracy at different points in the rank ordered list can then direct how resources should be allocated. For example, if the accuracy (measured by positive predictive value) drops to 40% for those ranked between 400 and 800 by the likelihood of admission, the cost/benefit analysis may suggest that home visits are too costly to provide when they only reach the right people 40%. Instead, the care management team can use less expensive interventions to reach this group; even a phone-based intervention that is effective with 10% of people who are headed for an admission generates a 3.8x ROI within this group.

These examples show how positive predictive value maps directly to care management and can help design care management programs based on the projected ROI. By applying this approach to your own historical data, you can evaluate any care management / individual risk assessment situation, instead of simply accepting advertised performance.

Beyond statistical performance, positive predictive value makes it easy focus on what care management stakeholders really care about - the return on the investment of effort.

Key Takeaways

  • It’s critical to have objective measures of predictive performance in care management/population health.

  • Statistical measures are tools that need to be matched to the jobs they’re best suited for.

  • C-stat / area under ROC and R-squared are not suited for care management applications.

  • Positive predictive value is not only more appropriate but can be used to design care management/outreach efforts based on ROI / cost/benefit.

Leonard D’Avolio, PhD

CEO & Co-founder

  1. "Receiver operating characteristic - Wikipedia.". Accessed 27 Jun. 2017.
  2. "Coefficient of determination - Wikipedia.". Accessed 27 Jun. 2017.
  3. "Accuracy of Claims-Based Risk Scoring Models - Society of Actuaries."Accessed 27 Jun. 2017.
  4. "Positive and negative predictive values - Wikipedia.". Accessed 27 Jun. 2017.