Measuring the quality of care and improving it over time is a fundamental obligation of healthcare providers. Increasingly, quality is also tied to reimbursement and is reported publicly. While I strongly agree with both trends, three recent articles point out some of the challenges ahead.
The common theme among them is that “risk-adjustment” is a hard thing to do. A brief diversion to provide some context.
There are two main ways to measure and compare quality. One is to assess processes of care, such as adherence to established best practices and evidence-based treatment guidelines. This is relatively easy to do, but is by definition highly reductionist. Clinicians understand that “good care” is more than the sum of a handful of isolated activities. Does anyone really think that good diabetes care is equivalent to measuring the HgbA1c level annually and making sure that everyone is screened for diabetic retinopathy? The other way to me is to assess patient outcomes, or how patients actually fare at the hands of different providers. This allows for comparison of endpoints that providers and patients find important, and frees providers to innovate. The challenge is that it is very difficult to separate the relative impacts of patients’ baseline characteristics from the care received in determining the outcomes.
So, for example, an 89 year old woman with a large anterior myocardial infarction has a high risk of dying or developing heart failure or other complications despite state of the art care, whereas a 50 year old man with a small inferior myocardial infarction may do fine even if the care he receives is substandard. In order to tease out the impact of the clinical circumstances, the outcomes have to be adjusted to reflect the baseline severity of the illness in question. The science of risk-adjustment is all about doing that, based on statistical models that predict outcomes based on clinical characteristics. Conceptually straightforward, but tricky to do. Here’s where the 3 papers come in.
In Circulation, Weintraub and Garratt point out that inconsistencies in recording clinical data can make similar patients appear more or less “sick” and that statistical comparisons among providers are hard (or meaningless) if they are based on a very small absolute number of adverse clinical outcomes. In their example:
“If a hospital does 400 percutaneous coronary interventions annually and the expected mortality is ≈1.5% (6 deaths), then how can we evaluate whether the hospital is truly an outlier if it reports, say 9 deaths? The situation becomes much worse for the individual: if an operator performs 80 percutaneous coronary interventions annually and has that same expected mortality rate of 1.5%, then she is expected to lose 1.2 patients each year of practice. Does she become an outlier if she experiences 2 deaths (66% over target)? Three deaths (150% over target)? Is she truly providing superior care if she experiences no deaths?”
In The New England Journal of Medicine the authors point out that patients’ apparent comorbidities depend not just on the true differences in clinical circumstances, but also on how aggressively physicians look. So, for example, patients who live in areas of the country with more “diagnostic intensity” tend to accumulate more diagnoses and therefore appear to be sicker, even though they are not truly at higher risk for adverse outcomes. They suggest ways in which to “adjust the adjustment” based on regional differences in diagnostic intensity.
Finally, Kronick points out in Health Affairs that these differences in diagnostic and coding intensity have huge financial implications. Insurance companies that offer Medicare Advantage plans get reimbursed by CMS according to the measured “risk score” of each beneficiary. This score is based on a complex formula that takes into account all of the patient’s diagnoses. Not surprisingly, these plans have robust programs in place to capture as much data as possible to maximize the risk score. He reports that as a result of this coding intensity, “the average risk score for a Medicare Advantage (MA) enrollees has risen steadily relative to that for fee-for-service (FFS) Medicare beneficiaries, by approximately 1.5% per year.” As a result, he estimates that CMS will spend about $200 Billion more over 10 years than it would if the risk scores were the same for MA and FFS Medicare beneficiaries.
So what are we to conclude?
First, I still believe we should be measuring and aiming to improve patient outcomes. Second, it is clear that basing rankings and payments based on those outcomes is hard, and ought to be based on common standards of measurement and reporting. And, third, complexity invites gaming, so we should continue to look for ways to simplify payments for the care of populations of patients. Capitation anyone?
What do you think?