Another Step Down the Rabbit Hole
By Data Nerd
On Tuesday, May 7 at 9:53 p.m., the Center for Medicare and Medicaid Services released a new open dataset to shed light on hospital pricing variations. The Times and The Washington Post (among others) published lengthy online articles (presumably overnight), complete with data visualizations to assist consumers in understanding the vast differences between what hospitals charge Medicare for their services. CMS released state and national averages a week later after The Washington Post published an article aggregating the data for comparison on the state level.
On the first day of its release, the dataset was downloaded over 100,000 times, displaying the large appetite that the public has for open healthcare pricing data. What is unfortunate is that this data set is fundamentally flawed for the purpose for which it was made public.
In the age of high(er)-tech journalism, I was disappointed to read article after article that overlooked the data documentation and went straight to the numbers and visualizations that could be concocted. Even HHS’s own chief technology officer got it wrong when he referred to the data as, “The actual prices that hospitals charge Medicare for the top 100 procedures across the country.”
The data given are not the top 100 procedures. They are the top 100 DRGs, which means that in any given claim, there could have been anywhere between one and 25 procedures performed (and they do vary, wildly.)
If the goal is to compare hospital’s charge rates, you need a normalized cohort. Or in layman’s terms, you need to compare apples to apples instead of kumquats to grapefruits. People with the same DRG suffer from the same diagnosis and often share similar courses of treatment, but wouldn’t a better analysis look at patients that all had the same procedures?
A DRG is a diagnostic related group, a very broad categorization of the primary diagnosis that the hospital is treating. A claim only has one DRG, but can have anywhere between one and 25 procedure codes. The data as it is currently presented is inherently incapable of pointing to charging discrepancies because a claim could be charging for one procedure or 25.
Personally, I think the move was more of an administrative muscle flex going into the healthcare exchanges set to open in October — fueled by the threat of public perception rather than an attempt to shed (non-refracted) light on the subject. A more accurate approach would have been to isolate claims where only one procedure was performed and provide the average charge or reimbursement data for those. Unfortunately, CMS charges nearly $4,000 for the data in a format that would allow this type of analysis.
This open dataset is another unfortunate example about our exuberance for “big data” giving way to our human propensity to under-analyze and take misinformed baby steps toward a greater goal, however noble it may be. As more and more data is presented for public digestion, its dissemination must be properly documented and cited if it is to be used to drive analytical outcomes.