Every hospital in the US that accepts Medicare (which includes nearly every hospital in the US) is required to give a detailed financial report to the Centers for Medicare and Medicaid Service (CMS) each year. These financial reports are called Medicare Cost Reports and all of the data from these reports going back to 1996 can be retrieved from CMS at this link. In other words, all of the financial data for nearly every hospital in the US since 1996 can be retrieved from a single source.
That’s the good news. Retrieving this data can be challenging though. In this section I will attempt to explain as clearly as I can exactly how to retrieve data that’s been submitted in the Medicare cost reports for anyone who wishes to do so.
First, all of the cost report forms hospitals are required to fill out can all be downloaded here. There are two separate sets of cost report forms. The 1996 cost report forms were used from 1996 until about April 2010 and the 2010 forms have been used since then. Each set contains several dozen forms for the hospitals to fill out.
These forms request an extensive amount of information about hospitals. This information ranges from basic financial questions about billing, revenue and operating costs to specific questions like how many square feet of floor space each unit has. Since not every form is copletely filled out by every hospital each year, the amount of information provided by each hospital varies.
The scope of my research centered on the questions hospitals answered on only three of the forms for each set of cost reports, so this explanation will focus mostly on that information. The process I’m explaining here can be applied to analyzing all cost report data though.
How the data is filed
Once the hospitals fill out their cost reports and submit them to CMS, the information from the forms is filed on two enormous CSV files for all hospitals in the US each year. The first CSV file is called the alpha. It contains all of the worded answers the hospitals provided such as answers to yes/no questions, in which city the hospital is located, etc… The second CSV document is the numeric document and contains all of the answers that are numbers, such as the total amount a hospital issued in billed charges that year.
The size of each of these documents presents the first real challenge to extracting the data they provide. For example, the numeric document contains about 350-400 megabytes of data each year or about 11 million lines of data. Documents this large can’t be opened as standard spreadsheets so you’ll need special software that can retrieve data from such huge documents. If you try opening either document in Excel, about 90% of the data will be missing from the spreadsheet. Numbers won’t even try to open them.
Interpreting the data
The next challenge is interpreting the data you’ve retrieved. Every hospital in the US is filed under a different filing number each year. As an example, here is the numeric data from the cost report provided by a hospital in 2012. The information for this hospital was filed under the number 537439.
Of note: Medicare files these documents based on the beginning of the fiscal year for each hospital. So a hospital that has a fiscal year of 7/1/2012-6/30/2013 would be placed in the 2012 document.
So, which hospital provided this data? To find out, you first need to find that hospital’s filing number on a report that comes with the alpha and numeric documents. On this report you’ll find a hospital code number that corresponds to the hospital filing number for that year. Every hospital has a permanent code number that can be used to identify it. This permanent code number needs to be matched to the hospital’s filing code number for identification.
In the case of our example, hospital filing code 537439 can be found on line 5161 of the report (column A) which corresponds to hospital 50625 (column C). Going to the list of hospital codes, we can see that hospital 50625 is Cedars-Sinai hospital in Los Angeles.
On the first line, you can see that column A has the filing code, Column B shows an A followed by six zeros, column C shows a 100, D a 200 and E the number 23591105.
What do these entries mean? Column B lists the specific form which this entry was taken. In this case it was worksheet A. Column C lists the line on worksheet A and Column D shows the column for this entry. For both the line and column entries an extra two zeros are added to the listing.
So the first line of our data shows us that the answer for line one, column two of worksheet A for Cedars-Sinai hospital in 2012 was 23591105. Looking at worksheet A, you can see that column 2 line 1 is for “Capital related costs-building and fixtures. This means that Cedars-Sinai spent about $23.6 million on their building and fixtures between 7/1/2012 and 6/30/2013.
An index for the worksheet codes for all the cost report forms is here. The main focus of my research was on billing charges, total operating costs, actual hospital revenue and hospital census data. To get this information I used the data from three of the 1996 forms: Worksheets G2, G3 and S31 and three of the 2010 forms: Worksheets G2, G3, and S301.
For example, line one of column one on worksheet G3 asks for total patient revenue (also called gross patient revenue). This is the total amount the hospital billed for that year for all of the services it provided for both inpatients and outpatients. On line 8801 of the numeric document for Cedars-Sinai there’s an entry for G300000 (form G3), 100, 100 (line 1, column 1) for 10589887733. That means the total amount Cedars-Sinai billed for every service they provided that year was about $10.6 billion.
The code key for all of the data I retrieved for all of the hospitals is here.