- HIStalk - https://histalk2.com -

Readers Write: Seven Safety Checks Before Diving into the Big Data Ocean

Seven Safety Checks Before Diving into the Big Data Ocean
By Frank Poggio

When I last visited the topic of big data (BD) and analytics, I proposed that big data could easily become a wasteland for health providers and the next EHR boondoggle that could generate wads of cash for system vendors. I noted a large investment in big data could easily go for naught if we do not pay attention to at least two key issues. They were employing bad data as a foundation and blindly accepting analytics or mathematical models that do not correctly represent your world.

I received several responses to that piece, some stating that I was opposed to big data and analytics. Not true. As a one-time practitioner of analytics, back when it was called operations research in commercial industry, I saw firsthand the value of BD but also the very large expense and pitfalls. At the close of my first writing, I promised to follow up with a list of safety checks you should employ to avoid drowning in the big data ocean. Here they are.

Bad data. Big data and bad data do not mix. Before you jump in, you should get clear answers to these questions. Do you thoroughly understand what is in your data? How old is it? Where and how it was originally generated? What coding structures were used? How has the coding structures changed over time? How many system conversions and mutations has the data gone through? What is the consistency and integrity of your data?

Scrubbing your data, particularly if it goes back several years and/or transcends different information systems, is critical. A recent HIStalk piece written by Dan Raskin, MD covered this topic well. If you can’t answer these questions before you apply analytics, then all the conclusions you draw from your sophisticated analytics will be on a foundation of quicksand. And be aware, scrubbing historical data can be very time consuming and costly, which leads us to the next safety check.

Focus. Keep your focus as narrow as possible. When you jump in the BD ocean, keep your eyes on that floating life preserver. If you do not, you’ll get overwhelmed and sink fast. Most big data projects will fail because you tried to do too much or you were too broad in our goals, which led to loss of control, missed target dates, and over budget situations.

It’s very easy to fall into this riptide. For example, with a sea of data at our disposal, we surely should be able to predict census or institution-wide patient volumes for the next five or 10 years. The complexity of such an analytical model could easily overwhelm. As an alternative, try something more restricted and focused. For example, maybe just trying to predict volumes of a narrow specialty practice or identifying the three primary causes of re-admits. With a narrow focus, the probability of your model being useful will be far greater, which takes us to our next safety check.

Validate your model. Run simulations against past time periods with known outcomes. Did you get the answer you expected? If not revise, or replace the algorithm(s). Smaller models are easier to validate. Apply basic common sense against any prediction. Remember the end user, usually an executive or physician group, must buy in to the model logic and have full trust in the data before they can accept any predictions. If they do not understand it, they will not trust the forecasts and it the model will never be used. Once smaller models are validated, you can link multiple ones together to create larger organizational-wide models.

Change can sink your analytics. One of the primary reasons to apply models to big data is to predict change, then use that new knowledge to deal with the change before it becomes a problem. Unfortunately, there are some changes that your historical big data can’t predict. You need to understand them and factor them into any decisions you make. For example, can your model anticipate changes within the practice of medicine? Medical protocols change almost every month due to new research and new technologies. Hardly a week goes by without reading about a new protocol for medications, diagnostic testing, and chronic disease management. Your ocean of big data cannot predict these changes, and yet if you are planning a new medical service, you need to somehow factor in these elements.

Another unpredictable element is government regulations. A good deal of industry change will be driven by what party wins each election. Today it’s MU, ACOs, P4P, value-based purchasing, and many other regulations that did not exist five years ago. Tomorrow it will be something else. If you can predict those changes, you probably would do better in another profession. The analytics and models you build will only reflect past practices and governmental policies, and like they say on Wall Street, past performance may not be indicative of future results. In modeling building, these are known as ad hoc or exogenous variables. You take the model’s output then make a one-time swag adjustment to reflect your best guess for exogenous factors.

Pick the low-hanging fruit first. There are two major kinds of analytics: strategic models and operational models. Strategic analytics try to predict enterprise-wide outcomes and volumes five to 10 years out. They focus on questions such as: What are the population trends in our market? What patient programs should we be moving towards? Can they be financially viable? Where should they be located? What are the competitive factors?

Operational models deal with more immediate issues, such as: How can we handle higher patient volumes using less resources? What can we do to reduce re-admits? What is the ROI on a large capital investment? They are by nature near term and usually address efficiency questions.

Due to their complexity and time horizon, strategic analytics are tough to measure in terms of efficacy. Operational models are far easier to measure, while strategic models are sexier and costlier to build. Until you have had repeated good results with operational models, you should stay away from strategic models. The low-hanging fruit are in operational analytics. Moreover, there are a myriad of them that could quickly generate real ROI and may only require “little data.”

Paralysis by analysis. You could spend a long time drifting in the big data ocean and paralysis by analysis could easily set in. Remember, there will always be flaws in your historical data, and no model can be perfect, so do not let perfection become the enemy of good. This is not an academic exercise and you do not have an unlimited budget. All analytics need to be improved, so do it incrementally. Lastly, after many iterations and revisions and based on your real-life experiences, if the model still does not make sense to you, toss it out and move on.

Educate and understand. What problems are you really trying to solve? Many organizations waste time and money building models for problems they really do not have or understand. Due to hype, department managers come to believe the model will fix operational problems. Department managers need to be trained in how to use and interpret these powerful tools. Understand what the tool can and can’t do and what the real limitations of the model are. This step must come first or analytics projects can easily run amok

If you use outside resources, make sure they understand the healthcare industry and your particular venue. Being expert in quantitative tools is not enough. Having a sound footing in the complex relationships that drive the delivery of patient care is critical to the success of employing analytical tools.


The annual budget is an excellent example of an operational model. Before you jump into BD, take this test. How effective is your organization at budgeting? How close do you routinely come to hitting budget targets? Have you used variable budgeting successfully?

If you can’t answer these questions positively, you are not ready to swim in the BD ocean. Big data and analytics can be powerful tools when used with foresight and care. Applying BD without clearly identifying your objectives, being familiar with the weaknesses of your data, and not understanding the limits of mathematical modeling or analytical tools will be a costly and fruitless exercise.

Frank Poggio is president of The Kelzon Group.