Big Data, Small Data, Meta Data, See Ya Latah
By Jim Fitzgerald
It’s the RESTful, object store, file and block make me snore, it’s still bits and bytes to me……(sorry, Billy)
I just got back from HIMSS. Big data, like savoir faire, is everywhere. The cynical side of me says that technology vendors just want to sell more disk or flash drives. The analytical technical businessperson somewhere inside me says that the real play for the people trying to sell you and me on big data is in the tool suites for managing, monitoring, sorting, searching, and processing big data. We will be lured in with open source tools like Hadoop, and then when the hook is deep enough, the vendor community will point out to us why we need their quasi-proprietary toolkit to enhance the “limited feature set” and “programmer required” aspects of Hadoop.
Don’t read me wrong. I think I am a fan of this. Why the qualification? Big data, taken to its logical extreme and paired with some artificial intelligence, can help my doctor process all the environmental, social, and lifestyle data related to me and correlate it with the highly structured “small data” in my electronic health record to zero in on, and advise on, the real underlying issues behind my health that go well beyond the “sick care symptom” I am presenting that day.
The vague and slowly clarifying healthcare zeitgeist around population health and “well care” probably won’t be realized without employing big data management techniques as an everyday tool. This apparent service to humankind will be aided and abetted by small and large chunks of data streaming up to the cloud from the “personal Internet of things” that I already own and the things I am considering, like Apple Watch.
The cautionary note comes from my informed-paranoid fear of Big Brother. I have Orwellian visions of the healthcare police showing up at my house and herding me into the quarantine van for a stint of “voluntary rehab” after some warehouse full of seemingly disconnected Facebook posts, Yelp reviews, sensor numbers, and Whole Foods Market receipts mistakenly puts me on a high-risk list for the next pandemic. I won’t even go off into the potential side rant on all my voluntary and involuntary surrenders of my privacy rights along the way, although I do think the court system should brace itself for the onslaught.
Let’s hope my paranoia amounts to nothing more than the receptionist not being a bit surprised that I showed up in the doctor’s office that day because the data-lake-fed-AI predicted I would and had already authorized my insurance and sucked all the available fresh data on me into a useful visualization for my clinicians.
What’s the difference between big data and small data? The short version is that big data is generally considered to be an unstructured collection of data objects. Unstructured in this usage implies that there is no classic structured database format imposed on the data. The unstructured data could be a song captured as MP3 or AAC, a simple list of my last 20 temperatures stored in my Apple Watch, or a photo just taken in the ED of the festering wound on my right leg.
Big data is generally big because it is a vast collection of objects. Sometimes big data is big because the individual objects are prodigious on their own, and are also known as BLOBs or binary large objects – for example, your favorite “Breaking Bad” episodes that are still sitting on your iPad. It could really be anything, including a file that has a structure and order of its own, but is being considered as part of a greater set of data molecules in a “data lake.”
Storing data as objects, most commonly done on the Internet with RESTful storage protocols, is an increasingly normal trick in the world of data storage and management. When we store data as objects, we don’t care all that much about structure, or about the nature of the data, or about its accessibility by a particular file system or operating system. That problem is shifted from its traditional place in the OS or the storage array and is moved to the app. (notice I did not say “application.”)
To the extent that we care about the objects in an object store (an allegedly safe place to put objects) we may tag them as they go in with meta data, which everyone who has followed the Edward Snowden story knows is “data about the data.” In fact, the object might get multiple tags. One might be a lookup address or unique ID in the object store and one or more others might be some common descriptor of what is in the object itself. Hence the chaos of unstructured data may in fact, have some external structure imposed on it by some rules-based system ingesting the data objects.
In truth, small data is still where the rubber meets the road in today’s healthcare information systems. The organization or structure of that data by the HCIS in a pre-defined database provides the accuracy and confidence clinicians need to treat me and administrators need to bill me. It generates the endless arguments and the grossly inefficient cottage industry that has sprung up around HIEs. (do we really need to argue on what the “first name” field means?)
Big data can provide inferential context for small data, but it cannot supplant the precise articulation or definitive metrics collected and presented, in context, to help treat me. Small data is so important that we protect it not only in context of its integral structure in a database, but also in some cases at the file system, operating system, and storage subsystem levels. In many cases via RAID technology, backups, and replicas we have so many copies of the same small data that it is really not very small at all; but hey, in the days of petabyte and zettabyte data lakes, a few terabytes looks more like a data puddle.
There is, however, an economic force in play here. Depending on whose numbers you believe, big data on object stores is four to 20 times cheaper to manage than an equivalent amount of small data being managed by a production application in a Tier 1 SAN. The “apps” which are slowly arriving in healthcare (and may continue to arrive) may be happy just to slam a bunch of tags on an object and call it a day. Then we will have “tag oceans” and “tag bagging” toolsets with cute animal logos, and the circle of data will continue to self-perpetuate.
Jim Fitzgerald is technology strategist and EVP at Park Place International.