Readers Write: Big Data, Small Data, Meta Data, See Ya Latah

May 6, 2015 Readers Write 3 Comments

Big Data, Small Data, Meta Data, See Ya Latah
By Jim Fitzgerald

It’s the RESTful, object store, file and block make me snore, it’s still bits and bytes to me……(sorry, Billy)

I just got back from HIMSS. Big data, like savoir faire, is everywhere. The cynical side of me says that technology vendors just want to sell more disk or flash drives. The analytical technical businessperson somewhere inside me says that the real play for the people trying to sell you and me on big data is in the tool suites for managing, monitoring, sorting, searching, and processing big data. We will be lured in with open source tools like Hadoop, and then when the hook is deep enough, the vendor community will point out to us why we need their quasi-proprietary toolkit to enhance the “limited feature set” and “programmer required” aspects of Hadoop.

Don’t read me wrong. I think I am a fan of this. Why the qualification? Big data, taken to its logical extreme and paired with some artificial intelligence, can help my doctor process all the environmental, social, and lifestyle data related to me and correlate it with the highly structured “small data” in my electronic health record to zero in on, and advise on, the real underlying issues behind my health that go well beyond the “sick care symptom” I am presenting that day.

The vague and slowly clarifying healthcare zeitgeist around population health and “well care” probably won’t be realized without employing big data management techniques as an everyday tool. This apparent service to humankind will be aided and abetted by small and large chunks of data streaming up to the cloud from the “personal Internet of things” that I already own and the things I am considering, like Apple Watch.

The cautionary note comes from my informed-paranoid fear of Big Brother. I have Orwellian visions of the healthcare police showing up at my house and herding me into the quarantine van for a stint of “voluntary rehab” after some warehouse full of seemingly disconnected Facebook posts, Yelp reviews, sensor numbers, and Whole Foods Market receipts mistakenly puts me on a high-risk list for the next pandemic. I won’t even go off into the potential side rant on all my voluntary and involuntary surrenders of my privacy rights along the way, although I do think the court system should brace itself for the onslaught.

Let’s hope my paranoia amounts to nothing more than the receptionist not being a bit surprised that I showed up in the doctor’s office that day because the data-lake-fed-AI predicted I would and had already authorized my insurance and sucked all the available fresh data on me into a useful visualization for my clinicians.

What’s the difference between big data and small data? The short version is that big data is generally considered to be an unstructured collection of data objects. Unstructured in this usage implies that there is no classic structured database format imposed on the data. The unstructured data could be a song captured as MP3 or AAC, a simple list of my last 20 temperatures stored in my Apple Watch, or a photo just taken in the ED of the festering wound on my right leg.

Big data is generally big because it is a vast collection of objects. Sometimes big data is big because the individual objects are prodigious on their own, and are also known as BLOBs or binary large objects – for example, your favorite “Breaking Bad” episodes that are still sitting on your iPad. It could really be anything, including a file that has a structure and order of its own, but is being considered as part of a greater set of data molecules in a “data lake.”

Storing data as objects, most commonly done on the Internet with RESTful storage protocols, is an increasingly normal trick in the world of data storage and management. When we store data as objects, we don’t care all that much about structure, or about the nature of the data, or about its accessibility by a particular file system or operating system. That problem is shifted from its traditional place in the OS or the storage array and is moved to the app. (notice I did not say “application.”)

To the extent that we care about the objects in an object store (an allegedly safe place to put objects) we may tag them as they go in with meta data, which everyone who has followed the Edward Snowden story knows is “data about the data.” In fact, the object might get multiple tags. One might be a lookup address or unique ID in the object store and one or more others might be some common descriptor of what is in the object itself. Hence the chaos of unstructured data may in fact, have some external structure imposed on it by some rules-based system ingesting the data objects.

In truth, small data is still where the rubber meets the road in today’s healthcare information systems. The organization or structure of that data by the HCIS in a pre-defined database provides the accuracy and confidence clinicians need to treat me and administrators need to bill me. It generates the endless arguments and the grossly inefficient cottage industry that has sprung up around HIEs. (do we really need to argue on what the “first name” field means?)

Big data can provide inferential context for small data, but it cannot supplant the precise articulation or definitive metrics collected and presented, in context, to help treat me. Small data is so important that we protect it not only in context of its integral structure in a database, but also in some cases at the file system, operating system, and storage subsystem levels. In many cases via RAID technology, backups, and replicas we have so many copies of the same small data that it is really not very small at all; but hey, in the days of petabyte and zettabyte data lakes, a few terabytes looks more like a data puddle.

There is, however, an economic force in play here. Depending on whose numbers you believe, big data on object stores is four to 20 times cheaper to manage than an equivalent amount of small data being managed by a production application in a Tier 1 SAN. The “apps” which are slowly arriving in healthcare (and may continue to arrive) may be happy just to slam a bunch of tags on an object and call it a day. Then we will have “tag oceans” and “tag bagging” toolsets with cute animal logos, and the circle of data will continue to self-perpetuate.

Jim Fitzgerald is technology strategist and EVP at Park Place International.

Mike Lucey says:

May 7, 2015 at 1:14 pm

Well said Jim,

I echo your concern about the collusion of big data and implied permissions that I scatter around my “things”. I fully expect the day will come when my thermostat will auto-order my meat lovers pizza while my scale is auto-requesting a Lipitor prescription.

Matt Durkin says:

May 7, 2015 at 6:22 pm

Jim, I enjoyed your blog. My only question now is if your doctor is going to call me to remind you to get your flu shot!

dave hughes says:

December 9, 2015 at 2:22 pm

My, my – Jim as insightful as I remember you many years ago. Relatively unassailable premise and conclusion

all the best
d

RE: Change HC/RansomHub, now that the data is for sale, what is the federal govt. or DOD doing to protect…

We think that Medicine is a scientific discipline, and we believe it too, right? So what is a core tenet…

I want to add the following. I'm an IT professional. In the course of my professional duties, I'm expected to…

Seconded! I have thought for a long time that the expectations that physicians have that they can just do...stuff...without ever…

Re: "Physicians are just about the only professionals who are expected to enter data into a computer system while doing…

Recent Posts

Home » Readers Write » Currently Reading:

Readers Write: Big Data, Small Data, Meta Data, See Ya Latah

HIStalk Featured Sponsors

Currently there are "3 comments" on this Article:

Text Ads

RECENT COMMENTS

HIStalk Sponsor Announcements

Job postings

Founding Sponsors

Platinum Sponsors

Gold Sponsors

Industry Events

Webinars

Sponsor Quick Links