Home » Readers Write » Currently Reading:

Readers Write: Almost Real, But Not Quite: Synthetic Data and Healthcare

December 20, 2017 Readers Write No Comments

Almost Real, But Not Quite: Synthetic Data and Healthcare
By David Watkins

image

David Watkins, MS is a data scientist at
PCCI in Dallas, TX.

We all want to make clinical prediction faster and better so we can rapidly translate the best models into the best outcomes for patients. At the same time, we know from experience that no organization can single-handedly transform healthcare. Momentous information hidden in data silos across sectors of the healthcare landscape can help demystify the complexities around cost and outcomes in the United States, but lack of transparency and collaboration due to privacy and compliance concerns along data silos have made data access difficult, expensive, and resource-intensive to many innovation designers.

Until recently, the only way to share clinical research data has been de-identification, selectively removing the most sensitive elements so that records can never be traced back to the actual patient. This is a fair compromise, with some important caveats.

With any de-identified data, we are making a tradeoff between confidentiality and richness, and there are several practical approaches spanning that spectrum. The most automated and private method, so-called “Safe Harbor” de-identification, is also the strictest about what elements to remove. Records de-identified in this way can be useful for many research cases, but not time-sensitive predictions, since all date/time fields are reduced to the year only.

At the other extreme, it is possible to share more sensitive and rich data as a “Limited Data Set” to be used for research. Data generated under this standard still contains protected health information and can only be shared between institutions that have signed an agreement governing its use. This model works for long-term research projects, but can require lengthy contracting up front and the data is still locked within partner institutions, too sensitive to share widely.

What’s a novel yet pragmatic solution to ensure that analytics advancement is catalyzed in healthcare industry? We are exploring “synthetic data,” data created from a real data set to reflect its clinical and statistical properties without showing any of the identifying information.

Pioneering work is being done to create synthetic data that is clinically and statistically equivalent to a real data source without recreating any of the original observations. This notion has been around for a while, but its popularity has grown as we’ve seen impressive demonstrations that implement deep learning techniques to generate images and more. If it’s possible to generate endless realistic cat faces, could we also generate patient records to enable transparent, reproducible data science?

The deep learning approach works by setting up two competing networks: a generator that learns to create realistic records and a discriminator that learns to distinguish between real and fake records. As these two networks are trained together, they learn from their mistakes and the quality of the synthesized data improves. Newer approaches even allow us to further constrain the training of these networks to match specific properties of the input data, and to guarantee a designated level of privacy for patients in the training data.

We are investigating state-of-the-art methodologies to evaluate how effective the available techniques are at creating data sets. We are devising strategies for overcoming technology and scientific barriers to open up an easy access realistic data platform to enable an exponential expansion of data-driven solutions in healthcare.

SNAGHTML83ceea8 image

Can synthetic data be used to accelerate clinical research and innovation under strong privacy constraints?

image

In other data-intensive areas of research, new technologies and practices have enabled a culture of transparency and collaboration that is lacking in clinical prediction. The most impactful models are built on confidential patient records, so sharing data is vanishingly rare. Protecting patient privacy is an essential obligation for researchers, but privacy also creates a bottleneck for fast, open, and broad-based clinical data science. Synthetic data may be a potential solution healthcare has been waiting for.

View/Print Text Only View/Print Text Only


HIStalk Featured Sponsors

     







Subscribe to Updates

Search


Loading

Text Ads


Report News and Rumors

No title

Anonymous online form
E-mail
Rumor line: 801.HIT.NEWS

Tweets

Archives

Founding Sponsors


 

Platinum Sponsors


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Gold Sponsors


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Reader Comments

  • DrM: Found the person who knows nothing about user change management. (Did I do it right, is this how we play the game?) S...
  • Seargant Forbin: @ What's with the Fairview guy? I don't think this Apple comparison is helping your case any. I bet if you spoke with de...
  • Bobby: Great point Bob. Cerner and Epic are both organic companies that value a “one big honking system” approach to thin...
  • AynRandWasDumb: "The most recent Epic trick I heard is that, now that Epic is requiring every customer to move to quarterly updates inst...
  • Bob: Even though Apple could take IP directly from the Apple ecosystem developers, their usual model is to just buy the compa...
  • You might say I'm a dreamer...: Will this Cerner dust up with the DoD now give us a real granular discussion on a national level as to what Interoperabi...
  • Eddie T. Head: To claim that Apple is a hardware company and not a software company is quite odd. Without their software, Apple's hardw...
  • DrM: Epic's model does assert the ability to use any IP in App Orchard without compensation or limitation, it's why the few v...
  • Matt: VA CIO: Expect another 10 years of VistA in facilities during new EHR rollout This is clear indication of how the VA ...
  • Satan warming up the fiddle?: Coming in to an election season where healthcare will most certainly be part of the debate are we starting to see the fi...

RSS Industry Events

  • An error has occurred, which probably means the feed is down. Try again later.

Sponsor Quick Links