The expression “big data” leads to some pretty reasonable assumptions: 1) you need huge volumes of data for machine learning and 2) more is more. Neither is particularly helpful in healthcare.

We usually deal with smaller sets of rich but messy data (sample sizes in the hundreds or thousands). 10k rows vs 10M rows of claims data tend to be equally useful (or useless) for most problems. Smaller sets of data that capture a patients’ condition, clinical concerns, and actions are more useful than billions of row of data captured elsewhere that are used to create generic models for “average” patients. In healthcare the opportunity is about making the most of relevant data.

Ex. We’re helping a surgical hospital identify patients that are likely to have longer lengths of stay post laminectomy using only the data they’ll have at decision time, pre-operative surveys. 70% of these surveys are free text, the rest is likert scale data (1 to 5). We have 429 cases to learn from. The results beat claims-based approaches but also the validated measures of acuity used by surgeons today (ASA scores). Unlike standard scores where most everyone is a 3 or 4, we can offer back lists of people that we’re 95% confident will have longer stays, and 85%, and so on to inform decision making*. I’d be thrilled to have more relevant data and we’re working on getting better pre- & post-op info based on this experience. But that’s a very different approach than “big data” implies.

One more example w even “smaller” data: We did work at the Dept of Veterans Affairs to determine if Vets w PTSD were receiving “best practice” care at reported rates. Here’s the link to the study. There we had therapy notes for 300 cases.

Big takeaway: Those hoping to use their data to solve specific problems in healthcare would be better off focusing on how to make the most of the relevant (albeit messy) data they have access to versus gathering / formatting large collections of data with an “if we build it they will come” approach.

*I can share more details if helpful. Just trying to walk that fine line between making a point w experience versus blatant marketing.