Handling missing data in surgical metrics

  • User experience research
  • Data quality
  • Missing data

How I improved trust in data aggregations for clinical insights despite data gaps

Context

When does missing data impact interpretation and decision making? Hospital supply chain users need to compare surgical cases based on metrics like spend or patient outcomes, but missing data can skew the analyses. It's important to identify when missing data impacts interpretation to avoid misleading conclusions, ensuring transparency in the data provided and clarity around any aggregations or transformations. Since sporadic missing data may affect the metrics of interest, users might inadvertently draw invalid conclusions.

Process

To ensure reliable data interpretation given missing values in surgical case metrics, I led the development of a new data check feature. I didn’t have direct access to end users for this study, but I did have access to internal professionals who either had direct experience in roles similar to our end users or who deeply understood the client context. I investigated the impact of missing data on aggregate values by comparing medians calculated from available data against those imputed with extreme values. I then sought expert opinions on the acceptability of these types of discrepancies for clinical and administrative decisions.

Solution

My analysis revealed that professionals could trust medians with up to 20% missing values, but for 10th and 90th percentiles that acceptibility faded if any more than 4% of values were missing. Based on these findings, I recommended masking calculations when the scale of missing data is above these thresholds as well as marking aggregate values with any missing data, but within the defined thresholds, with an asterisk and explanation.

Outcome

The feature changes introduced from this assessment greatly increased transparency into data quality for our users, and gave them trust that the data displayed is actionable and not subject to bias caused by missing data.

Insights

I gained a greater appreciation for statistical vs. clinical/administrative significance. In this case, clinical/administrative significance refers to the level of difference from an expected value that would cause a potential change in action.

Medians are far more robust than outlier percentiles. This makes sense, given that this exercise assumes missing values are extreme (“outlier”) values that need to make up a significant fraction of the set to meaningfully shift the middle of a typical distribution; whereas outlier percentiles (such as 10th or 90th percentiles) are defined as closer to those extreme points, leading to less stability.

What I would do differently next time

I would have loved to spend more time working directly with clients to dig upstream and understand the cause of these missing data points.

While I think this method I developed is replicable in similar contexts, I would caution others from directly applying the bottom line results I came up with elsewhere. That said, I’d love to replicate this process in other data contexts to understand if there are generalizations that could be made around trust in aggregate values when there are known missing data points.

One unknown before this research was how sensitive aggregate metrics would be to missing data and if any missing data could make resulting aggregates suspect. If that had been the case, we may have needed to rethink this product feature to maintain trust in the product itself. Fortunately, missing data wasn’t common within our clients’ data, and aggregate metrics were relatively robust to data missingness, so this assessment didn’t challenge the product but rather strengthened it. Given enough time and resources, we may have wanted to confirm the level of data trust around this feature with our users.