Depression Incidence and Vitamin D Concentration in NHANES Survey Data
A Case Study in Collaboration and Causal Inference
The Giant ‘God warriors’ from Nausicaä were mindless monsters unleashed by the peoples of this story for self defense, but they ultimately consumed and burnt the world they were created to protect. There may be a lesson from this regarding how we deploy statistical models.
Talk given at the Fall, 2024 University of Idaho iids seminar series.
Abstract
While planned trials are the best option for evaluating causal pathways, observational data contain a wealth of data that does not have to be restricted to correlation. Causal inference is a valuable statistical methodology for establishing causality. It relies on deep domain knowledge, a clearly stated statistical hypothesis and a transparent process that can be interrogated by other experts. Here we report on a successful collaboration to understand the impact of serum vitamin D concentration on depression risk among pregnant and postpartum women using public survey data from the CDC. We defined our estimand as the slope of vitamin D on depression outcomes and constructed a directed acyclical graph of intervening variables to ascertain how to estimate the parameter of interest. Propensity scores were calculated to adjust for confounding variables, and we estimated our parameter by using Bayesian cumulative logit model for the ordinal depression categories. The results indicate how increasing vitamin D concentration impacts the risk of depression across the study cohorts. Statistical inference, while powerful, relies on carefully considered and clearly articulated hypothesized relationships to return accurate results.