Wednesday, April 01, 20152:00 PM - 3:00 PMCNLS Conference Room (TA-3, Bldg 1690)|
Causality Streamlines: Uncovering Disease Etiology From Zero-knowledge Machine Inference of Statistical Causality
Ishanu ChattopadhyayUniversity of Chicago
While correlation measures are extensively used to discern statistical relationships in almost all branches of data-driven scientific inquiry, what we are really interested in is the existence of causal dependence. A principled causality detection algorithm, if available, will transform clinical informatics. We can then potentially use the rapidly expanding databases of detailed medical information to ask questions not deemed answerable before, e.g., do co-occurrences of different pathologies indicate causally related etiologies? Can we isolate the causal contributors of different variables in any particular disease progression from large scale clinical databases of patient history? Given the reported incidences of a particular disease phenotype (e.g. Staphylococcus aureus infections) over time, at the spatial resolution of US counties, can we computationally discern if hydrological features such as streams, lakes and rivers, causally impact the spread of the particular infection? Or for similar data with influenza, can we identify the causal precursors of the flu season? Does the migrating avian population (particularly mallards), within which human influenza virus have been shown to be demonstrably present, play a causal role in triggering the flu season every year? Notably, many such questions relate to emergent phenomena; these make sense only at the population level, and is answerable only if we can confidently reverse-engineer causal influence from data. This talk delineates a non-parametric computation of such driving influences from sequential data streams, essentially solving Granger’s prima facie causality inference, without assuming any specific dynamical structure beyond that of ergodicity and stationarity. In contrast to regression based implementations of Granger’s idea, no specific model structure needs to be injected a priori; and unlike existing nonparametric tests e.g. the Hiemestra-Jones correlation integrals which test for causal influence at a pre-specified significance level, we can actually infer generative models of cross-dependence from data. This combines the advantage of obtaining generative models enjoyed by parametric regression based approaches, and non-parametric inference achieved in the HJ test; creating a powerful new tool for causal inquiry, with uncharted implications for biomedical informatics.
Host: Sara del Valle