Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Home 
 People 
 Current 
 Affiliates 
 Alumni 
 Visitors 
 Students 
 Research 
 ICAM-LANL 
 Publications 
 Publications 
 2007 
 2006 
 2005 
 2004 
 2003 
 2002 
 2001 
 2000 
 <1999 
 Conferences 
 Workshops 
 Sponsorship 
 Talks 
 Colloquia 
 Colloquia Archive 
 Seminars 
 Postdoc Seminars Archive 
 Quantum Lunch 
 CMS Colloquia 
 Q-Mat Seminars 
 Q-Mat Seminars Archive 
 Archive 
 Kac Lectures 
 Dist. Quant. Lecture 
 Ulam Scholar 
 Colloquia 
 
 Jobs 
 Students 
 Summer Research 
 Student Application 
 Visitors 
 Description 
 Past Visitors 
 Services 
 General 
 PD Travel Request 
 
 History of CNLS 
 
 Maps, Directions 
 CNLS Office 
 T-Division 
 LANL 
 
Thursday, June 05, 2014
10:00 AM - 11:00 PM
CNLS Conference Room (TA-3, Bldg 1690)

Seminar

Data Smashing: Universal Similarity To Computational Causality In Complex Systems

Ishanu Chattopadhyay
Cornell University

From gravitational fluctuations in space-time to signaling events in living cells, natural phenomena are often driven by complex stochastic processes. Ability to disambiguate the underlying physics or biology from realistic finite-time observations, in the absence of a priori system knowledge, is key to unraveling some of the most challenging scientific mysteries of our time. Such investigation is crucially dependent on the ability to compare and contrast data - to identify connections and spot outliers. The discriminating characteristics to look for in data is often determined by heuristics designed by experts, e.g. , distinct shapes of “folded” lightcurves may be used as “features” to classify variable stars, while determination of pathological brain states might require a Fourier analysis of brainwave activity. Finding good features is non-trivial, and presents the key bottle-neck in automating the search for novel phenomena. Here, we propose a universal solution to this problem: we delineate a principle for quantifying universal causal similarity between sources of arbitrary data streams, without a priori knowledge, features or training. We uncover an algebraic structure on a space of symbolic models for quantized data, and show that such stochastic generators may be added and uniqely inverted; and that a model and its inverse always sum to the generator of flat white noise. Therefore, every data stream has an anti-stream: data generated by the inverse model. Similarity between two streams, then, is the degree to which one, when summed to the other’s anti-stream, mutually annihilates all statistical structure to noise. We call this data smashing. We present diverse applications, including disambiguation of brainwaves pertaining to epileptic seizures, detection of anomalous cardiac rhythms, cognitive finger-printing, and classification of astronomical objects from raw photometry. In our examples, the data smashing principle, without access to any domain knowledge, meets or exceeds the performance of specialized algorithms tuned by domain experts. Finally, we show that how such zero-knowledge techniques lay the framework for seeking out incipient causality networks in complex systems, primary examples being that of emerging high order positional correlations in molecular evolution of retro-viral genomes, causal connections in high-frequency price fluctuations in the financial market, and long-range spatial dependencies in seismic events.

Host: Marian Anghel