Center for Nonlinear Studies

Monday, March 02, 20093:00 PM - 4:00 PMCNLS Conference Room (TA-3, Bldg 1690)
Colloquium
Untangling Visual Object Recognition in the Brain
Jim DiCarloMcGovern Institute for Brain Research and Dept. of Brain and Cognitive Sciences Massachusetts Institute of Technology
Although object recognition is fundamental to our behavior and seemingly effortless, it is a remarkably challenging computational problem because the visual system must somehow tolerate tremendous image variation produced by different views of each object (the “invariance” problem). In this talk, I will present a framework for thinking about that computational crux of object recognition and how it might be solved (“untangling” object manifolds). Our current neurophysiological evidence suggests that the primate brain accomplishes this untangling by gradually transforming its initial neuronal population representation (a photograph on the retina) to a new, explicit form of neuronal population representation at the highest level of the primate ventral visual stream (inferior temporal cortex, IT). We have recently discovered that unsupervised learning of naturally-occurring temporal contiguity cues in the visual environment can play a key role in constructing the untangling solution in IT.

The only way to know if such neuroscience results can explain visual recognition is to incorporate them into instantiated computational models. But the challenges are formidable: 1) neuroscience data do not fully constrain many of the important parameters (“details”) of such models, 2) the primate visual system operates at high dimensionality and with years of natural experience, and 3) the community lacks well-defined methods of assessing the progress of such models. To approach these problems, we and our collaborators are leveraging recent advances in stream processing hardware (high-end GPUs and the Playstation 3's CellProcessor). In analogy to high-throughput screening approaches in molecular biology, we are screening among thousands of network architectures using appropriate recognition benchmarks. We found that this approach gives reproducible gains in recognition performance and it can offer insight into which model parameters are most important. As available computational power continues to expand and new neuroscience data are acquired, this approach has the potential to greatly accelerate our understanding of how the visual system accomplishes object recognition.

Host: Vadas Gintautas, T-4/CNLS