Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Colloquia Archive 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 CMS Colloquia 
 Q-Mat Seminars 
 Q-Mat Seminars Archive 
 P/T Colloquia 
 Kac Lectures 
 Kac Fellows 
 Dist. Quant. Lecture 
 Ulam Scholar 
 CNLS Fellowship Application 
 Student Program 
 Past Visitors 
 History of CNLS 
 Maps, Directions 
 CNLS Office 
Thursday, June 23, 2011
2:00 PM - 3:00 PM
CNLS Conference Room (TA-3, Bldg 1690)

Postdoc Seminar

Task-specific saliency from sparse, hierarchical models of visual cortex compared to eye-tracking data for object detection in natural video sequences, now with color!

Michael Ham

HMAX/Neocognitron models of visual cortex use learned hierarchical (sparse) representations to describe visual scenes. These models have reported state-of-the-art accuracy on whole-image labeling tasks using natural still imagery (Serre, et al.,[4]). Generalizations of these models (e.g., Brumby, et al., AIPR 2009) allow localized detection of objects within a scene. Itti and Koch [21] have proposed non-task specific models of visual attention ("saliency maps"), which have been compared to human and animal data using eye-tracking systems. Chikkerur, et al., [25] have reported using eye-tracking to compare visual fixations on objects in detection tasks within still images (finding pedestrians and vehicles in urban scenes), compared to an extension of an HMAX model that adds a model of attention in parietal cortex. Here, we describe new work comparing human eye-tracking data for object detection in natural video sequences to task-specific saliency maps generated by a sparse, hierarchical model of the ventral pathway of visual cortex called PANN (Petascale Artificial Neural Network), our high-performance implementation of an HMAX/Neocognitron type model. We explore specific object detection tasks including vehicle detection in aerial video from a low-flying aircraft, for which we collect eye-tracking data from several human subjects. We train our model using hand-marked training data on a few frames, and compare our results to eye-tracking data over an independent set of test video sequences. We also compare our task-specific saliency maps to non-task specific saliency maps (Itti et al. PAMI 1998 [22]; Harel et al. NIPS 2006 [23]).

Host: Peter Loxley,