Center for Nonlinear Studies

Wednesday, March 21, 20123:00 PM - 4:00 PMCNLS Conference Room (TA-3, Bldg 1690)
Seminar
Eyes Wide Open: Iterative Discovery in Large Data Sets without Premature Specialization
Kiri WagstaffJet Propulsion Laboratory
What is the best way to dive in and explore a new data set? We pose a new machine learning problem, iterative discovery, that seeks to enable users to interactively explore a large data set and quickly identify items of interest. At each iteration, the system selects an item for the user to review, and the user provides feedback as to whether it is interesting or not. The system must retain a strong exploratory bias to avoid premature specialization (seeking items similar to those known to be interesting, and missing out on different but equally interesting items). Unlike active learning, the goal is to select items of most use to the user, not to the system. I will describe a solution called Discovery through Eigenbasis Modeling of Uninteresting Data (DEMUD) that avoids premature specialization on the positive (interesting) class by modeling the negative (uninteresting) feedback only. DEMUD is especially effective when the class of interest is rare and/or he terogeneous. I will share results of experiments with image, planetary science, and astronomy data, in which we find that DEMUD discovers items of interest faster than methods that model the positive class, including active learning.

Host: Garrett Kenyon, gkenyon@lanl.gov, 7-1900, IS & T