Lab Home | Phone | Search
Center for Nonlinear Studies  Center for Nonlinear Studies
 Home 
 People 
 Current 
 Executive Committee 
 Postdocs 
 Visitors 
 Students 
 Research 
 Publications 
 Conferences 
 Workshops 
 Sponsorship 
 Talks 
 Seminars 
 Postdoc Seminars Archive 
 Quantum Lunch 
 Quantum Lunch Archive 
 P/T Colloquia 
 Archive 
 Ulam Scholar 
 
 Postdoc Nominations 
 Student Requests 
 Student Program 
 Visitor Requests 
 Description 
 Past Visitors 
 Services 
 General 
 
 History of CNLS 
 
 Maps, Directions 
 CNLS Office 
 T-Division 
 LANL 
 
Wednesday, July 19, 2017
11:00 AM - 12:00 PM
CNLS Conference Room (TA-3, Bldg 1690)

Seminar

Visual Question Answering: Algorithms, Datasets, & Challenges

Christopher Kanan
Rochester Institute of Technology

Algorithms for extracting semantic information from images and video have dramatically improved over the past five years, with today’s best deep convolutional neural networks (CNNs) now rivaling humans at image recognition. These successes have prompted researchers to pursue building new systems that are capable of a multitude of tasks. In Visual Question Answering (VQA), an algorithm is given a text-based question about an image, and it must produce an answer. Although the first VQA datasets were released less than three years ago, algorithms are already approaching human performance. However, these results may be misleading due to biases in existing benchmarks. In this talk, I review the current state of VQA algorithms, including algorithms from my lab. I then analyze existing datasets for VQA and demonstrate that they have severe flaws and limitations. Lastly, I discuss what a better dataset would look like, and examine which kinds of questions are easy and which are hard for today's best algorithms.Algorithms for extracting semantic information from images and video have dramatically improved over the past five years, with today’s best deep convolutional neural networks (CNNs) now rivaling humans at image recognition. These successes have prompted researchers to pursue building new systems that are capable of a multitude of tasks. In Visual Question Answering (VQA),an algorithm is given a text-based question about an image, and it must produce an answer. Although the first VQA datasets were released less than three years ago, algorithms are already approaching human performance. However, these results may be misleading due to biases in existing benchmarks. In this talk, I review the current state of VQA algorithms, including algorithms from my lab. I then analyze existing datasets for VQA and demonstrate that they have severe flaws and limitations. Lastly, I discuss what a better dataset would look like, and examine which kinds of questions are easy and which are hard for today's best algorithms.Algorithms for extracting semantic information from images and video have dramatically improved over the past five years, with today’s best deep convolutional neural networks (CNNs) now rivaling humans at image recognition. These successes have prompted researchers to pursue building new systems that are capable of a multitude of tasks. In Visual Question Answering (VQA),an algorithm is given a text-based question about an image, and it must produce an answer. Although the first VQA datasets were released less than three years ago, algorithms are already approaching human performance. However, these results may be misleading due to biases in existing benchmarks. In this talk, I review the current state of VQA algorithms, including algorithms from my lab. I then analyze existing datasets for VQA and demonstrate that they have severe flaws and limitations. Lastly, I discuss what a better dataset would look like, and examine which kinds of questions are easy and which are hard for today's best algorithms.

Host: Amy Larson