Lab Home | Phone | Search | ||||||||
|
||||||||
Algorithms for extracting semantic information from images and video have dramatically improved over the past five years, with today’s best deep convolutional neural networks (CNNs) now rivaling humans at image recognition. These successes have prompted researchers to pursue building new systems that are capable of a multitude of tasks. In Visual Question Answering (VQA), an algorithm is given a text-based question about an image, and it must produce an answer. Although the first VQA datasets were released less than three years ago, algorithms are already approaching human performance. However, these results may be misleading due to biases in existing benchmarks. In this talk, I review the current state of VQA algorithms, including algorithms from my lab. I then analyze existing datasets for VQA and demonstrate that they have severe flaws and limitations. Lastly, I discuss what a better dataset would look like, and examine which kinds of questions are easy and which are hard for today's best algorithms.Algorithms for extracting semantic information from images and video have dramatically improved over the past five years, with today’s best deep convolutional neural networks (CNNs) now rivaling humans at image recognition. These successes have prompted researchers to pursue building new systems that are capable of a multitude of tasks. In Visual Question Answering (VQA),an algorithm is given a text-based question about an image, and it must produce an answer. Although the first VQA datasets were released less than three years ago, algorithms are already approaching human performance. However, these results may be misleading due to biases in existing benchmarks. In this talk, I review the current state of VQA algorithms, including algorithms from my lab. I then analyze existing datasets for VQA and demonstrate that they have severe flaws and limitations. Lastly, I discuss what a better dataset would look like, and examine which kinds of questions are easy and which are hard for today's best algorithms.Algorithms for extracting semantic information from images and video have dramatically improved over the past five years, with today’s best deep convolutional neural networks (CNNs) now rivaling humans at image recognition. These successes have prompted researchers to pursue building new systems that are capable of a multitude of tasks. In Visual Question Answering (VQA),an algorithm is given a text-based question about an image, and it must produce an answer. Although the first VQA datasets were released less than three years ago, algorithms are already approaching human performance. However, these results may be misleading due to biases in existing benchmarks. In this talk, I review the current state of VQA algorithms, including algorithms from my lab. I then analyze existing datasets for VQA and demonstrate that they have severe flaws and limitations. Lastly, I discuss what a better dataset would look like, and examine which kinds of questions are easy and which are hard for today's best algorithms. Host: Amy Larson |