Center for Nonlinear Studies

Wednesday, July 10, 20133:00 PM - 4:00 PMCNLS Conference Room (TA-3, Bldg 1690)
Seminar
Error Estimation and Classification in the Context of Prior Knowledge
Edward DougertyTexas A&M University
Epistemologically, the most important aspect of a classifier is its error rate because this rate characterizes its predictive capacity. Since, absent knowledge of the feature-label distribution, the error rate must be estimated, error estimation is critical to classification. Absent any prior knowledge whatsoever, that is, in a completely distribution-free scenario, accurate error estimation in small-sample design is highly problematic. In particular, the training data must be used for error estimation. Very rarely are any performance bounds known and, when they are known, they require very large samples, so they are not applicable to small-sample settings. The alternative to distribution-free methods is to utilize prior knowledge in the form of a prior distribution on an uncertainty class of feature-label distributions. In this setting, given sufficient prior knowledge, good error estimation can be achieved. Moreover, since it is necessary to utilize prior knowledge for error estimation, one may as well forego classification rules altogether and simply find the optimal MSE classifier, given the prior knowledge and the data. This talk discusses the difficulties of small-sample error estimation, error estimation in the context of prior knowledge, and MSE-optimal classifier design.

Host: Josephine Olivas