Assessing Active Learning Performance at Runtime

Daniel Kottke, Jim Schellinger, Denis Huseljic, and Bernhard Sick

Classification algorithms aim to predict an unknown label (e.g., a quality class) for a new instance (e.g., a product). Therefore, training samples (instances and labels) are used to deduct classification hypotheses. Often, it is relatively easy to capture instances but the acquisition of the corresponding labels remain difficult or expensive. Active learning algorithms select the most beneficial instances to be labeled to reduce cost. In research, this labeling procedure is simulated and therefore a ground truth is available. But during deployment, active learning is a one-shot problem. Acquiring additional labels to evaluate the performance of the algorithm would increase cost and is therefore counterproductive. In this article, we formalize the task and review existing strategies to assess the performance of an actively trained classifier during training. Furthermore, we identified three major challenges: 1) to derive a performance distribtion, 2) to remain representativeness of the labeled subset, and 3) to correct against sampling bias induced by an intelligent selection strategy. A qualitative analysis and experiments with differently biased selections evaluate the performance of the identified assessment approaches and the advantages and drawbacks are discussed. All plots and experiments are implemented and explained in a Jupyter notebook that is available for download.

Jupyter Notebook