Learning to Validate the Predictions of Black Box Classifiers on Unseen Data


Machine Learning (ML) models are difficult to maintain in production settings. In particular, deviations of the unseen serving data (for which we want to compute predictions) from the source data (on which the model was trained) pose a central challenge, especially when model training and prediction are outsourced via cloud services. Errors or shifts in the serving data can negatively affect the predictive quality of a model, but are hard to detect, especially for non-ML experts such as software engineers operating ML deployments. We propose a simple approach to automate the validation of pretrained ML models by estimating the model’s predictive performance on unseen, unlabeled serving data. In contrast to existing work, we (i) do not require explicit distributional assumptions on the dataset shift between the source and serving data, (ii) do not require access to the ground truth labels for serving data, and (iii) do not force the user to define distance functions and thresholds between source and serving data. Instead, a domain expert specifies typical cases of dataset shift and data errors that are observed in real world data. We use this information to learn a ‘performance predictor’ for a pretrained black~box model that automatically raises alarms when it detects performance drops on unseen, unlabeled serving data. We experimentally evaluate our approach on various datasets, models and error type, including cloud and AutoML use cases. We find that it reliably predicts the performance of black box models in the majority of cases, and outperforms several baselines in the performance prediction task, even in the presence of unspecified data errors.