Towards Automated ML Model Monitoring: Measure, Improve and Quantify Data Quality

Abstract

Machine learning (ML) has become a central component in modern software applications, giving rise to many new challenges. Tremendous progress has been made in this context with respect to model serving, experiment tracking, model diagnosis and data validation. In this paper, we focus on the arising challenge of automating the operation of deployed ML applications, especially with respect to monitoring the quality of their input data. Existing approaches for this problem have not yet reached broad adoption. One reason for that is that they often require a large amount of domain knowledge, e.g., to define data unit tests and corresponding similarity metrics and thresholds for detecting data shifts. Additionally, it is very challenging to test data at early stages of a pipeline (e.g., during integration) without explicit knowledge of how the data will be processed by downstream applications. In other cases, the engineers in charge of operating a deployed ML model may not have access to the model internals, for example if they leverage a popular cloud ML service such as Google AutoML for training and inference. Integrating and automating data quality monitoring into ML applications is also difficult due to the lack of agreed upon abstractions for defining and deploying such applications. We list three approaches to tackle data quality in ML applications from recent work: (i) Measuring data quality with ``data unit tests” using the deequ library; (ii) Improving data quality with missing value imputation using the DataWig library; and (iii) Quantifying the impact of data quality issues on the predictive performance of a deployed ML model. Finally, we outline challenges and potential directions for combining these approaches and for automating their configuration in real world deployment settings.

Publication
ML Ops workshop at the Conference on Machine Learning and Systems (MLSys)
Date
Links