Declarative Metadata Management: A Missing Piece in End-to-End Machine Learning


We argue for the necessity of managing the metadata and lineage of common artifacts in machine learning (ML). We discuss a recently presented lightweight system built for this task, which accelerates users in their ML workflows, and provides a basis for comparability and repeatability of ML experiments. This system tracks the lineage of produced artifacts in ML workloads and automatically extracts metadata such as hyperparameters of models, schemas of datasets and layouts of deep neural networks. It provides a general declarative representation of common ML artifacts, is integrated with popular frameworks such as MXNet, SparkML and scikit-learn, and meets the demands of various production use cases at Amazon.

SysML Conference (extended abstract)