Fairness-Aware Instrumentation of Preprocessing Pipelines for Machine Learning

Ke Yang, Biao Huang, Julia Stoyanovich, Sebastian Schelter

Abstract

Surfacing and mitigating bias in ML pipelines is a complex topic, with a dire need to provide system-level support to data scientists. Humans should be empowered to debug these pipelines, in order to control for bias and to improve data quality and representativeness. We propose fair-DAGs, an open-source library that extracts directed acyclic graph (DAG) representations of the data flow in preprocessing pipelines for ML. The library subsequently instruments the pipelines with tracing and visualization code to capture changes in data distributions and identify distortions with respect to protected group membership as the data travels through the pipeline. We illustrate the utility of fair-DAGs with experiments on publicly available ML pipelines.

Type

Conference paper

Publication

Human-In-the-Loop Data Analytics workshop at ACM SIGMOD

Date

April, 2020

Links

PDF