I am currently a Faculty Fellow with the Center for Data Science at New York University. My research focuses on the intersection of data management and machine learning.
My work covers a wide spectrum, such as automating data quality verification, optimizing programs that combine operations from linear and relational algebra or tracking the lineage of machine learning pipelines.
I am active in open source as an elected member of the Apache Software Foundation, and have extensive experience in building real world systems, from my time at Amazon Research, Twitter, IBM Research and Zalando.
Moreover, I am convinced that modern data analysis can have a huge impact in answering pressing societal questions. In related research, I have analyzed the rise of the German far-right in social media, and cartographed the online tracking sphere from a webcrawl of several billion pages.
I will return to Europe to join University of Amsterdam as a tenure-track Assistant Professor in April, working with the Information and Language Processing Systems group and the Intelligent Data Engineering Lab. I will additionally be appointed as Director of Engineering at Ahold Delhaize, an international retailer based in the Netherlands, and manage the AI for Retail Lab Amsterdam. This new type of joint position is part of the KickstartAI iniative.
SIGMOD’20: We have a paper on Learning to Validate the Predictions of Black Box Classifiers on Unseen Data accepted, as well as a short industry paper on Elastic Machine Learning Algorithms in Amazon SageMaker. Moreover, I am co-chairing the 4th edition of the International Workshop on Data Management for End-to-End Machine Learning (DEEM) with Steven Whang and Julia Stoyanovich.
CIDR’20: I presented our work on “Amnesia” - A Selection of Machine Learning Models That Can Forget User Data Very Fast.
EDBT’20: We have a short paper on FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions and a full paper on Zooming Out on an Evolving Graph accepted.
I am hosting Ji Zhang from Huazhong University of Science and Technology (HUST) as a visiting Ph.D. student at NYU. We conduct research on machine learning for data-intensive systems.
I am collaborating with Prof. Julia Stoyanovich from New York University on research with the regard to the impact of data preprocessing on the fairness of machine-assisted decision making.
I am co-supervising Sergey Redyuk who is a Ph.D. student at Technische Universität Berlin. We conduct research on novel systems for reproducibility and automated documentation of data science experiments.
I am conducting research on data validation and data cleaning for machine learning with Prof. Felix Biessmann from Beuth University, Berlin.
I am consulting Amazon AI as a part-time Senior Applied Scientist, and work on open source software for large-scale data quality verification with a team from Berlin.
I regularly discuss my research on data quality and model validation with Immuta, a company building a data management platform for data science.
Before joining New York University, I have been a Senior Applied Scientist at Amazon Core AI in Berlin, where I worked on data management-related issues of machine learning applications, such as demand forecasting, metadata and provenance tracking of machine learning pipelines and automating data quality verification.
I received my Ph.D. from TU Berlin in 2015, where I have been advised by Volker Markl, head of the database systems and information management group. My co-supervisors were Klaus-Robert Müller from the machine learning group at TU Berlin and Reza Zadeh from Stanford.
I am engaged in open source as an elected member of the Apache Software Foundation, where I currently mentor the Apache TVM project on behalf of the Apache Incubator. In the past, I have been involved in the Apache Mahout, Apache Flink, Apache Giraph and Apache MXNet projects.
I am the founder and chair of the workshop series on Data Management for End-To-End Machine Learning (DEEM) at ACM SIGMOD, which started in 2017.
I regularly review submissions to top tier data management conferences. I have been on the program committee at SIGMOD 2017, 2019 & 2020, VLDB 2021, ICDE 2018, 2019 & 2020, EDBT 2017, the workshop on Exploiting Artificial Intelligence Techniques for Data Management at SIGMOD 2019 and the Large-Scale Recommender Systems workshop at the ACM RecSys 2013-2015. Additionally, I have reviewed submissions to journals for IEEE TKDE, ACM TIST, IEEE TPDS, IEEE TNNLS, VLDB Journal, the journal track of ECML/PKDD and the open source track of JMLR. I have also been a reviewer for the Amazon Research Awards.