I am currently a Moore-Sloan Data Science Fellow with the Center for Data Science at New York University. My research focuses on the intersection of data management and machine learning, with the interdisciplinary application to computational social science and web science.
My work covers a wide spectrum. I enjoy tackling technical challenges in data management, such as automating data quality verification, optimizing programs that combine operations from linear and relational algebra or tracking the lineage of machine learning pipelines.
At the same time, I am convinced that modern data analysis can have a huge impact in pressing societal questions. I am currently building a platform that recommends political news with diverse viewpoints and thereby helps us to break out of our filter bubbles. In the past, I have analyzed the rise of the German far-right in social media, and cartographed the online tracking sphere from a webcrawl of several billion pages.
Our poster paper on Deequ - Data Quality Validation for Machine Learning Pipelines has been accepted at the workshop on “Machine Learning Systems” at NeurIPS 2018.
Our proposal for the 3rd workshop on “Data Management for End-to-End Machine Learning (DEEM)” has been accepted at SIGMOD 2019.
Our paper on “Deep” Learning for Missing Value Imputation in Tables with Non-Numerical Data has been accepted for publication in the case study track of CIKM 2018.
Before joining New York University, I have been a Senior Applied Scientist at Amazon Core AI in Berlin, where I worked on data management-related issues of machine learning applications, such as demand forecasting, metadata and provenance tracking of machine learning pipelines and automating data quality verification.
I received my Ph.D. from TU Berlin in 2015, where I have been advised by Volker Markl, head of the database systems and information management group. My co-supervisors were Klaus-Robert Müller from the machine learning group at TU Berlin and Reza Zadeh from Stanford.
I am engaged in open source as an elected member of the Apache Software Foundation, where I currently mentor the Apache MXNet project on behalf of the Apache Incubator. In the past, I have been involved in the Apache Mahout, Apache Flink and Apache Giraph projects.
I am the originator and chair of the workshop series on Data Management for End-To-End Machine Learning (DEEM) at ACM SIGMOD, which started in 2017.
I regularly review submissions to top tier data management conferences. I have been on the program committee at SIGMOD 2019, ICDE 2019 (demo track), ICDE 2018, SIGMOD 2017, EDBT 2017 and the Large-Scale Recommender Systems workshop at the ACM RecSys 2013-2015. Additionally, I have reviewed submissions to journals for IEEE TKDE, ACM TIST, IEEE TPDS, the journal track of ECML/PKDD and the open source track of JMLR. I have also been a reviewer for the Amazon Research Awards.