Publications

Filter by type:

. Deequ - Data Quality Validation for Machine Learning Pipelines. Machine Learning Systems workshop at the conference on Neural Information Processing Systems (NeurIPS), 2018.

PDF

. Deep Learning for Missing Value Imputation in Tables with Non-Numerical Data. ACM Conference on Information and Knowledge Management (CIKM), 2018.

PDF

. Benchmarking Distributed Data Processing Systems for Machine Learning Workloads. TPC Technology Conference on Performance Evaluation & Benchmarking (TPCTC), 2018.

PDF

. BlockJoin: Efficient Matrix Partitioning Through Joins. International Conference on Very Large Databases (VLDB), 2018.

PDF

. Automating Large-Scale Data Quality Verification. International Conference on Very Large Databases (VLDB), 2018.

PDF

. On the Ubiquity of Web Tracking: Insights from a Billion-Page Web Crawl. Journal of Web Science (JWS), 2018.

PDF

. Declarative Metadata Management: A Missing Piece in End-to-End Machine Learning. SysML Conference (extended abstract), 2018.

PDF

. Automatically Tracking Metadata and Provenance of Machine Learning Experiments. Machine Learning Systems workshop at the conference on Neural Information Processing Systems (NIPS), 2017.

PDF

. Dark Germany: Hidden Patterns of Participation in Online Far-Right Protests Against Refugee Housing. International Conference on Social Informatics (SocInfo), 2017.

PDF

. Probabilistic Demand Forecasting at Scale. International Conference on Very Large Databases (VLDB), 2017.

PDF

. Dark Germany: Hidden Patterns of Participation in Online Far-Right Protests Against Refugee Housing. ACM Web Science Conference (WebSci), 2017.

PDF

. Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems. Fachtagung für Business, Technologie und Web (BTW), 2017.

PDF

. Structural Patterns in the Rise of Germany’s New Right on Facebook. Data Mining in Politics workshop at the International Conference on Data Mining (ICDM), 2016.

PDF

. Samsara: Declarative Machine Learning on Distributed Dataflow Systems. Machine Learning Systems workshop at the conference on Neural Information Processing Systems (NIPS), 2016.

PDF

. Doubly stochastic large scale kernel learning with the empirical kernel map. arxiv, 2016.

PDF

. Predicting Political Party Affiliation from Text. International Conference on the Advances in Computational Analysis of Political Text (PolText), 2016.

PDF

. Tracking The Trackers: A Large-Scale Analysis of Embedded Web Trackers. AAAI International Conference on Web and Social Media (ICWSM), 2016.

PDF

. Scaling Data Mining in Massively Parallel Dataflow Systems. Technische Universität Berlin, 2015.

PDF

. Optimistic Recovery for Iterative Dataflows in Action. ACM SIGMOD (demo), 2015.

PDF

. Efficient Sample Generation for Scalable Meta Learning. IEEE International Conference on Data Engineering (ICDE), 2015.

PDF

. Factorbird - a Parameter Server Approach to Distributed Matrix Factorization. Distributed Machine Learning and Matrix Computations workshop at the conference on Neural Information Processing Systems (NIPS), 2014.

PDF

. The Stratosphere platform for big data analytics. The VLDB Journal — The International Journal on Very Large Data Bases, 2014.

PDF

. Scaling Data Mining in Massively Parallel Dataflow Systems. PhD Symposium at ACM SIGMOD, 2014.

PDF

. 'All Roads Lead to Rome:' Optimistic Recovery for Distributed Iterative Data Processing. ACM Conference on Information and Knowledge Management (CIKM), 2013.

PDF

. Distributed Matrix Factorization with MapReduce using a series of Broadcast-Joins. ACM Conference on Recommender Systems (RecSys), 2013.

PDF

. Iterative Parallel Data Processing with Stratosphere: An Inside Look. ACM SIGMOD (demo), 2013.

PDF

. Collaborative Filtering with Apache Mahout. Recommender Systems Challenge Workshop in conjunction with ACM RecSys, 2012.

PDF

. Scalable Similarity-Based Neighborhood Methods with MapReduce. ACM Conference on Recommender Systems (RecSys), 2012.

PDF