Efficient Incremental Cooccurrence Analysis for Item-Based Collaborative Filtering


Recommender systems are ubiquitous in the modern internet, where they help users find items they might like. A widely-deployed recommendation approach is item-based collaborative filtering. This approach internally relies on analyzing large item cooccurrence matrices that denote how many users interacted with a pair of items. The potentially quadratic number of items to compare poses a scalability bottleneck in analyzing such item cooccurrences. Additionally, this problem intensifies in real-world use cases with incrementally growing datasets, especially when the recommendation model is regularly re-computed from scratch. We highlight the connection between the growing cost of item-based recommendation and densification processes in common interaction datasets. Based on our findings, we propose an efficient incremental algorithm for item-based collaborative filtering based on cooccurrence analysis. This approach restricts the number of interactions to consider from ‘power users’ and ‘ubiquitous items’ to guarantee a provably constant amount of work per user-item interaction to process. We discuss efficient implementations of our algorithm on a single machine as well as on a distributed stream processing engine, and present an experimental evaluation. Our results confirm the asymptotic benefits of the incremental approach. Furthermore, we find that our implementation is an order of magnitude faster than existing open source recommender libraries on many datasets, and at the same time scales to high dimensional datasets which these existing recommenders fail to process.

International Conference on Scientific and Statistical Database Management (SSDBM)