RowSimilarityJob on Steroids

Mahout,Recommender Systems 4 August 2011 0 Comments

I’m currently working on improving RowSimilarityJob, one of my first contributions to Mahout. It’s a Map/Reduce job to compute the pairwise similarities between the row vectors of a sparse matrix. While this is a problem with quadratic worst case runtime, one can achieve linear scalability when certain sparsity constraints of the matrix are fulfilled and appropriate downsampling is used.

This is part of my work for the ROBUST research project where this algorithm can be used to find near-duplicates of user posts in forums or to predict missing links in social graphs.

Here’s a picture of my current approach, more details to come:

Leave a Reply

You must be logged in to post a comment.