CF
From Chaehyun
Collaborative Filtering
- input.txt
- (user_id, item_id)
- map reduce 1 - change input format
- user_id, Vector(item1, item2...)
- user_item_list.txt
- map reduce 2 - minhash clustering
- minhash_id \t user1, user2, user3 ...
- clustering.txt
- map reduce 3 - generate recommendation
- read clustering.txt and generate map in the memory
- user1 - list(minhash1, minhash2, minhash3 ...)
- user2 - list(minhash2, minhash3...)
- mapper
- read user_item_list.txt
- emit each purchase record to each minhash cluster which a user belongs to
- reducer
- collect all records and make recommendations
- output
- user \t item1, item2 ...
- read clustering.txt and generate map in the memory
- map reduce 4 - merge recommend list, sort items based on similarity, and print them