Mahout is a Java API provided by “Apache” for implementing Machine Learning algorithms. At present this API provides implementations for classification ,clustering and recommendations.Recommendations are used for computing user personalization
- Recommendations on Amazon.
- Videos recommendations on YouTube.
- Movies recommendations on NetFlix.
Mahout uses Collabrative Filtering approach for making recommendations. (http://en.wikipedia.org/wiki/Collaborative_filtering). Collaborative filtering produces recommendations based on user preferences for items and does not require knowledge of the specific properties of the items. In contrast, content-based recommendation produces recommendations based off of intimate knowledge of the properties of items. This implies, of course, that content-based recommendation engines are domain-specific, whereas Mahout’s collaborative filtering approach can work in any domain provided it has sufficient user-item preference data to work with.
Mahout utilizes the fact that if two users have same views of different items ,then their taste are similar and preferences of one user can act as recommendations for other.
“” if two users,u and w, like the movies Titanic and Notebook and don’t like documentaries such as “World without oil” then in case u also likes Pretty Woman there are high chances w would also be interested in the same”
This approach is used to make recommendations and is termed as User Based Recommendation. Main advantage lies in the fact that this doesn’t requires attributes of item to be known for making recommendations but just user preferences.
for every other user w
compute a similarity s between u and w
retain the top users, ranked by similarity, as a neighborhood n
for every item i that some user in n has a preference for, but that u has no preference for yet
for every other user v in n that has a preference for i
compute a similarity s between u and v
incorporate v’s preference for i, weighted by s, into a running average
Item based – item recommendation based on finding similar items .Instead of finding similar users it finds similar items. By default it uses user rating or their association with different item ,thus the main concept is similar to user based .However,we can customize it to form similarity based on domain knowledge .
for every item i that u has no preference for yet
for every item j that u has a preference for
compute a similarity s between i and j
add u’s preference for j, weighted by s, to a running average
return the top items, ranked by weighted average
Steps for making a recommendation engine using Mahout
1) Data Transformation
Collect data and transform it to a form userId,ItemId,rating .Value of rating is optional .Mahout can get this data from database,file or program .Using file is better in the sense it performs efficiently as it avoids computation for marshalling,serializing data.
2) Prepare Model
Experiment with various approaches of finding distance measures and neighborhood to find the best for given problem.You also need to experiment by changing parameters to different approach and evaluate their score to find best model which gives minimum error
This step would result in algo that provides the best recommendations.
3) Add customization
4) Integrate it with web application
Mahout provides servlet and web service which takes userId and noOfRecommendations as parameters .We can integrate our recommender with the mahout – src distribution and deploy it on tomcat.
Recommender component of Mahout which we discussed above can handle upto 10M of data preferences. But when we have billions of preferences and heap requirements of 32 GB ,we need a distributed recommender system.Mahout provides the solution by integrating with Hadoop which uses Map Reduce functionality to distribute the task over multiple machines in a cluster
“Mahout In Action” By Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman