<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>JTeam Blog &#187; Collaborative Filtering</title>
	<atom:link href="http://blog.jteam.nl/tag/collaborative_filtering/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.jteam.nl</link>
	<description>Keep updated on what we&#039;re doing!</description>
	<lastBuildDate>Wed, 01 Sep 2010 17:48:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Mahout – Taste :: Part Three – Estimators</title>
		<link>http://blog.jteam.nl/2010/07/08/mahout-%e2%80%93-taste-part-three-%e2%80%93-estimators/</link>
		<comments>http://blog.jteam.nl/2010/07/08/mahout-%e2%80%93-taste-part-three-%e2%80%93-estimators/#comments</comments>
		<pubDate>Thu, 08 Jul 2010 15:46:22 +0000</pubDate>
		<dc:creator>Frank Scholten</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Collaborative Filtering]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Recommendations]]></category>
		<category><![CDATA[Taste]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=2317</guid>
		<description><![CDATA[In Taste, estimators are the bridge between the generic item- or user recommendation logic and the specific similarity algorithm. Estimators are mainly used as part of the recommendation process, however, they are also used for evaluating recommenders. Additionally, the &#8216;recommended because&#8217; feature is also powered by an estimator. This blog covers some Taste internals and [...]]]></description>
			<content:encoded><![CDATA[<p>In Taste, estimators are the bridge between the generic item- or user recommendation logic and the specific similarity algorithm. Estimators are mainly used as part of the recommendation process, however, they are also used for evaluating recommenders. Additionally, the &#8216;recommended because&#8217; feature is also powered by an estimator. This blog covers some Taste internals and shows you how estimators are used within Taste via a few code samples.</p>
<p><span id="more-2317"></span></p>
<h3><strong>Estimators for recommendations</strong></h3>
<p>Let&#8217;s start with the main usage of estimators: providing recommendations. Suppose we create a <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span>, provide it with a <span style="font-family: 'courier new'; font-size: medium;">DataModel</span> and one of Taste&#8217;s <span style="font-family: 'courier new'; font-size: medium;">ItemSimilarity</span> implementations.</p>
<p>To fetch a few recommendations we call <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender.mostSimilarItems(long itemID, int howMany)</span>, as shown in the snippet below:</p>
<pre class="brush: java;">
  @Override
  public List&lt;RecommendedItem&gt; mostSimilarItems(long itemID, int howMany) throws TasteException {
    return mostSimilarItems(itemID, howMany, null);
  }

  @Override
  public List&lt;RecommendedItem&gt; mostSimilarItems(long itemID, int howMany,
                                                Rescorer&lt;LongPair&gt; rescorer) throws TasteException {
    TopItems.Estimator&lt;Long&gt; estimator = new MostSimilarEstimator(itemID, similarity, rescorer);
    return doMostSimilarItems(new long[] {itemID}, howMany, estimator);
  }
</pre>
<p>After delegating the method call to a more generic <span style="font-family: 'courier new'; font-size: medium;">mostSimilarItems</span> method, a <span style="font-family: 'courier new'; font-size: medium;">MostSimilarEstimator</span> is constructed and passed to the protected method <span style="font-family: 'courier new'; font-size: medium;">doMostSimilarItems</span>. The whole process of estimating and recommending is implemented via an estimator and algorithm specific logic within a recommender.</p>
<p>Now let&#8217;s zoom in on the <span style="font-family: 'courier new'; font-size: medium;">doMostSimilarItems</span> method. See the snippet below:</p>
<pre class="brush: java;">
  private List&lt;RecommendedItem&gt; doMostSimilarItems(long[] itemIDs,
                                                   int howMany,
                                                   TopItems.Estimator&lt;Long&gt; estimator) throws TasteException {
    DataModel model = getDataModel();
    FastIDSet possibleItemsIDs = new FastIDSet();
    for (long itemID : itemIDs) {
      PreferenceArray prefs = model.getPreferencesForItem(itemID);
      int size = prefs.length();
      for (int i = 0; i &lt; size; i++) {
        long userID = prefs.get(i).getUserID();
        possibleItemsIDs.addAll(model.getItemIDsFromUser(userID));
      }
    }
    possibleItemsIDs.removeAll(itemIDs);
    return TopItems.getTopItems(howMany, possibleItemsIDs.iterator(), null, estimator);
  }
</pre>
<p>The snippet above describes the core logic for item-based recommendation. This process consists of three steps:</p>
<ol>
<li>Fetch all preferences for the given item(s)</li>
<li>For each preference get the corresponding user and fetch all their other preferences</li>
<li>From this set of preferences, minus the given item, get the corresponding items and determine the top items based on the given estimator</li>
</ol>
<p>The <span style="font-family: 'courier new'; font-size: medium;">TopItems</span> is a helper class for fetching the top ranked items of a set of items for a given estimator.</p>
<p>Now on to the estimator. All estimators implement <span style="font-family: 'courier new'; font-size: medium;">TopItems.Estimator&lt;T&gt;</span> interface which is really simple. It returns an estimate for a &#8216;thing&#8217; as a double.</p>
<pre class="brush: java;">
  public interface Estimator&lt;T&gt; {
    double estimate(T thing) throws TasteException;
  }
</pre>
<p>Now on to the <span style="font-family: 'courier new'; font-size: medium;">MostSimilarEstimator<span>:</span></span></p>
<pre class="brush: java;">
  public static class MostSimilarEstimator implements TopItems.Estimator&lt;Long&gt; {

    private final long toItemID;
    private final ItemSimilarity similarity;
    private final Rescorer&lt;LongPair&gt; rescorer;

    public MostSimilarEstimator(long toItemID, ItemSimilarity similarity, Rescorer&lt;LongPair&gt; rescorer) {
      this.toItemID = toItemID;
      this.similarity = similarity;
      this.rescorer = rescorer;
    }

    @Override
    public double estimate(Long itemID) throws TasteException {
      LongPair pair = new LongPair(toItemID, itemID);
      if ((rescorer != null) &amp;&amp; rescorer.isFiltered(pair)) {
        return Double.NaN;
      }
      double originalEstimate = similarity.itemSimilarity(toItemID, itemID);
      return rescorer == null ? originalEstimate : rescorer.rescore(pair, originalEstimate);
    }
  }
</pre>
<p>This estimator does three things:</p>
<ol>
<li>Use the <span style="font-family: 'courier new'; font-size: medium;">Rescorer</span> to filter items. <span style="font-family: 'courier new'; font-size: medium;">Rescorers</span> can be used to create domain specific filtering of items</li>
<li>Use the <span style="font-family: 'courier new'; font-size: medium;">ItemSimilarity</span> to calculate the preference of a user for the given item</li>
<li>Optionally boost the similarity value with the <span style="font-family: 'courier new'; font-size: medium;">Rescorer</span></li>
</ol>
<p>This setup allows you to plugin arbitrary <span style="font-family: 'courier new'; font-size: medium;">ItemSimilarity</span> algorithms in the recommender.</p>
<h3><strong>Recommended because&#8230;</strong></h3>
<p>Another interesting feature of the <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span> is the &#8216;Recommended because&#8217; feature. With this feature you can determine <em>why</em> a certain item was recommended to you, i.e. <em>which of your preferences were largely responsible for giving you this recommendation</em>.</p>
<p>To use this feature call <span style="font-family: 'courier new'; font-size: medium;">recommendedBecause(long userID, long itemID, int howMany)</span>, see snippet below:</p>
<pre class="brush: java;">
  @Override
  public List&lt;RecommendedItem&gt; recommendedBecause(long userID, long itemID, int howMany) throws TasteException {
    if (howMany &lt; 1) {
      throw new IllegalArgumentException(&quot;howMany must be at least 1&quot;);
    }

    DataModel model = getDataModel();
    TopItems.Estimator&lt;Long&gt; estimator = new RecommendedBecauseEstimator(userID, itemID, similarity);

    PreferenceArray prefs = model.getPreferencesFromUser(userID);
    int size = prefs.length();
    FastIDSet allUserItems = new FastIDSet(size);
    for (int i = 0; i &lt; size; i++) {
      allUserItems.add(prefs.getItemID(i));
    }
    allUserItems.remove(itemID);

    return TopItems.getTopItems(howMany, allUserItems.iterator(), null, estimator);
  }
</pre>
<p>It takes all items the given user has a preferences for, minus the given item and passes this to <span style="font-family: 'courier new'; font-size: medium;">TopItems</span>, along the with <span style="font-family: 'courier new'; font-size: medium;">RecommendedBecauseEstimator</span>, see the code below:</p>
<pre class="brush: java;">
  private class RecommendedBecauseEstimator implements TopItems.Estimator&lt;Long&gt; {

    private final long userID;
    private final long recommendedItemID;
    private final ItemSimilarity similarity;

    private RecommendedBecauseEstimator(long userID, long recommendedItemID, ItemSimilarity similarity) {
      this.userID = userID;
      this.recommendedItemID = recommendedItemID;
      this.similarity = similarity;
    }

    @Override
    public double estimate(Long itemID) throws TasteException {
      Float pref = getDataModel().getPreferenceValue(userID, itemID);
      if (pref == null) {
        return Float.NaN;
      }
      double similarityValue = similarity.itemSimilarity(recommendedItemID, itemID);
      return (1.0 + similarityValue) * pref;
    }
  }

}
</pre>
<p>This <span style="font-family: 'courier new'; font-size: medium;">RecommendedBecauseEstimator</span> determines the ranking by multiplying the preference value of the user by the item similarity value of the current item pair. After this process the top ranked items are those items that were most important in causing a recommendation of the given item.</p>
<h3><strong>Conclusions</strong></h3>
<p>This concludes the overview of some Taste internals and has hopefully given you a clearer picture on how recommendations and estimators work inside Taste. In future posts I will probably expand on this topic, especially within the context the evaluation of recommenders. If you have any questions regarding Taste in general or this topic of estimators feel free to leave a comment.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/07/08/mahout-%e2%80%93-taste-part-three-%e2%80%93-estimators/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Mahout &#8211; Taste at Lucene Eurocon and Berlin Buzzwords</title>
		<link>http://blog.jteam.nl/2010/07/01/mahout-taste-at-lucene-eurocon-and-berlin-buzzwords/</link>
		<comments>http://blog.jteam.nl/2010/07/01/mahout-taste-at-lucene-eurocon-and-berlin-buzzwords/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 08:40:51 +0000</pubDate>
		<dc:creator>Frank Scholten</dc:creator>
				<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Collaborative Filtering]]></category>
		<category><![CDATA[Conference]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Recommendations]]></category>
		<category><![CDATA[Taste]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=2308</guid>
		<description><![CDATA[A little while ago, I was delighted to present two introductory Mahout &#8211; Taste talks, at Lucene Eurocon and Berlin Buzzwords. I received quite a lot of good feedback about the presentations and have been asked by a few attendees to post them. If you&#8217;re one of those attendees or you missed the presentation, you [...]]]></description>
			<content:encoded><![CDATA[<p>A little while ago, I was delighted to present two introductory Mahout &#8211; Taste talks, at <a href="http://lucene-eurocon.org/">Lucene Eurocon</a> and <a href="http://www.berlinbuzzwords.de/">Berlin Buzzwords</a>. I received quite a lot of good feedback about the presentations and have been asked by a few attendees to post them.</p>
<p>If you&#8217;re one of those attendees or you missed the presentation, you can download the slides here:</p>
<ul>
<li><a href="http://lucene-eurocon.org/slides/Introduction-To-Collaborative-Filtering-Using-Mahout_Frank-Scholten.pdf">Lucene Eurocon Slides</a></li>
<li><a href="http://blog.jteam.nl/wp-content/uploads/2010/07/scholten_bbuzz2010.pdf">Berlin Buzzwords Slides</a></li>
</ul>
<p>At Lucene Eurocon, the first European conference on Lucene and Solr there were interesting presentations, ranging from practical relevance to language analysis. For me it was fun to give a practical presentation about recommendations as a complementary feature to search applications. I hope you find the presentation useful if you&#8217;re trying to work out how to build a recommender &#8211; I used the movielens dataset as an example in the presentation and based the code on my earlier <a href="http://blog.jteam.nl/2010/04/15/mahout-taste-part-two-getting-started/">&#8216;getting started&#8217;</a> blog.</p>
<p>I also really enjoyed doing the Berlin Buzzwords presentation and meeting up with people from the Mahout community and other attendees. This conference focused mainly on NoSQL, scalability and Hadoop. However, from my talks with people there I sense that there&#8217;s growing interest in Mahout. You should find the presentation useful if you want to know more about different algorithms and how to evaluate them. I will blog about this topic in more detail soon.</p>
<p>Until then, I&#8217;d love to hear some feedback on what you think of the presentations!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/07/01/mahout-taste-at-lucene-eurocon-and-berlin-buzzwords/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mahout &#8211; Taste :: Part Two &#8211; Getting started</title>
		<link>http://blog.jteam.nl/2010/04/15/mahout-taste-part-two-getting-started/</link>
		<comments>http://blog.jteam.nl/2010/04/15/mahout-taste-part-two-getting-started/#comments</comments>
		<pubDate>Thu, 15 Apr 2010 09:08:49 +0000</pubDate>
		<dc:creator>Frank Scholten</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Collaborative Filtering]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Recommendations]]></category>
		<category><![CDATA[Taste]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=1842</guid>
		<description><![CDATA[This blog is a &#8216;getting started&#8217; article and shows you how to build a simple web-based movie recommender with Mahout / Taste, Wicket and the Movielens dataset from Grouplens research group at the University of Minnesota. I will discuss which components you need, how to wire them up in Spring, and how to create a [...]]]></description>
			<content:encoded><![CDATA[<div>This blog is a &#8216;getting started&#8217; article and shows you how to build a simple web-based movie recommender with Mahout / Taste, Wicket and the Movielens dataset from Grouplens research group at the University of Minnesota. I will discuss which components you need, how to wire them up in Spring, and how to create a Wicket frontend for displaying movies and their recommendations. Along the way I give some tips and pointers about developing a recommender. Additionally I show the <span style="font-family: 'courier new'; font-size: medium;">ResourceDataModel</span>, a Mahout <span style="font-family: 'courier new'; font-size: medium;">DataModel</span> implementation which reads preferences from a Spring <span style="font-family: 'courier new'; font-size: medium;">Resource</span>.<br />
<span id="more-1842"></span><strong> </strong></p>
<h3><strong>Online movie store</strong></h3>
<p>Our running example for this post is an online DVD shop in which you can view and rent movies. Visitors go to a movie&#8217;s page to check out the plot description and will be shown a list of similar movies. These are other movies that were watched by people who also rented that specific movie. Because of space- and time-constraints, this application only provides a view on the dataset and the recommended movies. Other features expected of an online movie rental service, such as registration and payment are left out.</p>
<h3><strong>Movielens Dataset</strong></h3>
<p>The movies and their ratings originate from the <a href="http://www.grouplens.org/node/73">Movielens dataset</a> of the <a href="http://www.grouplens.org/">Grouplens research group</a> from the University of Minnesota. There are datasets contains 100.000, 1 million and 10 million ratings. Note that these ratings are <em>explicit</em>, ranging from 1 to 5. This is different from the example in <a id="icdy" title="my earlier blog" href="../2009/12/09/mahout-taste-part-one-introduction/">my earlier blog</a>, which featured <em>implicit</em> ratings. Implicit ratings only indicate if a user purchased or liked an item but not how much someone liked it. In this example I used the 100.000 ratings file.</p>
<h3><strong>Item-based or user-based algorithms</strong></h3>
<p>For this application we use an item-based recommender, which is a good choice performance-wise if the number of items is less than the number of users. This is likely to be the case for an online movie rental store. Item-based recommendation is different from user-based recommendation, which works by identifying a user neighbourhood of similar users and recommending items from the user neighbourhood to other users. User-based recommendations are <em>personalized</em> while with item-based recommendations each user will get <em>the same recommendations for a given item</em>. In our case, all users that visit a specific movie page will see the same recommended movies for that movie. An example of user-based recommendation is <a href="http://www.stumbleupon.com">Stumbleupon</a>, which provides personalized website recommendations. Stumbleupon requires that you to login first so that it can perform its user-based recommendation based on your profile of likes and dislikes of certain sites.</p>
<h3><strong>EuclidianDistanceSimilarity</strong></h3>
<p>Now we need to select an item-based algorithm that fits our dataset.  The <span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span> is one of Taste&#8217;s algorithms that is suitable for explicit ratings and will be used for our demonstration. Before we construct our recommender we introduce the euclidian distance similarity in more detail. The algorithm computes the euclidian distance between each item&#8217;s preference vector. The shorter the distance between these vectors, the greater the similarity. For instance, suppose we have users u1, u2 and u3 and item i1. Let&#8217;s say these preferences for these users are 2, 4 and 5 respectively. We now have a preference vector [2,4,5] for item i1. The euclidian distance between two of such vectors can now be computed and used as a measure of their similarity. The formula for computing the euclidian distance between two vectors i and j equals the root of the sum of squared differences between coordinates of a pair of vectors. See the formula below:<br />
<center><img src="http://blog.jteam.nl/wp-content/cache/tex_5b23879f5cdb88c80c1d0b55b1bbd52b.png" align="absmiddle" class="tex" alt="d_{ij}=\sqrt{\sum\limits_{k=1}^n \left(x_{ik} - x_{jk}\right)^2}" /></center></p>
<p>The <span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span> calculates this similarity for each pair of items and then returns</p>
<p><center><img src="http://blog.jteam.nl/wp-content/cache/tex_04966097412fbd5618c399029a650ff5.png" align="absmiddle" class="tex" alt="\frac{1}{1 - d_{ij}}" /></center></p>
<p>which results in a value between 0 and 1. The <span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span> can also be weighted. If you pass in the <span style="font-family: 'courier new'; font-size: medium;">Weighting.WEIGHTED</span> enum to the constructor of <span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span> then the algorithm will weight the values based on the number of users and the number of co-occurring preferences.</p>
<h3><strong>Creating a web-based recommender</strong></h3>
<p>Below is a list of components we need to create a small web-based recommendation engine, using Taste, Wicket and JPA/Hibernate. I won&#8217;t cover all the details of building this webapp, just the main building blocks. You can download the code for this example <a href="http://blog.jteam.nl/wp-content/uploads/2010/04/taste-getting-started.zip">here</a> and look at the specific details. We need the following components:</p>
<ul>
<li><span style="font-family: 'courier new'; font-size: medium;">Datamodel</span><br />
A <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span> that reads movie ids, user ids and ratings from the Movielens dataset directly into memory.</li>
<li><span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span><br />
Computes item similarities for each pair of items in the datset.</li>
<li><span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span><br />
Uses both the datamodel and the similarity algorithm to compute similar items in memory.</li>
<li><span style="font-family: 'courier new'; font-size: medium;">MovieRepository</span><br />
JPA repository for retrieving <span style="font-family: 'courier new'; font-size: medium;">Movie</span> objects</li>
<li><span style="font-family: 'courier new'; font-size: medium;">MovieService</span><br />
Uses the recommender and the <span style="font-family: 'courier new'; font-size: medium;">MovieRepository</span> to retrieve most similar movies for a given movie id.</li>
<li>Wicket <span style="font-family: 'courier new'; font-size: medium;">MoviePage</span>, HTML + CSS<br />
This includes a page for viewing a movie along with similar movies, a few model classes, some HTML and CSS, and a few code tweaks to the original <a id="hsdj" title="wicket quickstart" href="http://wicket.apache.org/quickstart.html">wicket quickstart</a> project.</li>
</ul>
<p>Note that Taste ships with a preconfigured <span style="font-family: 'courier new'; font-size: medium;">MovielensRecommender</span>. For the purpose of this article however, I wanted to show you how to build a recommender from the ground up.</p>
<h3><strong>Configuring the ResourceDataModel, EuclidianDistanceSimilarity and GenericItemBasedRecommender</strong></h3>
<p>Because of license restrictions the movielens data cannot be shipped with this demo, so you need to download it here. Place the <span style="font-family: 'courier new'; font-size: medium;">u.data</span> file under <span style="font-family: 'courier new'; font-size: medium;">src/main/resources/grouplens/100K/ratings/</span> and place the <span style="font-family: 'courier new'; font-size: medium;">u.item</span> file under <span style="font-family: 'courier new'; font-size: medium;">src/main/resources/grouplens/100K/data/</span>. Now that you have the ratings data setup you need to feed it into a <span style="font-family: 'courier new'; font-size: medium;">DataModel</span> class. You can use a <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span> for this but for this you need to use an absolute path. Instead I implement a <span style="font-family: 'courier new'; font-size: medium;">ResourceDataModel</span> which reads ratings files from the classpath. Below is the implementation of the <span style="font-family: 'courier new'; font-size: medium;">ResourceDataModel</span>, which is a wrapper around a <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span>. More on how to wire this below.</p>
<pre class="brush: java;">
package nl.jteam.mahout.gettingstarted.datamodel;

// Imports omitted.

/**
 * DataModel implementation which reads a Spring {@link org.springframework.core.io.Resource} into a
 * {@link org.apache.mahout.cf.taste.impl.model.file.FileDataModel} delegate.
 *
 * @author Frank Scholten
 */
public class ResourceDataModel implements DataModel {
    FileDataModel delegate;

    ResourceDataModel() { // For testing
    }

    /**
     * Reads the preferences from the given {@link org.springframework.core.io.Resource}
     *
     * @param resource with user IDs, items IDs and preferences
     */
    public ResourceDataModel(Resource resource) {
        try {
            this.delegate = new FileDataModel(resource.getFile());
        } catch (IOException e) {
            throw new RuntimeException(&quot;Could not read resource &quot; + resource.getDescription(), e);
        }
    }

    @Override
    public LongPrimitiveIterator getUserIDs() throws TasteException {
        return delegate.getUserIDs();
    }

    @Override
    public PreferenceArray getPreferencesFromUser(long userID) throws TasteException {
        return delegate.getPreferencesFromUser(userID);
    }

    @Override
    public FastIDSet getItemIDsFromUser(long userID) throws TasteException {
        return delegate.getItemIDsFromUser(userID);
    }

    @Override
    public LongPrimitiveIterator getItemIDs() throws TasteException {
        return delegate.getItemIDs();
    }

    @Override
    public PreferenceArray getPreferencesForItem(long itemID) throws TasteException {
        return delegate.getPreferencesForItem(itemID);
    }

    @Override
    public Float getPreferenceValue(long userID, long itemID) throws TasteException {
        return delegate.getPreferenceValue(userID, itemID);
    }

    @Override
    public int getNumItems() throws TasteException {
        return delegate.getNumItems();
    }

    @Override
    public int getNumUsers() throws TasteException {
        return delegate.getNumUsers();
    }

    @Override
    public int getNumUsersWithPreferenceFor(long... itemIDs) throws TasteException {
        return delegate.getNumUsersWithPreferenceFor(itemIDs);
    }

    @Override
    public void setPreference(long userID, long itemID, float value) throws TasteException {
        delegate.setPreference(userID, itemID, value);
    }

    @Override
    public void removePreference(long userID, long itemID) throws TasteException {
        delegate.removePreference(userID, itemID);
    }

    @Override
    public boolean hasPreferenceValues() {
        return delegate.hasPreferenceValues();
    }

    @Override
    public void refresh(Collection&lt;Refreshable&gt; alreadyRefreshed) {
        delegate.refresh(alreadyRefreshed);
    }
}
</pre>
<p>Next you need to initialize the movie database. Run the <span style="font-family: 'courier new'; font-size: medium;">initialize_movielens_db.sql</span> from the <span style="font-family: 'courier new'; font-size: medium;">src/main/resources/sql</span> folder. This will create the <span style="font-family: 'courier new'; font-size: medium;">movielens</span> database and a user with username and password <span style="font-family: 'courier new'; font-size: medium;">movielens</span>. Additionally, it creates the movie database and loads it with the movie titles from the <span style="font-family: 'courier new'; font-size: medium;">u.item</span> file.</p>
<p>Now we wire up the <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span> and its dependencies, the <span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span> and the <span style="font-family: 'courier new'; font-size: medium;">ResourceDataModel</span> in the following spring context:</p>
<pre class="brush: xml;">
 &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;beans xmlns=&quot;http://www.springframework.org/schema/beans&quot;
       xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
       xsi:schemaLocation=&quot;http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd&quot;&gt;

    &lt;!-- Recommender --&gt;

    &lt;bean id=&quot;euclidianDistanceRecommender&quot; class=&quot;org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender&quot;&gt;
        &lt;constructor-arg index=&quot;0&quot; ref=&quot;movielensDataModel100K&quot;/&gt;
        &lt;constructor-arg index=&quot;1&quot; ref=&quot;euclidianDistanceSimilarity&quot;/&gt;
    &lt;/bean&gt;

    &lt;!-- DataModel --&gt;

    &lt;bean id=&quot;movielensDataModel100K&quot; class=&quot;nl.jteam.mahout.gettingstarted.datamodel.ResourceDataModel&quot;&gt;
        &lt;constructor-arg value=&quot;classpath:/grouplens/100K/ratings/u.data&quot;/&gt;
    &lt;/bean&gt;

    &lt;!-- Similarity --&gt;

    &lt;bean id=&quot;euclidianDistanceSimilarity&quot; class=&quot;org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity&quot;&gt;
        &lt;constructor-arg ref=&quot;movielensDataModel100K&quot;/&gt;
        &lt;constructor-arg value=&quot;WEIGHTED&quot;/&gt;
    &lt;/bean&gt;

&lt;/beans&gt;
</pre>
<h3><strong>MovieRepository, MovieService &amp; MoviePage</strong></h3>
<p>Our recommender is now ready to determine similar movies for a given movie ID. However, the <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span> interface only returns movie IDs of the type <span style="font-family: 'courier new'; font-size: medium;">long</span>. In order to display the actual movie information to users we need to create a <span style="font-family: 'courier new'; font-size: medium;">MovieRepository</span> which fetches recommended <span style="font-family: 'courier new'; font-size: medium;">Movie</span> objects. Additionally, we need a <span style="font-family: 'courier new'; font-size: medium;">MovieService</span> which coordinates the <span style="font-family: 'courier new'; font-size: medium;">MovieRepository</span> and the <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span> so that we can retrieve recommended <span style="font-family: 'courier new'; font-size: medium;">Movie</span> objects for a given movie ID. Below is a snippet of a JPA implementation of a <span style="font-family: 'courier new'; font-size: medium;">MovieRepository</span>.</p>
<pre class="brush: java;">
package nl.jteam.mahout.gettingstarted.repository;

// Imports omitted.

/**
 * Repository for retrieving {@link Movie}s
 *
 * @author Frank Scholten
 */
@Repository
public class JpaMovieRepository implements MovieRepository {

    @PersistenceContext
    private EntityManager entityManager;

    /** {@inheritDoc} */
    @Override
    public Movie getMovieById(long id) {
        return entityManager.find(Movie.class, id);
    }

    /** {@inheritDoc} */
    @Override
    @SuppressWarnings(&quot;unchecked&quot;)
    public List&lt;Movie&gt; getMoviesById(List&lt;Long&gt; movieIds) {
        return (List&lt;Movie&gt;) entityManager.createQuery(&quot;SELECT m FROM movie m WHERE m.id IN (:movieIds)&quot;)
                .setParameter(&quot;movieIds&quot;, movieIds)
                .getResultList();
    }
}
</pre>
<p>Below is a code snippet of a default implementation of the <span style="font-family: 'courier new'; font-size: medium;">MovieService</span>.</p>
<pre class="brush: java;">
package nl.jteam.mahout.gettingstarted.service;

// Imports omitted.

/**
 * Service for retrieving and recommending {@link Movie}s.
 *
 * @author Frank Scholten
 */
@Transactional
@Service
public class DefaultMovieService implements MovieService {

    @Autowired
    private MovieRepository movieRepository;

    @Autowired
    private ItemBasedRecommender movieRecommender;

    public Movie getMovieById(long id) {
        return movieRepository.getMovieById(id);
    }

    @SuppressWarnings(&quot;unchecked&quot;)
    public List&lt;Movie&gt; moreLikeThis(long movieId) {
        try {
            List&lt;RecommendedItem&gt; recommendedItems = movieRecommender.mostSimilarItems(movieId, 5);

            List&lt;Long&gt; ids = new ArrayList();
            for (RecommendedItem r : recommendedItems) {
                ids.add(r.getItemID());
            }

            return movieRepository.getMoviesById(ids);

        } catch (TasteException e) {
            return (List&lt;Movie&gt;) Collections.EMPTY_LIST;
        }
    }
}
</pre>
<p>Finally, below is the snippet of the Wicket <span style="font-family: 'courier new'; font-size: medium;">MoviePage</span> which displays the current movie and similar movies fetched through specially created Wickets models.</p>
<pre class="brush: java;">
package nl.jteam.mahout.gettingstarted.web.page;

// Imports omitted.

/**
 * Page for showing a single {@link Movie} from the Movielens dataset along
 * with recommended movies i.e. 'more like this'.
 *
 * @author Frank Scholten
 */
public class MoviePage extends WebPage {

    private static final String MOVIE_ID = &quot;0&quot;;

    public MoviePage(PageParameters pageParameters) {
        final long movieId = pageParameters.getLong(MOVIE_ID, 1);

        MovieModel model = new MovieModel(movieId);
        add(new Label(&quot;title&quot;, model.getObject().getTitle()));

        PropertyListView&lt;Movie&gt; recommendedMovies = new PropertyListView&lt;Movie&gt;(&quot;moreLikeThis&quot;, new RecommendedMoviesModel(movieId)) {
            @Override
            protected void populateItem(ListItem listItem) {
                Movie movie = (Movie) listItem.getModelObject();
                PageParameters pageParameters = new PageParameters();
                pageParameters.put(MOVIE_ID, movie.getId());

                BookmarkablePageLink&lt;MoviePage&gt; movieLink = new BookmarkablePageLink&lt;MoviePage&gt;(&quot;link&quot;, MoviePage.class, pageParameters);
                listItem.add(movieLink);
                Label movieTitle = new Label(&quot;title&quot;);
                movieTitle.setRenderBodyOnly(true);
                movieLink.add(movieTitle);
            }
        };
        add(recommendedMovies);
    }
}
</pre>
<h3><strong>Running the web application</strong></h3>
<p>First you need to download the Movielens dataset and add the ratings file on the classpath under <span style="font-family: 'courier new'; font-size: medium;">grouplens/100K/ratings</span>. See the spring context above. Since this example is based on the Wicket quickstart project you can start the application via Jetty through the <span style="font-family: 'courier new'; font-size: medium;">Start</span> class and run it in your favourite IDE. Go to <span style="font-family: 'courier new'; font-size: medium;">http://localhost:9090/</span> and you can browse through movies via recommendations. Alternatively you can build the WAR and drop it into tomcat.</p>
<h3><strong>Performance tweaks</strong></h3>
<ul>
<li>If you like to experiment with the larger datasets you need to add <span style="font-family: 'courier new'; font-size: medium;">-Xmx512m</span> as a VM parameter if you want to run this application with the 10 million ratings dataset. Also, Tastes <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span> uses commas and tabs as delimiter. You may need to run the Movielens files through <span style="font-family: 'courier new'; font-size: medium;">sed</<span style="font-family: 'courier new'; font-size: medium;"> before feeding them into a <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span></li>
<li>The <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span> reads everything in memory before computation. This is way faster than using a <span style="font-family: 'courier new'; font-size: medium;">MySQLJDBCDataModel</span>, since this requires around O(n<sup>2</sup>) database queries to compute similarities between all pairs. Reading all data in memory is not always feasible however. An alternative is to precompute the similarities for the item pairs and store the results in the database and read them via the <span style="font-family: 'courier new'; font-size: medium;">MySQLJDBCItemSimilarity</span>.</li>
<li>You can also sample the dataset and/or remove noise elements to speed things up a little more</li>
</ul>
<p>These performance aspects are an interesting topic for a later blogpost. If any of you reading this has experience with these type of issues please post a comment and we&#8217;ll discuss them.</p>
<h3><strong>Conclusions</strong></h3>
<p>This concludes the getting started post on Mahout / Taste. What we didn&#8217;t cover was how to update the recommender and how to customize your recommender with boosting of items. These are all subjects for future blog posts.</p>
<h3><strong>References</strong></h3>
<ul>
<li>Getting started demo &#8211; source code</li>
<p>You can download the source code of this example <a href="http://blog.jteam.nl/wp-content/uploads/2010/04/taste-getting-started.zip">here</a>.</p>
<li>Grouplens datasets</li>
<p>The <a href="http://www.grouplens.org/">Grouplens</a> research group of the University of Minnesota have made a few <a href="http://www.grouplens.org/node/73">datasets</a> publicly available for research purposes.</p>
<li><a href="http://www.manning.com/owen/">Mahout in Action EAP</a></li>
<p>This is a great resource on Mahout and explains a lot about performance issues and details how the algorithms stack up against eachother. Also it provides a lot of examples and case studies on how to use Mahout in practice.</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/04/15/mahout-taste-part-two-getting-started/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Mahout &#8211; Taste :: Part 1 &#8211; Introduction</title>
		<link>http://blog.jteam.nl/2009/12/09/mahout-taste-part-one-introduction/</link>
		<comments>http://blog.jteam.nl/2009/12/09/mahout-taste-part-one-introduction/#comments</comments>
		<pubDate>Wed, 09 Dec 2009 10:06:36 +0000</pubDate>
		<dc:creator>Frank Scholten</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[Collaborative Filtering]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Recommendations]]></category>
		<category><![CDATA[Taste]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=324</guid>
		<description><![CDATA[This post is the first in a series on Taste, a Java framework for providing personalized recommendations. Taste is part of the larger Mahout framework, which features various scalable machine-learning algorithms. In this post I introduce you to the concepts of personalized recommendations, also known as collaborative filtering. After this introduction, Taste&#8217;s architecture and extension [...]]]></description>
			<content:encoded><![CDATA[<p>This post is the first in a series on Taste, a Java framework for providing personalized recommendations. Taste is part of the larger <a href="http://lucene.apache.org/mahout/taste.html">Mahout</a> framework, which features various scalable machine-learning algorithms. In this post I introduce you to the concepts of personalized recommendations, also known as <a href="http://en.wikipedia.org/wiki/Collaborative_filtering">collaborative filtering</a>. After this introduction, Taste&#8217;s architecture and extension points are explained. I finish this post by demonstrating and explaining the <code>TanimotoCoefficientSimilarity</code>, one of Taste&#8217;s implementations used for computing recommendations.<br />
<span id="more-324"></span></p>
<p><strong>Personalized Recommendations</strong></p>
<p>Today the web is full of services for recommending books, websites, music, applications, movies and so on. <a href="http://www.amazon.com">Amazon</a>, <a href="http://www.last.fm">Last.fm</a> and <a href="http://www.stumbleupon.com">StumbleUpon</a>, all provide these personalized recommendations for internet users. These features can be quite useful for customers and profitable for many e-commerce sites these days.</p>
<p><strong>Collaborative Filtering</strong></p>
<p>Let&#8217;s first review some basic concepts. The theory that powers all these useful websites mentioned above is the process of <a href="http://en.wikipedia.org/wiki/Collaborative_filtering">Collaborative Filtering</a>, or pattern recognition in large datasets of multiple users. These datasets can contain preferences of users for certain items. For example, Youtube members can rate a video by assigning a number of stars. The number of stars is a user&#8217;s preference value, a value from 1 to 5. Based on this collection of personal preferences and a &#8216;similarity function&#8217; you can recommend videos to users or determine similar users, users with similar taste in videos. In this case, recommending videos is an example of an &#8216;item-based recommendation&#8217; and determining users with similar tastes is an example of an &#8216;user-based recommendation&#8217;.</p>
<p><strong>Introducing Mahout &#8211; Taste</strong></p>
<p>So how can we use this theory to build recommenders? This is where Mahout &#8211; Taste comes in. <a href="http://lucene.apache.org/mahout/">Mahout</a> is a Java framework for running scalable machine learning algorithms on top of Hadoop. <a href="http://lucene.apache.org/mahout/taste">Taste</a> is a sub-framework of Mahout for building recommendation engines. Since April 4th 2008, Taste has become part of Mahout. Below is a figure with Taste&#8217;s architecture and the building blocks you need to configure a recommender.</p>
<div id="attachment_1479" class="wp-caption aligncenter" style="width: 364px"><img class="size-full wp-image-1479" title="taste-architecture" src="http://blog.jteam.nl/wp-content/uploads/2009/12/taste-architecture.png" alt="taste-architecture" width="354" height="448" /><p class="wp-caption-text">Taste architecture diagram</p></div>
<p style="text-align: center;">
<p>The main building block in Taste is the <code>Recommender</code>. The <code>Recommender</code> recommends items based on a given item or it determines users with similar tastes. It works as follows: the recommender applies a similarity function on a subset of pairs of items (or users) in the dataset. A similarity function usually returns a value between 0 and 1, with 1 representing two completely similar items and 0 completely dissimilar items. When the similarity function processes pairs in the dataset the resulting similarity values are collected and are either kept in memory or stored on the filesystem or a database. When the Java application requests a few recommendations for a given item, the <code>Recommender</code> returns the items with the highest similarity.</p>
<p>The Recommender retrieves items and users through the <code>DataModel</code> abstraction. Taste contains <code>DataModel</code> implementations for retrieving and storing your dataset through the filesystem or a database. In addition, the <code>DataModel</code> provides methods that count the total number of users, total number of items, number of users that prefer a certain item, and many more functions. Similarity functions use these numbers to compute a similarity value for pairs of items or users. This will be shown later in the example of the <code>TanimotoCoefficientSimilarity</code>, which determines a ratio based on some of these figures. You can build a recommender with Taste by adding a <code>DataModel</code> and a similarity function to a <code>Recommender</code>. You can also define your own similarity function by extending <code>UserSimilarity</code> or <code>ItemSimilarity</code> to recommend users or items, respectively.</p>
<p><strong>Tanimoto coefficent similarity</strong></p>
<p>Taste contains around a dozen similarity algorithms you can choose from to build a recommender. For this introductory post I will explain Taste&#8217;s <code>TanimotoCoefficientSimilarity</code>, a relatively straightforward similarity algorithm that is widely used in <a href="http://books.google.nl/books?id=7VofXg_d5Y8C&amp;pg=PA105&amp;lpg=PA105&amp;dq=tanimoto+coefficient++introduction&amp;source=bl&amp;ots=l9ExLXcgz5&amp;sig=IB9VD9G87lSKqcADb0GcTz1QTms&amp;hl=nl&amp;ei=4tMWS6_2D9Wd4Qb1wcDcBg&amp;sa=X&amp;oi=book_result&amp;ct=result&amp;resnum=3&amp;ved=0CBYQ6AEwAg#v=onepage&amp;q=&amp;f=false">chemo-informatics</a> for discovering similarities between molecules. Let&#8217;s illustrate the algorithm in the context of a webshop. Suppose there are 3 customers, A, B and C and 5 products, numbered 1 up to 5. Say each customer has bought a few products. For this algorithm it does not matter how many products are purchased, only which products are purchased by which customer. See the table below.</p>
<div id="attachment_1429" class="wp-caption aligncenter" style="width: 331px"><img class="size-full wp-image-1429" title="tanimoto" src="http://blog.jteam.nl/wp-content/uploads/2009/11/tanimoto1.png" alt="Customer purchases" width="321" height="102" /><p class="wp-caption-text">Customer purchases</p></div>
<p>Intuitively you may see that the similarity between two products can be expressed by some ratio of purchases of customers. To be more precise, the tanimoto coefficient is computed by the following formula:</p>
<p><center><img src="http://blog.jteam.nl/wp-content/cache/tex_b50b726af23d4e66783f753a752f4eb8.png" align="absmiddle" class="tex" alt="\frac{c}{a + b - c}" /></center></p>
<p><img src="http://blog.jteam.nl/wp-content/cache/tex_36c971674a761c6a91990f7343614843.png" align="absmiddle" class="tex" alt="c = " /> Number of customers that purchased p1 and p2<br />
<img src="http://blog.jteam.nl/wp-content/cache/tex_c880245cc43ede64b71ddd5b30f6238d.png" align="absmiddle" class="tex" alt="a = " /> Number of customers that purchased p1<br />
<img src="http://blog.jteam.nl/wp-content/cache/tex_bac4a7fe320df1d892715ede7817a9f9.png" align="absmiddle" class="tex" alt="b = " /> Number of customers that purchased p2</p>
<p>This means that if many customers have bought products, the numerator will be higher and so will be the similarity value. Alternatively, if many people have bought p1 and many have bought p2, but very few people bought both, p1 and p2 are probably dissimilar. Below is a table with calculated tanimoto coefficients for each product pair:</p>
<div id="attachment_1558" class="wp-caption aligncenter" style="width: 502px"><img class="size-full wp-image-1558" title="tanimoto_calculations" src="http://blog.jteam.nl/wp-content/uploads/2009/12/tanimoto_calculations.png" alt="tanimoto_calculations" width="492" height="102" /><p class="wp-caption-text">Tanimoto coefficients for all product pairs</p></div>
<p><strong>Demonstrating Taste</strong></p>
<p>In this section I demonstrate how to express the previous example with Taste.  If you like to try this example at home, download the <a href="http://lucene.apache.org/mahout/#17+Nov.+2009+-+Apache+Mahout+0.2+released">brand new</a> 0.2 mahout jar <a href="http://lucene.apache.org/mahout/releases.html">here</a> and add it to your classpath. If you are using maven, add the following snippet to your pom.xml</p>
<pre class="brush: xml;">
&lt;dependency&gt;
    &lt;groupId&gt;org.apache.mahout&lt;/groupId&gt;
    &lt;artifactId&gt;mahout-core&lt;/artifactId&gt;
    &lt;version&gt;0.2&lt;/version&gt;
&lt;/dependency&gt;
</pre>
<p>Below is a code snippet that shows how to build the <code>DataModel</code> and how to compute similarities with the <code>TanimotoCoefficientSimilarity</code>.  In the <code>setup()</code> method I create a <code>BooleanPreferenceArray</code> for each user. I then fill these arrays, put all of them in a <code>FastByIdMap</code> and put that in the <code>DataModel</code>. Next I create the <code>TanimotoCoefficientSimilarity</code> and pass in the <code>DataModel</code>. I then write a few tests that check whether the similarities computed by Taste return the values I expect, given the formula above.</p>
<pre class="brush: java;">
/**
 * Demonstrates TanimotoCoefficientSimilarity + recommender.
 *
 * @author Frank Scholten
 */
public class TanimotoDemo {

 private DataModel dataModel;
 private ItemSimilarity tanimoto;

 private long CUSTOMER_A = 0;
 private long CUSTOMER_B = 1;
 private long CUSTOMER_C = 2;

 private long productOne = 0;
 private long productTwo = 1;
 private long productThree = 2;
 private long productFour = 3;
 private long productFive = 4;

 @Before
 public void setup() {
   FastByIDMap&lt;PreferenceArray&gt; userIdMap = new FastByIDMap&lt;PreferenceArray&gt;();

   BooleanUserPreferenceArray customerAPrefs = new BooleanUserPreferenceArray(4);
   customerAPrefs.set(0, new BooleanPreference(CUSTOMER_A, productOne));
   customerAPrefs.set(1, new BooleanPreference(CUSTOMER_A, productTwo));
   customerAPrefs.set(2, new BooleanPreference(CUSTOMER_A, productFour));
   customerAPrefs.set(3, new BooleanPreference(CUSTOMER_A, productFive));

   BooleanUserPreferenceArray customerBPrefs = new BooleanUserPreferenceArray(3);
   customerBPrefs.set(0, new BooleanPreference(CUSTOMER_B, productTwo));
   customerBPrefs.set(1, new BooleanPreference(CUSTOMER_B, productThree));
   customerBPrefs.set(2, new BooleanPreference(CUSTOMER_B, productFive));

   BooleanUserPreferenceArray customerCPrefs = new BooleanUserPreferenceArray(2);
   customerCPrefs.set(0, new BooleanPreference(CUSTOMER_C, productOne));
   customerCPrefs.set(1, new BooleanPreference(CUSTOMER_C, productFive));

   userIdMap.put(CUSTOMER_A, customerAPrefs);
   userIdMap.put(CUSTOMER_B, customerBPrefs);
   userIdMap.put(CUSTOMER_C, customerCPrefs);

   dataModel = new GenericDataModel(userIdMap);
   tanimoto = new TanimotoCoefficientSimilarity(dataModel);
 }

 @Test
 public void testSimilarities() throws TasteException {
   assertEquals((double) 1, tanimoto.itemSimilarity(productOne, productOne), 0.01);
   assertEquals((double) 1 / 3, tanimoto.itemSimilarity(productOne, productTwo), 0.01);
   assertEquals((double) 0, tanimoto.itemSimilarity(productOne, productThree), 0.01);
   assertEquals((double) 1 / 2, tanimoto.itemSimilarity(productOne, productFour), 0.01);
   assertEquals((double) 2 / 3, tanimoto.itemSimilarity(productOne, productFive), 0.01);

   assertEquals((double) 1 / 1, tanimoto.itemSimilarity(productTwo, productTwo), 0.01);
   assertEquals((double) 1 / 2, tanimoto.itemSimilarity(productTwo, productThree), 0.01);
   assertEquals((double) 1 / 2, tanimoto.itemSimilarity(productTwo, productFour), 0.01);
   assertEquals((double) 2 / 3, tanimoto.itemSimilarity(productTwo, productFive), 0.01);

   assertEquals((double) 1, tanimoto.itemSimilarity(productThree, productThree), 0.01);
   assertEquals((double) 0, tanimoto.itemSimilarity(productThree, productFour), 0.01);
   assertEquals((double) 1 / 3, tanimoto.itemSimilarity(productThree, productFive), 0.01);

   assertEquals((double) 1, tanimoto.itemSimilarity(productFour, productFour), 0.01);
   assertEquals((double) 1 / 3, tanimoto.itemSimilarity(productFour, productFive), 0.01);

   assertEquals((double) 1, tanimoto.itemSimilarity(productFive, productFive), 0.01);
 }

 @Test
 public void testRecommendProducts() throws TasteException {
   ItemBasedRecommender recommender = new GenericItemBasedRecommender(dataModel, tanimoto);

   List&lt;RecommendedItem&gt; similarToProductThree = recommender.mostSimilarItems(productThree, 2);

   assertEquals(productTwo, similarToProductThree.get(0).getItemID());
   assertEquals(productFive, similarToProductThree.get(1).getItemID());
 }
}
</pre>
<p>I also added tests for the recommendations themselves. The <code>testRecommendProducts()</code> method uses <code>mostSimilarItems</code> to determine items similar to product 3, in terms of customer preferences. The second parameter of this method is the number of similar items to compute. The result of this method is a list of items ordered by their similarity value, descendingly. We can now predict and see that product two is most similar to product three and product five is second most similar. Note that the code above is only suitable for demonstrative purposes to demonstrate and understand the algorithm. For instance, in an actual a production ready implementation a database backed <code>DataModel</code> can be used, or the algorithm can be ran on Hadoop.</p>
<p>This concludes this introduction to Taste. There are a lot of interesting parts that I haven&#8217;t explored myself, I am learning this one piece at a time, including the algorithms and mathematics behind it.  Some things to check out: the <a href="http://lucene.apache.org/mahout/javadoc/core/org/apache/mahout/cf/taste/impl/similarity/package-summary.html">similarity Javadocs</a> for similarity algorithms. And of course there are classes to run your recommender on <a href="http://blog.jteam.nl/2009/08/04/introduction-to-hadoop/">Hadoop</a> and to evaluate the quality of your recommender with training sets. In the next few posts I will go into some of these subjects and use examples with a larger dataset, look into Hadoop and do a comparison of algorithms.</p>
<p><strong>References</strong></p>
<ul>
<li><a href="http://lucene.apache.org/mahout/taste.html">http://lucene.apache.org/mahout/taste.html</a> &#8211; Taste homepage</li>
<li><a href="http://lucene.apache.org/mahout/mailinglists.html">http://lucene.apache.org/mahout/mailinglists.html</a> &#8211; Mahout mailing list</li>
<li><a href="http://www.ibm.com/developerworks/java/library/j-mahout/">http://www.ibm.com/developerworks/java/library/j-mahout/</a> &#8211; Mahout Article by Grant Ingersoll</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2009/12/09/mahout-taste-part-one-introduction/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	</channel>
</rss>
