<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>JTeam Blog &#187; Development</title>
	<atom:link href="http://blog.jteam.nl/category/development/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.jteam.nl</link>
	<description>Keep updated on what we&#039;re doing!</description>
	<lastBuildDate>Fri, 23 Jul 2010 07:32:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Connecting to FTP server with Spring Integration</title>
		<link>http://blog.jteam.nl/2010/07/21/connecting-to-ftp-server-with-spring-integration/</link>
		<comments>http://blog.jteam.nl/2010/07/21/connecting-to-ftp-server-with-spring-integration/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 13:07:50 +0000</pubDate>
		<dc:creator>Roberto van der Linden</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[FTP]]></category>
		<category><![CDATA[Spring Integration]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=2389</guid>
		<description><![CDATA[For one of our project I needed to read zip files from a FTP server and import the content in a system. In this post I will explain how I have used the Spring Integration to connect with a FTP server and retrieve Zip files. FTP Client Factory As the FTP extension for Spring Integration [...]]]></description>
			<content:encoded><![CDATA[<p>For one of our project I needed to read zip files from a FTP server and import the content in a system. In this post I will explain how I have used the Spring Integration to connect with a FTP server and retrieve Zip files.<br />
<span id="more-2389"></span></p>
<h2>FTP Client Factory</h2>
<p>As the FTP extension for Spring Integration has no official release yet, I have used the latest build which can be found at the <a href="http://www.springsource.org/extensions/se-sia" target="_blank">Spring Integration Adapters site</a>.</p>
<p>The extension provides a client factory that allows you to connect with a client. The class I have used is the <a href="https://src.springframework.org/svn/se-sia/trunk/org.springframework.integration.ftp/src/main/java/org/springframework/integration/ftp/DefaultFTPClientFactory.java" target="_blank">DefaultFTPClientFactory</a> which implements the interface <a href="https://src.springframework.org/svn/se-sia/trunk/org.springframework.integration.ftp/src/main/java/org/springframework/integration/ftp/FTPClientFactory.java" target="_blank">FTPClientFactory</a>. </p>
<p>When you have copied these files to your own project, you can configure a client by using the following code:</p>
<pre>
<pre class="brush: xml;">
    &lt;bean id=&quot;defaultClient&quot; class=&quot;nl.jteam.importer.ftp.DefaultFTPClientFactory&quot;&gt;
        &lt;property value=&quot;${ftp.remotedir}&quot; name=&quot;remoteWorkingDirectory&quot;&gt;&lt;/property&gt;
        &lt;property value=&quot;${ftp.username}&quot; name=&quot;username&quot;&gt;&lt;/property&gt;
        &lt;property value=&quot;${ftp.password}&quot; name=&quot;password&quot;&gt;&lt;/property&gt;
        &lt;property value=&quot;${ftp.port}&quot; name=&quot;port&quot;&gt;&lt;/property&gt;
        &lt;property value=&quot;${ftp.host}&quot; name=&quot;host&quot;&gt;&lt;/property&gt;
    &lt;/bean&gt;
</pre>
</pre>
<h2>Reading the files</h2>
<p>Now that the client is configured, we can read the files from the FTP server. In the method <em>getFilesFromFTPClient()</em> we get a <a href="http://commons.apache.org/net/api/org/apache/commons/net/ftp/FTPClient.html" target="_blank">FTPClient</a> by calling the <em>getClient()</em> method on the client factory. The client API provides you with the possibility to retrieve, delete, rename or store files. The API offers a lot more, but I won&#8217;t discuss all the methods here. In our case we wanted to retrieve only zip files. As the client does not provide you the functionality to retrieve files from a specific extension, you have to do it yourself by, for example, checking the extension of each file.</p>
<p>Don’t forget to close the connection with the FTP client once you are done with handling files.</p>
<p>Because we want use Spring Integration to send the files to the class that handles the zip files, we create a <a href="http://static.springsource.org/spring-integration/reference/htmlsingle/spring-integration-reference.html#overview-components" target="_blank">Message</a> with the list of zip files. </p>
<p>The (partial) code of the FTPInboundAdapter class:</p>
<pre>
<pre class="brush: java;">
    public FTPInboundAdapter(FTPClientFactory clientFactory, String localTmpDirName) throws IOException {
        Assert.notNull(localTmpDirName, &quot;The directory name to write the files to can not be null&quot;);
        this.clientFactory = clientFactory;
        localDirectory = ImportUtils.ensureTempDirExists(localTmpDirName);
    }

    public Message&lt;list&gt;&lt;file&gt;&amp;gt; getFilesFromFTPClient() {
        FTPClient client = null;
        try {
            client = clientFactory.getClient();

            List&lt;file&gt; localZipFiles = retrieveRemoteZipFiles(client);

            return MessageBuilder.withPayload(localZipFiles).build();
        } catch (IOException e) {
            throw new MessagingException(&quot;Problem occurred while trying to retrieve files.&quot;, e);
        } finally {
            closeFtpClient(client);
        }
    }

    private void closeFtpClient(FTPClient client) {
        if (client != null &amp;amp;&amp;amp; client.isConnected()) {
            try {
                client.disconnect();
            } catch (IOException e) {
                logger.warn(&quot;Error occurred when disconnection FTP client&quot;, e);
            }
        }
    }
</pre>
</pre>
<p>The configuration code for the adapter:</p>
<pre>
<pre class="brush: xml;">
    &lt;bean id=&quot;ftpInboundAdapter&quot; class=&quot;nl.jteam.importer.ftp.FTPInboundAdapter&quot;&gt;
        &lt;constructor-arg ref=&quot;defaultClient&quot; /&gt;
        &lt;constructor-arg value=&quot;someLocalDirectory&quot; /&gt;
    &lt;/bean&gt;
</pre>
</pre>
<h2>Checking the FTP directory</h2>
<p>If you want Spring Integration to check the FTP server on a regular base for new files, you can wire up an inbound channel adapter with a cron expression like this:</p>
<pre>
<pre class="brush: xml;">
    &lt;si:inbound-channel-adapter id=&quot;zipInboundChannelAdapter&quot; ref=&quot;ftpInboundAdapter&quot; method=&quot;getFilesFromFTPClient&quot; channel=&quot;zipFilesChannel&quot;&gt;
        &lt;si:poller max-messages-per-poll=&quot;1&quot;&gt;
            &lt;si:cron-trigger expression=&quot;0 0/5 * ? * *&quot; /&gt;
        &lt;/si:poller&gt;
    &lt;/si:inbound-channel-adapter&gt;
</pre>
</pre>
<p>The channel attribute of the inbound channel adapter specifies the output channel. So in our case this will be the message with the list of zip files that is put onto this channel. This channel can then be used to send the message to wherever you want.</p>
<p>As you could see, it was relatively easy to connect with a FTP server and retrieve the files. I hope this post helped you in setting up your own FTP connection with Spring Integration. All that rests us is waiting for official release of the Spring Integration Adapters.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/07/21/connecting-to-ftp-server-with-spring-integration/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Parsing HTML with Jericho</title>
		<link>http://blog.jteam.nl/2010/07/14/parsing-html-with-jericho/</link>
		<comments>http://blog.jteam.nl/2010/07/14/parsing-html-with-jericho/#comments</comments>
		<pubDate>Wed, 14 Jul 2010 11:00:59 +0000</pubDate>
		<dc:creator>Roberto van der Linden</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[jericho]]></category>
		<category><![CDATA[parsing]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=2368</guid>
		<description><![CDATA[In one of our projects I had to parse and manipulate HTML. After searching for a nice HTML parser, I ended up using the open source library Jericho HTML Parser. Jericho provides you a lot of features including text extraction from HTML markup, rendering, formatting or compacting HTML. In this post I will show you [...]]]></description>
			<content:encoded><![CDATA[<p>In one of our projects I had to parse and manipulate HTML. After searching for a nice HTML parser, I ended up using the open source library <a href="http://jericho.htmlparser.net/docs/index.html" target="_blank">Jericho HTML Parser</a>. Jericho provides you a lot of features including text extraction from HTML markup, rendering, formatting or compacting HTML. In this post I will show you a few of the features I have used.<br />
<span id="more-2368"></span></p>
<h2>Maven dependency</h2>
<p>If you use Maven, you can simply add the following dependency to use the library.</p>
<pre class="brush: xml;">
&lt;dependency&gt;
    &lt;groupid&gt;net.htmlparser.jericho&lt;/groupid&gt;
    &lt;artifactid&gt;jericho-html&lt;/artifactid&gt;
    &lt;version&gt;3.1&lt;/version&gt;
&lt;/dependency&gt;
</pre>
<h2>API</h2>
<p>I don’t want to explain all classes, but the following classes are basically the starting point of all your parsing.</p>
<ul>
<li><strong>Source</strong> – Represents a source HTML document. This is always the first step in parsing an HTML document.</li>
<li><strong>OutputDocument</strong> &#8211; Represents a modified version of an original Source document or Segment.</li>
<li><strong>Element</strong> &#8211; Represents an element  in a specific source document, which encompasses a start tag, an optional end tag and all content  in between.</li>
</ul>
<p>For a complete overview of all classes you can view the <a href="http://jericho.htmlparser.net/docs/javadoc/index.html" target="_blank">javadoc</a>.</p>
<h2>Extract all text</h2>
<p>To extract all the text from the HTML markup, all you have to do is the following:</p>
<pre class="brush: java;">
    public String extractAllText(String htmlText){
        Source source = new Source(htmlText);
        return source.getTextExtractor().toString();
    }
</pre>
<p>You define a new Source object that takes in our case a String as input. But it also accepts for example a InputStream or URL. The Source object contains a method getTextExtractor that allows you to, how surprising, extract the text. The TextExtractor class gives you a few options to configure the extraction. One of the options is that you can exclude text from a specified Element. You can also include an attribute. The value of that attribute will be included in the output.</p>
<h2>Manipulating HTML</h2>
<p>Manipulating HTML is very easy with Jericho. In the code example below I want to add an id attribute to all H2 elements to create anchor navigation. One again I create a Source document. From this Source document I create an OutputDocument.</p>
<p>The OutputDocument represents a modified version of the original Source document. With the list of all H2 elements retrieved from the Source, we now can ask for all the attributes of a single H2 element. If the attribute <em>id</em> already exists we do nothing, but if it does not we recreate the starttag with a new id attribute and all the other existing attributes from that H2 element.</p>
<p>As you can see in the example, it is relatively easy to manipulate attributes of an element. With the <a href="http://jericho.htmlparser.net/docs/javadoc/net/htmlparser/jericho/Attributes.html">Attributes</a> object you can get a List of Attribute objects that are found in the source document or in a starttag. These attributes are not modifiable. The outputDocument has a convenience method that allows us to replace the specific startTag with our newly created H2 start tag in order to add our <em>id</em> attribute.</p>
<pre class="brush: java;">
    public String addIdAttributeToH2Elements(String html) {
        Source source = new Source(html);
        OutputDocument outputDocument = new OutputDocument(source);
        List&lt;element&gt; h2Elements = source.getAllElements(&quot;h2&quot;);

        for (Element element : h2Elements) {
            StartTag startTag = element.getStartTag();
            Attributes attributes = startTag.getAttributes();
            Attribute idAttribute = attributes.get(&quot;id&quot;);

            if (idAttribute == null) {
                String elementValue = element.getTextExtractor().toString();
                String validAnchorId = AnchorUtils.getLowerCasedValidAnchorTitle(elementValue);

                StringBuilder builder = new StringBuilder();
                builder.append(&quot;&lt;h2&quot;).append(&quot; &quot;).append(&quot;id=\&quot;&quot;).append(validAnchorId).append(&quot;\&quot;&quot;);
                for (Attribute attribute : attributes) {
                    builder.append(&quot; &quot;);
                    builder.append(attribute);
                }
                builder.append(&quot;&gt;&quot;);

                outputDocument.replace(startTag, builder);
            }
        }

        return outputDocument.toString();
    }
</pre>
<h2>Remove Elements</h2>
<p>Just like me, you may want to remove a few tags from your HTML. Here is an example that shows you how you can achieve that.</p>
<pre class="brush: java;">
    private static final Set&lt;string&gt; ALLOWED_HTML_TAGS = new HashSet&lt;string&gt;(Arrays.asList(
            HTMLElementName.ABBR,
            HTMLElementName.ACRONYM,
            HTMLElementName.SPAN,
            HTMLElementName.SUB,
            HTMLElementName.SUP)
    );

    private static String removeNotAllowedTags(String htmlFragment) {
        Source source = new Source(htmlFragment);
        OutputDocument outputDocument = new OutputDocument(source);
        List&lt;element&gt; elements = source.getAllElements();

        for (Element element : elements) {
            if (!ALLOWED_HTML_TAGS.contains(element.getName())) {
                outputDocument.remove(element.getStartTag());
                if (!element.getStartTag().isSyntacticalEmptyElementTag()) {
                    outputDocument.remove(element.getEndTag());
                }
            }
        }

        return outputDocument.toString();
    }
</pre>
<p>In the example above you see that after checking if the tag is allowed, we need to remove the start and endtag. If you would remove the complete element, then you would also remove the text within these tags. The API allows you to check for elements that are empty. This can be handy to remove redundant empty elements or in my case to check if the starttag a self closing tag.</p>
<h2>Conclusion</h2>
<p>In this post I showed you how I have used Jericho, but Jericho has a lot more interesting features. On their webpage they provide more examples on how to use those features. Jericho provides a nice and clean API and makes the parsing of HTML really easy!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/07/14/parsing-html-with-jericho/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mahout – Taste :: Part Three – Estimators</title>
		<link>http://blog.jteam.nl/2010/07/08/mahout-%e2%80%93-taste-part-three-%e2%80%93-estimators/</link>
		<comments>http://blog.jteam.nl/2010/07/08/mahout-%e2%80%93-taste-part-three-%e2%80%93-estimators/#comments</comments>
		<pubDate>Thu, 08 Jul 2010 15:46:22 +0000</pubDate>
		<dc:creator>Frank Scholten</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Collaborative Filtering]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Recommendations]]></category>
		<category><![CDATA[Taste]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=2317</guid>
		<description><![CDATA[In Taste, estimators are the bridge between the generic item- or user recommendation logic and the specific similarity algorithm. Estimators are mainly used as part of the recommendation process, however, they are also used for evaluating recommenders. Additionally, the &#8216;recommended because&#8217; feature is also powered by an estimator. This blog covers some Taste internals and [...]]]></description>
			<content:encoded><![CDATA[<p>In Taste, estimators are the bridge between the generic item- or user recommendation logic and the specific similarity algorithm. Estimators are mainly used as part of the recommendation process, however, they are also used for evaluating recommenders. Additionally, the &#8216;recommended because&#8217; feature is also powered by an estimator. This blog covers some Taste internals and shows you how estimators are used within Taste via a few code samples.</p>
<p><span id="more-2317"></span></p>
<h3><strong>Estimators for recommendations</strong></h3>
<p>Let&#8217;s start with the main usage of estimators: providing recommendations. Suppose we create a <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span>, provide it with a <span style="font-family: 'courier new'; font-size: medium;">DataModel</span> and one of Taste&#8217;s <span style="font-family: 'courier new'; font-size: medium;">ItemSimilarity</span> implementations.</p>
<p>To fetch a few recommendations we call <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender.mostSimilarItems(long itemID, int howMany)</span>, as shown in the snippet below:</p>
<pre class="brush: java;">
  @Override
  public List&lt;RecommendedItem&gt; mostSimilarItems(long itemID, int howMany) throws TasteException {
    return mostSimilarItems(itemID, howMany, null);
  }

  @Override
  public List&lt;RecommendedItem&gt; mostSimilarItems(long itemID, int howMany,
                                                Rescorer&lt;LongPair&gt; rescorer) throws TasteException {
    TopItems.Estimator&lt;Long&gt; estimator = new MostSimilarEstimator(itemID, similarity, rescorer);
    return doMostSimilarItems(new long[] {itemID}, howMany, estimator);
  }
</pre>
<p>After delegating the method call to a more generic <span style="font-family: 'courier new'; font-size: medium;">mostSimilarItems</span> method, a <span style="font-family: 'courier new'; font-size: medium;">MostSimilarEstimator</span> is constructed and passed to the protected method <span style="font-family: 'courier new'; font-size: medium;">doMostSimilarItems</span>. The whole process of estimating and recommending is implemented via an estimator and algorithm specific logic within a recommender.</p>
<p>Now let&#8217;s zoom in on the <span style="font-family: 'courier new'; font-size: medium;">doMostSimilarItems</span> method. See the snippet below:</p>
<pre class="brush: java;">
  private List&lt;RecommendedItem&gt; doMostSimilarItems(long[] itemIDs,
                                                   int howMany,
                                                   TopItems.Estimator&lt;Long&gt; estimator) throws TasteException {
    DataModel model = getDataModel();
    FastIDSet possibleItemsIDs = new FastIDSet();
    for (long itemID : itemIDs) {
      PreferenceArray prefs = model.getPreferencesForItem(itemID);
      int size = prefs.length();
      for (int i = 0; i &lt; size; i++) {
        long userID = prefs.get(i).getUserID();
        possibleItemsIDs.addAll(model.getItemIDsFromUser(userID));
      }
    }
    possibleItemsIDs.removeAll(itemIDs);
    return TopItems.getTopItems(howMany, possibleItemsIDs.iterator(), null, estimator);
  }
</pre>
<p>The snippet above describes the core logic for item-based recommendation. This process consists of three steps:</p>
<ol>
<li>Fetch all preferences for the given item(s)</li>
<li>For each preference get the corresponding user and fetch all their other preferences</li>
<li>From this set of preferences, minus the given item, get the corresponding items and determine the top items based on the given estimator</li>
</ol>
<p>The <span style="font-family: 'courier new'; font-size: medium;">TopItems</span> is a helper class for fetching the top ranked items of a set of items for a given estimator.</p>
<p>Now on to the estimator. All estimators implement <span style="font-family: 'courier new'; font-size: medium;">TopItems.Estimator&lt;T&gt;</span> interface which is really simple. It returns an estimate for a &#8216;thing&#8217; as a double.</p>
<pre class="brush: java;">
  public interface Estimator&lt;T&gt; {
    double estimate(T thing) throws TasteException;
  }
</pre>
<p>Now on to the <span style="font-family: 'courier new'; font-size: medium;">MostSimilarEstimator<span>:</span></span></p>
<pre class="brush: java;">
  public static class MostSimilarEstimator implements TopItems.Estimator&lt;Long&gt; {

    private final long toItemID;
    private final ItemSimilarity similarity;
    private final Rescorer&lt;LongPair&gt; rescorer;

    public MostSimilarEstimator(long toItemID, ItemSimilarity similarity, Rescorer&lt;LongPair&gt; rescorer) {
      this.toItemID = toItemID;
      this.similarity = similarity;
      this.rescorer = rescorer;
    }

    @Override
    public double estimate(Long itemID) throws TasteException {
      LongPair pair = new LongPair(toItemID, itemID);
      if ((rescorer != null) &amp;&amp; rescorer.isFiltered(pair)) {
        return Double.NaN;
      }
      double originalEstimate = similarity.itemSimilarity(toItemID, itemID);
      return rescorer == null ? originalEstimate : rescorer.rescore(pair, originalEstimate);
    }
  }
</pre>
<p>This estimator does three things:</p>
<ol>
<li>Use the <span style="font-family: 'courier new'; font-size: medium;">Rescorer</span> to filter items. <span style="font-family: 'courier new'; font-size: medium;">Rescorers</span> can be used to create domain specific filtering of items</li>
<li>Use the <span style="font-family: 'courier new'; font-size: medium;">ItemSimilarity</span> to calculate the preference of a user for the given item</li>
<li>Optionally boost the similarity value with the <span style="font-family: 'courier new'; font-size: medium;">Rescorer</span></li>
</ol>
<p>This setup allows you to plugin arbitrary <span style="font-family: 'courier new'; font-size: medium;">ItemSimilarity</span> algorithms in the recommender.</p>
<h3><strong>Recommended because&#8230;</strong></h3>
<p>Another interesting feature of the <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span> is the &#8216;Recommended because&#8217; feature. With this feature you can determine <em>why</em> a certain item was recommended to you, i.e. <em>which of your preferences were largely responsible for giving you this recommendation</em>.</p>
<p>To use this feature call <span style="font-family: 'courier new'; font-size: medium;">recommendedBecause(long userID, long itemID, int howMany)</span>, see snippet below:</p>
<pre class="brush: java;">
  @Override
  public List&lt;RecommendedItem&gt; recommendedBecause(long userID, long itemID, int howMany) throws TasteException {
    if (howMany &lt; 1) {
      throw new IllegalArgumentException(&quot;howMany must be at least 1&quot;);
    }

    DataModel model = getDataModel();
    TopItems.Estimator&lt;Long&gt; estimator = new RecommendedBecauseEstimator(userID, itemID, similarity);

    PreferenceArray prefs = model.getPreferencesFromUser(userID);
    int size = prefs.length();
    FastIDSet allUserItems = new FastIDSet(size);
    for (int i = 0; i &lt; size; i++) {
      allUserItems.add(prefs.getItemID(i));
    }
    allUserItems.remove(itemID);

    return TopItems.getTopItems(howMany, allUserItems.iterator(), null, estimator);
  }
</pre>
<p>It takes all items the given user has a preferences for, minus the given item and passes this to <span style="font-family: 'courier new'; font-size: medium;">TopItems</span>, along the with <span style="font-family: 'courier new'; font-size: medium;">RecommendedBecauseEstimator</span>, see the code below:</p>
<pre class="brush: java;">
  private class RecommendedBecauseEstimator implements TopItems.Estimator&lt;Long&gt; {

    private final long userID;
    private final long recommendedItemID;
    private final ItemSimilarity similarity;

    private RecommendedBecauseEstimator(long userID, long recommendedItemID, ItemSimilarity similarity) {
      this.userID = userID;
      this.recommendedItemID = recommendedItemID;
      this.similarity = similarity;
    }

    @Override
    public double estimate(Long itemID) throws TasteException {
      Float pref = getDataModel().getPreferenceValue(userID, itemID);
      if (pref == null) {
        return Float.NaN;
      }
      double similarityValue = similarity.itemSimilarity(recommendedItemID, itemID);
      return (1.0 + similarityValue) * pref;
    }
  }

}
</pre>
<p>This <span style="font-family: 'courier new'; font-size: medium;">RecommendedBecauseEstimator</span> determines the ranking by multiplying the preference value of the user by the item similarity value of the current item pair. After this process the top ranked items are those items that were most important in causing a recommendation of the given item.</p>
<h3><strong>Conclusions</strong></h3>
<p>This concludes the overview of some Taste internals and has hopefully given you a clearer picture on how recommendations and estimators work inside Taste. In future posts I will probably expand on this topic, especially within the context the evaluation of recommenders. If you have any questions regarding Taste in general or this topic of estimators feel free to leave a comment.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/07/08/mahout-%e2%80%93-taste-part-three-%e2%80%93-estimators/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Implementing RSS Feeds with new features of Spring 3</title>
		<link>http://blog.jteam.nl/2010/05/05/implementing-rss-feeds-with-new-features-of-spring-3/</link>
		<comments>http://blog.jteam.nl/2010/05/05/implementing-rss-feeds-with-new-features-of-spring-3/#comments</comments>
		<pubDate>Wed, 05 May 2010 13:27:51 +0000</pubDate>
		<dc:creator>Roberto van der Linden</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[feed]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Rome]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[spring 3]]></category>
		<category><![CDATA[Spring Framework]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=2223</guid>
		<description><![CDATA[In this post I explain how we implemented the way we created the RSS feeds in a project and the challenges that we had during the set-up. My colleague Jettro Coenradie explained in a previous post how you can create a feed using Rome and Spring 3, but didn’t elaborated on the Spring 3 part. [...]]]></description>
			<content:encoded><![CDATA[<p>In this post I explain how we implemented the way we created the RSS feeds in a project and the challenges that we had during the set-up.</p>
<p>My colleague Jettro Coenradie explained in a previous <a href="http://www.gridshore.nl/2010/02/16/creating-a-w3c-validated-rss-feed-using-rome-and-spring-3/" target="_blank">post</a> how you can create a feed using Rome and Spring 3, but didn’t elaborated on the Spring 3 part. I will explain how we used Spring 3 to create the feeds.</p>
<p><span id="more-2223"></span></p>
<h2>Context</h2>
<p>In this project we need to support a large number of different feeds. As an example I will use the news feed that we need to create for different subjects. The subjects are created in a CMS and therefore people can add subjects whenever they want. This means that we want to support dynamically created feeds.</p>
<h2>Controller</h2>
<p>The controller is the place where the request mapping takes place.</p>
<pre class="brush: java;">
@Controller
public class SubjectRssController extends AbstractRssController {

    @SessionManagedDaos
    @RequestMapping(value = &quot;/subjects/{subjectName}/news.rss&quot;, method = RequestMethod.GET)
    public String generateSubjectNewsFeedItems(ModelMap modelMap, @PathVariable final String subjectName,
                                               HttpServletRequest request, RssSubjectDao subjectDao) throws ServletException {
        SubjectBean subject = findSubject(subjectName, request, subjectDao);

        RssQueryResultMapper mapper = getSubjectQueryResultMapper();
        List&lt;item&gt; items = subjectDao.findNewsForSubject(subject, mapper);

        modelMap.addAttribute(&quot;items&quot;, items);
        modelMap.addAttribute(&quot;subjectName&quot;, subject.getTitle());
        return &quot;subjectRssFeed&quot;;
}
</pre>
<h3>Annotations</h3>
<p>The first thing you might notice when you look at the code above are the two annotations (@SessionManagedDaos and @RequestMapping) on the method. The @SessionManagedDaos is an annotation that is created by us. This annotation manages the session that we use from the CMS connection pool.</p>
<p>The second annotation is the @RequestMapping. This annotation has the value <em>&#8220;/subjects/{subjectName}/news.rss&#8221; </em>which means that every path that matches this pattern will be mapped to this method. In this pattern uses a new feature of Spring 3, namely the variable in the path. The @PathVariable annotation in front of the parameter <em>subjectName </em>indicates that the variable in the path needs to be bound to the method parameter.</p>
<p>This helps us to support any request with a given subject name and therefore we do not need to change the code if a new subject is created.</p>
<h3>Wrong request</h3>
<p>With the freedom of the path pattern also comes the problem that a person can try any subject name that he wants. In order to prevent that we execute a query for a subject that does not exist, we execute a query that finds the subject. If the subject is not found, we throw a NoSuchRequestHandlingMethodException. This exception is handled by the DefaultHandlerExceptionResolver. This resolver handles certain standard Spring MVC exceptions by setting a specific response status code. In our case the status code will be a 404.</p>
<pre class="brush: java;">
protected SubjectBean findSubject(String subjectName, HttpServletRequest request, RssSubjectDao subjectDao)
                                  throws NoSuchRequestHandlingMethodException {
    SubjectBean subject = subjectDao.findSubjectBeanByName(subjectName);
    if (subject == null) {
        throw new NoSuchRequestHandlingMethodException(request);
    }

    return subject;
}
</pre>
<h3>Items</h3>
<p>The implementation of the controller is not special. We execute a query that finds some news and with the help of a mapper, we map the retrieved beans to RSS <a href="https://rome.dev.java.net/apidocs/0_3/com/sun/syndication/feed/rss/Item.html" target="_blank">items</a>. These items, as well as the subject title, are added to the model.</p>
<h2>View</h2>
<p>Now that our model is filled, we can create the view. Spring 3 provides us with the <a href="http://static.springsource.org/spring/docs/3.0.x/javadoc-api/org/springframework/web/servlet/view/feed/AbstractRssFeedView.html" target="_blank">AbstractRssFeedView</a>. This view has a few entry points to create the feed.</p>
<p>We created a BaseRssFeedView which extends the AbstractRssFeedView. The BaseRssFeedView overrides three methods that build the feed.</p>
<pre class="brush: java;">
public abstract class BaseRssFeedView extends AbstractRssFeedView {

    private LocalizedMessageResolver messages;

    @Override
    protected void prepareResponse(HttpServletRequest request, HttpServletResponse response) {
        super.prepareResponse(request, response);
       response.setCharacterEncoding(CharacterEncodingConstants.UTF8);
    }

    @Override
    protected void buildFeedMetadata(Map&lt;String, Object&gt; model, Channel feed, HttpServletRequest request) {
        RssMetaData rssMetaData = new RssMetaData();
        configureRssMetaData(rssMetaData, model);

        feed.setTitle(rssMetaData.getFeedTitle());
        feed.setDescription(rssMetaData.getFeedDescription());
        feed.setLink(feedLink);
        feed.setWebMaster(feedWebmaster);
        feed.setLanguage(feedLanguage);
        feed.setCopyright(feedCopyright);

        Image image = resolveImage(rssMetaData.getFeedTitle());
        feed.setImage(image);
        feed.setLastBuildDate(new Date());
        feed.setTtl(TIME_TO_LIVE);

        AtomNSModule module = new AtomNSModuleImpl();
        module.setLink(rssMetaData.getFeedUrl());
        feed.getModules().add(module);
    }

    @Override
    protected List&lt;Item&gt; buildFeedItems(Map&lt;String, Object&gt; stringObjectMap, HttpServletRequest request, HttpServletResponse response) {
        return (List&lt;Item&gt;) stringObjectMap.get(&quot;items&quot;);
    }

    protected abstract void configureRssMetaData(RssMetaData metaData, Map&lt;String, Object&gt; viewModel);
}
</pre>
<p>I don’t want to discuss the implementation of the methods, as I think that they are pretty self-explanatory. But I do want to discuss the one abstract method that needs to be implemented by the subclasses. This method allows the subclass to specify the specific metadata for the feed. The code below shows the view that would belong to our SubjectRssController.</p>
<pre class="brush: java;">
@Component(&quot;subjectRssFeed&quot;)
public class SubjectRssFeedView extends BaseRssFeedView {

    @Override
    protected void configureRssMetaData(RssMetaData metaData, Map&lt;String, Object&gt; viewModel) {
        String subjectName = (String) viewModel.get(&quot;subjectName&quot;);

        LocalizedMessageResolver messages = getMessageResolver();
        metaData.setFeedUrl(messages.getMessage(&quot;subject.news.rss.feed.url&quot;, subjectName));
        metaData.setFeedTitle(messages.getMessage(&quot;subject.news.rss.feed.title&quot;, subjectName));
        metaData.setFeedDescription(messages.getMessage(&quot;subject.news.rss.feed.description&quot;, subjectName));
    }
}
</pre>
<p>The class is annotated with @Component(&#8220;subjectRssFeed&#8221;). The name of the view is <em>subjectRssFeed</em> and needs to match with the view name you return in the controller.</p>
<h3>Properties</h3>
<p>In this case we assume that every subject has the same message for the specific metadata. So the only thing that differs is the subject name. For example, the description of a feed of the subject cars could be: <em>This feed contains the news items of the subject cars. </em></p>
<p>So this means that we need to parameterize the messages. In order to do that we have created a class that resolves the message for us from the correct property file. This way we can still use the templates in the property files.</p>
<p><em>subject.news.rss.feed.description=This feed contains the news items of the subject {0}</em></p>
<p>Spring 3 also lets you annotate a field or method/constructor parameter that indicates a default value expression for the affected argument. This means you could inject a property like this:</p>
<pre class="brush: java;">
@Value(&quot;#{rssProperties.getProperty('subject.news.rss.feed.url')}&quot;)
public void setFeedUrl(String feedUrl) {
    this.feedUrl = feedUrl;
}
</pre>
<p>or</p>
<pre class="brush: java;">
@Value(&quot;#{rssProperties.getProperty('subject.news.rss.feed.url')}&quot;)
private String feedUrl;
</pre>
<h2>Conclusion</h2>
<p>I hope you got an idea of how Spring 3 can help you with creating RSS feeds. In our case it was really useful as we get all the advantages of Spring MVC and added new feeds are a piece of cake!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/05/05/implementing-rss-feeds-with-new-features-of-spring-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Axon Framework 0.5 released</title>
		<link>http://blog.jteam.nl/2010/04/24/axon-framework-0-5-released/</link>
		<comments>http://blog.jteam.nl/2010/04/24/axon-framework-0-5-released/#comments</comments>
		<pubDate>Sat, 24 Apr 2010 15:30:19 +0000</pubDate>
		<dc:creator>Allard Buijze</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Axon Framework]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/2010/04/24/axon-framework-0-5-released/</guid>
		<description><![CDATA[Today, I finalized the 0.5 release of the Axon Framework. There is quite a number of changes since the 0.4 version. The 0.5 version is a major step towards production readiness of the framework. Besides some changes to existing building blocks, such as the event bus, which is now much more powerful, the 0.5 version [...]]]></description>
			<content:encoded><![CDATA[<p><img style="margin: 5px; display: inline;" src="http://www.gridshore.nl/wp-content/uploads/axon_logo.png" alt="" align="left" /> Today, I finalized the 0.5 release of the Axon Framework. There is quite a number of changes since the 0.4 version. The 0.5 version is a major step towards production readiness of the framework.</p>
<p>Besides some changes to existing building blocks, such as the event bus, which is now much more powerful, the 0.5 version also includes some new features.</p>
<p>Read on to find out more.</p>
<p><span id="more-2218"></span></p>
<h2><strong>New features</strong></h2>
<ul>
<li><strong>Code restructuring.</strong> The package structure has been changed to better reflect the different architectural components Axon Framework provides. It should be easier to find the one you&#8217;re looking for.</li>
<li><strong>Command Bus.</strong> The command bus is added to Axon. It provides you the ability to explicitly define commands and dispatch them to your command handlers. Furthermore, the command bus provides you the ability to process commands regardless of their type using interceptors. This is useful for, for example, logging, authorization and correlation of incoming commands.</li>
<li><strong>JPA Event Store. </strong>The easiest CQRS configuration is on using full-consistency. That means everything should run within a single transaction. Since transactions over multiple data sources involve a huge performance penalty, Axon provides a JPA Event Store. Its performance is not as good as the FileSystem version, but is does provide transaction support.</li>
<li><strong>Easy switching between full-consistency and eventual consistency.</strong> You can easily choose to process all commands and related events inside a single transaction, or to handle events asynchronously. Choosing for consistency or high-performance is just a matter of configuration. No coding required.</li>
<li><strong>Per-event listener configuration of asynchronous processing.</strong> It is now possible to decide on synchronous vs asynchronous event processing for each event handler individually, just by adding an annotation. If you configure a transaction manager for your event listeners, Axon will process the events in batches and manage the transactions around them.</li>
<li><strong>Support for rolling snapshots.</strong> All event stores will automatically pick up snapshot events. Snapshot events are an important performance booster when aggregates generate a lot of events. Instead of reading all passed events, the event store just needs to read the last snapshot event and the regular events created since the snapshot.</li>
<li><strong>Transactional Event Processing.</strong> Configuring transactions in asynchronous event processing is now a lot easier. 0.5 includes a <tt>SpringTransactionManager</tt> you can use in combination with Spring&#8217;s <tt>PlatformTransactionManager</tt>.</li>
<li><strong>Major documentation update.</strong> The documentation has been restructured to make it easier to find what you&#8217;re looking for.</li>
</ul>
<h2><strong>Maven Central</strong></h2>
<p>Where the 0.4 version required configuration of a repository in your project’s pom.xml, the 0.5 version doesn’t. All required artifacts are available in the maven central repository.</p>
<h2>Workshop and professional support</h2>
<p>We believe that the 0.5 version of Axon Framework is a major step towards production readiness. Therefore, JTeam has decided to provide professional support for the Axon Framework and organize workshops to get you acquainted with the numerous features and choices involved with CQRS.</p>
<p>The first workshop is planned for Friday May 21st in our office in Amsterdam. For more information, visit <a href="http://www.jteam.nl/training/workshop/cqrs-axon-framework-training-workshop.html">http://www.jteam.nl/training/workshop/cqrs-axon-framework-training-workshop.html</a>.</p>
<h2>Getting started</h2>
<p>Want to get started? Visit <a href="http://www.axonframework.org">www.axonframework.org</a> and download the <a href="http://axonframework.googlecode.com/files/reference-guide-0.5.pdf">reference guide</a>. That should contain enough information to get you started. If you still have questions, drop me a message.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/04/24/axon-framework-0-5-released/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Mahout &#8211; Taste :: Part Two &#8211; Getting started</title>
		<link>http://blog.jteam.nl/2010/04/15/mahout-taste-part-two-getting-started/</link>
		<comments>http://blog.jteam.nl/2010/04/15/mahout-taste-part-two-getting-started/#comments</comments>
		<pubDate>Thu, 15 Apr 2010 09:08:49 +0000</pubDate>
		<dc:creator>Frank Scholten</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Collaborative Filtering]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Recommendations]]></category>
		<category><![CDATA[Taste]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=1842</guid>
		<description><![CDATA[This blog is a &#8216;getting started&#8217; article and shows you how to build a simple web-based movie recommender with Mahout / Taste, Wicket and the Movielens dataset from Grouplens research group at the University of Minnesota. I will discuss which components you need, how to wire them up in Spring, and how to create a [...]]]></description>
			<content:encoded><![CDATA[<div>This blog is a &#8216;getting started&#8217; article and shows you how to build a simple web-based movie recommender with Mahout / Taste, Wicket and the Movielens dataset from Grouplens research group at the University of Minnesota. I will discuss which components you need, how to wire them up in Spring, and how to create a Wicket frontend for displaying movies and their recommendations. Along the way I give some tips and pointers about developing a recommender. Additionally I show the <span style="font-family: 'courier new'; font-size: medium;">ResourceDataModel</span>, a Mahout <span style="font-family: 'courier new'; font-size: medium;">DataModel</span> implementation which reads preferences from a Spring <span style="font-family: 'courier new'; font-size: medium;">Resource</span>.<br />
<span id="more-1842"></span><strong> </strong></p>
<h3><strong>Online movie store</strong></h3>
<p>Our running example for this post is an online DVD shop in which you can view and rent movies. Visitors go to a movie&#8217;s page to check out the plot description and will be shown a list of similar movies. These are other movies that were watched by people who also rented that specific movie. Because of space- and time-constraints, this application only provides a view on the dataset and the recommended movies. Other features expected of an online movie rental service, such as registration and payment are left out.</p>
<h3><strong>Movielens Dataset</strong></h3>
<p>The movies and their ratings originate from the <a href="http://www.grouplens.org/node/73">Movielens dataset</a> of the <a href="http://www.grouplens.org/">Grouplens research group</a> from the University of Minnesota. There are datasets contains 100.000, 1 million and 10 million ratings. Note that these ratings are <em>explicit</em>, ranging from 1 to 5. This is different from the example in <a id="icdy" title="my earlier blog" href="../2009/12/09/mahout-taste-part-one-introduction/">my earlier blog</a>, which featured <em>implicit</em> ratings. Implicit ratings only indicate if a user purchased or liked an item but not how much someone liked it. In this example I used the 100.000 ratings file.</p>
<h3><strong>Item-based or user-based algorithms</strong></h3>
<p>For this application we use an item-based recommender, which is a good choice performance-wise if the number of items is less than the number of users. This is likely to be the case for an online movie rental store. Item-based recommendation is different from user-based recommendation, which works by identifying a user neighbourhood of similar users and recommending items from the user neighbourhood to other users. User-based recommendations are <em>personalized</em> while with item-based recommendations each user will get <em>the same recommendations for a given item</em>. In our case, all users that visit a specific movie page will see the same recommended movies for that movie. An example of user-based recommendation is <a href="http://www.stumbleupon.com">Stumbleupon</a>, which provides personalized website recommendations. Stumbleupon requires that you to login first so that it can perform its user-based recommendation based on your profile of likes and dislikes of certain sites.</p>
<h3><strong>EuclidianDistanceSimilarity</strong></h3>
<p>Now we need to select an item-based algorithm that fits our dataset.  The <span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span> is one of Taste&#8217;s algorithms that is suitable for explicit ratings and will be used for our demonstration. Before we construct our recommender we introduce the euclidian distance similarity in more detail. The algorithm computes the euclidian distance between each item&#8217;s preference vector. The shorter the distance between these vectors, the greater the similarity. For instance, suppose we have users u1, u2 and u3 and item i1. Let&#8217;s say these preferences for these users are 2, 4 and 5 respectively. We now have a preference vector [2,4,5] for item i1. The euclidian distance between two of such vectors can now be computed and used as a measure of their similarity. The formula for computing the euclidian distance between two vectors i and j equals the root of the sum of squared differences between coordinates of a pair of vectors. See the formula below:<br />
<center><img src="http://blog.jteam.nl/wp-content/cache/tex_5b23879f5cdb88c80c1d0b55b1bbd52b.png" align="absmiddle" class="tex" alt="d_{ij}=\sqrt{\sum\limits_{k=1}^n \left(x_{ik} - x_{jk}\right)^2}" /></center></p>
<p>The <span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span> calculates this similarity for each pair of items and then returns</p>
<p><center><img src="http://blog.jteam.nl/wp-content/cache/tex_04966097412fbd5618c399029a650ff5.png" align="absmiddle" class="tex" alt="\frac{1}{1 - d_{ij}}" /></center></p>
<p>which results in a value between 0 and 1. The <span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span> can also be weighted. If you pass in the <span style="font-family: 'courier new'; font-size: medium;">Weighting.WEIGHTED</span> enum to the constructor of <span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span> then the algorithm will weight the values based on the number of users and the number of co-occurring preferences.</p>
<h3><strong>Creating a web-based recommender</strong></h3>
<p>Below is a list of components we need to create a small web-based recommendation engine, using Taste, Wicket and JPA/Hibernate. I won&#8217;t cover all the details of building this webapp, just the main building blocks. You can download the code for this example <a href="http://blog.jteam.nl/wp-content/uploads/2010/04/taste-getting-started.zip">here</a> and look at the specific details. We need the following components:</p>
<ul>
<li><span style="font-family: 'courier new'; font-size: medium;">Datamodel</span><br />
A <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span> that reads movie ids, user ids and ratings from the Movielens dataset directly into memory.</li>
<li><span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span><br />
Computes item similarities for each pair of items in the datset.</li>
<li><span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span><br />
Uses both the datamodel and the similarity algorithm to compute similar items in memory.</li>
<li><span style="font-family: 'courier new'; font-size: medium;">MovieRepository</span><br />
JPA repository for retrieving <span style="font-family: 'courier new'; font-size: medium;">Movie</span> objects</li>
<li><span style="font-family: 'courier new'; font-size: medium;">MovieService</span><br />
Uses the recommender and the <span style="font-family: 'courier new'; font-size: medium;">MovieRepository</span> to retrieve most similar movies for a given movie id.</li>
<li>Wicket <span style="font-family: 'courier new'; font-size: medium;">MoviePage</span>, HTML + CSS<br />
This includes a page for viewing a movie along with similar movies, a few model classes, some HTML and CSS, and a few code tweaks to the original <a id="hsdj" title="wicket quickstart" href="http://wicket.apache.org/quickstart.html">wicket quickstart</a> project.</li>
</ul>
<p>Note that Taste ships with a preconfigured <span style="font-family: 'courier new'; font-size: medium;">MovielensRecommender</span>. For the purpose of this article however, I wanted to show you how to build a recommender from the ground up.</p>
<h3><strong>Configuring the ResourceDataModel, EuclidianDistanceSimilarity and GenericItemBasedRecommender</strong></h3>
<p>Because of license restrictions the movielens data cannot be shipped with this demo, so you need to download it here. Place the <span style="font-family: 'courier new'; font-size: medium;">u.data</span> file under <span style="font-family: 'courier new'; font-size: medium;">src/main/resources/grouplens/100K/ratings/</span> and place the <span style="font-family: 'courier new'; font-size: medium;">u.item</span> file under <span style="font-family: 'courier new'; font-size: medium;">src/main/resources/grouplens/100K/data/</span>. Now that you have the ratings data setup you need to feed it into a <span style="font-family: 'courier new'; font-size: medium;">DataModel</span> class. You can use a <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span> for this but for this you need to use an absolute path. Instead I implement a <span style="font-family: 'courier new'; font-size: medium;">ResourceDataModel</span> which reads ratings files from the classpath. Below is the implementation of the <span style="font-family: 'courier new'; font-size: medium;">ResourceDataModel</span>, which is a wrapper around a <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span>. More on how to wire this below.</p>
<pre class="brush: java;">
package nl.jteam.mahout.gettingstarted.datamodel;

// Imports omitted.

/**
 * DataModel implementation which reads a Spring {@link org.springframework.core.io.Resource} into a
 * {@link org.apache.mahout.cf.taste.impl.model.file.FileDataModel} delegate.
 *
 * @author Frank Scholten
 */
public class ResourceDataModel implements DataModel {
    FileDataModel delegate;

    ResourceDataModel() { // For testing
    }

    /**
     * Reads the preferences from the given {@link org.springframework.core.io.Resource}
     *
     * @param resource with user IDs, items IDs and preferences
     */
    public ResourceDataModel(Resource resource) {
        try {
            this.delegate = new FileDataModel(resource.getFile());
        } catch (IOException e) {
            throw new RuntimeException(&quot;Could not read resource &quot; + resource.getDescription(), e);
        }
    }

    @Override
    public LongPrimitiveIterator getUserIDs() throws TasteException {
        return delegate.getUserIDs();
    }

    @Override
    public PreferenceArray getPreferencesFromUser(long userID) throws TasteException {
        return delegate.getPreferencesFromUser(userID);
    }

    @Override
    public FastIDSet getItemIDsFromUser(long userID) throws TasteException {
        return delegate.getItemIDsFromUser(userID);
    }

    @Override
    public LongPrimitiveIterator getItemIDs() throws TasteException {
        return delegate.getItemIDs();
    }

    @Override
    public PreferenceArray getPreferencesForItem(long itemID) throws TasteException {
        return delegate.getPreferencesForItem(itemID);
    }

    @Override
    public Float getPreferenceValue(long userID, long itemID) throws TasteException {
        return delegate.getPreferenceValue(userID, itemID);
    }

    @Override
    public int getNumItems() throws TasteException {
        return delegate.getNumItems();
    }

    @Override
    public int getNumUsers() throws TasteException {
        return delegate.getNumUsers();
    }

    @Override
    public int getNumUsersWithPreferenceFor(long... itemIDs) throws TasteException {
        return delegate.getNumUsersWithPreferenceFor(itemIDs);
    }

    @Override
    public void setPreference(long userID, long itemID, float value) throws TasteException {
        delegate.setPreference(userID, itemID, value);
    }

    @Override
    public void removePreference(long userID, long itemID) throws TasteException {
        delegate.removePreference(userID, itemID);
    }

    @Override
    public boolean hasPreferenceValues() {
        return delegate.hasPreferenceValues();
    }

    @Override
    public void refresh(Collection&lt;Refreshable&gt; alreadyRefreshed) {
        delegate.refresh(alreadyRefreshed);
    }
}
</pre>
<p>Next you need to initialize the movie database. Run the <span style="font-family: 'courier new'; font-size: medium;">initialize_movielens_db.sql</span> from the <span style="font-family: 'courier new'; font-size: medium;">src/main/resources/sql</span> folder. This will create the <span style="font-family: 'courier new'; font-size: medium;">movielens</span> database and a user with username and password <span style="font-family: 'courier new'; font-size: medium;">movielens</span>. Additionally, it creates the movie database and loads it with the movie titles from the <span style="font-family: 'courier new'; font-size: medium;">u.item</span> file.</p>
<p>Now we wire up the <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span> and its dependencies, the <span style="font-family: 'courier new'; font-size: medium;">EuclidianDistanceSimilarity</span> and the <span style="font-family: 'courier new'; font-size: medium;">ResourceDataModel</span> in the following spring context:</p>
<pre class="brush: xml;">
 &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;beans xmlns=&quot;http://www.springframework.org/schema/beans&quot;
       xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;
       xsi:schemaLocation=&quot;http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd&quot;&gt;

    &lt;!-- Recommender --&gt;

    &lt;bean id=&quot;euclidianDistanceRecommender&quot; class=&quot;org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender&quot;&gt;
        &lt;constructor-arg index=&quot;0&quot; ref=&quot;movielensDataModel100K&quot;/&gt;
        &lt;constructor-arg index=&quot;1&quot; ref=&quot;euclidianDistanceSimilarity&quot;/&gt;
    &lt;/bean&gt;

    &lt;!-- DataModel --&gt;

    &lt;bean id=&quot;movielensDataModel100K&quot; class=&quot;nl.jteam.mahout.gettingstarted.datamodel.ResourceDataModel&quot;&gt;
        &lt;constructor-arg value=&quot;classpath:/grouplens/100K/ratings/u.data&quot;/&gt;
    &lt;/bean&gt;

    &lt;!-- Similarity --&gt;

    &lt;bean id=&quot;euclidianDistanceSimilarity&quot; class=&quot;org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity&quot;&gt;
        &lt;constructor-arg ref=&quot;movielensDataModel100K&quot;/&gt;
        &lt;constructor-arg value=&quot;WEIGHTED&quot;/&gt;
    &lt;/bean&gt;

&lt;/beans&gt;
</pre>
<h3><strong>MovieRepository, MovieService &amp; MoviePage</strong></h3>
<p>Our recommender is now ready to determine similar movies for a given movie ID. However, the <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span> interface only returns movie IDs of the type <span style="font-family: 'courier new'; font-size: medium;">long</span>. In order to display the actual movie information to users we need to create a <span style="font-family: 'courier new'; font-size: medium;">MovieRepository</span> which fetches recommended <span style="font-family: 'courier new'; font-size: medium;">Movie</span> objects. Additionally, we need a <span style="font-family: 'courier new'; font-size: medium;">MovieService</span> which coordinates the <span style="font-family: 'courier new'; font-size: medium;">MovieRepository</span> and the <span style="font-family: 'courier new'; font-size: medium;">GenericItemBasedRecommender</span> so that we can retrieve recommended <span style="font-family: 'courier new'; font-size: medium;">Movie</span> objects for a given movie ID. Below is a snippet of a JPA implementation of a <span style="font-family: 'courier new'; font-size: medium;">MovieRepository</span>.</p>
<pre class="brush: java;">
package nl.jteam.mahout.gettingstarted.repository;

// Imports omitted.

/**
 * Repository for retrieving {@link Movie}s
 *
 * @author Frank Scholten
 */
@Repository
public class JpaMovieRepository implements MovieRepository {

    @PersistenceContext
    private EntityManager entityManager;

    /** {@inheritDoc} */
    @Override
    public Movie getMovieById(long id) {
        return entityManager.find(Movie.class, id);
    }

    /** {@inheritDoc} */
    @Override
    @SuppressWarnings(&quot;unchecked&quot;)
    public List&lt;Movie&gt; getMoviesById(List&lt;Long&gt; movieIds) {
        return (List&lt;Movie&gt;) entityManager.createQuery(&quot;SELECT m FROM movie m WHERE m.id IN (:movieIds)&quot;)
                .setParameter(&quot;movieIds&quot;, movieIds)
                .getResultList();
    }
}
</pre>
<p>Below is a code snippet of a default implementation of the <span style="font-family: 'courier new'; font-size: medium;">MovieService</span>.</p>
<pre class="brush: java;">
package nl.jteam.mahout.gettingstarted.service;

// Imports omitted.

/**
 * Service for retrieving and recommending {@link Movie}s.
 *
 * @author Frank Scholten
 */
@Transactional
@Service
public class DefaultMovieService implements MovieService {

    @Autowired
    private MovieRepository movieRepository;

    @Autowired
    private ItemBasedRecommender movieRecommender;

    public Movie getMovieById(long id) {
        return movieRepository.getMovieById(id);
    }

    @SuppressWarnings(&quot;unchecked&quot;)
    public List&lt;Movie&gt; moreLikeThis(long movieId) {
        try {
            List&lt;RecommendedItem&gt; recommendedItems = movieRecommender.mostSimilarItems(movieId, 5);

            List&lt;Long&gt; ids = new ArrayList();
            for (RecommendedItem r : recommendedItems) {
                ids.add(r.getItemID());
            }

            return movieRepository.getMoviesById(ids);

        } catch (TasteException e) {
            return (List&lt;Movie&gt;) Collections.EMPTY_LIST;
        }
    }
}
</pre>
<p>Finally, below is the snippet of the Wicket <span style="font-family: 'courier new'; font-size: medium;">MoviePage</span> which displays the current movie and similar movies fetched through specially created Wickets models.</p>
<pre class="brush: java;">
package nl.jteam.mahout.gettingstarted.web.page;

// Imports omitted.

/**
 * Page for showing a single {@link Movie} from the Movielens dataset along
 * with recommended movies i.e. 'more like this'.
 *
 * @author Frank Scholten
 */
public class MoviePage extends WebPage {

    private static final String MOVIE_ID = &quot;0&quot;;

    public MoviePage(PageParameters pageParameters) {
        final long movieId = pageParameters.getLong(MOVIE_ID, 1);

        MovieModel model = new MovieModel(movieId);
        add(new Label(&quot;title&quot;, model.getObject().getTitle()));

        PropertyListView&lt;Movie&gt; recommendedMovies = new PropertyListView&lt;Movie&gt;(&quot;moreLikeThis&quot;, new RecommendedMoviesModel(movieId)) {
            @Override
            protected void populateItem(ListItem listItem) {
                Movie movie = (Movie) listItem.getModelObject();
                PageParameters pageParameters = new PageParameters();
                pageParameters.put(MOVIE_ID, movie.getId());

                BookmarkablePageLink&lt;MoviePage&gt; movieLink = new BookmarkablePageLink&lt;MoviePage&gt;(&quot;link&quot;, MoviePage.class, pageParameters);
                listItem.add(movieLink);
                Label movieTitle = new Label(&quot;title&quot;);
                movieTitle.setRenderBodyOnly(true);
                movieLink.add(movieTitle);
            }
        };
        add(recommendedMovies);
    }
}
</pre>
<h3><strong>Running the web application</strong></h3>
<p>First you need to download the Movielens dataset and add the ratings file on the classpath under <span style="font-family: 'courier new'; font-size: medium;">grouplens/100K/ratings</span>. See the spring context above. Since this example is based on the Wicket quickstart project you can start the application via Jetty through the <span style="font-family: 'courier new'; font-size: medium;">Start</span> class and run it in your favourite IDE. Go to <span style="font-family: 'courier new'; font-size: medium;">http://localhost:9090/</span> and you can browse through movies via recommendations. Alternatively you can build the WAR and drop it into tomcat.</p>
<h3><strong>Performance tweaks</strong></h3>
<ul>
<li>If you like to experiment with the larger datasets you need to add <span style="font-family: 'courier new'; font-size: medium;">-Xmx512m</span> as a VM parameter if you want to run this application with the 10 million ratings dataset. Also, Tastes <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span> uses commas and tabs as delimiter. You may need to run the Movielens files through <span style="font-family: 'courier new'; font-size: medium;">sed</<span style="font-family: 'courier new'; font-size: medium;"> before feeding them into a <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span></li>
<li>The <span style="font-family: 'courier new'; font-size: medium;">FileDataModel</span> reads everything in memory before computation. This is way faster than using a <span style="font-family: 'courier new'; font-size: medium;">MySQLJDBCDataModel</span>, since this requires around O(n<sup>2</sup>) database queries to compute similarities between all pairs. Reading all data in memory is not always feasible however. An alternative is to precompute the similarities for the item pairs and store the results in the database and read them via the <span style="font-family: 'courier new'; font-size: medium;">MySQLJDBCItemSimilarity</span>.</li>
<li>You can also sample the dataset and/or remove noise elements to speed things up a little more</li>
</ul>
<p>These performance aspects are an interesting topic for a later blogpost. If any of you reading this has experience with these type of issues please post a comment and we&#8217;ll discuss them.</p>
<h3><strong>Conclusions</strong></h3>
<p>This concludes the getting started post on Mahout / Taste. What we didn&#8217;t cover was how to update the recommender and how to customize your recommender with boosting of items. These are all subjects for future blog posts.</p>
<h3><strong>References</strong></h3>
<ul>
<li>Getting started demo &#8211; source code</li>
<p>You can download the source code of this example <a href="http://blog.jteam.nl/wp-content/uploads/2010/04/taste-getting-started.zip">here</a>.</p>
<li>Grouplens datasets</li>
<p>The <a href="http://www.grouplens.org/">Grouplens</a> research group of the University of Minnesota have made a few <a href="http://www.grouplens.org/node/73">datasets</a> publicly available for research purposes.</p>
<li><a href="http://www.manning.com/owen/">Mahout in Action EAP</a></li>
<p>This is a great resource on Mahout and explains a lot about performance issues and details how the algorithms stack up against eachother. Also it provides a lot of examples and case studies on how to use Mahout in practice.</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/04/15/mahout-taste-part-two-getting-started/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>iPhone OS 4.0</title>
		<link>http://blog.jteam.nl/2010/04/11/iphone-os-4-0/</link>
		<comments>http://blog.jteam.nl/2010/04/11/iphone-os-4-0/#comments</comments>
		<pubDate>Sun, 11 Apr 2010 17:00:53 +0000</pubDate>
		<dc:creator>Tom van Zummeren</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[iPhone]]></category>
		<category><![CDATA[Objective C]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=2112</guid>
		<description><![CDATA[This is just a quick blog post to share my excitement for the new iPhone OS firmware version 4.0, which was announced by Apple just a few days ago. Yesterday I installed the beta version of this new OS on my phone, only to find out that a few of the new features are not [...]]]></description>
			<content:encoded><![CDATA[<p>This is just a quick blog post to share my excitement for the <a href="http://www.apple.com/iphone/preview-iphone-os/">new iPhone OS firmware version 4.0</a>, which was announced by Apple just a few days ago. Yesterday I installed the beta version of this new OS on my phone, only to find out that a few of the new features are not supported on iPhone 3G <img src='http://blog.jteam.nl/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' />  Unfortunately, those features will be available only on iPhone 3GS. Nonetheless, I am very excited about all this new stuff and want to go over the most interesting features, both from a user’s and a developer’s perspective.<br />
<span id="more-2112"></span></p>
<h2>From a user’s perspective</h2>
<p>I want to highlight a few of the new features I think are most interesting which come with the new iPhone OS 4.0.</p>
<h3>Multitasking</h3>
<p>I was always one of the people that believed that the iPhone doesn’t <em>need</em> multitasking at all. I was perfectly happy with the fact that an app can manually save it’s state on exit, and restore that same state when you start the app again, like nothing happened. A while ago I was using a phone that did support multitasking. It was the HTC Magic which was running on Google’s Android platform. I soon noticed the downsides of the ability to multitask. The battery runs out very quickly (even quicker than an iPhone’s battery) and the phone tends to get really sluggish when you have just a few apps running in the background.</p>
<p>The downsides of multitasking I just described were one of the reasons why Apple didn’t support multitasking in the first place. They wanted to make a phone that provided a great user experience which runs smoothly in every circumstance. And also they didn’t want the battery to die in just half a day. But now with OS 4.0, Apple includes multitasking. It’s implemented in a way which preserves battery life and performance! Sounds very promising, doesn’t it? Let&#8217;s see if it is too good to be true.</p>
<p>I ran OS 4.0 on the iPhone simulator which comes with the iPhone SDK to play around with this new functionality. Because, like I said, I can&#8217;t use the multitask feature on my iPhone 3G. Now I can only demonstrate what the user interface looks like on the simulator. I deployed a little iPhone app which I created a while ago for an existing website. This website was created by a friend of mine and is kind of like a forum with just one thread, used by a group of people that know each other in real life. The screenshots below show how you can easily switch between that app and a Safari web page, as an example. To switch you can double click the home button and the icons of all currently running apps slide into the screen. Quickly switch to one of the running apps by tapping it&#8217;s icon.</p>
<table border="0">
<tr>
<td><img src="http://blog.jteam.nl/wp-content/uploads/2010/04/Screen-shot-Yert-app.png" alt=""Screen shot Yert app" width="207" height="385" class="alignnone size-full wp-image-2117" /></td>
<td><img src="http://blog.jteam.nl/wp-content/uploads/2010/04/Screen-shot-Yert-+-running-apps.png" alt="Screen shot Yert + running apps" width="207" height="385" class="alignnone size-full wp-image-2119" /></td>
<td><img src="http://blog.jteam.nl/wp-content/uploads/2010/04/Screen-shot-Safari-+-running-apps.png" alt="Screen shot Safari + running apps" width="207" height="385" class="alignnone size-full wp-image-2120" /></td>
</tr>
</table>
<p>The way they implemented this is once an app is put in the background, the state of that app freezes. Once you switch back to that app, the app instantly continues to run from the same state you left it in. This way the app doesn’t keep running in the background which would negatively affect battery consumption and performance. Of course, some apps still want to be able to do stuff while in the background. Apple provides those apps with a bunch of services they can utilize to be able to do their thing, while preserving battery life. Here are a few examples:</p>
<ul>
<li>Playing music in the background. For example the Pandora Radio app (which I believe isn’t available outside of America unfortunately) is now able to continue to play music even when the app is not running. It’s just like what the iPod app already could do, so now that same technique is available for all music playing apps.</li>
<li>Voice over IP in the background. For example the Skype app can now receive calls while it’s in the background! This service is kind of similar to what the Phone app already could do. This is awesome because this was the main drawback of using Skype on iPhone. You really needed to have the app running to be able to receive phone calls. Once you quit the app, you could no longer receive phone calls. This service makes it possible so that’s exciting news!</li>
<li>Receive GPS data while in background. This still consumes some of the battery&#8217;s life but it allows for example TomTom, to continue giving the user directions, even when the user is switching to other apps. But since you always have the iPhone connected to a power adapter when using TomTom, the extra battery consumption of this service is no problem at all.</li>
</ul>
<p>There are a few more services available like this, but the ones I described above are the ones I think are most interesting.</p>
<h3>Setting backgrounds</h3>
<p>Yes, Apple finally gave in, the support for displaying a background image behind your application icons. So now people that didn’t jailbreak their phones can enjoy a nice little picture in the background too <img src='http://blog.jteam.nl/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  This feature is also only supported on iPhone 3GS. Why it isn&#8217;t on 3G? I have no clue at all.</p>
<h3>Organizing apps with folders</h3>
<p>Another new feature is the ability to create folders to organize your apps! Each folder can contain up to 12 apps. This is a feature that IS supported on my iPhone 3G so I made some screenshots without the need for the simulator this time.</p>
<table width="100%">
<tr>
<td align="right"><img src="http://blog.jteam.nl/wp-content/uploads/2010/04/Screen-shot-closed-folders.png" alt="Screen shot closed folders" width="160" height="240" class="alignnone size-full wp-image-2123" /></td>
<td align="left"><img src="http://blog.jteam.nl/wp-content/uploads/2010/04/Screen-shot-open-Music-folder.png" alt="Screen shot open Music folder" width="160" height="240" class="alignnone size-full wp-image-2124" /></td>
</tr>
</table>
<p>Before OS 4.0 you could “only” install  a total of 180 apps on your phone, so now in theory if you would replace every icon with a folder and stuff it with apps, you could install a total of 2160 apps! Which is totally ridiculous if you would do that, but it IS possible now <img src='http://blog.jteam.nl/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> </p>
<h3>Game Center</h3>
<p>Apple included a preview of Game Center in the OS 4.0 beta, which will not be in the first release of OS 4.0. It’s supposed to be available later this year for everyone to use. As far as I understand, Game Center is a way for all iPhone games to easily find other users to play with and to keep online score boards and stuff like that. You can even challenge other players who are not currently playing the game. They get a push notification with the challenge, and if the user accepts, the game starts. To me this whole thing sounds just like the iPhone’s version of the Playstation Network and XBOX Live, which I think is awesome <img src='http://blog.jteam.nl/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h2>From a developer’s perspective</h2>
<p>OS 4.0 also includes a lot of new API’s for developers to work with to make their apps even more awesome.</p>
<h3>Local notifications</h3>
<p>Since OS 3.0 we already have push notifications, which let a server send messages to your phone when certain events occur. Now, we also get local notifications. The main difference is that these doesn’t require a server. An app can now simply schedule a local notification to be displayed at a certain time. I think this is a really cool feature since you don’t always need a server to display notifications.</p>
<h3>Calendar access</h3>
<p>By using the new EventKit framework, apps can now access the user’s calendar. So apps can now get existing events from your calendar and add new events to it. This was always something I was missing, but now it’s here!</p>
<h3>Quick Look</h3>
<p>With the Quick Look framework, apps are now able to show a preview of certain files. For examples attachments to an e-mail, or files in your DropBox. The following file types are supported: iWork documents, Microsoft Office documents, RTF, PDF, images, text and CSV files.</p>
<h3>Sending SMS messages</h3>
<p>Previously you could send SMS messages only by leaving the app completely and go into the Messages app (this could be directly triggered from within the app). Now SMS messages can be sent without leaving the app by using the Message UI framework. You stay in the same app while typing the text message.</p>
<h3>Block object</h3>
<p>This actually isn’t a new framework, it’s a new language feature in Objective C! The developer documentation describes it best: <em>“A block object is a mechanism for creating an ad hoc function body, something which in other languages is sometimes called a closure or lambda”</em>. This can for example be used as a replacement for delegates or it can be used as callback functions.</p>
<p>An example to sort a NSString array using a block object:</p>
<pre class="brush: cpp;">NSArray *stringsArray = [NSArray arrayWithObjects:@&quot;string 1&quot;, @&quot;String 21&quot;, @&quot;string 12&quot;, @&quot;String 11&quot;, @&quot;String 02&quot;, nil];
static NSStringCompareOptions comparisonOptions = NSCaseInsensitiveSearch | NSNumericSearch | NSWidthInsensitiveSearch | NSForcedOrderingSearch;
NSLocale *currentLocale = [NSLocale currentLocale];

NSComparator finderSortBlock = ^(id string1, id string2) {
    NSRange string1Range = NSMakeRange(0, [string1 length]);
    return [string1 compare:string2 options:comparisonOptions range:string1Range locale:currentLocale];
};
NSArray *finderSortArray = [stringsArray sortedArrayUsingComparator:finderSortBlock]);
NSLog(@&quot;finderSortArray: %@&quot;, finderSortArray);</pre>
<p>The <em>NSComparator finderSortBlock</em> is the block object in this example. It describes a function which can compare two strings with each other.</p>
<h2>That&#8217;s all folks!</h2>
<p>That’s all I wanted to share for now. I am very excited about all this new development. Let me know what you think by leaving a comment in the comment section below.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/04/11/iphone-os-4-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>JTeam en rijksoverheid.nl</title>
		<link>http://blog.jteam.nl/2010/04/05/jteam-en-rijksoverheid-nl/</link>
		<comments>http://blog.jteam.nl/2010/04/05/jteam-en-rijksoverheid-nl/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 14:47:56 +0000</pubDate>
		<dc:creator>Jettro Coenradie</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[Hippo]]></category>
		<category><![CDATA[rijksoverheid]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=2044</guid>
		<description><![CDATA[www.rijksoverheid.nl is live. Hoelang geleden het idee achter één rijksbrede website is ontstaan weet ik niet. Hoelang ik nu bij het project betrokken ben weet ik wel. Het is nu ongeveer anderhalf jaar geleden dat ik ben begonnen als Software Architect bij het project Overheid Nieuws Stijl. Het is een ambitieus project om 16 websites [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.rijksoverheid.nl">www.rijksoverheid.nl</a> is live. Hoelang geleden het idee achter één rijksbrede website is ontstaan weet ik niet. Hoelang ik nu bij het project betrokken ben weet ik wel.</p>
<div style="text-align:center;"><img src="http://blog.jteam.nl/wp-content/uploads/2010/04/Screen-shot-2010-04-04-at-08.52.08.png" alt="Screen shot 2010-04-04 at 08.52.08.png" border="0" width="450" height="281" /></div>
<p>Het is nu ongeveer anderhalf jaar geleden dat ik ben begonnen als Software Architect bij het project <strong>Overheid Nieuws Stijl</strong>. Het is een ambitieus project om 16 websites samen te laten komen in één rijksbrede website <a href="http://www.rijksoverheid.nl">www.rijksoverheid.nl</a>. Met de eerste live gang zijn er 7 websites over: postbus 51, regering.nl, minaz.nl, minez.nl, minocw.nl, minvws.nl en jeugengezin.nl. Uiteraard moest de meeste content van de websites ook op de nieuwe website terug te vinden zijn.</p>
<p>Vanuit de overheid is er voor gekozen om zoveel mogelijk Open Source software te gaan gebruiken. Ook voor het project Overheid Nieuwe Stijl (ONS) is er gekozen voor meerdere Open Source oplossingen. Naast het gebruik van Open Source software worden zaken als Open Standaarden en ook Open Data belangrijk gevonden. Vooral <strong>Open Data</strong> wordt door iedereen als zeer positief ervaren. De <a href="http://www.rijksoverheid.nl/copyright">licentie voor alle content</a> is Creative Commons.</p>
<p>Mocht je meer informatie willen hebben over dit enorme project, <a href="http://www.communicatieplein.nl/Onderwerpen/Corporate_communicatie/Project_ONS">dan kun je deze hier vinden</a>.</p>
<p>In deze blogpost wil ik me vooral concentreren op de betrokkenheid van JTeam bij dit ambitieuze project voor de rijksoverheid.</p>
<p><span id="more-2044"></span><br />
<h2>Proof-Of-Concept (Poc)</h2>
<p>Voor mij startte het project ergens in September 2009. Samen met twee mensen vanuit de overheid een iemand van Hippo zouden wij het Hippo cms op de pijnbank leggen tijdens een Proof-Of-Concept (PoC). Voor deze PoC was er al veel onderzoek gedaan naar content management systemen, daar is Hippo uitgekomen als de beste keuze voor de eisen van de nieuwe rijksoverheid website. Al tijdens de PoC bleek dat Hippo graag wilde helpen om een succes te maken van dit project. Ze hebben vanuit Hippo tijdens de PoC en uiteraard ook tijdens de rest van het project veel features opgeleverd die wij als rijksoverheid nodig hadden. Tijdens de PoC heb ik geholpen de oplossing te realiseren. Waar nodig ben ik kritisch richting Hippo geweest. Heb soms zelfs om verbeteringen gevraagd. Dit heb ik en uiteraard ook mijn collega&#8217;s tijdens het project vaker gedaan. Zo hebben we bugs gerapporteerd, soms ook wat patches gemaakt om de uiteindelijke oplossingen beter te maken.</p>
<p>Een klein onderdeel van de PoC was het importeren van content en beschikbaar stellen van data. Door gebruik te maken van spring-ws en spring integration en een eigen gemaakte hippo connector bleek dit heel goed mogelijk.</p>
<p>Met hard werken is de PoC uiteindelijk geslaagd en konden de plannen voor het echte project worden gemaakt.</p>
<h2>De aanbesteding</h2>
<p>In afwachting van de aanbesteding is het project team verder gegaan met het maken van plannen en voorbereiden van het project. Nadat JTeam bij een van de 6 partijen hoorde is het team uitgebreid en inmiddels zitten Roberto, Rob en Tom ook in het team. Daarmee hebben we als JTeam een behoorlijke bijdrage geleverd aan de uiteindelijke oplossing.</p>
<h2>Open Source</h2>
<p>Zoals eerder al gezegd is er de voorkeur om zoveel mogelijk met Open Source te werken. Het is bekend dat we gebruik maken van <a href="http://www.onehippo.com/en/products/cms">Hippo</a> als content management systeem. Hippo werkt mee in het project, de samenwerking met Hippo heb ik als zeer prettig ervaren. We houden elkaar scherp en vullen elkaar duidelijk aan qua kennis.</p>
<p>Naast hippo zijn er ook nog andere Open Source technologiën waar we gebruik van maken. De belangrijkste die ik nog wel wil noemen zijn twee spring framework projecten. We hebben <a href="http://static.springsource.org/spring-ws/sites/1.5/">spring webservices</a> gebruikt voor het accepteren van nieuwe content en content updates. Het afhandelen van nieuwe content is een complex stuk dat we uiteindelijk oplossen met <a href="http://www.springsource.org/spring-integration">spring integration</a>. Deze technologiën zijn onmisbaar in het hele content migratie stuk. Daarnaast maken we ook gebruik van de RSS mogelijkheden die het spring framework out-of-the-box ondersteund.</p>
<h2>Data migratie</h2>
<p>De data van de websites wordt middels een commerciële tool geïmporteerd. Kapow van <a href="http://kapowtech.com/">kapowtech</a> is een gespecialiseerde tool voor het lezen van bestaande websites. Zij spreken vervolgens een door ons gerealiseerde webservice aan om de content in Hippo te krijgen. Dit heeft zich tot nu toe bewezen als zeer efficiënt.</p>
<h2>Web richtlijnen</h2>
<p>Vanuit het ministerie van Volksgezondheid, Welzijn en Sport zijn een aantal mensen binnen gekomen die veel doen met de web richtlijnen. Twee van hen zijn nauw betrokken bij bij het tot stand komen en dus ook naleven van de <a href="http://www.webrichtlijnen.nl/">web richtlijnen</a>. Ik moet eerlijk toegeven dat ik me hier nooit zoveel mee bezig heb gehouden. Natuurlijk kende ik de basis wel, maar ik heb al best wel aardig wat geleerd van deze mannen. Ik heb begrepen dat we nu al nagenoeg voldoen aan de nieuwe versie van de web richtlijnen die momenteel wordt vastgelegd.</p>
<h2>Hosting en infrastructuur</h2>
<p>Zonder een goede infrastructuur is het onmogelijk om een goede site op te leveren. We maken al lang gebruik van een specialist op het gebied van infrastructuur. Hij komt van het bedrijf <a href="http://www.prolocation.nl/">prolocation</a>. Samen met een collega heeft hij mij echt weten te verbazen. De kennis die deze twee gasten hebben is enorm. Heb veel van hen opgestoken.</p>
<p>De hosting ga ik het niet te veel over hebben. Hier is het nodige om te doen. Weet wel dat we er erg veel energie in hebben moeten stoppen om de infrastructuur zo te krijgen dat het voor ons project werkt zoals het zou moeten werken.</p>
<h2>Open Data</h2>
<p>Ik heb al eerder gezegd dat Open Data erg belangrijk is voor het project. In de huidige versie hebben we nog niet zo heel veel gedaan aan het beschikbaar stellen van data. Het gebruik van Creative Commons is wellicht het belangrijkste. Wel is er gestart met het beschikbaar stellen van rss feeds. Uiteraard zullen hier in de toekomst meer features worden opgeleverd.</p>
<p>Persoonlijk moet ik er nog wel eens aan wennen. Ik was dan ook nogal skeptisch over het initiatief van een andere partij om toch meer feeds aan te bieden. Vanuit het project was men het hier echter helemaal niet mee eens en vond dit juist een mooi initiatief.</p>
<p><a href="http://rijksoverheid.jijendeoverheid.nl/">http://rijksoverheid.jijendeoverheid.nl/</a></p>
<h2>De mensen</h2>
<p>Als laatste wil ik toch nog even in gaan op de mensen die in het project zitten. Ik vind echt dat we een enorm goed team hebben weten te creëren met mensen die overal vandaan kwamen. Ambtenaren en externen gaan samen volledig voor een resultaat dat er mag zijn. </p>
<h2>Het resultaat</h2>
<p>Super snelle website en een gaaf project.</p>
<p>Wat we binnen dit project voor elkaar krijgen kan alleen met een team, en dat is precies wat we met zijn allen zijn geworden. Ik ben enorm trots op dat ik mee heb mogen werken met dit project en ik denk dat ik nog wel even mee blij werken om ook de volgende fases mee te kunnen maken.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/04/05/jteam-en-rijksoverheid-nl/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Searching your Java CMS using Apache Solr: Introduction</title>
		<link>http://blog.jteam.nl/2010/03/31/searching-your-java-cms-using-apache-solr-introduction/</link>
		<comments>http://blog.jteam.nl/2010/03/31/searching-your-java-cms-using-apache-solr-introduction/#comments</comments>
		<pubDate>Wed, 31 Mar 2010 06:08:36 +0000</pubDate>
		<dc:creator>Ralph Benjamin Ruijs</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[CMS]]></category>
		<category><![CDATA[Java Content Repository]]></category>
		<category><![CDATA[JCR]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=1982</guid>
		<description><![CDATA[All Content Management Systems (CMS) provide the capability for users to search the content and browse the result. However, commonly this functionality turns out to be insufficient. This can be either because you want to allow users to search over multiple sources (the content repository, but also some external system) and combine the result. Or [...]]]></description>
			<content:encoded><![CDATA[<p>All Content Management Systems (CMS) provide the capability for users to search the content and browse the result. However, commonly this functionality turns out to be insufficient. This can be either because you want to allow users to search over multiple sources (the content repository, but also some external system) and combine the result. Or because you want to offer your users more advanced search functionality like &#8220;Did you mean&#8230;&#8221; functionality or facetted navigation. Therefore, you might want to consider using an advanced, open source search solution like Apache Solr. This blog post is the first in a serie that will introduce searching different CMS solutions using Apache Solr.</p>
<p><span id="more-1982"></span><span style="font-size: large;"><strong>Introduction</strong></span></p>
<p>To finish the last part for my Bachelor of ICT, I recently started my internship at JTeam. JTeam has been doing a lot of projects using either a CMS (e.g. Magnolia and Hippo) or requiring a search solution, typically using Apache Lucene and Apache Solr. My assignment is to investigate the problem of making the information from different CMS search-able using Solr and hopefully come up with a good solution.</p>
<p>In order for the content in a CMS to be available for searching, it needs to be indexed by Solr. Problem is that many of the Java-based CMS solutions do not provide an easy way to get their content to Solr. We need to get the information out of the CMS and feed it to Solr. In this blog post I will discuss the synchronization problem at a more generic level and look at some possible solutions.</p>
<p><span style="font-size: large;"><strong>Synchronization problem</strong></span></p>
<p>Before diving into CMS solutions specifically, let&#8217;s look at a more classic problem: the data replication problem. When we get information from a specific source and replicate it into another system, the data may get out of date when changes in the original source are not immediately propagated. This is a common problem, that has been studied extensively. Junghoo Cho and Hector Garcia-Molina studied this problem with the focus on Web Crawlers. They started their study by looking how to measure the problem and defined the freshness and age of a database.</p>
<blockquote><p>Intuitively, we consider a database ”fresher” when the database has more<br />
up-to-date elements. For instance, when database A has 10 up-to-date elements out of 20 elements, and when database B has 15 up-to-date elements, we consider B to be fresher than A. Also, we have a notion of ”age”: Even if all elements are obsolete, we consider database A ”more current” than B, if A was synchronized 1 day ago, and B was synchronized 1 year ago.</p></blockquote>
<p>With this notion of Freshness and Age we can look at the potential quality of the solutions to this problem.</p>
<p><span style="font-size: medium;"><strong>Possible ways</strong></span></p>
<p>Basically, there are three ways of replicating the content in a CMS to a search solution like Apache Solr:</p>
<ol>
<li>Look at the CMS&#8217;s data to see what has changed</li>
<li>Listen to the CMS and hear what has changed</li>
<li>Let the CMS update Solr</li>
</ol>
<p>The first option, looking at the CMS&#8217;s data to see what has changed, can be done by creating a crawler. The crawler will iterate over all the content in the CMS and compare the last known version with the current version of the content in the CMS. Such a crawler is typically implemented by repeating the following steps:</p>
<ul>
<li>Put all the known elements in a Queue</li>
<li>While the Queue is not Empty</li>
<ul>
<li>Take an element from the Queue</li>
<li>Check if the element has changed</li>
</ul>
</ul>
<p>This is fairly straightforward to implement. Luckily, the same study shows there is little to gain in trying to optimize the order in which we look at the elements, or the amount of resources we use for specific elements (e.g. the rate at which specific elements are checked). However, in order to keep your Solr data fresh and young, a crawler will need more and more resources when the CMS grows.</p>
<p>The second option, listening for changes in the CMS, would be better. In order to listen for changes the CMS should push the changes to external systems. A push, or observation, mechanism can be implemented in one of two ways: asynchronous or journaled. The simplest form, asynchronous observation, will send a notification on every change that occurs in the CMS. Getting notified of the change and being able to incorporate that change will give us a fresh and young Solr index. Although asynchronous observation looks perfect at first sight, it has two flaws: it doesn&#8217;t guarantee that all changes are actually sent and does not guarantee the right ordering of the changes. Missing a change, or getting two updates in the wrong order, leaves an element out-of-date, and the Solr index not fresh.</p>
<p>Journal based observation solves the problems of asynchronous observation, as it can guarantee complete information and ordering. Instead of being informed on every change, we will get a list, a journal, of changes since the last time we checked. Using journal based observation, the age is only influenced by the time it takes to process the changes and the freshness by the amount of changes in each journal.</p>
<p>The third option, let the CMS update Solr, is obviously the most ideal. Instead of crawling or observing the CMS and update Solr when something changes, the CMS could directly propagate all changes to Solr. The age and freshness of Solr would be perfect, as changes to data can be made available at the same time to both the CMS and Solr. Having a CMS that is built using an event-driven, highly decoupled architecture, for instance using a CQRS framework, like <a href=http://code.google.com/p/axonframework/">Axon Framework</a>, would make extending the CMS with the capability for updating Solr very easy. However, currently no Java based CMS provides this capability out of the box.</p>
<p><span style="font-size: medium;"><strong>Conclusion</strong></span></p>
<p>Trying to solving the problem of making the information from different CMS solutions search-able using Solr gave some interesting insights in the classic data replication problem. Using Age and Freshness as a guideline, letting the CMS update Solr looks like the best solution, but is currently not usable when looking at different CMS solutions. Both asynchronous observation and crawling have up and downsides, but seem to be the most generic solution. Journaled observation, when looking at the Age and Freshness and different CMS systems, looks the most promising.</p>
<p>In the next blog posts as part of this series, I will go into more detail of actually implementing this integration for a number of widely used CMS solutions like Magnolia and Hippo. Stay tuned&#8230;</p>
<p><span style="font-size: medium;"><strong>References</strong></span></p>
<p><a href="http://oak.cs.ucla.edu/~cho/papers/cho-tods03.pdf">Effective Page Refresh Policies For Web Crawlers</a><br />
<a href="http://www.axonframework.org">Axon framework</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/03/31/searching-your-java-cms-using-apache-solr-introduction/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Language analysis comparable to Fast / Endeca for Solr</title>
		<link>http://blog.jteam.nl/2010/03/30/language-analysis-comparable-fast-endeca-available-solr/</link>
		<comments>http://blog.jteam.nl/2010/03/30/language-analysis-comparable-fast-endeca-available-solr/#comments</comments>
		<pubDate>Tue, 30 Mar 2010 08:15:21 +0000</pubDate>
		<dc:creator>Martijn van Groningen</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://blog.jteam.nl/?p=1892</guid>
		<description><![CDATA[Good, solid language analysis is a very important asset for the quality of your search results. It is one of the features that for instance Microsoft Fast and Endeca are using as one of their unique selling points. However, you can get the same powerful analysis when using Apache Solr to implement your search. The [...]]]></description>
			<content:encoded><![CDATA[<p>Good, solid language analysis is a very important asset for the quality of your search results. It is one of the features that for instance Microsoft Fast and Endeca are using as one of their unique selling points. However, you can get the same powerful analysis when using Apache Solr to implement your search.</p>
<p><span id="more-1892"></span><br />
The thing is that both Ms Fast and Endeca did not implement their language analysis themselves. They use an existing, commercial solution called the Rosetta Linguistics Platform (RLP) provided by <a href="http://basistech.com/">Basis Technologies</a> under the hood to provide their sophisticated language analysis capabilities. This is a good thing, as RLP also provides integration components for Apache Solr. This allows anyone using Solr to easily plug in RLP advanced language capabilities into their solution.</p>
<p><strong>What is RLP?</strong><br />
The Rosette Linguistics Platform (RLP) is a commercial solution that allows you to perform linguistic analysis of text in many languages (English as well as dozens of major European, Asian, and Middle Eastern languages). Besides that RLP also supports advanced entity extraction capabilities, base noun detection, sentence boundary detection and even part of speech tagging. However, as RLP is a commercial product it comes with a price tag.<br />
However, we feel that RLP has a lot of potential when you are in need of sophisticated language capabilities. That&#8217;s why we want to show you how to integrate RLP into your solution based on Apache Solr.</p>
<p><strong>Installing the RLP platform for Solr</strong><br />
RLP and Solr are two separate systems. Integration between Solr and RLP is quite straight forward. Basis provides extensive documentation on how to setup the RLP on your machine. With the RLP configuration you can customize the actual language analysis. This is not done directly via Solr. Installing and configuring RLP is explained in RLP documentation that is included with RLP bundle.</p>
<p>Configuring Solr to use RLP for analyzing your documents is also relatively easy to setup. In order to use RLP&#8217;s language analysis in Solr you have to configure the <code>RLPTokenizerFactory</code> as tokenizer for the specific fields you want to use it on. As RLP is not written in Java, JNI is used to integrate a Java application with RLP. In order for Solr to integrate with RLP you must set certain environment variables that point to the RLP installation. Basis has quite extensive documentation for the Solr integration with RLP as well, which is included in the RLP Solr bundle.</p>
<p><strong>RLP&#8217;s stemming capabilities</strong><br />
Terms like verbs, nouns and adjectives appear in different forms. When searching the different forms of a term, it can result in a miss. For example if you search for the term customize, but the token in the index is customization, your are unlikely to find results with this term. The solution to this is <a href="http://en.wikipedia.org/wiki/Stemming">stemming</a>. Stemming reduces a term to a base form. There are many ways to do this but the most common way is <a href="http://en.wikipedia.org/wiki/Stemming#Suffix_Stripping_Algorithms">to remove the terms&#8217; suffix</a> according to some basic rules. In our case customize and customization would be stemmed to customiz. Stemming is usually applied during indexing and during searching (stemming the search query). So whether we search for customization or customize does not matter, for both terms we&#8217;ll get the same result. Another example is go and going, both forms are stemmed to go.</p>
<p>In all languages there are terms that are irregular. For example the verb to buy. The present tense is buy and the the past tense is bought. Using a suffix based stemmer, will not create a common base form for the tokens. The create a proper base form the stemmer needs to have some basic knowledge of a language. Stemmers who do this usually use <a href="http://en.wikipedia.org/wiki/Stemming#Lemmatisation_Algorithms">lemmatisation</a>. Using this algorithm buy and bought would both be stemmed to buy. Lemmatisation based stemmers will stem more terms to a common base form and therefore increase findability of documents in your index.</p>
<p>RLP&#8217;s stemming capabilities fall into the last category of stemming algorithms and as you can see in the table below the stemming is quite powerful.</p>
<table>
<tr>
<th>Unstemmed token</th>
<th>Stemmed token</th>
</tr>
<tr>
<td>mice</td>
<td>mouse</td>
</tr>
<tr>
<td>mouse</td>
<td>mouse</td>
</tr>
<tr>
<td>been</td>
<td>be</td>
</tr>
<tr>
<td>is</td>
<td>be</td>
</tr>
</table>
<p><strong>Advantages of using RLP</strong><br />
Why would you use RLP in conjunction with Solr? RLP provides you with really powerful language analysis. Solr is a extensible open-source search engine. By integrating RLP with Solr, RLP will compliment the language analysis provided by Solr and increase the quality of your search results. As seen in the previous table, the better stemming capabilities increases the likelihood for relevant documents to be found.</p>
<p>If you consider using RLP for your project or solution, please feel free to <a href="http://www.jteam.nl/contact.html">contact us</a>, so we can help you both to make a decision as well as help you implement it.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.jteam.nl/2010/03/30/language-analysis-comparable-fast-endeca-available-solr/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
