In the most recent contribution to field collapsing I have improved the response format. The old format was not properly structured, the naming of the elements not self explanatory and in some situations the response was even flawed. From my opinion a better response format was necessary in order to improve the stability of the patch and to make parsing the response easier.
Read the rest of this entry »
Author Archive
Improved field collapse response
November 11th, 2009 by Martijn van Groningen(http://blog.jteam.nl/2009/11/11/improved-field-collapse-response/)
Result grouping / Field Collapsing with Solr
October 20th, 2009 by Martijn van Groningen(http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/)
In a number of search projects that I have done using Lucene and Solr there was a lot of almost identical data. From a user perspective, when searching the first result pages were full of documents that look very similar, for instance getting a full page of the same car model, where only the edition differs, when searching for a specific car brand. What actually is desired is to only show the different models. Then and only when a user is interested in a certain model, the user can view all the editions of the model by clicking on the result. We simply want to group our search result, based on some criteria. Although this is not support out-of-the-box with Lucene/Solr, luckily it is possible using a patch that I’ve created and contributed to Solr. This blog entry explains what result grouping (also known as field collapsing) is and how you can start using it in your own projects.
Introduction to Hadoop
August 4th, 2009 by Martijn van Groningen(http://blog.jteam.nl/2009/08/04/introduction-to-hadoop/)
Recently I was playing around with Hadoop, after a while I really recognized that this was a great technology. Hadoop allows you to write and run your application in a distributed manner and process large amounts of data with it. It consists out of a MapReduce implementation and a distributed file system. Personally I did not have any experience with distributed computing beforehand, but I found MapReduce quiet easily to understand.
In this blog post I will give an introduction to Hadoop by showing a relative simple MapReduce application. This application will count the unique tokens inside text files. With this example I will try to explain how Hadoop works. Before we start creating our example application we need to know the basics of MapReduce itself.