Archive for the ‘Enterprise Search’ Category

Mahout – Taste :: Part Three – Estimators

July 8th, 2010 by Frank Scholten
(http://blog.jteam.nl/2010/07/08/mahout-%e2%80%93-taste-part-three-%e2%80%93-estimators/)

In Taste, estimators are the bridge between the generic item- or user recommendation logic and the specific similarity algorithm. Estimators are mainly used as part of the recommendation process, however, they are also used for evaluating recommenders. Additionally, the ‘recommended because’ feature is also powered by an estimator. This blog covers some Taste internals and shows you how estimators are used within Taste via a few code samples.

Read the rest of this entry »

Mahout – Taste at Lucene Eurocon and Berlin Buzzwords

July 1st, 2010 by Frank Scholten
(http://blog.jteam.nl/2010/07/01/mahout-taste-at-lucene-eurocon-and-berlin-buzzwords/)

A little while ago, I was delighted to present two introductory Mahout – Taste talks, at Lucene Eurocon and Berlin Buzzwords. I received quite a lot of good feedback about the presentations and have been asked by a few attendees to post them.

If you’re one of those attendees or you missed the presentation, you can download the slides here:

At Lucene Eurocon, the first European conference on Lucene and Solr there were interesting presentations, ranging from practical relevance to language analysis. For me it was fun to give a practical presentation about recommendations as a complementary feature to search applications. I hope you find the presentation useful if you’re trying to work out how to build a recommender – I used the movielens dataset as an example in the presentation and based the code on my earlier ‘getting started’ blog.

I also really enjoyed doing the Berlin Buzzwords presentation and meeting up with people from the Mahout community and other attendees. This conference focused mainly on NoSQL, scalability and Hadoop. However, from my talks with people there I sense that there’s growing interest in Mahout. You should find the presentation useful if you want to know more about different algorithms and how to evaluate them. I will blog about this topic in more detail soon.

Until then, I’d love to hear some feedback on what you think of the presentations!

Introduction to Lucene Connectors Framework – Part 1

April 16th, 2010 by Ralph Benjamin Ruijs
(http://blog.jteam.nl/2010/04/16/introduction-to-lucene-connectors-framework-part-1/)

In my previous blog, Searching your Java CMS using Apache Solr: Introduction, I looked at how to synchronize the information in a Java CMS with a Solr index. This blog is an introduction to the Lucene Connectors Framework, a crawler framework I will use to solve the problem of making the information from a Java CMS search-able using Solr. I will show you how to build, deploy and get it running as a web crawler. In part 2 of this introduction I will extend LCF with a new Connector.

The Lucene Connectors Framework, an incubator project at Apache, provides a framework for connecting a source content repository to target repositories or indexes, such as Apache Solr. Last month the Lucene Connector Framework published their first build-able sources.
Read the rest of this entry »

Mahout – Taste :: Part Two – Getting started

April 15th, 2010 by Frank Scholten
(http://blog.jteam.nl/2010/04/15/mahout-taste-part-two-getting-started/)
This blog is a ‘getting started’ article and shows you how to build a simple web-based movie recommender with Mahout / Taste, Wicket and the Movielens dataset from Grouplens research group at the University of Minnesota. I will discuss which components you need, how to wire them up in Spring, and how to create a Wicket frontend for displaying movies and their recommendations. Along the way I give some tips and pointers about developing a recommender. Additionally I show the ResourceDataModel, a Mahout DataModel implementation which reads preferences from a Spring Resource.
Read the rest of this entry »

State of Solr

April 14th, 2010 by Chris Male
(http://blog.jteam.nl/2010/04/14/state-of-solr/)

What happened to Solr 1.5? what is Solr 3.1? and what about Solr Cloud? In the last few months, there have been many changes to Solr that can leave users confused about which version to use, what features each version provides, and when (and if) they will be released. This blog entry will try to clarify the State of Solr.
Read the rest of this entry »

Enterprise Search using Solr and Lucene

April 1st, 2010 by Bram Smeets
(http://blog.jteam.nl/2010/04/01/enterprise-search-using-solr-and-lucene/)

The Enterprise Search market has long been dominated by commercial vendors and their products (e.g. Autonomy and Fast). We at JTeam feel that this era is finally over. At least for certain customers and requirements, there is finally a good Open Source alternative: Apache Solr, which is the Enterprise Search server based on Apache Lucene. In this blog post we’ll give our view on enterprise search and explain how Lucene and Solr can help you realize your projects.

Read the rest of this entry »

Searching your Java CMS using Apache Solr: Introduction

March 31st, 2010 by Ralph Benjamin Ruijs
(http://blog.jteam.nl/2010/03/31/searching-your-java-cms-using-apache-solr-introduction/)

All Content Management Systems (CMS) provide the capability for users to search the content and browse the result. However, commonly this functionality turns out to be insufficient. This can be either because you want to allow users to search over multiple sources (the content repository, but also some external system) and combine the result. Or because you want to offer your users more advanced search functionality like “Did you mean…” functionality or facetted navigation. Therefore, you might want to consider using an advanced, open source search solution like Apache Solr. This blog post is the first in a serie that will introduce searching different CMS solutions using Apache Solr.

Read the rest of this entry »

Language analysis comparable to Fast / Endeca for Solr

March 30th, 2010 by Martijn van Groningen
(http://blog.jteam.nl/2010/03/30/language-analysis-comparable-fast-endeca-available-solr/)

Good, solid language analysis is a very important asset for the quality of your search results. It is one of the features that for instance Microsoft Fast and Endeca are using as one of their unique selling points. However, you can get the same powerful analysis when using Apache Solr to implement your search.

Read the rest of this entry »

Spatial Solr Plugin 1.0-RC4

March 30th, 2010 by Chris Male
(http://blog.jteam.nl/2010/03/30/spatial-solr-plugin-1-0-rc4/)

I am pleased to announce the latest release of our Spatial Solr Plugin, v1.0-RC4. This release is a backwards compatible with RC3, and contains the following changes:

  • PDF documentation has been improved to remove inconsistencies in request parameter and source code package names
  • SpatialFilter now includes hashCode and equals implementations, facilitating storage of the filter in caches

JTeam’s Solr Spatial Plugin (SSP) is a standalone plugin that provides efficient and extensible spatial search support to both Solr and Lucene. You can find more information about the plugin here

Spatial Lucene 2.0

December 31st, 2009 by Chris Male
(http://blog.jteam.nl/2009/12/31/spatial-lucene-2-0/)

In a number of blog entries we have spoken about the spatial search functionality that we have been developing here at Jteam. In the last two weeks, I have had a chance to contribute much of this work back to the Apache Lucene project with the goal of furthering the development of Lucene’s open source spatial search support. If you want to dive immediately into the code, then jump to LUCENE-2139, if you want more details, then read on.

Read the rest of this entry »