Publishing “The Sesame Lucene Sail: RDF Queries with Full-text Search”

We have written a Technical Report on our integration of Sesame2 with Lucene.

Enrico Minack, Leo Sauermann, Gunnar AAstrand Grimnes, Christiaan Fluit, Jeen Broekstra: The Sesame Lucene Sail: RDF Queries with Full-text Search.
download PDF (alternate link)

For short:
PREFIX search:
SELECT ?x ?score ?snippet WHERE {?x search:matches ?match.
?match search:query “person”;
search:score ?score;
search:snippet ?snippet. }

With the growth of the Semantic Web, the requirements on storing and querying RDF has become more sophisticated. When a larger amount of data has to be managed, queries in structured query languages, such as SPARQL, are not always powerful enough. Use of additional keywords for querying can further reduce the result set towards the actual relevant answers, however, SPARQL only provides complete string matching or filtering based on regular expressions, which is a very slow operation. In contrast, state of the art Information Retrieval (IR) techniques provide sophisticated features such as keyword search, lemmatisation, stemming and ranking. In this paper we present a combination of structured RDF queries and full-text search. It is implemented as an extension of an established RDF store (Sesame) with IR capabilities using the text search library Lucene, without requiring modifications to existing RDF query languages.

(in these files you find all my publications, including this one)
bibtex / rdf

The implementation lives here: