headlines: can URIs be ambigous – democracy prevails!

There is a question related to the semantic web, and this question is – will it be a centralized, dictated system or open? Is there “one weird standard to rule us all?”

Update: Roy Fielding, who motivated to write me this post yesterday, answered in a comment and I reconsidered my post, rewriting it (29.2.2008). Updates are Italic, deleted text striked.

The fundamental question as such – is the semantic web a controllable system or a distributed (more chaotic) structure – shows up in different manifestations. I interpret the question of unambigous URIs – one URI for one concept, not multiple – as subtopic of this.

As you could guess already, the answer is no. The Semantic Web is as free, open, uncontrolled, unreliable as the web today, but with more features.

Roy Fielding said (actually cited) that one the Semantic Web’s goal is to unambigously identify resources. He also cited another quote by Tim Berners Lee:
I don’t want the Web to constrain what people do: the Web
is not there to constrain society. It’s there to model society
in its completeness, in its entirety. [Tim Berners-Lee, 1994]

What does this mean? Unambigous means when you talk about the Tesla Car, you must always use the same identifier (in our case, a URI) to refer to it. As could be expected, this idea is not a requirement of the semantic web and not practically required nor used much. Some people state it as a nice scientific goal, but deployers don’t have to care about it as the W3C recommendation has something else to say.

Instead, people continue to say things about the world in blogposts and wikipedia and elsewhere as always, minting new URIs for things as they want. In the Semantic Web, the standard tags “rdfs:seeAlso” and “owl:sameAs” are then used to link the different views about the same thing, or the Tesla. If you want to neen non-ambiguity, perhaps use sindice (or any other semantic web search engine). Horray, freedom of expression and scalability prevails.

And yes, the Semantic Web is already there, for example on openlinkeddata, or on GoPubMed. So maybe Roy’s statement “the semantic web will never happen” indicates Roy is living in the past? We will see in the future…

Sorry Mr Fielding, this sarcasm now rebounds to myself, I was wrong, you are right in citing both positions.

DFKI at CeBit 2008

I will be presenting NEPOMUK at the CeBit 2008 next week,
at Booth B37 Hall 9.

I will be there on 4th and 5th March, if you want an appointment, please phone me beforehand (I won’t read mail that much).
(germany) 0176 24548974

here is the full press release in German:

DFKI auf der CeBIT 2008

Das DFKI ist auf der CeBIT 2008 (04.03. – 09.03.2008) im Rahmen des CeBIT-Konzepts “future-parc” mit einem eigenen Messestand vertreten. Der Stand B37 in Halle 9 umfasst eine Fläche von 72qm und befindet sich in unmittelbarer Nachbarschaft zum Stand des Bundesministeriums für Bildung und Forschung, BMBF (Halle 9, Stand B40). Darüber hinaus präsentiert das DFKI Exponate auf einem weiteren DFKI-Stand, dem DFKI – John-Deere-Stand (Halle 9, Stand C07), auf dem Stand des BMBF und auf dem Gemeinschaftsstand der Universität des Saarlandes (Halle 9, Stand B35).

Exponate auf dem Haupt-Stand des DFKI (B37)

Exponate des FB Bildverstehen und Mustererkennung

  • OCRopus – Open Source Texterkennung
  • InViRe – Intelligentes Video Retrieval

Exponate des FB Intelligente Benutzerschnittstellen und FB Sprachtechnologie

  • BabbleTunes – Sprechen Sie mit Ihrem iPod
  • i2home – Mobiler multimodaler Zugang zum digitalen
    Zuhause für alle
  • Ideas for Games – A.I. Poker im Casino Virtuell

Exponate des FB Wissensmanagement

  • Nepomuk – The Social Semantic Desktop
  • ALOE – A Socially Aware Resource and Metadata Hub
  • iDocument – Intelligent document information extraction
  • Eye-Book – Multimediales Lesen

Exponate des FB Sichere Kognitive Systeme und FB Robotik

  • SAMS – Sicherungskomponente für Autonome Mobile Systeme
  • Robotik Videos

Expoante des FB Institut für Wirtschaftsinformatik im DFKI

  • Pipe – Hybride Wertschöpfung im Maschinen- und
    Anlagebau
  • R4eGov – Organisationsübergreifende Zusammenarbeit von öffentlichen Verwaltungen

Exponate des FB Deduktion und Multiagentensysteme

  • CASCOM – Intelligente Dienstagenten für medizinische Notfalleinsätze
  • MAS-Dispo XT – Multiagententechnologie in
    der Stahlproduktion
  • Scallops – Secure Agent-Based Pervasive Computing

DFKI – John Deere – Stand (C07)

Auf dem DFKI-Stand C07 präsentiert das DFKI in Kooperation mit John Deere das Projekt IVIP im Kontext “Green IT”.

Exponat des FB Wissensmanagement

  • IVIP – Intelligente Vernetzung verteilter Informationsquellen zur
    betriebs- und standortspezifischen Planung der Energiepflanzenerzeugung

Das BMBF stellt auf seinem Stand B40 das Förderprogramm der Bundesregierung “IKT 2020” vor.

DFKI-Exponate:


Zentrum für Mensch-Maschine-Interaktion

  • Smartfactory – die
    intelligente Fabrik der Zukunft

FB Intelligente Benutzerschnittstellen

  • SoKNOS – Service-orientierte Architekturen zur Unterstützung
    von Netzwerken im Rahmen öfentlicher Sicherheit


FB Robotik

  • SentryBot – Ein Autonomes, Kooperatives Mehrrobotersystem für Sicherheit und Objektschutz


KWT-Stand (B35)

Forschungsbereich Deduktion und Multiagentensysteme

  • Verisoft_XT (Tom in der Rieden)
  • ActiveMath (Erika Melis)

Hallenplan der Halle 9

Hallenplan_klein

Einen detaillierten Hallenplan
im PDF-Format können Sie
hier herunterladen.

Publishing “The Sesame Lucene Sail: RDF Queries with Full-text Search”

We have written a Technical Report on our integration of Sesame2 with Lucene.

Enrico Minack, Leo Sauermann, Gunnar AAstrand Grimnes, Christiaan Fluit, Jeen Broekstra: The Sesame Lucene Sail: RDF Queries with Full-text Search.
download PDF (alternate link)

For short:
PREFIX search:
SELECT ?x ?score ?snippet WHERE {?x search:matches ?match.
?match search:query “person”;
search:score ?score;
search:snippet ?snippet. }

Abstract:
With the growth of the Semantic Web, the requirements on storing and querying RDF has become more sophisticated. When a larger amount of data has to be managed, queries in structured query languages, such as SPARQL, are not always powerful enough. Use of additional keywords for querying can further reduce the result set towards the actual relevant answers, however, SPARQL only provides complete string matching or filtering based on regular expressions, which is a very slow operation. In contrast, state of the art Information Retrieval (IR) techniques provide sophisticated features such as keyword search, lemmatisation, stemming and ranking. In this paper we present a combination of structured RDF queries and full-text search. It is implemented as an extension of an established RDF store (Sesame) with IR capabilities using the text search library Lucene, without requiring modifications to existing RDF query languages.

Bibtex
(in these files you find all my publications, including this one)
bibtex
bibtex / rdf

The implementation lives here:
http://dev.nepomuk.semanticdesktop.org/wiki/LuceneSail

call for tutorial proposals at KI 2008

Submit your proposal, this conference is in my hometown and if you come, we can drink a beer. Submit your proposal!

KI 2008 — 31st German Conference on Artificial Intelligence
23 – 26 September 2008, Kaiserslautern, Germany

http://ki2008.dfki.uni-kl.de

CALL FOR TUTORIAL PROPOSALS

KI-2008 will include a small number of tutorials, to be held the day
before the technical program starts. Tutorials will be free of charge
for conference participants. We aim at a small number of high-quality
tutorials suitable for a large percentage of conference participants,
including graduate students as well as experienced researchers and
practitioners.

Contents

Tutorials should give a comprehensive, in-depth perspective on
innovative AI methods or technologies that have an obvious potential for
research and/or application and are not covered by typical AI textbooks.
We especially encourage tutorial proposals covering cognitive aspects of
AI. Do not hesitate to contact the conference or tutorial Chairs if you
are in doubt about the suitability of a particular topic for the
purpose. Both full-day (6 hours) and half-day (3 hours) tutorials are of
interest.

Technicalities

KI-2008 tutorials will be held on September 23, 2008. They should
preferably be given in English. The revenue for tutor(s) consists of
free participation at the KI-2008 conference, plus reasonable travel
support if necessary. Teaching material will be printed and distributed
by the conference organization.

How to Propose a Tutorial

Proposals must contain the necessary information to judge the
importance, quality and community interest in the proposed tutorial as
well as the expertise of its tutor(s). Proposals must include
descriptions of the tutorial topic, goals, the intended audience, an
outline of the contents, and brief CVs of the tutor(s), including their
expertise and teaching experience in the field, and the intended length
(half- or full- day). Proposers are encouraged to include excerpts of
material from recent teaching about the proposed topic as an annex of
their submission, if available.
Proposal texts should be submitted by e-mail to the Tutorial Chair in
plain text format. Annexes may be sent as .pdf, .ps, .ppt, or .doc
format.

Important Dates

Proposal deadline: Mar 28, 2008
Acceptance Notification: May 30, 2008
Camera-ready teaching material due: July 2, 2008
Tutorial: Sep 23, 2008

KI-2008 Tutorial Chair

Prof. Dr. Frank Bomarius
frank.bomarius (at) iese.fraunhofer.de

Author needed for German Article on Semantic Web

The german magazine createordie.de is covering the topic “next generation web” in an upcoming issue and searching for an author for the topic of Semantic Web. (the rest in german).

Ich wurde vom Verlag gefragt, ob ich selber einen Artikel schreiben würde, leider habe ich aber momentan zu wenig Zeit (wir haben bald Projekt-Review) um einen guten Artikel zu schreiben. Aber der eine oder andere Leser ist ja selbst Autor.

Konkret sind für den Artikel vier Seiten eingeplant, eine Seite fasst 4.000 Zeichen sowie zwei Bilder, insgesamt also 16.000 bis 17.000 Zeichen sowie 8 bis 10 Bilder. Deadline ist der 25. Februar 2008. Der Verlag honoriert den Text, nicht zu verwechseln mit Wissenschaftlichen Publikationen.

Falls du Semantic Web Experte bist, und bereits Artikel veröffentlicht hast, bitte wende dich direkt an den Chefredakteur Felix Schrader, Kontaktdetails bei createordie. Wie immer, es geht um die öffentliche Sicht aufs Semantic Web, da ist es wichtig ein gutes Bild zu bringen.

Google Data API and GData

Since 2006, google collects API programming interfaces in the Google Data (gdata) project. At their website, you find links to google docs, calendar, spreadsheet, youtube, and more.

http://code.google.com/apis/gdata/

It is a one-stop place to find interfaces for the various google services. For Semantic Web developers, it is also a good overview how google shapes its interfaces to its web-based applications. Get inspired by the pros.

Especially the GData protocol and data format is worth a look. It’s a generic API for getting and querying data, based on RSS 2.0 and Atom.
In the GData reference, you find a description of the Atom extensions and a simple query-format extending it.

Assuming a feed is hosted at the URI http://www.example.com/feeds/jo, then elements within the feed can be queried with the following URI:

http://www.example.com/feeds/jo?q=Darcy&updated-min=2005-04-19T15:30:00

A kind of “easy going sparql”.

In their own words:
The Google data APIs provide a simple standard protocol for reading and writing data on the web.

These APIs use either of two standard XML-based syndication formats: Atom or RSS. They also have a feed-publishing system that consists of the Atom publishing protocol plus some extensions (using Atom’s standard extension model) for handling queries.

Many Google services support the Google data API protocol.

Combining Rules with SPARQL

a recent blog-post by Dan Brickley reminded me that we have a Jena-Rule engine augmented with SPARQL dusting our shelves. Its years old, but may be interesting for you.

We augmented Jena’s rules by adding sparql.
The passing of parameters is easy, you just have to use the variables of the rules. Within Jena rules, you can always express graphs using N-Triple Axioms, so its also possible to write RDF files.
Only caveat: no quads.

code is in this folder
http://gnowsis.opendfki.de/browser/tags/0.8.3-alpha/gnowsis_retrieval/WEB-INF/src/org/gnowsis/retrieval/RuleQuerySparql.java

download SVN URI:
http://gnowsis.opendfki.de/repos/gnowsis/tags/0.8.3-alpha/gnowsis_retrieval/WEB-INF/src/org/gnowsis/retrieval/RuleQuerySparql.java

documentation:
http://gnowsis.opendfki.de/browser/tags/0.8.3-alpha/gnowsis_retrieval/doc/help.html

Here is a snippet of that documentation for you:

Inference

After the results have been gathered, inference rules are evaluated against the results.
This means, that you can define rules on which new information is generated
based on a declarative syntax described in the
Jena Rule Engine DOC.
An example file for these rules can be found in the source, at best directly in SVN here:
retrieval.rules.txt.
You can use the existing Jena rules and a special rule that was created by Leo Sauermann to
load additional triples from the gnowsis. This query-rule is defined as follows:

# Load triples from the search storage by a triple patter.
# The search storage is crawled by gnowsis IF you enabled it in the
# configuration and have crawled a few datasources. If not, this query
# will return nothing.

queryStorage(?subject, ?predicate, ?object, 'search')

# ?subject, ?predicate, ?object: a triple pattern. 
# Leave one of the empty (= a unbound variable like ?_x) and it will try to match 
# the empty thing as a wildcard. The variables are not bound in the pattern and
# cannot be used in the same rule. You have to write additional rules to work
# on the queried triples. 

usage example:
# load all project members and managers
(?project rdf:type org:Project) ->
queryStorage(?project, org:containsMember, ?_y, 'search'),
queryStorage(?project, org:managedBy, ?_z, 'search').

If you want to bind the variables and use them: It is not possible. See the
statement of the Jena developers
about this. But this is not a big problem, you can work around it easily.

Debugging Inference

If you want to tweak your inference rules and don’t want to have gnowsis run the query at all times,
you can use our built-in inference debugging tricks.

  • first: when you run the query for debugging, click the ‘rerun only rules’ link at the bottom of your search
  • second: open the inference file by clicking ‘edit rules file’
    (also found at the bottom of each search result)

the first stage brings you into a query mode, where pressing “reload” in the browser
just does the inference and the clustering, but not the search itself. This speeds up
your debugging of inference rules. You will only spot the difference in the addressbar
of the browser, which now contains something like http://…/retrieval?cmd=runrules&query=YourQuery

Also, note that syntax errors in your inference code will be logged to the gnowsis
logging system. This is either the message window, pane ‘org.gnowsis.retrieval.RetrievalService’
or your file logging in ~/.gnowsis/data/… .
You will not see syntax errors in your query results, sorry.

Inference and SPARQL combined

SPARQL reference

You can also use SPARQL queries to refine and expand the search results.
The basic syntax is to run a SPARQL query in the head of a rule, the first argument
is the query, escaped with ”, the following arguments are variables that will
be used in the query. The passed variables are interpreted as named variables
in the SPARQL query, named ?x1 ?x2 ?x3, etc.

Example for querySparql:

(?a ?b ?c) ->
querySparql('
 PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
 CONSTRUCT   { ?x1 rdfs:label ?label }
 WHERE       { ?x1 rdfs:label ?label }
')
', ?c).

The variable named ?x1 will be replaced with the value of ?c.

Note the
following tips:

  • literals are escaped using the \’text\’ markup.
  • All arguments passed after the query will be bound into the query using names ?x1, ?x2, …
  • querySparql can only be used in the head of rules.
  • Attention: if you are querying the gnowsis biggraph, you have to add the graph-name to your
    sparql queries.
  • Try out your queries on the debug interface before you use them.
  • Only ‘construct’ queries are supported, not select or describe.
  • Namespace prefixes: inside the SPARQL query, you can use the namespace prefixes defined in the rule file

An example to do so is given now, the task here is to retrieve the members of a project if a project
was in the result.

#note that these namespace prefixes are available in the sparql query
@prefix skos: <http://www.w3.org/2004/02/skos/core#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix owl:  <http://www.w3.org/2002/07/owl#>.
@prefix retrieve: <http://www.gnowsis.org/ont/gnoretrieve#>.
@prefix tag: <http://www.gnowsis.org/ont/gnoretrievetag#>.

# get members with SPARQL
# note the special namspace defined inside
(?hit retrieve:item ?project),
(?project rdf:type org:Project) ->
querySparql('
PREFIX org: <http://km.dfki.de/model/org#>
CONSTRUCT   { 
  ?x1 org:containsMember ?m. ?m rdfs:label ?labelm. ?m rdf:type ?typem.
  ?x1 tag:todoRelateHitTo _:hit .
  _:hit rdf:type retrieve:InferedHit .
  _:hit retrieve:item ?m .
  _:hit retrieve:textSnippet \'member of project\'.
}
WHERE       { graph ?g {
  ?x1 org:containsMember ?m. ?m rdfs:label ?labelm. ?m rdf:type ?typem.
} }
', ?project).

# make the missing relations to the hits
# this is needed because you cannot pass blank nodes into the SPARQL engine.
(?item tag:todoRelateHitTo ?tohit),
(?hit retrieve:item ?item) ->
(?hit retrieve:related ?tohit).
run a sparql query, replacing placeholders (?x1, ?x2, ...) in the query with the passed arguments,
arguments have to be bound. Only 'construct' queries are supported.
 
# retrieve a test sparql
[test:
(http://xmlns.com/foaf/0.1/ ?x ?type)
-> querySparql('
CONSTRUCT   { ?x1 rdfs:label ?label }
WHERE       { ?x1 rdfs:label ?label. FILTER (?x1 = foaf:name) }
')
]

# retrieve with param - bind ?ont to the foaf ontology, it is called x1 in the query
[testbind:
(?ont rdf:type owl:Ontology)
-> querySparql('
CONSTRUCT   { ?p rdfs:isDefinedBy ?x1. ?p rdfs:label ?label. }
WHERE       { ?p rdfs:isDefinedBy ?x1. ?p rdfs:label ?label. }
', ?ont)
]


# example for gnowsis. note the use of "....{ graph ?g { ...."
[test:
(http://xmlns.com/foaf/0.1/ ?x ?type)
-> querySparql('
CONSTRUCT   { ?x1 rdfs:label ?label }
WHERE       { graph ?g {  ?x1 rdfs:label ?label. FILTER (?x1 = foaf:name) } }
')
]

Artikel in “Entwickler Magazin” 2008.1

For the german audience, “Entwickler Magazin” hat in Ausgabe 2008.1 einen Artikel von mir über den Semantic Desktop veröffentlicht.

Cover "Entwickler" Ausgabe 2008.1

In vier Seiten wird dort erklärt, was die Grundlagen von Semantic Web und Semantic Desktop sind, und ein paar links auf Projekte gegeben.
Zu haben um € 6,50 im Zeitschriftenhandel in Deutschland/Österreich/Schweiz, eine der 20k Ausgaben können schon bald dein sein.

Einleitung:
Der Semantic Desktop macht den PC zum Denkwerkzeug. wink wink
Wir haben genug Platz, um all unsere E-Mails, MP3s, Photos, Videos und Dokumente am Desktop zu speichern. Das Problem ist, diese Information zu verwalten. Dateisysteme bieten nur starre Hierarchien an. Tim Berners-Lee und das W3C haben bereits weiter gedacht: Menschen denken in Konzepten, das Semantic Web bietet mit RDF und Ontologien einen auf HTTP, URIs und HTML aufbauenden Standard zur Annotation und Suche. Der Semantic Desktop bringt Betriebssysteme und Anwendungen damit weg von den Dateien, auf die Stufe der Gedanken.

Um dahin zu kommen, zuerst ein kleiner Crash-Kurs zum Thema Semantic Web,…

Dataportability.org brings together google, plaxo, and facebook

A storm-in-a-waterglass gathering more and more momentum, dataportability.org welcomes new members. Before they were heating up the storm, now individual corporate representatives of Scoble, Plaxo, and Facebook, are sitting on one virtual table.

As announced here, blogged here, and then slashdotted, people working for some interesting ventures have today joined dataportability.org.

In the last weeks, for those who missed the event that Robert Scoble used an un-released app “pulse” from plaxo to gather his contacts from facebook and got blocked by facebook after this. He contacted them and after a while, was back in, but the problem is obvious: social websites, and the companies running them, have one capital on their stock: data created by us. As “we” were “man of the year” in Time, the data of such a celebrity is worth a lot.

Scoble joined dataportability.org (DP) and blogged this, which made me curious to also look at their site and add a few notes about how RDF and Semantic Web may help them out instead of creating their own standards.
Now that people working for Plaxo, Google, and Facebook join the already impressive list of individuals at dataportability, they can really talk about the mission
To put all existing technologies and initiatives in context to create a reference design for end-to-end Data Portability. To promote that design to the developer, vendor and end-user community.

My biggest fear was, that the standards created by DP were used-less as no big companies were present in their board (not like W3C, where nearly all big companies are onboard). This has changed now and I would expect that the effort indeed now is relevant to the future of the web.