Leobard's blog

2005-08-062017-11-04

watched the terminal

just watched “the terminal”, movie directed by Steven Spielberg with Tom Hanks playing a guy stuck at NY airport for 9 months.

we checked a little and found the true story aspect at this nytimes article and a nice wikipedia entry about Merhan Karimi Nasseri, who still lives at Roisin Airport in Paris. The wikipedia entry says that this is still true on 17th July 2005, so probably the web keeps this story alive.

Funny thing, somebody has to have checked this and entered it in wikipedia. this alone is a weird fact, but also giving me a feeling of security: when I want to know something like this, many many other people write stuff down.

2005-08-042019-11-16

matatour 2005

finally we have photos of the matatour 2005. as already mentioned in http://leobard.twoday.net/stories/832666/ we were sailing the wide sea of croatia and these is a tiny collection of 400 photos from the 4GB we made in total.

2005-08-032017-11-04

sparql fast as hell

In the last two months we shifted the gnowsis search services to SPARQL. Our problem was, that the common ARQ implementation is slow and does not work in a named graph scenario.

A fine twist in history brought Richard Cyganiak to work at HP Labs for a while and he hacked sparql2sql there, a mapping of sparql to the jena database scheme with support for named graphs.

I asked him if I can use it in gnowsis and he was quite happy to have this as a test-case. Result is, that Richard added a special framework for fulltext search and I hacked some stuff regarding MySQL’s fine FULLTEXT index functions (if you don’t know it: its similiar to lucene, can do relevance and query expansion and so on)

outcome: an hour ago I transformed the SQL index to the new fulltext format and got hit in the face with the new answer times:

SPARQL of this kind works practically instantly (10-20msec):
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?x ?label WHERE
{ GRAPH ?source {
?x rdfs:label ?label.
FILTER REGEX(?label, “test”, “i”)}}

But the real astonishing thing is that SPARQL of this kind is also fast as hell (10-500ms):
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?x ?p ?label WHERE
{ GRAPH ?source {
?x ?p ?label.
FILTER REGEX(?label, “test”, “i”)}}

In simple words: this is a fulltext scan over all properties of all statements. don’t get bothered by the warmup time, the thing will need about 10-20 sec warmup, but then its great.

triplecount:
371994

for the freaks we will probably pack a little executable jar that packs all into a nice demo. for the real freaks:
http://gnowsis.opendfki.de/repos/gnowsis/trunk/sparql2sql/

demo app

2005-07-312017-11-04

pixelart and isometric art

nice thing found on web zen: pixelart

again, web zen made my day.
the tutorial suggested there is nice, I have to post my childish results of it.

aditional to the tutorial posted there, i found these:

list of sources: http://www.inthefaith.com/blog/archives/000437.php
nice tutorial http://www.deviantart.com/view/2866522/
small tutorial: http://www.pixelfreak.com/v3/en/tutorial0.html

2005-07-292017-11-04

smushing

I am putting together more about smushing, which will be a key factor in the global semantic web: to connect annotations that were made by different people.

A typical smushing algorithm would be:

take a large datastore DS that contains a set of triples Tset = {Ta, Tb, Tc, … }
iterate through known InverseFunctionalProperties IFPset = {Ia, Ib, Ic, ….}
for each InverseFunctionalProperty Iy that is represented in the Tset as predicate, do a check for smushing.
find all triples TxIy so that Tx has Property Iy
find one triple Txc of TxIy that points to a grounding resource / canonical resource (see below)
Use the subject Sx from the triple Txc and aggregate all other triples of subjects of TxIy to Sx. This means, change the subject in the triples to Sx.
add owl:sameAs triples to connect all Subjects(TxIy) to Sx

The problem is, when you have a set of triples TxIy that have several subjects that should be the same – as defined by IFP – to choose which subject is the “canonical” subject and should now be filled with the triples.

There are different approaches to find the canonical resource:

take by random
prefer the resource that is annotated in special ontology (ie prefer SKOS concepts over foaf:Persons)
prefer the more public resource (googlefight, public urls wins over private uris)
prefer the best annotated resource (the resource with the most triples – attention, this is self-amplification of single resources)
prefer the resource with the shortest / the longest uri
prefer named resources over anonymous resources (this is very important, you must not smush to anonyms)

Another question is what to do with the smushing. Different approaches

store the smushing in an extra graph
delete the old triples, add the smushing
add the smushing additional to the old triples (tricky)

Each has obvious advantages and disadvantages. For gnowsis I would prefer (1)to smush into an extra graph, which is similiar to (3) but seperates the data.

In gnowsis we have the problem of incremental smushing, which means that we crawl thousands of emails per day and then would like to smush the persons in the addresses, but only of the new messages.

I have posted this algorithm also in the ESW wiki, where you can comment on it.

2005-07-252017-11-04

geo-me

as recently said I added my geopos to leobard.net

now using this, I found myself on google maps.
http://maps.google.com/maps?ll=49.441200,7.766700&spn=0.05157,0.006615&t=k&hl=en

2005-07-252017-11-04

d/l the internet

http://www.w3schools.com/downloadwww.htm

2005-07-222017-11-04

new diploma thesis topics available

I updated my diploma thesis website, where I offer topics related to the Semantic Desktop as Master Thesis.

http://www.dfki.uni-kl.de/~sauermann/projekte/index.html

this always takes so much time, to write down my ideas and things that will be needed. The next student starts in October with a Thesis, so there are still places open.

here are the current topics. Well, I should add another one for GUI!

your own idea – “I have an idea for the Semantic Web, its X and Leo, you will like it!” mail me.
business card annotation– annotating business events
managing concepts using SKOS – concepts for the masses
semantic tools – evaluate semantic tools for our future
semantic search – search inside the semantic desktop (taken by student)
semantic email – how to annotate emails and how to classify emails. Eat Spam!
text annotation – using the semantic desktop to annotate documents or write notes about sport events
metadata integration – integrate information using RDF
cyberspace in ultima
online
– Are you a mage or a hacker? or both?

2005-07-222017-11-04

Langton’s Ant

Langton’s ant was used as an example of scientific ideas in Terry Pratchett’s book about science in his fantasy world (which my brother gave me as a present last christmas, thank you).

I searched and found implementations:
http://www.mathematische-basteleien.de/ameise.htm

and a definition:
http://en.wikipedia.org/wiki/Langton’s_ant

and an applet that works (yeah, java rocks)
http://users.libero.it/acnard/ant.html

2005-07-222017-11-04

ytmnd.com

http://ytmnd.com

http://whatisnes.ytmnd.com/

!!!!!! GREAT ZOOOT !!!!!

stumbled me thoruhg SWHACK

another reason to hack the semantic web…