piggybank VS gnowsis – towards a better Semantic Desktop

In this article I am going to point out how the experiences made from piggy-bank and gnowsis could fit together and how all this code could be made compatible towards a better Semantic Desktop, creating a new semantic experience. First I will point out what inspired this article and some technical details, then I will describe some plans for the future.

I saw Eric Miller from the SIMILE team at ISWC and he suggested that I and Libby and others install Piggy-Bank, which was surprising to me as I have looked on this piece of software for a long time, but never dived deep into it.

So, when I first had a look at it, I found much interesting points. Then at ISWC we tried to annotate “this talk had a bad typo in the powerpoints” and I wanted to somehow make a comment or enter this information, but not as a tag. As Piggy-Bank does not support easy entering of random RDF, I first blogged the typo and then combined the blog post URI with some RDF from a file, and it worked. So the start is, that Piggy-Bank is primarily for scraping RDF from web-pages and surfing this RDF. Editing options are only reduced and even Eric and me had some mind-bending things to do before I could get to my goal – saying the funny typo in RDF.

During the last weeks I mailed with Stefano Mazzocchi from the SIMILE team and also Apache, talking primarily about Aperture. Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (file systems, web sites, mail boxes, …) and the file formats (documents, images, …) occurring in these systems.

So, editing any RDF is something we have tried to do again and again in gnowsis, and we have two major approaches to this task: first, the linker, a user interface that allows linking two resources with each other and then Enquire2, a user interface with more editing options (adding wiki text, editing any RDF using the ThingDialog).

I now took a deeper look at Piggy-Bank and my mind begins thinking how we can join Piggy-Bank and Gnowsis and I decide to publish some of my thoughts, according to the good principle that I try to write on my mind: ” Building the Semantic Web is easier together”.

So, some facts:

  • Piggy-Bank offers great options to easily browse bigger amounts of RDF
  • Piggy-Bank is seperated into several sub-projects, like longwell for browsing.
  • Piggy-Bank team wants to extract more information from PDFs, etc.
  • Piggy-Bank combines Java with Mozilla
  • gnowsis extracts data from Mozilla.
  • gnowsis has some plugins for Mozilla Firefox and Thunderbird
  • Gnowsis is written in Java against Jena
  • Aperture is written in Java against Sesame (and planed for Jena)
  • Gnowsis should move to Sesame2 as data store, I already entered a ticket for this
  • Piggy-Bank uses Velocity for displaying their information as HTML. This is a nice framework anyways, but fattens the distribution by a few kilobyte (no problem).
  • both use Jetty for their internal web-servers.
  • Piggy-Bank stores their triples into your Firefox-Profile dir. There you find your native-SAIL sesame data. Example on my Mac: ~/Library/Application Support/Firefox/Profiles/xxxyyy.default/piggy-bank/my-piggy-bank
  • Gnowsis stores its data into ~/.gnowsis/data using jena

Gnowsis and Piggy-Bank are – from the high-up view – near to each other and in principle compatible. They both base on RDF, are licensed under BSD, based on Java and integrate with Mozilla. They differ much because gnowsis is programmed against Jena, but that’s not a problem in principle.

So what are possible benefits and pay-offs, wild ideas:

  • Piggybank may profit from the thingdialog and other editing functions of gnowsis.
  • piggy-bank and gnowsis can both integrate aperture to extract RDF from data sources and applications like address books
  • gnowsis could grab much piggybank code for page-scraping
  • gnowsis could use longwell as RDF browsing thing
  • both could move on to the long-seeked “semantic Bus” architecture. more about this below. Basically, they would share their data stores together.
  • piggy-bank has a nice development tutorial that gives much info on how it works inside.
  • gnowsis has a development wiki that shows how to get started with gnowsis. and some outdated documentation that is still interesting.
  • gnowsis docu could be improved

but hey – what is it worth to take code from piggy-bank and try to make it run in gnowsis (and vice-versa) when both projects could live together in parallel? Wouldn’t it be the right thing to make connections between semantic applications; making them all talk to each other. Together they provide a semantic desktop architecture that is open, that is based on collaboration and web communication, distributed applications working hand in hand for a better user experience.

The Semantic Bus

So to bring these two desktop applications nearer to each other, they could both use a Semantic Bus a buzzword known also as Semantic Desktop or seen on some TimBl slides recently.
Basically, a Semantic Bus is a system that connects several services running on one machine, database, gui, datasources and adapters, web-browser, applications are all conntected via this semantic bus and use snippets of RDF and some kind of REST-ful api to talk to each other.

The bus is a fabled thing that will probably take a little longer to materialize, but its part of our upcoming NEPOMUK plans anyway and we will try to implement something and then standardize and align with others. The Simile project is also aiming at the same ideas and also Patrick Stickler had and idea in the direction of a semantic desktop integration bus.

For the piggy-bank VS gnowsis question, the Semantic Bus is important for Storage and Adapters (Aperture), and I think we have to move to a more federated architecture here.

Personal Semantic Storage
At the moment, the storage of piggy-bank is far from what we want: its hidden inside an obscure directory and it can be only accessed when Firefox runs. Thats bad. Same with gnowsis, its slow because of Jena and runs only when gnowsis runs (which is often, gnowsis is designes as a server).

For our mutual goals, we would have to decide a kind of neutral “personal sesame” edition and installer, that we ship both with Piggy-Bank and gnowsis0.9. It can be a requirement for both and is a kind of “basic” architecture. And it is the first cornerstone of a Semantic Bus.

So, Personal Semantic Storage using Sesame2 as server in a small Jetty bag would serve great as first cornerstone of the Personal Semantic Desktop / Semantic Bus. *yeah* *buzzwordsgalore*

Why is this plan so good? Because it speaks the language of two major Semantic Web projects – piggy-bank and gnowsis. They both say that Sesame is a useful storage. When we keep talking about Sesame we can also say that it contains a RESTful web server and a simple web-API for it. All running stand-alone inside Jetty, if you want.

So, if piggy-bank and gnowsis could be configured to use an external personal semantic storage – sesame2Personal – we have begun.

From my perspective – what are the problems we face here

  • Sesame2 moves on to Java 1.5. The installation of piggy-bank is quite tricky already, under MacOsX dealing with Java 1.5 will be interesting, but solveable. If Piggy-Bank wants to use Aperture and Sesame2, they will probably have to move to Java 1.5. Gnowsis has this plans also and enough problems on itself.
  • can the developers of both projects gather with Aduna and design a Personal Semantic Database as first cornerstone of the Personal Semantic Desktop? The task is to find a common standard how to deploy RDF applications on desktop computers and how to name those repositories – where to put piggy-bank treasures, where to put gnowsis links, where to store all this buffered RDF from Aperture?
  • Adapters to local data sources. The aperture project gives easy access to local data stores, development is running at the moment. Can this be the project to integrate various local and remote data sources to the storage?

The Vision for a Semantic Desktop
So – what is the vision of a Semantic Desktop that comes here? You install your personal sesame2 server. You install Aperture and gnowsis, configure some datasources and let it crawl. You can use Piggy-Bank now and open it’s faceted browsing interface on this vast datasources. Without storing anything from the web, your local gnowsis already contains around ~400.000 triples; extracted from your own files, emails, address books, bibtex files, etc. These desktop triples are ripe for piggy-bank or Autofocus.

You gather new data from the web using piggy-bank and its java-script written page scrapers. Here you can do all the nice things of piggy-bank. Store weblog’s rss feeds. Tag all data related to a friend of yours. See the last hiking trip on google maps via piggy-bank. You view the stored info from your piggy-bank data using gnowsis and have the possibility to edit each literal or value. You create new instances of any class in gnowsis and store them into your personal semantic web store. For example, you want to create a new recipe in the style of epicurious. You have already piggy-banked some epicurious recipes from the web, and when you click on “new recipe” in Enquire2006 (the upcoming user interface of gnowsis0.9) you can create such a recipe for yourself. So, based on the existing recipes (a kind of ontology) you make up your new recipe and store it. Done. At the end, using piggy-bank, you could upload your recipe to a public piggy-bank and others can facet-search it there, tag it and be happy.

Also Patrick Stickler’s URIQA approach would fit into this vision. Any application could use the personal semantic storage to quickly create concise bounded descriptions, or use Aperture to get fresh data from various sources.

2006 will be the year where we are closer towards contact – contact to the original use case of the semantic web. sharing cooking recipes!

Summary and Outlook

I had a look at two current Semantic Web projects, gnowsis and piggy-bank and played with ideas how they might mix together. Gnowsis and Piggy-bank have distinct features and disctinct use cases. But they both rely on storage like Sesame, and next year probably both will be based on Sesame2 native storage and lucene for indexing. So the different projects could integrate very good, if they both contact a Personal Semantic Desktop Data Server, a sesame2 thing running in the background. (have to find a name for that).

Piggy-Bank has perfected the integration of data of web-sites and gnowsis has a tradition of contacting local sources like IMAP or Microsoft Outlook, if both applications could decide on a common Sesame2 storage in the middle, and some little framework around for configuration etc, the Semantic Desktop at large would come nearer. Both might have reduced performance (adding another communication layer in the middle) but these can be overcome by pushing some of the velocity work to the server directly. I also wrote a little vision how usage of piggy-bank and gnowsis could be at the end of 2006.

sparqling the opera community

as suggested on the swig scratchpad, we should search for examples.

ok, what we have is a SPARQL conformant query interface to 2mio triples that come from a community website (users, forums, photos, etc)

I did a little blind shots, not very amusing. By mistake, I did a full table scan on all instances once (* type *) and got a 100mb file with uris to play with. None of the uris has a DESCRIBE (argh, why isn’t CBD in SPARQL yet).

so, this for example doesn’t work:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
DESCRIBE <http://my.opera.com/butchevans/homes/albums/10149/PA140033.JPG>

typical uris and types I found:
– these are of type foaf:Person

– these are of type foaf:Image

this has got type gallery:

these are blogs:
type: http://purl.org/dc/elements/1.1/Collection

a document:
type: http://xmlns.com/foaf/0.1/Document

type: http://xmlns.com/foaf/0.1/Group

ok, looking at the opera community site we see that there is a guy named Words who is member of the week, meaning he is very active. So sparqling for Words looks like this:

PREFIX rdfs:
{<http://my.opera.com/Words/xml/foaf#Words> ?p ?o.}

Well, this brought us his foaf:knows relations and his weblog uri. cool. Lets make a backlink search:

PREFIX rdfs:
SELECT ?s ?p WHERE { ?s ?p

this brought some pictures he created, connected via dc:creator uri: http://purl.org/dc/elements/1.1/creator.

and much more….

ok, thsi should be enough to get people on the track to build guis and web 2.0 stuff on top of opera community.

thanks to opera: the users data is users data now, not theirs. cool move.

abusing the semantic web

Rickard Öberg blogs here about “abusing” piggy bank to do something with spreadsheets. First of all, Jean Rhomer said that “Excel is the competition of the Semantic Web” which is exactly where we all are heading.

Second, I wonder if Aperture would help Rickard in his task – he mentiones to automatize the import of the excel sheets. In Aduna Metadata Server, using a DataSource, this would be possible. In Aperture (+Sesame2) it will be possible, soon.

The idea is to create a RDF datasource out of an openoffice XML file (a spreadsheet of people and how long they work) and use XSLT for translation to RDF (hm, sounds like obvious idea now buzzworded scioc?). Then this datasource is queried from time to time (doing the xslt magic) and stored to a server (piggybank or gnowsis or whatever).


jause – brotzeit

Diesen Sonntag gibts bei der Mama eine Jause. Das versteht nun wieder mein deutscher Bürokollege nicht, darum der versuch einer Erklärung.

Eine Jause ist ein kurzes Essen, das oft unterwegs eingenommen wird. Eine typische Jause wäre, wenn ich bei einem Fahrradausflug am Donauufer mit der Familie stehenbleibe um Butterbrote und Radieschen zu essen. Nach einer halben Stunde fahren wir dann weiter. Mann kann auch zuhause jausnen.

wikipedi – jause

uri crisis – what do URIs identify?

Still, we don’t use the Semantic Web in broad and I think one problem is, that we don’t find the right uris to identify ideas/people/things – concepts from the real world. The discussion about the URI crisis does not happen in conferences and articles, but everytime somebody proposes a new uri scheme to identify books, lifescience terms, etc. Then masses of people flame each other on mailinglists.

There are nifty approaches about to cure the identity crisis (like this here) but they all fail the problem because the problem is much deeper.

I usually then write one email saying “uri crisis again” to point out that the problem is unsolved.

So, what is the Uri crisis about?

basically, this text TimBl has written describes it best:

What does “http://www.amazon.com/exec/obidos/ASIN/0679600108/qid=1027958807/sr=2-3/ref=sr_2_3/103-4363499-9407855

  1. A whale
  2. “Moby Dick or the Whale” by Herman Melville
  3. A web page on Amazon offering a book for sale
  4. A URI string
  5. All the above

When was the thing it identifed last changed?
Have you read the thing it identifies?

it is part of the article “what do HTTP Uris identify?”

So we don’t have good uris to identify people, concepts, books, etc. Because a Uri has more than one meaning.

This is explained very good by David Booth’s article
“Four Uses of a URL: Name, Concept, Web Location and Document Instance” Coming to this conclusion
One point seems clear. In using URLs to identify concepts (such as “http://x.org/love”), we need conventions for denoting each of these four things: name, concept, Web location and document instance.

Then there was also an article how the semantics of Topic Maps could help by Steve Pepper, titled “Curing the Web’s Identity Crisis”.
In his introduction Pepper writes:

In an important recent article on XML.com entitled “Identity Crisis” [Clark 2002], Kendall Clark addresses the issue of “identity” as it pertains to the World Wide Web. Clark quotes the
description of the Web by the W3C’s Technical Architecture Group (TAG) in Architecture of the World Wide Web [Jacobs 2002], as a “universe of resources”, where “resource” is to be understood according to the definition given in [RFC 2396] as being “anything that has identity”. Clark points out that the concept of “identity” itself is nowhere defined and moreover is severely problematic.

He cites the Article “Identity Crisis” by Kendall Grant Clark. In his introduction to the problem, Clark says:
The Identity of Resources.
In the APW’s view, the Web is a “universe of resources”. So far, so good. But what is a resource? The APW adopts the definition of resource from RFC
, a definition which has always made me uneasy, though probably because I’m still more inclined to think of these things like a philosopher than like a programmer or software system architect.

So, it is a philosophical problem. Ah. Now we come somewhere. Sadly, every time leobard tried to get a philosopher on this track, saying things like “I think that URIs will change the way we identify abstract concepts, a change that is fundamental to our constructivistic worldview”, philosophers answer: you young nerd, read 10 kilos of philosophical books and come back. Sure – but I won’t spend no time on that.

So – face it. The meaning of what a URI identifies os not defined. Hence, when TimBl announces he has a URI now and a Foaf file – what does this mean?

That we should identify the concept “Person named Tim Berners-Lee” using this uri?

perhaps, and perhaps thats the way it works: you explicitly say to identify the concepts you have in your mind using the URI you find most approapriate. When other humans copy your behaviour (and copy/paste your uri), URIs will identify concepts. Hm, perhaps.

So, next time when I shout “You are facing the Uri crisis”, don’t answer “I never heard there was a crisis” or “are we out of uris?” and think of a solution instead.

the great escape 2

hat heute abend in der Glocke in Kaiserslautern stattgefunden.

Themen waren:

Stephan Baumann – Cultural Hacking.
Eine präsentation aus Slides und Videos, Mischung aus wort, bild und etwas ton. Wortlos vorgetragen, aber die slides im Rythmus der Musik abgemischt und live gemixt. Gutes topic.

Martin Wasniowski – Urban Calligraphy and Beyond

projekte aus einem buch – urban calligraphy. Dann photos von street-art. Später das projekt germany.yellowarrow.net vorgestgellt und gleich mal ein paar sticker verteilt.

Florian Groß – liebe-deine-stadt.de

ein projekt, das sich noch im Aufbau befindet. kommt aus der Richtung architektur und guter theoretischer background. liebe-deine-stadt.de

photos gibts auch, wie immer im Kollektiv auf flickr:


(achtung, da sind auch pics von älteren events)

ISWC2005 trip: Friday, 11. November 2005

going home. just relaxing day in galway, making writeups of some ideas, photos, etc. At shannon âirport we had wireless, which explains this post. and this photo:

Three hackers found wifi at shannon airport

During the flight I had an idea for a nice paper, perhaps this will be on ESWC, we’ll see.

Arrived savely in Frankfurt Hahn Airport, Bertin joined me in my car and we had a nice drive going home. I think not a single semantic topic was raised during this car drive. relaxing. At about 00:19 I came to the train station where Ingrid arrived at 00:17, so our love is worldwide and brings us together timely, she just arrived from Vienna.

after all: got to see all these people from the community, love them all, many topics to do research and many places to keep on hacking the semantic web together. All these new foaf:knows triples should help me to include more projects to gnowsis and replace some of my work with work by others. Hope the semantic web goes alive soon. At least these blog entries have urls.

special greetings to stefan decker and his family.
greetings to captsolo for showing us town, Ina, danbri, libby, kao, chrisb, kim, jack, sibel, christine, york, danc, peter, alistair, sergio, stefano, haibo yu, alfredo, emmanuel, malte, ansgar, bertin, michael, dfki, and everybody else out there.