Mozilla RDF Javascript support

Part II of the seriesdiving into Mozilla.

Simple XUL example

To get going with RDF in Mozilla, it is good to make a Hello World-kind of Application. I did this by the way of “jslib”, a javascript library that helps Mozilla developers.

Step1 – Download and Install jslib
from http://jslib.mozdev.org/downloads/index.html
and install it in your mozilla (it is a XPI, so no problem there)
Test the library by opening this url: chrome://jslib/content/
see also installation doc.

Step2 – write a XUL file to test
I did it with this ugly file that extracts the firstname from my public FOAF file: rdflib_hello (xul, 1 KB).

Step3 – configure it to run
The problem is that the XUL file must be placed where XUL files are usually placed. If you know how to do this, fine. If you don’t, you have to configure jslib so that it accepts files outside the chrome. This may be a security risk. Description to use jslib from local XUL files.

Step4 – run rdflib_hello.xul
Start Mozilla, go to “open file” and open the XUL file (or use chrome:// if you managed to put it in your chrome).
You should see a single button. Press it and the String “Leo” should come.

What it does:
It loads my foaf file from my public homepage and extracts a literal property from a resource. To do this, there are fine XPCOM objects in Mozilla. But the XPCOM are hard to use, so the jslib guys made this system of Javascript helper objects to handle RDF. The script loads these jslib functions. Then, in the function testresult(), it uses an RDF object to get the RDF data from the homepage, select a resource from it and query an attribute of the resource.

You can also use the XPCOM RDF objects directly. So, at first glance Mozilla proves to be RDF-capable.

starting to dive into Mozilla

I started today to do the programming examples from the lovely book Rapid Application Developement with Mozilla by Nigel Mc Farlane. Reading software books without doing the programming is like watching Simpson’s without sound: senseless. So I will spend some time doing the examples.

At DFKI we are using Mozilla as browser and add some nice RDF features, so I had to learn Mozilla anyway. The book is rather fresh, the author announced it on RDF-IG some time ago and I had it lying aroung for weeks.

RDF in Mozilla

For those of you who don’t know yet: Mozilla is soaked with RDF!.
And it has got lot of XML and javascript, an environment where I will feel cosy.

Mozilla is a very fine example of “how to use RDF and benefit”. It uses RDF to configure the platform and to communicate between backend and user interface. great! To get started, look into your MOZILLA_HOME/chrome directory and the file chrome.rdf. They use seq tags in a nice and easy way.

perhaps this is the beginning of a longer lasting relation between Mozilla and me, we’ll see…

Magpie

hm, I just had another look at Magpie.

Being a Semantic Web hacker I had to download it and try it out.

Hm, looks like you have to download “all knowledge there is” (aka ontology) and then magpie uses the literals in this Ontology to find similiar strings in the document.

Have a look at the video, we all have to make videos for our projects. Inspirative.

What I also found out it that the “right click” menu on found items uses a webservice. Magpie opens then a website like this: magpie link.
If you look at the parameters of the URL, you will see that the onotolgy is identified as string value and the requested resource also.

– HM –

The idea is good and I think I can learn something here. Have to check out the publications though. Also the Internet Explorer integration is nifty. The user interface is fine. I think the architecture does not scale. Best part: it adapts to IE and does not need a whole haystack of applications 🙂

public calendars

coordinating my appointments with someone else is hard enough.

Some offer solution.

http://icalshare.com/ – public iCalendar files

http://www.eventsherpa.com/ – proprietary calendaring application that is iCal based. They want you to host your calendar on their site. Which is also ok.

I also looked on MozCal’s calendar sharing ability and thought about using KDE’s KOrganzizer. The two didn’t agree on the same iCal lingo so I have to wait for newer versions.

Nice things for gnowsis

I stumbled across some nice features I could use for www.gnowsis.com.

Features like Samsung contact. They are also on Freshmeat.

This leads to the question: Is it feasible to implement an Open Source MAPI interface? By doing this, we could fool Outlook into storing its information on a RDF store, perhaps server based.
otlkcon tries to do it.

Bynari Insight Connector is the “bring two good ideas together” approach. It uses an IMAP server as a storage device to host MS-Outlook data. That is plain GREAT. It implements MAPI and fools Outlook into storing all its data into Bynari. Bynari then forwards everything to IMAP. Don’t wanna know what tweaks they did on the IMAP side, anyhow.

kSpaces

kSpaces

kSpaces is a metadata-driven, distributed knowledge management platform. It was designed to be lightweight, transparent and extensible. The kSpaces proof-of-concept allows files to be described with arbitrary RDF metadata. These descriptions can then be easily shared with and queried by other nodes in the system. Finally, kSpaces-managed files can be made available to all other nodes participating in the same kSpace.

finally someone with good ideas and practical implementation. We have to see if kSpaces can be plugged together with gnowsis. I am looking forward to see this code deeper.

the Holy Bibel as Placeless Content

Today we hype about blogs and are crazy about hyperlinks, URIs and the web. We quote content of others using hyperlinks. Messages spread through hyperlinks. If a cool blog has some content, it will spread.
Christian Bible usage has some interesting similarities to Semantic Web stuff. The Holy Bible can be seen as a small Semantic Web itself. Some social and cultural practice around the Bible are similar to Semantic Web practice.

Protestant Way

From protestant view only the Bible itself gives authorative information about the faith. Secondary literature is not authorative, it can only enlighten and go deeper about stuff that is already written in the bible. If you want to know the real stuff, you have to use a real bible. (not a Firebible).

So knowledge of the bible is a pre-condition to know and live the faith the protestant way.

Biblical Authority

For a protestant, the meaning of biblical terms can only be defined by the bible itself and the few information we have from secondary literature that is from the biblical age. And of course, by the personal experience of the Holy Ghost, but I will exclude Him from this essay.
William Barclay used all historic sources he could get to write his Bible Commentary, but in the religious world, the Bible is always seen as more trusted then other texts from the ancient world. This may be good or not, I will not discuss this, you sure have your own faith about it.
But certain tools are build to make it easier for Bible-enthusiasts to live their christian faith.

  • Normative Identifiers (aka Uris)
  • Cross References (aka Hyperlinks)
  • Blogging
  • Indices (aka Search Engines)
  • Chain Systems (aka Link Collections)
  • Excessive Quoting (aka blogrolling)

I will now show what these points mean for a practicing christian and how they are related to current hypertext systems.

Normative Identifiers

Some wise people had the idea to give all books in the Bible names and use them to identify the books. The books are called “John, Luke, Psalms, Revelation” etc. If there are two books with the same name, they get a unique integer id prefix, ordered by publishing date. Thats where “1Moses, 2Moses, 1Joh, 2Joh” come from. First the Latin and Greece names where good, today we can map them to all languages.
Inside a book, a destinction into chapters was made. So we have John 1,2,3,4,5 … 21. In each chapter the verses where numbered. The verse number is written after the chapter number, normally like: “John 3,15”. And voilá, we can identify bible passages in all languages and cultures and over the last 1000 years with this system. Great, isn’t it, if you read age-old books about the bible, they use the same URIs as we do, where do you find this persistency anymore ? If I go to the St. Stephen’s Cathedral in the Center of Vienna, there are stones from the 15th century there quoting bible passages, using the same identifiers as I am using here to quote Joh 3,15 (this is in german). So referencing works in all Bible Translations in all languages and cultures. Great.

Cross References

The nifty thing about Bible reading is, that when you don’t get it, you switch to another passage about it. They quote all the time. Even Jesus does. And they quote passages that have been age old even in their time. When I as a reader do not understand a passage or want to know more about it, I can usually find these neat references at the border of my bible. In the online bible, they are at the end of a chapter and foot-noted.
Sometimes people quote well known parts of the old testament, for the modern reader the references to these parts are given. For example in Acts 7 Stephanus gives a talk about the bad things that happened to the prophets before. After the passage you find many cross references. (Sadly the audience did not listen and killed him). In a paper Bible you would be very fast in finding the referenced passages. Win a Bible Quiz. Some online Bibles are slower than paper based systems….

Blogging

As we see, Bible content is not really structured by topic. Most of the books are a collection of stories written by different people. Most are about what these people did, like a diary, or what they heard that other people did, like a newspaper. The newer articles in the bible often quote the older articles. In the new testament, the four gospels and the Acts form a blogosphere of four people. (Acts is writte by Luke, we assume).
A blogosphere is the bible? Sure. They all write down what happened, from their point of view or as they heard it. They quote the historical persons or each other. They write chronological. Some Bible witty guys assume that some apostels had the wise idea of taking small notes while things happened and later wrote the gospels. Isn’t this enough to state that the fourl gospels are blogs? At least they are heavily linked in the letters and in literature.

Indizes

If you don’t know where to start your Bible study about “Wine”, you can rely on a Bible index. The idea of an index is to have an alphabetical list of all relevant words in the Bible and then have a list of all links to passages where this word appears. My own paper based index book has the advantage that it only quotes the most relevant passages and it has the context around the word, to know what the passage is about. So when I search for “Wine” I see part of the sentences containing the term, that helps much. Hm, don’t modern search engines do this also???

There are different types of indizes, from Full Blown Bible Blasters with all words in every sentence to smaller slightly slimmer scriptures, that fit in your pocket. Anyway, indizes that also list a little context are great in Christian and in Semantic Web world. Like in google, the often referenced passages have a higher chance to be placed in a selective index.

Chain System

My Scofield Bible is organised in a fascinating way. Additional to the use of Cross References (hyperlinks), it has so-called “chains”. A chain is a connection of several bible passages about the same topic. For example, there is a chain about “grace”. It starts at Joh 1,4 and ends at Revelation 22,21.
There are 72 selected Terms in my Bible, like “Antichrist”, “Christ”, “Sabbat” etc.
Surely these Chains are selected by a not-divine author and therefore there is more than one chain system. It reminds me very strong to Vannevar Bush’s idea of “trails” in the memex article.
Doing trails is ok, when you write your own ideas about a certain topic, you get heavily flamed in the christian world: Scofield Conspirancy Theory.

Excessive Quoting – Blogrolling

If a christian teacher or preacher writes some text and publishes it, she or he has to cross link the text to the Bible. You won’t find a text that isn’t filled with “I tell you this and that, as it is said in Luke X.Y”. If christian content is discussed, there will always be bible references. This is a good practice, as it allows the recipient of the content to integrate the new information in her or his existing knowledge about faith. Like anchors the refernces allow us to put the new, contemporary content to historic places we know. If someone talks about God’s grace he has to quote some of the famous Bible passages that mention grace.
This reminds me of us modern researchers, if you don’t do one reference every five lines of text, you aren’t considered cool and worthy reviewing. But in Semantic Web times, you have to link to related popular articles to get yourself a good place in the search results.

Revelation

So what its got all to do with current discussion on RDF-IG and #rdfig? Well, hyperlinks and URIs are a very old stuff and we can be happy that we use them. The Holy Bible has been around for about 1900 years and people using it have invented some cool tools and social practice. There is a globally agreed identification system, aka Bible references. Protestant belief denies the authority of non-biblical material, so the only way to really know what the Bible says about Love is to read all the Bible passages about it. This is a healthy view of objectivity: You have to read all cross-linked related material to get a view.

Heavy quoting and cross-hyperlinking is good, it helps the reader of the Bible to find passages that are related to passages in current context. Index system and full text indices are in the arsenal of witty Bible-Proof christians for the last thousand years. That’s why they always find these Bible quotes so amazingly fast.

In contemporary christian literature, hyperlinking to the Bible is used to Semantically annotate the new content and thereby classify it. Related material can be found by searching for other work that cites the Bible passages.

A single web document alone us not authorative. Only all related material together gives a good impression what people think about a topic. Link-collecations, cross linking and quoting help us to find this related material. “Famous and historical” resources form some kind of Anchor for us – like Biblical passages do for contemporary christian literature.

To a savvy Christian, all this Semantic Web stuff is thousand years old and we relax by quoting Salomo (Ecc 1, 10):
“What has been will be again,
what has been done will be done again;
there is nothing new under the sun. “

and thanks to Michael Zeltner for bugging me to blog this crazy idea.

Query Languages Report

RDF-Query

A report by AIFB and Sesame and Jeen Broekstra from the Sesame crew. The Authors know what they are talking about as they are SemWeb developers themselves.

Although a little self advertisement and some missing languages, its a good thing to read. If you need info about RDF Query languages, read it.

My previous demand about “optional joins in queries” is answered by SeRQL.

why I love Patrick Sticklers URIQA approach

Ever tried to convert data into RDF? Extract something from iCalendar or an MP3 file and then use a bit of RDF? Have it all in a graph? Then you may be interestedhow to choose your weapons wise: If you want a fast and easy way for RDF integration, follow Patrick Stickler and his URIQA ideas.

Today I had another day of fighting with gnowsis, my desktop integration framework. The task was to extract data from MS-Outlook, on demand. The output format and request format is RDF, I used iCalendar/RDF as ontology.

Gnowsis does wrap all read access to outlook by a Jena Model. Outlook is represented as a Jena Model, each resource in outlook gets a URL and is a RDF resource. So f.e. a query like
SELECT ?x, ?y WHERE (<rdfp://leo.gnowsis.com/msoutlook/appointment/
00000000B2CDC30BFF2EED4ABA9C61436A07FE3384002000> ?x ?y)

does give a RDF/XML return like this: QueryResult.xml (xml, 1 KB)

as you may note, there are some “in between resources” where usually you would find anonymous resources: the properties dtstart and dtend have as object the (normally anonymous resource) “…#Start” and “…#End”.

This helps with Jena internals: My Jena model is dynamic, it has no storage backend. Whenever a query hits the model, it searches for resources and creates new triples. The anonymous nodes in between would break this model, when I get an anonymous object in dtstart, I could not easily do a request for the cal:dateTime value inside.

at this time consider reading some example RDF/ical, if you don’t have any clue what I am talking about, check out this: TestTermin.rdf. It is a longer version of the same VEvent entry.

So how can I get the anonymous nodes and will it be efficient?

This is a flaw in gnowsis: Each triple is generated by a Java class that represents the property in the triple. Triples containing the “summary” property are created by a corresponding “summary” java class, the Java class checks out the subject and finds a correct object value (and vice versa).

So Gnowsis adapts all properties with java classes. This can get too big, when an ontology uses many properties and classes. iCalendar is already too much for me to program.

So enter Patrick Stickler and his Uriqa

Most applications need – on the client side – only data about one or a few resources. You want to see an email, a person, an event. Then you may want to write an email, add the event to your calendar or do similiar stuff.

To Extract the needed data from a Jena Model, you will (based on RDQL) need many queries, even for a single Resource, as RDQL has no optional joins. This is a problem, and other people have it too. Read f.e. The Veudas Announcement and see there.

The solution is to get bigger chunks of RDF in one request.

what is a RDF chunk and how do I get it?

When you want to work with RDF data you need normally the data about a resource (at the best, the resource is identified by URL or downloadable). If the resource points to other resources via RDF triples, you may want to load the other resources, too.

F.E I have an appointment with JohnDoe, I may start at the appointment and then load the resource containing data about JohnDoe.

So when I load the chunks, how would I load them and what should they contain?

Most people will want “RDF-Subgraphs”, and I prefer them great above “variable-bound result sets” (like RDQL will give you). You will get such a subgraph from you RDF server by a protocol. A possible protocol may be URIQA. you may also do it with Joseki or Sesame.

And what should be contained in the RDF-Subgraph? Patrick Stickler has an answer, that is conformant with my wishes: He defined the Concise Bounded Description.
Everything that I need immediately, with the option to get more if I want.
Patrick says:
A concise bounded description of a resource is a body of knowledge about a named resource which does not include any explicit knowledge about any other named resource.
more about Concise Bounded Description is at the URIQA page.

So the Concise Bounded Descriptions are the kind of chunks we will like to retrieve from any RDF host or RDF publishing application, why this is so good I will tell now.

Using Chunks instead of accessing single triples has several implications:

  1. Easier to write extraction algorithms
  2. Much faster & efficient
  3. Addressable chunks
  4. Restricts Access to chunks, single triples cannot be retrieved

now to a deeper look at the implications:

1 Easy Extraction algorithms

For gnowsis, it was needed to wrap every single rdfs:Property and rdfs:Class. This is good for some applications but not for all. Complicated tasks are better done by “hand-written” extractors, programs like the many Perl/Python scripts that convert stuff to RDF. You can find many examples written by TimBl, DanC and others at SWAP and some in cal-space, f.e. fromIcal.py.

Most of these converters and tools supply single resources, they convert a single file or similiar.

It is easiser to write RDF-integration for “chunks” of RDF, this is proven by the many adapters and experience.

2 Much Faster and Efficient

I have written adapters to extract RDF from the Filesystem, MP3 files, iCal and Outlook. I have gone so far to write a specific Java class for every property and class around. I have a big overhead with this. Especially when you use ActiveX or SOAP bridges to communicate with your data sources, you will have a problem.

So it is better to write an extractor very “near” to the source data. Imagine to write an MS-Outlook data Extractor in Java or in C++/Dll what will be faster? Surely the C++ DLL.
But when you wrote your adapter as DLL, how can the DLL understand queries when they come from a sourrounding like Jena or Sesame?
So the best is to say: DLL, give me the Concise Bounded Description of Resource X.

Then the dll may extract all stuff and return a chunk of RDF/XML. And if you are really clever, you may have built it in a way that you can pass the XML directly to a calling client.

Think of implementing a Enterprise Integration Server and you need RDF data (readonly), this can be a very neat way to do it. Just use any protocol and Concise Bounded Descriptions.

Addressable Chunks

A main problem in Semantic Web is the question: How in the world will I get information about the resource X?

Many people propose to use indirect addressing:
To get a FOAF of Person X, search for
X? <foaf:mbox> <mailto:leo@test.com>

This is a way to address people with email address “leo@test.com”. hm. But which server shall I run this query against? www.test.com? smtp.test.com? And which interface to use? http, smpt, rdfp, uriqa, … ?

So if you use indirect addressing you have to think of a search engine or something like a central register. (which is ok, I respect that and I know many people who do and I like them).

But I prefer another approach, where every resource is addressable by URL. (Perhaps I will write something about this, too lazy now).

It is much easier to work with RDF-Chunks when the resource identifier URI is also a URL and contains information about protocol and server where I can get the Concise Bounded Description of the chunk.
It is even good when you are only working on Desktop Integration of data on a single host: There may be a Sesame Server and a Joseki Server running Happily on your machine, how will you decide which server to contact when you need something?

So I propose: use the power of URLS and fill them with information of how to get data about the resources.

Only chunk access, no single triples

If you work only on Concise Bounded Description chunks and cannot query a server for individual triples, this may be ineffizient! Yep, it may be, especially, if you have “directory” resources, f.e. a folder resource that is connected to thousands of files.

In this case your adapters may always have to build the whole chunk and then return it to you, so you can filter it on client side. But this is not the last word. You can use an adaptive Framework for such queries (like gnowsis framework). Or you can use another query interface only to do the selection of resources.

F.e. the selection of “all Appointments I have in the next seven days” may be a complicated search in your iCalendar data. If you are a badass hacker, you may want to write a query engine that does the trick for you, as long as you support SeRQL or RDQL I am happy hitting your server with Queries.

When I then have my list of needed Resource Identifiers, I can use URIQA or any other Concise Bounded Description compatible server to get my RDF chunks.

to sum it up

Building RDF aggregators is not an easy task. I have tried it in many different ways, and had varying forms of success. If you want an easy approach that does it, think about concise bounded descriptions.