kSpaces

kSpaces

kSpaces is a metadata-driven, distributed knowledge management platform. It was designed to be lightweight, transparent and extensible. The kSpaces proof-of-concept allows files to be described with arbitrary RDF metadata. These descriptions can then be easily shared with and queried by other nodes in the system. Finally, kSpaces-managed files can be made available to all other nodes participating in the same kSpace.

finally someone with good ideas and practical implementation. We have to see if kSpaces can be plugged together with gnowsis. I am looking forward to see this code deeper.

the Holy Bibel as Placeless Content

Today we hype about blogs and are crazy about hyperlinks, URIs and the web. We quote content of others using hyperlinks. Messages spread through hyperlinks. If a cool blog has some content, it will spread.
Christian Bible usage has some interesting similarities to Semantic Web stuff. The Holy Bible can be seen as a small Semantic Web itself. Some social and cultural practice around the Bible are similar to Semantic Web practice.

Protestant Way

From protestant view only the Bible itself gives authorative information about the faith. Secondary literature is not authorative, it can only enlighten and go deeper about stuff that is already written in the bible. If you want to know the real stuff, you have to use a real bible. (not a Firebible).

So knowledge of the bible is a pre-condition to know and live the faith the protestant way.

Biblical Authority

For a protestant, the meaning of biblical terms can only be defined by the bible itself and the few information we have from secondary literature that is from the biblical age. And of course, by the personal experience of the Holy Ghost, but I will exclude Him from this essay.
William Barclay used all historic sources he could get to write his Bible Commentary, but in the religious world, the Bible is always seen as more trusted then other texts from the ancient world. This may be good or not, I will not discuss this, you sure have your own faith about it.
But certain tools are build to make it easier for Bible-enthusiasts to live their christian faith.

  • Normative Identifiers (aka Uris)
  • Cross References (aka Hyperlinks)
  • Blogging
  • Indices (aka Search Engines)
  • Chain Systems (aka Link Collections)
  • Excessive Quoting (aka blogrolling)

I will now show what these points mean for a practicing christian and how they are related to current hypertext systems.

Normative Identifiers

Some wise people had the idea to give all books in the Bible names and use them to identify the books. The books are called “John, Luke, Psalms, Revelation” etc. If there are two books with the same name, they get a unique integer id prefix, ordered by publishing date. Thats where “1Moses, 2Moses, 1Joh, 2Joh” come from. First the Latin and Greece names where good, today we can map them to all languages.
Inside a book, a destinction into chapters was made. So we have John 1,2,3,4,5 … 21. In each chapter the verses where numbered. The verse number is written after the chapter number, normally like: “John 3,15”. And voilá, we can identify bible passages in all languages and cultures and over the last 1000 years with this system. Great, isn’t it, if you read age-old books about the bible, they use the same URIs as we do, where do you find this persistency anymore ? If I go to the St. Stephen’s Cathedral in the Center of Vienna, there are stones from the 15th century there quoting bible passages, using the same identifiers as I am using here to quote Joh 3,15 (this is in german). So referencing works in all Bible Translations in all languages and cultures. Great.

Cross References

The nifty thing about Bible reading is, that when you don’t get it, you switch to another passage about it. They quote all the time. Even Jesus does. And they quote passages that have been age old even in their time. When I as a reader do not understand a passage or want to know more about it, I can usually find these neat references at the border of my bible. In the online bible, they are at the end of a chapter and foot-noted.
Sometimes people quote well known parts of the old testament, for the modern reader the references to these parts are given. For example in Acts 7 Stephanus gives a talk about the bad things that happened to the prophets before. After the passage you find many cross references. (Sadly the audience did not listen and killed him). In a paper Bible you would be very fast in finding the referenced passages. Win a Bible Quiz. Some online Bibles are slower than paper based systems….

Blogging

As we see, Bible content is not really structured by topic. Most of the books are a collection of stories written by different people. Most are about what these people did, like a diary, or what they heard that other people did, like a newspaper. The newer articles in the bible often quote the older articles. In the new testament, the four gospels and the Acts form a blogosphere of four people. (Acts is writte by Luke, we assume).
A blogosphere is the bible? Sure. They all write down what happened, from their point of view or as they heard it. They quote the historical persons or each other. They write chronological. Some Bible witty guys assume that some apostels had the wise idea of taking small notes while things happened and later wrote the gospels. Isn’t this enough to state that the fourl gospels are blogs? At least they are heavily linked in the letters and in literature.

Indizes

If you don’t know where to start your Bible study about “Wine”, you can rely on a Bible index. The idea of an index is to have an alphabetical list of all relevant words in the Bible and then have a list of all links to passages where this word appears. My own paper based index book has the advantage that it only quotes the most relevant passages and it has the context around the word, to know what the passage is about. So when I search for “Wine” I see part of the sentences containing the term, that helps much. Hm, don’t modern search engines do this also???

There are different types of indizes, from Full Blown Bible Blasters with all words in every sentence to smaller slightly slimmer scriptures, that fit in your pocket. Anyway, indizes that also list a little context are great in Christian and in Semantic Web world. Like in google, the often referenced passages have a higher chance to be placed in a selective index.

Chain System

My Scofield Bible is organised in a fascinating way. Additional to the use of Cross References (hyperlinks), it has so-called “chains”. A chain is a connection of several bible passages about the same topic. For example, there is a chain about “grace”. It starts at Joh 1,4 and ends at Revelation 22,21.
There are 72 selected Terms in my Bible, like “Antichrist”, “Christ”, “Sabbat” etc.
Surely these Chains are selected by a not-divine author and therefore there is more than one chain system. It reminds me very strong to Vannevar Bush’s idea of “trails” in the memex article.
Doing trails is ok, when you write your own ideas about a certain topic, you get heavily flamed in the christian world: Scofield Conspirancy Theory.

Excessive Quoting – Blogrolling

If a christian teacher or preacher writes some text and publishes it, she or he has to cross link the text to the Bible. You won’t find a text that isn’t filled with “I tell you this and that, as it is said in Luke X.Y”. If christian content is discussed, there will always be bible references. This is a good practice, as it allows the recipient of the content to integrate the new information in her or his existing knowledge about faith. Like anchors the refernces allow us to put the new, contemporary content to historic places we know. If someone talks about God’s grace he has to quote some of the famous Bible passages that mention grace.
This reminds me of us modern researchers, if you don’t do one reference every five lines of text, you aren’t considered cool and worthy reviewing. But in Semantic Web times, you have to link to related popular articles to get yourself a good place in the search results.

Revelation

So what its got all to do with current discussion on RDF-IG and #rdfig? Well, hyperlinks and URIs are a very old stuff and we can be happy that we use them. The Holy Bible has been around for about 1900 years and people using it have invented some cool tools and social practice. There is a globally agreed identification system, aka Bible references. Protestant belief denies the authority of non-biblical material, so the only way to really know what the Bible says about Love is to read all the Bible passages about it. This is a healthy view of objectivity: You have to read all cross-linked related material to get a view.

Heavy quoting and cross-hyperlinking is good, it helps the reader of the Bible to find passages that are related to passages in current context. Index system and full text indices are in the arsenal of witty Bible-Proof christians for the last thousand years. That’s why they always find these Bible quotes so amazingly fast.

In contemporary christian literature, hyperlinking to the Bible is used to Semantically annotate the new content and thereby classify it. Related material can be found by searching for other work that cites the Bible passages.

A single web document alone us not authorative. Only all related material together gives a good impression what people think about a topic. Link-collecations, cross linking and quoting help us to find this related material. “Famous and historical” resources form some kind of Anchor for us – like Biblical passages do for contemporary christian literature.

To a savvy Christian, all this Semantic Web stuff is thousand years old and we relax by quoting Salomo (Ecc 1, 10):
“What has been will be again,
what has been done will be done again;
there is nothing new under the sun. “

and thanks to Michael Zeltner for bugging me to blog this crazy idea.

Query Languages Report

RDF-Query

A report by AIFB and Sesame and Jeen Broekstra from the Sesame crew. The Authors know what they are talking about as they are SemWeb developers themselves.

Although a little self advertisement and some missing languages, its a good thing to read. If you need info about RDF Query languages, read it.

My previous demand about “optional joins in queries” is answered by SeRQL.

why I love Patrick Sticklers URIQA approach

Ever tried to convert data into RDF? Extract something from iCalendar or an MP3 file and then use a bit of RDF? Have it all in a graph? Then you may be interestedhow to choose your weapons wise: If you want a fast and easy way for RDF integration, follow Patrick Stickler and his URIQA ideas.

Today I had another day of fighting with gnowsis, my desktop integration framework. The task was to extract data from MS-Outlook, on demand. The output format and request format is RDF, I used iCalendar/RDF as ontology.

Gnowsis does wrap all read access to outlook by a Jena Model. Outlook is represented as a Jena Model, each resource in outlook gets a URL and is a RDF resource. So f.e. a query like
SELECT ?x, ?y WHERE (<rdfp://leo.gnowsis.com/msoutlook/appointment/
00000000B2CDC30BFF2EED4ABA9C61436A07FE3384002000> ?x ?y)

does give a RDF/XML return like this: QueryResult.xml (xml, 1 KB)

as you may note, there are some “in between resources” where usually you would find anonymous resources: the properties dtstart and dtend have as object the (normally anonymous resource) “…#Start” and “…#End”.

This helps with Jena internals: My Jena model is dynamic, it has no storage backend. Whenever a query hits the model, it searches for resources and creates new triples. The anonymous nodes in between would break this model, when I get an anonymous object in dtstart, I could not easily do a request for the cal:dateTime value inside.

at this time consider reading some example RDF/ical, if you don’t have any clue what I am talking about, check out this: TestTermin.rdf. It is a longer version of the same VEvent entry.

So how can I get the anonymous nodes and will it be efficient?

This is a flaw in gnowsis: Each triple is generated by a Java class that represents the property in the triple. Triples containing the “summary” property are created by a corresponding “summary” java class, the Java class checks out the subject and finds a correct object value (and vice versa).

So Gnowsis adapts all properties with java classes. This can get too big, when an ontology uses many properties and classes. iCalendar is already too much for me to program.

So enter Patrick Stickler and his Uriqa

Most applications need – on the client side – only data about one or a few resources. You want to see an email, a person, an event. Then you may want to write an email, add the event to your calendar or do similiar stuff.

To Extract the needed data from a Jena Model, you will (based on RDQL) need many queries, even for a single Resource, as RDQL has no optional joins. This is a problem, and other people have it too. Read f.e. The Veudas Announcement and see there.

The solution is to get bigger chunks of RDF in one request.

what is a RDF chunk and how do I get it?

When you want to work with RDF data you need normally the data about a resource (at the best, the resource is identified by URL or downloadable). If the resource points to other resources via RDF triples, you may want to load the other resources, too.

F.E I have an appointment with JohnDoe, I may start at the appointment and then load the resource containing data about JohnDoe.

So when I load the chunks, how would I load them and what should they contain?

Most people will want “RDF-Subgraphs”, and I prefer them great above “variable-bound result sets” (like RDQL will give you). You will get such a subgraph from you RDF server by a protocol. A possible protocol may be URIQA. you may also do it with Joseki or Sesame.

And what should be contained in the RDF-Subgraph? Patrick Stickler has an answer, that is conformant with my wishes: He defined the Concise Bounded Description.
Everything that I need immediately, with the option to get more if I want.
Patrick says:
A concise bounded description of a resource is a body of knowledge about a named resource which does not include any explicit knowledge about any other named resource.
more about Concise Bounded Description is at the URIQA page.

So the Concise Bounded Descriptions are the kind of chunks we will like to retrieve from any RDF host or RDF publishing application, why this is so good I will tell now.

Using Chunks instead of accessing single triples has several implications:

  1. Easier to write extraction algorithms
  2. Much faster & efficient
  3. Addressable chunks
  4. Restricts Access to chunks, single triples cannot be retrieved

now to a deeper look at the implications:

1 Easy Extraction algorithms

For gnowsis, it was needed to wrap every single rdfs:Property and rdfs:Class. This is good for some applications but not for all. Complicated tasks are better done by “hand-written” extractors, programs like the many Perl/Python scripts that convert stuff to RDF. You can find many examples written by TimBl, DanC and others at SWAP and some in cal-space, f.e. fromIcal.py.

Most of these converters and tools supply single resources, they convert a single file or similiar.

It is easiser to write RDF-integration for “chunks” of RDF, this is proven by the many adapters and experience.

2 Much Faster and Efficient

I have written adapters to extract RDF from the Filesystem, MP3 files, iCal and Outlook. I have gone so far to write a specific Java class for every property and class around. I have a big overhead with this. Especially when you use ActiveX or SOAP bridges to communicate with your data sources, you will have a problem.

So it is better to write an extractor very “near” to the source data. Imagine to write an MS-Outlook data Extractor in Java or in C++/Dll what will be faster? Surely the C++ DLL.
But when you wrote your adapter as DLL, how can the DLL understand queries when they come from a sourrounding like Jena or Sesame?
So the best is to say: DLL, give me the Concise Bounded Description of Resource X.

Then the dll may extract all stuff and return a chunk of RDF/XML. And if you are really clever, you may have built it in a way that you can pass the XML directly to a calling client.

Think of implementing a Enterprise Integration Server and you need RDF data (readonly), this can be a very neat way to do it. Just use any protocol and Concise Bounded Descriptions.

Addressable Chunks

A main problem in Semantic Web is the question: How in the world will I get information about the resource X?

Many people propose to use indirect addressing:
To get a FOAF of Person X, search for
X? <foaf:mbox> <mailto:leo@test.com>

This is a way to address people with email address “leo@test.com”. hm. But which server shall I run this query against? www.test.com? smtp.test.com? And which interface to use? http, smpt, rdfp, uriqa, … ?

So if you use indirect addressing you have to think of a search engine or something like a central register. (which is ok, I respect that and I know many people who do and I like them).

But I prefer another approach, where every resource is addressable by URL. (Perhaps I will write something about this, too lazy now).

It is much easier to work with RDF-Chunks when the resource identifier URI is also a URL and contains information about protocol and server where I can get the Concise Bounded Description of the chunk.
It is even good when you are only working on Desktop Integration of data on a single host: There may be a Sesame Server and a Joseki Server running Happily on your machine, how will you decide which server to contact when you need something?

So I propose: use the power of URLS and fill them with information of how to get data about the resources.

Only chunk access, no single triples

If you work only on Concise Bounded Description chunks and cannot query a server for individual triples, this may be ineffizient! Yep, it may be, especially, if you have “directory” resources, f.e. a folder resource that is connected to thousands of files.

In this case your adapters may always have to build the whole chunk and then return it to you, so you can filter it on client side. But this is not the last word. You can use an adaptive Framework for such queries (like gnowsis framework). Or you can use another query interface only to do the selection of resources.

F.e. the selection of “all Appointments I have in the next seven days” may be a complicated search in your iCalendar data. If you are a badass hacker, you may want to write a query engine that does the trick for you, as long as you support SeRQL or RDQL I am happy hitting your server with Queries.

When I then have my list of needed Resource Identifiers, I can use URIQA or any other Concise Bounded Description compatible server to get my RDF chunks.

to sum it up

Building RDF aggregators is not an easy task. I have tried it in many different ways, and had varying forms of success. If you want an easy approach that does it, think about concise bounded descriptions.

Semantic Web Stone Age – AAARRGHHH

uargh, today’s Semantic Web tech drives me crazy.

I want to build a framework that is capable of integrating all data in the world in a single Jena Model. That is the goal.

Now what happens is I miss some points and you, dear reader, may miss them too and while you are at it, perhaps you are going to implement them

* A good query language
I want a query language that can do optional triples. It should retrieve subgraphs.

* An update language
Ever thought how you are going to update RDF sources? I remember very useful update and insert sql statements. How can we do this in RDF. And yes, ever thought how to update/change the contents of a collection? And what if triggers are bound to changes in this collection?
Deleting triples and then entering the changed version of them is no option for me.

* A protocol on the web
and how will I get my triples from www.yahoo.com? By URIQA? Please, tell me.

* A protocol on the desktop
ah yes, and when I want triples from the neat driver that adapts my big big database. Imagine you have a lovely .o file or .dll, how will I get triples out of this driver? I request an ODBC thing for RDF!

* Bonus Question
everything should be implemented/available in Jena

so. I will surely find more stuff that bothers me.

hello world

hallo liebe Welt, ich werd mal anfangen.

also, es ist so dass ich sehr gern den Cyberspace hätt, weil dann könnten wir alle so lustig dort zusammen spielen und informationen austauschen. Da von selbst nix passiert, werd ich mithelfen beim bauen. logisch, was.

der erste schritt zum cyberspace ist das semantic web. mal sehen wann wir das haben.

nebenbei werd ich noch ein bisserl über lustige Sachen schreiben und was mich sonst noch so interessiert.