My Dissertation published

In October 2006 I started blogging about my PhD, here the aftermath: it is published!

Get your PDF version or printed version directly from the publisher:
cover
http://www.dissertation.de/buch.php3?buch=5954

The ISBN Number is 978-3-86624-449-8

The bibtex is online at bibsonomy:
http://www.bibsonomy.org/bibtex/22b669dcf81fefda8ba233ded900aec9d/leobard

An online version is available here:
http://www.dfki.uni-kl.de/~sauermann/papers/Sauermann2009phd.pdf

Alas, now you can finally cite it and it is free to take its place in science.
and I am free, too…

awesome Touch&Write technology demo by DFKI

Working at DFKI is awesome, and now there is a video showing the work done by Marcus Liwicki to bring touch-tables and collaborative work to the next level. This is a peak outcome by the people from our km group, led by Professor Andreas Dengel. Saher El-Neklawy did part of the programming, many people contributed here.

Others say:
if you’ve ever had to sit down with people at the collaboration table, I think you’ll recognize how well this system matches that experience. I really appreciate the approach they’ve taken, and I look forward to seeing where else it will go.

Our website says:
The touch&write pen-abled interactive touchtable combines infrared technology for the normal touching and moving with the digital pen technology for high resolution handwriting. This allows an intuitive switch between the modes object manipulation, and content editing.

Touch & Write is an innovative new platform for creating applications, that users find natural to use. It seamlessly integrates the paper world into the digital world. Editing, arranging and writing tasks can be easily performed in an intuitive way.

The construction of Touch & Write as a table top environment encourages collaborative settings. Users surround the table, discuss their ideas and work together. Since intuitive handwriting is allowed as an input metaphor, the creativity is furthermore supported.

Some of the shown parts in the demo are based on technology connected to things we do in the semantic part of DFKI, our department is a really nice environment to work in 😉

NEPOMUK and Semantic Desktop in the midst of KDE and gnome integration

The co-located Akademy/KDE and GUADEC/GNOME “desktop summit” is over, and various bloggers report about the outcome. pro-linux news. Sebastian Trüg witnesses joint conference.

A summary statement on the KDE blogs by Richard Moore reports that NEPOMUK and the Semantic Desktop are key elements to build upon (also mentioning it before the new GUI) and distinguish the KDE desktop from others. During his keynote, Sebastian Kügler, board member of KDE Ev, announced that more funding will go into the Semantic Desktop area:
KDE e.V. will sponsor a series of developer meetings that focus on integrating these features into the desktop. We invite teams inside KDE to think about how their software can benefit from the semantic framework introduced in KDE 4. The semantic desktop has the potential of being a game-changer for the Free Desktop, as it provides a way to model the user’s data closer to how the human brain does it. It will move the computer’s user interface one step closer to the user. (copied from Richard Moore’s post, I cannot cross-check if Sebastian really said that)

Most apparent and important for me: gnome and KDE people work in parallel, but with closer and closer cooperation. An interesting aspect on this merge and cooperation is our contribution from the Semantic Desktop side:
KDE’s strigi library is getting closer to gnome’s tracker. As Sebastian Trüg writes, tracker is now using RDF and sparql.

As a common ground, developers from both projects agreed in April on using the NEPOMUK ontologies, and this will for cooperation grew and grew since then. I am helping a bit from the side of www.oscaf.org, an organization that was founded as a place for standardization work by DERI, DFKI, and KDE. There I represent DFKI, and we had to rethink our process a lot to cater for the needs of open-source projects.

So, I am watching open-mouthed at how ontologies really help building a common model between domain experts (horrayy, they work) and cannot imagine what will happen soon in the KDE/Gnome community around NEPOMUK. I really can’t, because with these many brains who now cooperate, innovation is speeding up by the minute, and its not predictable anymore what semantic desktop applications will be there in a year. Looking forward to it, though 🙂

phd step7: Defending the PhD

On Friday, 5th June 2009, I defended my PhD on “The Gnowsis Semantic Desktop approach to Personal Information Management”. I defined the PIMO ontology, an architecture built on gnowsis 0.9, and evaluated it. My conclusion is: the Semantic Desktop, as I define it, supports users in filing, finding, and thinking about information.

On Friday, I defended the thesis in a 30 minute talk. I concentrated on one story-line “knowing more than you can remember” and knowledge articulation. Thomas Roth-Berghofer passed on a tip by Professor Richter: have one slide in the presentation that is really complex, to show that you did something challenging. So I drilled down on the dropbox application to show the innards of the system. There was a misunderstanding between one Professor and the School of Informatics about the date, so we had to wait a bit until he finally arrived, but luckily everything went excellent. After the talk the Professors debated about the grade and then called me in, this raises the anxiousness effectively. They decided to grade me “sehr gut”, which translates to “magna cum laude” and is the second-best grade (after “summa cum laude”).

For preparation, I collected the major arguments that needed to be in the presentation, cut away all the details, took a lot of my slides from my previous 85 talks about the topics, and changed everything to give one coherent story with coherent examples. I also used a few structuring tricks, such as “in-between” slides to separate areas and nice rounded corners. Here is the presentation on slideshare:

Our tradition here goes on to meet at the institute, drink some sparkling wine and give cheers to the candidate. Professor Andreas Dengel, my supervisor, gave a very nice speech about my work and my personality. I gave thanks to my peers, God and Jesus, and to Ingrid, my wife. Then the tradition is to give the candidate a doctoral hat that is built by his peers. Here is the moment:

Leobard mit Doktorhut

My hat is awesome, it has a burning man in the middle, is made of tape sculpture, and glows in blacklight:
Hut
click the picture to read the details.

We then ate good food from the catering company “Klein-Partyservice” who are locals from Kaiserslautern. I also brought three crates of beer, which was more than enough for the 30 guests. Part of the celebration was opening a bottle of Barolo wine from my best friend Ebo, which tasted excellent. In the night, a few of us went to the frohlocker.de party in Kramladen and we had a lot of fun with the crew there, like bringing some good wine. And the robot:
Robot dancing frohlocker

DJ Frohlocker

Max and Heiko slept in Kaiserslautern and we all had breakfast together at my place, great:
Frühstück

This is the last important step of my three-year enterprise to blog about doing a dissertation on Semantic Desktop at DFKI (on that page you find trackback links to all steps).
The finishing step will be publication as book and then receiving the title.

phd step6: preparing the presentation, last minute panic

Today I will defend my Phd, which is another point in the long story I blogged about doing a phd.

Things to do 2h before going to the defence:

  • test your talk again. do it before good friends who you trust and who will give you positive feedback. In my case: Thomas Roth Berghofer and Olaf Grebner
  • but 3 crates of beer and 12 bottels of sparkling wine into the office refridgerator
  • use a lot of axe deo-spray to fight cold-sweat-of-panic
  • print your presentation slides in case armageddon happens and no beamer is available and you have to give your talk without beamer (thx to Olaf for the tip)
  • blog

Then, at 4pm, go and defend your thesis. To put it in starcraft-speak: The attackers will watch you build your base while they have enough time to go for resources and then do a zergling rush. So put your arguments into bunkers and use your tanks for cover fire. Anyway, they are not in for winning, its the joy of attacking you, remember that.

see you on the other side….

Interview on webinale

Back from Berlin! On webinale.de, I was interviewed by three Journalists/Bloggers about my Semantic Desktop idea and startup. I also gave a talk about Semantic Web, which was one of the more visited talks. Two journalist/bloggers have put their reports online already, read more on our blog
http://www.gnowsis.com/about/node/8

At this years “webinale” web 2.0 conference in Berlin, Leo Sauermann gave a talk about Semantic Web and had the time to talk about gnowsis.com.

Viktoria Trosien from tiburon-tv interviewed Leo Sauermann about the core idea about our startup, read Viktoria’s article and watch the video.

For more information about Semantic Web and Semantic Desktop, there is also an interview with Leo Sauermann on Sian-Ru Lai’s blog.

Part of this is information about gnowsis.com, in German:

Sian: Du hast ja auch dein eigenes Start-Up. Worum geht es dabei?

· Menschen beim Erinnern helfen. Heutzutage haben wir so viel Information, dass man sich gar nicht mehr alles merken kann. Computer verhalten sich aber immer noch wie Bene-Ordner, also wie starres Paper wo ich Zettel einordne. Wie kann man aber Ordnung in E-Mails, Dateien, Webseiten, Projekte und Kontakte bekommen? Mit gnowsis.com bauen wir gerade eine Firma auf, die mit Web 3.0 Technologien dieses Problem angeht. Man kann sich damit ein persönliches Wissensnetz bauen, dass dann in jeder Anwendung verfügbar ist. Ich sehe dann, woher ich eine Person kenne oder welche Dinge ich zum Urlaub einpacken wollte. Das schwierige ist jetzt, die bestehenden Texte zu lesen und dieses Netzwerk automatisch zu bauen, da sind wir dran. An sich ist das Thema sehr spannend, wir haben da aber schon 6 Jahre dran geforscht und werden bald mit einer einfachen Version auf den Markt gehen. Bin schon gespannt wie sich das entwickelt.

Austrian Science Minister publicly announces to leave CERN – mountainfolks withdraw from the suisse caves into their own

The Austrian Science Minister Johannes Hahn wants to free 20mio EUR budget per year by leaving the CERN consortium. This would allow him to move the money directly to his buddies at local universities to support “european research”.

Well, then please also close down your WWW servers at http://www.bmwf.gv.at/ because we are currently celebrating 20 years of this CERN invention.

In their own words: “Durch die frei werdenden CERN-Mittel bieten wir den Universitäten eine europäische Forschungsperspektive”. – “By freeing funds from CERN we offer the Universities a european Research Perspective”. Well, I wonder why the Universities are currently cut off from EU funds – because they are not writing enough proposals? And CERN is international, its not Europen. Even better to stay there.

Here is the press release (7.5.2009).

Here is the protest platform:
http://sos.teilchen.at/

“Der wissenschaftliche Output ist unbestritten, aber die Sichtbarkeit kleiner Staaten in Experimenten mit über 2.000 Mitgliedern eher gering.” – meaning: with over 2000 scientists, the Austrians don’t always appear in front row, and this is not enough great publicity for my ministry. What the fuck? Its about science, not about a minister showing off in front of cameras. Be happy if you can send your students to meet the other 2000 top phycicists in the world, in a highly competitive atmosphere.

So, lets leave the underground caves in switzerland, filled with mysterious magnets, to go back to our own caves at the Austrian Research Centers, a political wonderland of science funding.

Annotating files – but where to store the metadata?

An interesting thread about file metadata for KDE got my attention: Portable Meta-Information. I waited a month until it cooled down and re-read it to draw my own conclusions.

The author, zwabel, correclty identified the problem that the Semantic Desktop must be compatible with the past – and with the future!.

I think, for the future, we need to find a way to keep the users data together, so it is as persistent and approachable as the files themselves:
– When the user copies his photo archive or backs it up to a CD, no matter what application he uses, meta-information like ratings, comments, or tags, have to move together with the photos
– When the user has a fresh install, and copies his photo archive from a CD to the disk, the meta-information for the photos should be just there
– User-generated meta-data should _never_ be lost just because a file/directory was renamed, a mount-point changed, or whatever
– User-generated meta-data should not be lost when a file completely unrelated to the item is damaged or deleted(Database)
– In 20 years, when KDE4 is history for a long time, and I find an old photo backup CD, the meta-data should still be readable

zwabel then suggested to store the metadata additionally to the central store (which NEPOMUK needs for the search engine and is essential anyway) in a multitude of “.meta” files, which are stored in the same directory as the files. For the file picture1.png, the metadata would be in picture1.png.meta. I think this is a pragmatic idea and would say:

Lets store it in picture1.png.rdf

As serialization, I suggest the W3C RDF standard, which we use in the central NEPOMUK store anyway (in the database) and which has a well-readable standardized serialization format in either XML or a plain-text format. To achieve linux-geek compability, I suggest the plaintext format. For example, to add authorship information about picture1.png, it would be:

@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.  
<>  dc:creator "Dave Beckett";
dc:date "2002-07-31";
dc:publisher "ILRT, University of Bristol";
dc:title "Dave Beckett's Home Page" .

Note that the <> is a known shortcut for “this”, the equivalent rdf/xml is: rdf:about=””.

Sebastian Trüg also argues in a way that also leaves both ways open for the future, database and filesystem:
“you need a database anyway. Thus, in the end, the only solution I see at the moment is a kind of copy wrapper that makes sure metadata is copied with the file. Then one could also send information like a person or a project to a friend and the system would pick up all interesting metadata.”

So – how do we format the metadata inside the files? The same way we do as in the RDF repository of nepomuk. There we use the NIE and NAO ontologies. But Pushing Dublin Core is also a good way to do, but do it the W3C way, standardized.

Using the RDF encoding of Dublin Core and for example Turtle/N3 as serialization format gives a rock-solid W3C industry standardized (or at least well implemented) way.

Because the world is not perfect and needs many possible ways to evolve, we can store the metadata in redundancy now in as many places as possible – but in one format. For freedesktop and nepomuk RDF is the best choice, in my (not so humble) opinion. It is serializeable, it can be stored in a database, it can be hosted on the web. No other standard has this. It is embedded in PDF already in the XMPP format.

I propose “.turtle” files to indicate that its RDF/Turtle serialization, but if you insist, “.rdf” is also fine with me (but implying RDF/XML storage, which is a bit sluggish), and “.meta” is also fine with me if you store RDF/turtle inside. Making up a new micro format would be stupid.

My Summary:

  • storing it in the filesystem is nice, but not a killer-argument. It works ™ by just storing it in the central nepomuk repository for 90% of all use cases, so start hacking applications that help the users save time and improve their user experience with what is there today.
  • do not store it in .meta, but in .turtle, which is the rock-solid industry standard by W3C and human-readable and a simple microformat-like text format (smoother than xml)
  • do also store it however possible in the files themselves, not to block out others. Use EXIF fields, use XMPP fields in PDF, use ID3v2 fields, use those metedata!
  • do also index it in the central search engine, be it nepomuk or beagle++ (beagle++ is the rdf-enabled beagle, check it out if you are not aware of it)
  • storing it in metadata file attributes (xattr/channels/…) is the goal, but I propose to extend these standards with RDF to achieve cross-system compability. What worked for the web, may also work here.

OrganiK project: working on testdata collection

As blogged in January, Gunnar, Remzi and I are working for DFKI on the Organik-Project. As true hard bloggin’ scientists, we keep on reporting.

In the next two weeks, I will gather an exhaustive test-data collection of texts that we use for ontology learning. I hope to gather around 10.000 documents from various sources that have a topic overlap. We need e-mails, office documents (contracts, etc) and news documents. There are a lot of test data sets out there, the question is now to pick the right one. Also, in OrganiK we have SME partners who could provide some data.

After this, the next step will be to create a taxonomy learning module that analyses the documents and semi-automatically (or fully automatically) creates a taxonomy out of it for future classification. If its fully automatic, I expect that the taxonomy will have probabilistic elements in it (“it thinks that this is a customer, but only 60%”). If we work with a probabilistic model throughout the whole project, we can rank everything all the time, maybe this will reduce human work. We will see.
Anyone has experience with taxonomies that have a weight added? Its similar to a TF/IDF rank.