PhD step3: writing the PhD

People with a good memory (who aren’t hooked on semantic systems like me) may remember that I started roughly a year ago to blog about my PhD. In Step 1 I described the general idea around Semantic Desktop and PIMO, in Step 2 I described my struggle for a research question and the actual scientific goal. Now, that was a year ago and something should have happened in the meantime…

Doing a PhD is a permanent struggle against yourself, god, life as such. If not true, this is at least a view that excuses a student to feel overwhelmed by the task, sit back, open (a can of beer|a bottle of wine) and yearn for life before and after the PhD. In the last year, I did struggling, being overwhelmed, and working on the PhD. I did entagle myself being involved in a lot of other things (such as aperture, nepomuk, burning man decompression, sweo, sesame, my 30th birthday) , which somehow ended up being a lot of things and some of them suffering from diminuished attention. Overwhelmed, we come back to the better reaction than the alcoholic beverage: finishing the PhD by writing it, then improve quality in the other tasks.

The core of the problem is that writing a PhD involves the task of writing. In general, a student can write one page of quality work per work-day. Given the fact that I was involved in a gazillion of things, this drops to one page a week. Oh, a 150 page PhD will then be written in three years. But as I started last year and want to get finished SOON, things had to speed up.

If you do the math for your own work, you can easily finish a PhD by cutting down the pagecount to a reasonable length (less than 100 pages would be excellent). But this means that the pages have to be really good. If you are really good, try writing a PhD in 82 pages like the web-shaking Human Computation by Luis von Ahn. He was able to say the following in his abstract (in my eyes a ticket to finish the PhD right after the first page and get graduated)
In addition, I introduce CAPTCHAs, automated tests that humans can pass but computer programs cannot. CAPTCHAs take advantage of human processing power in order to differentiate humans from computers, an ability that has important applications in practice.
The results of this thesis are currently in use by hundredszillions of Web sites and companies around the world, and some of the games presented here have been played by over 100,000 people.
(strikethrough and correction of numbers by me)
Basically, this translates to: my work is pure innovation, or shorter: I rule.

So, not witty enough, I aimed for doing a larger 150 page thesis with the typical parts

  • introduction: state of the art (web, knowledge management, ontologies, semantic desktop), problem
  • my idea (PIMO + Semantic Desktop for PIM)
  • my implementation and algorithms and ontologies
  • evaluation
  • conclusion: this work is pure innovation, may I rule?

The main parts of the work have been published before in other papers I did on my own and with others, see my publication list for a glimpse of what I refer to.
I took the argumentation, the citations, and the key results from this work and integrated it into one view.
Before, I had a time of “writing it from scratch” and “planning the layout in sections”, I even made a MindManager mindmap of the whole structure.
I ended up having a 200 page document which I called “PhD draft”, but which was actually a copy/pasted frankenstein of my papers, croaking “Pease kill me, Every moment I live is agony!“.
So instead of finishing early, this approach made it actually a bit longer.

The real tricky part was “finishing something”, I then started by finishing the introduction. I wrote it, improved it, printed it, thought about it, sent it to my mother (who has a phd in International Business plus a degree in english translation) to check the english (Mom, thank you so much and I am really lucky to have you!), check it by my colleague Thomas Roth-Berghofer and then by my Professor Andreas Dengel.
And alas, everyone was ok with it. Away frankenstein, welcome results. I was motivated to continue.

This was in August and knowing that I am on the right track, the rest was easy … sort of.
I plowed through section after section, improving the content, adding missing bits, removing redundancy. Today I have a 250 page version that contains most of the things that must be in, missing are a few parts I wanted to do on literature and software engineering lessons learned.
From time to time I talked with my Buddy Sven Schwarz, who is on a similar quest as me, exchanging our status and talking helped to get organized.

So, the next sections can go into review, looking forward to get it done … soon. A good motivation to speed up is the fact that Gunnar has also reached the equilibrium of “its done, only minor changes now”.

Property names: nouns preferred

When constructing ontologies, labels are needed for properties. This is a crucial part of the work, the names will form the XML namespace and are visible as labels in the user interface.

We had a discussion at work about names for predicates. To illustrate it, a bad example (in N3):
:isKnowingperson a rdf:Property; rdfs:Range :Person.

The name is too long, contains a verb (“is”), has a mixed uppercasing (the “p” should be uppercase to ease reading) and contains too much information (Person can be removed, its also in the range.

The practical community preferes nouns (part, location,
topic, related) there is a slight one-sentence reco towards using nouns in the swap primer.

The popular ontology (=foaf) use nouns for literal properties and a verb for knows. Similar relations can be modelled: loves, hates.
SKOS [2], also uses isXOf for inverse that were hard to define, as is
rdfs:isDefinedBy.

It seems that the trend is toward shorter forms, gerunds, verbs or nouns.
My summary is: as I want to define inverses, I will try to use nouns
without verb-prepositions whenever possible. When this does not capture
the semantics in a satisfying way, I will use “isXof” and “hasX” or
search for a gerund. In practice the guideline is: use “name” instead of
“hasName”.

For NEPOMUK’s PIMO ontology, this would result in:

  • part – partOf
  • location – locationOf
  • related
  • topic – topicOf

I would appreciate feedback based on published guidelines on building
ontologies or on other ontologies that had similar problems.

ANNOUNCEMENT of Aperture 1.0.1-beta release

Aperture is a Java framework for extracting full-text content and
metadata from various information systems (e.g. file systems, web sites,
mail boxes) and the file formats (e.g. documents, images) occurring in
these systems.

http://aperture.sourceforge.net/

Today, on 12th November 2007, we publish the first beta release, marking the point where aperture is being applied in projects. The new version number is 1.0.1.

This release bears the mark of the Nepomuk Social Semantic Desktop – a major intiative combining research institutes and commercial companies from around Europe. Aperture is used as one of the pillars of a next-generation platform that changes the way people can organize and use the data stored on their computers. The input from the Nepomuk Community drove us to implement a host of new features that make Aperture more useful, more flexible and more powerful.

New Features

  • Aperture has been migrated to use the newly developed
    Nepomuk Information Element Ontology framework. This
    added a new level of precision and expressiveness to the
    output of Aperture components. The ontology itself
    is endorsed by the Nepomuk Consortium, well documented
    and maintained.
  • The output is now thoroughly tested with an extensible
    rdf validator for compliance with the ontology. This
    allowed us to fix a number of bugs that made certain
    properties appear in places they didn’t belong
  • the data source configuration API has been overhauled and
    is now much more easy to use
  • A new facility that allows clients to implement dynamic
    GUIs for data source configuration.
  • New JpgExtractor that extracts EXIF annotations from JPG
    files
  • four new experimental crawlers (Flickr, Bibsonomy,
    del.icio.us and Apple IPhoto).
  • host of small improvements and bug fixes

Another improvement is a public wiki for documentation,
tutorials and FAQ.
http://aperture.wiki.sourceforge.net/

Updated dependencies

  • Sesame 2.0 beta-6 (was beta-4)
  • RDF2Go 4.4.6
  • RDF2Go driver for sesame
  • A metadata-extractor-2.4.0-beta-1 library used by the
    JPGExtractor
  • POI has been update to 3.0-FINAL
  • flickrapi-1.0b4 used by the FlickrCrawler
  • DFKIUtils 2.0 some XML-related utilities are used by the website crawlers
  • nrlvalidator-0.1 the validator used by the unit tests
  • infsail-0.1 and unionsail-0.1 dependencies of the
    validator

Download here.

Best regards
Antoni Mylka
Christiaan Fluit
Leo Sauermann

now with barcodes

The semantic web is about URIs, and every idea needs a URI. Each of my blog posts already got one, but now also in printed form. Look to the lower right, a mobile-phone readable QR code.

barcode for this post

Technically this all is age-old and I loved it already years ago when semapedia had its “uh-ah”. The Nipponese, in the meantime, went for the full monty and covered their island coast-to-coast with QR codes, so we have to catch up.

For my mother of all cell phones E70, I use the i-nigma reader (ha, another enigma pun) because it was listed on Nokias page on barcode apps.

To hack twoday.net to render them I added a table after the blogposts in the “manage” menu – change template (html) – story.display – add somewhere this code (generated using i-nigma generator):

<!– barcode –>
<img style="float:right" src="http://212.179.113.209/QRCode/img.php?d=URL%3Ahttp%3A%2F%2Fleobard.twoday.net%2Fstories%2F<%
story.id %>&c=blogcode&s=4" alt="QR barcode linking to this story. Useful when
printed." />

The leobard.twoday.net part needs to be replaced. It can probably be simplified by replacing the whole uri with <%
story.href %> but then I feared that the URL-encoding would break.

Next task for all Semantic Web lovers: print out and clue barcodes to all physical things that deserve a URI. And don’t fuck it by picking an uncool URI, pick a cool uri for the semantic web.

Wiki am Semantic Desktop – Hiwi Job in Kaiserslautern

Wir vergeben einen

HiwiJob/ Praktikum: Ein Wiki für den Semantic Desktop.

Der Semantic Desktop erlaubt dem Benutzer, Dokumente seines Arbeitsplatzes (Dateien, Webseiten, Adressen, Termine…) beliebig zu
verknüpfen und mit weiten Informationen zu versehen. Als Schnittstelle
zum Benutzer dient dabei ein persönliches Wiki, das entsprechend
erweitert wird.

Die Arbeit erfolgt in Zusammenarbeit mit anderen Entwicklern und ist
größtenteils praktischer Natur. Die Fähigkeit zum selbständigen
Implementieren sowie zum schnellen Einarbeiten in existierende
Frameworks ist Voraussetzung. Zur Koordination der Entwicklung
wird ein Softwareentwicklungsportal benutzt. Die erstellte Software ist
größtenteils Open Source.
Anforderungen

  • Erfahrung mit Java, JSP, JavaScript, Tomcat
  • Erfahrung mit Open SourceProjekten (Dokumentation etc.)

Kontakt
Leo Sauermann
Deutsches Forschungszentrum für Künstliche Intelligenz
Forschungsbereich Wissensmanagement
Trippstadter Straße 122, Raum 3.04
Tel.: +49 631 20575-116
sauermann@dfki.unikl.de

Holy Grail of PIM

Walter Rafelsberger who runs the meta portal of media polemic has blogged about His Holy Grail of PIM.

Excerpt:
One thing I have in mind would be a microformats/AJAX/xmlrpc/API/etc. powered Wiki with focus on ease of use and service integration. The SemanticWiki implementations are heading in the right direction, but they still feel more like proofs of concept and not really usable in terms of usability.

That said, for the moment I decided to switch from one evil to many small little devils. Here’s what I use for PIM for now.

Yep! read on ….
link

socialising web 2.0 musical performed at big brother awards by monochrom

lim (nerdness)>1 when Austrians art dada fluxus group monochrom hit the topic of social networking, facebook, myspace, youtube, etc and mash it up in a live musical performance at the big brother awards 2007.

The experienced and award-winning group hit again the musical genre.
Here is the video, you need to scroll to about half-length to come to the actual musical (the beginning is 10 minutes of gamejew boredom, scroll it), I would bet that there is a version of this video on youtube. Well, and if you don’t understand my outmost appreciation of this: I am a fan.

http://s1.video.blip.tv/1010000946590/Gamejew-GamejewSeason2Ep4326p.mov

naturally, via monochrom blog.

SOA = Software Oriented Architecture (ah, web 3.0)

What is SOA? Software oriented architecture! That is a accidentially mixed up meaning of the acronym just said in a talk of a speaker at this years “Akademische Jahresfeier” at the TU Kaiserslautern.

“service oriented architecture” is the buzzword that really drives the techies at the moment, but often software oriented architecture is the outcome.

But the speaker looks into the future and his talk is very good. semantic web is the next thing, web 2.0 is it, web of things, international communication, business software is at the beginning of a Kondratiev cycle, etc. Nice overview. flattering was a slide that linked to Nepomuk and the article by technology review about me.

here is a moblog pic from that slide, pointly before the talk ends (yes nokia e70 makes the moblogger extatic)

http://m.flickr.com/photo.gne?id=1747446165&

The tempting rule to break

W3C has published new Semantic Web Logos (as blogged before) and with them, rules how to use them. One of the rules, which caused uproar in some blog posts and questions in mailinglists, was:

  • The logo may not be used to disparage W3C, its Member organizations, services, or products.

Also, the logo is not so distinctive as the triples.

The phrase alone triggers free minded web enthousiasts to do the right thing, namely, to question the other rules about photoshopping the logo 🙂

semweb chasing microformats by burningbird Shelley Powers

sparql logo by Danny Ayers

the gladiator “the gladiator” by Danny Ayers

update (22.10.2007)
your own combinations by gridinoc

W3C published new Semantic Web logos

swhorzWe have a visual identity now! SWEO and the W3C communications team worked on different designs for a Semantic Web logo, to the right you see the nicely colored result.

They are available in different colors and tastes, like this:
owl in magenta

The three sides of the tri-color cube in these logos evoke the triplet of the RDF model. The peeled back lid invites you to Open Your Data to the Semantic Web!

W3C anticipates using the Semantic Web cube in conjunction with other imagery related to the Semantic Web. However, until the imagery is more widely recognized, it should not be used on its own. Please use the full logos above rather than the cube alone.

So, could someone please photoshop a picture of my head “opened up” instead of the triple on my forehead?

All the other logos here:
www.w3.org/2007/10/sw-logos.html