PhD step3: writing the PhD

People with a good memory (who aren’t hooked on semantic systems like me) may remember that I started roughly a year ago to blog about my PhD. In Step 1 I described the general idea around Semantic Desktop and PIMO, in Step 2 I described my struggle for a research question and the actual scientific goal. Now, that was a year ago and something should have happened in the meantime…

Doing a PhD is a permanent struggle against yourself, god, life as such. If not true, this is at least a view that excuses a student to feel overwhelmed by the task, sit back, open (a can of beer|a bottle of wine) and yearn for life before and after the PhD. In the last year, I did struggling, being overwhelmed, and working on the PhD. I did entagle myself being involved in a lot of other things (such as aperture, nepomuk, burning man decompression, sweo, sesame, my 30th birthday) , which somehow ended up being a lot of things and some of them suffering from diminuished attention. Overwhelmed, we come back to the better reaction than the alcoholic beverage: finishing the PhD by writing it, then improve quality in the other tasks.

The core of the problem is that writing a PhD involves the task of writing. In general, a student can write one page of quality work per work-day. Given the fact that I was involved in a gazillion of things, this drops to one page a week. Oh, a 150 page PhD will then be written in three years. But as I started last year and want to get finished SOON, things had to speed up.

If you do the math for your own work, you can easily finish a PhD by cutting down the pagecount to a reasonable length (less than 100 pages would be excellent). But this means that the pages have to be really good. If you are really good, try writing a PhD in 82 pages like the web-shaking Human Computation by Luis von Ahn. He was able to say the following in his abstract (in my eyes a ticket to finish the PhD right after the first page and get graduated)
In addition, I introduce CAPTCHAs, automated tests that humans can pass but computer programs cannot. CAPTCHAs take advantage of human processing power in order to differentiate humans from computers, an ability that has important applications in practice.
The results of this thesis are currently in use by hundredszillions of Web sites and companies around the world, and some of the games presented here have been played by over 100,000 people.
(strikethrough and correction of numbers by me)
Basically, this translates to: my work is pure innovation, or shorter: I rule.

So, not witty enough, I aimed for doing a larger 150 page thesis with the typical parts

  • introduction: state of the art (web, knowledge management, ontologies, semantic desktop), problem
  • my idea (PIMO + Semantic Desktop for PIM)
  • my implementation and algorithms and ontologies
  • evaluation
  • conclusion: this work is pure innovation, may I rule?

The main parts of the work have been published before in other papers I did on my own and with others, see my publication list for a glimpse of what I refer to.
I took the argumentation, the citations, and the key results from this work and integrated it into one view.
Before, I had a time of “writing it from scratch” and “planning the layout in sections”, I even made a MindManager mindmap of the whole structure.
I ended up having a 200 page document which I called “PhD draft”, but which was actually a copy/pasted frankenstein of my papers, croaking “Pease kill me, Every moment I live is agony!“.
So instead of finishing early, this approach made it actually a bit longer.

The real tricky part was “finishing something”, I then started by finishing the introduction. I wrote it, improved it, printed it, thought about it, sent it to my mother (who has a phd in International Business plus a degree in english translation) to check the english (Mom, thank you so much and I am really lucky to have you!), check it by my colleague Thomas Roth-Berghofer and then by my Professor Andreas Dengel.
And alas, everyone was ok with it. Away frankenstein, welcome results. I was motivated to continue.

This was in August and knowing that I am on the right track, the rest was easy … sort of.
I plowed through section after section, improving the content, adding missing bits, removing redundancy. Today I have a 250 page version that contains most of the things that must be in, missing are a few parts I wanted to do on literature and software engineering lessons learned.
From time to time I talked with my Buddy Sven Schwarz, who is on a similar quest as me, exchanging our status and talking helped to get organized.

So, the next sections can go into review, looking forward to get it done … soon. A good motivation to speed up is the fact that Gunnar has also reached the equilibrium of “its done, only minor changes now”.

Eclipse WTP Server synchronisation

category: random problems with Java. Mainly interesting for google searchers that have similar problems.

Developing for Eclipse WTP (perhaps using Eclipse 3.2) and you feel that the code you program, the servlets and JSPs are actually never synchronized with the web application and the running code is different from the development version?

I experience this sometimes. Then, I don’t know exactly what to do, but deleting the temp-files of the server seems a good way to go.
The temp files are in:
%lt;your workspace folder>\.metadata\.plugins\org.eclipse.wst.server.core\tmp

Actually, this broke the server, so I deleted the server and created it again (to to the J2EE perspective, servers in the package explorer, delete your server, and run the web application again on a new server)

Mobile clubbing flashmob in Kaiserslautern

Today at quarter to 6, it was again flashmob time in Kaiserslautern. I enjoy the sparse breakouts of the town’s normality, especially on a mobile clubbing. Bring your own favorite music, shake freely. It was the best clubbing in town ever, seldomly I have seen so many happy people dancing in one place. No wonder, given the great music. I added some of my own to the soundtrack, bystanders heard nothing and wondered. Pics tomorrow.

thx and kudos to Katja, our flashmob queen.

but alas, I wish I would live in London, where they have really impressive mobile clubbings, like this one: (hey, theres a poidancer in it)

Property names: nouns preferred

When constructing ontologies, labels are needed for properties. This is a crucial part of the work, the names will form the XML namespace and are visible as labels in the user interface.

We had a discussion at work about names for predicates. To illustrate it, a bad example (in N3):
:isKnowingperson a rdf:Property; rdfs:Range :Person.

The name is too long, contains a verb (“is”), has a mixed uppercasing (the “p” should be uppercase to ease reading) and contains too much information (Person can be removed, its also in the range.

The practical community preferes nouns (part, location,
topic, related) there is a slight one-sentence reco towards using nouns in the swap primer.

The popular ontology (=foaf) use nouns for literal properties and a verb for knows. Similar relations can be modelled: loves, hates.
SKOS [2], also uses isXOf for inverse that were hard to define, as is
rdfs:isDefinedBy.

It seems that the trend is toward shorter forms, gerunds, verbs or nouns.
My summary is: as I want to define inverses, I will try to use nouns
without verb-prepositions whenever possible. When this does not capture
the semantics in a satisfying way, I will use “isXof” and “hasX” or
search for a gerund. In practice the guideline is: use “name” instead of
“hasName”.

For NEPOMUK’s PIMO ontology, this would result in:

  • part – partOf
  • location – locationOf
  • related
  • topic – topicOf

I would appreciate feedback based on published guidelines on building
ontologies or on other ontologies that had similar problems.

ANNOUNCEMENT of Aperture 1.0.1-beta release

Aperture is a Java framework for extracting full-text content and
metadata from various information systems (e.g. file systems, web sites,
mail boxes) and the file formats (e.g. documents, images) occurring in
these systems.

http://aperture.sourceforge.net/

Today, on 12th November 2007, we publish the first beta release, marking the point where aperture is being applied in projects. The new version number is 1.0.1.

This release bears the mark of the Nepomuk Social Semantic Desktop – a major intiative combining research institutes and commercial companies from around Europe. Aperture is used as one of the pillars of a next-generation platform that changes the way people can organize and use the data stored on their computers. The input from the Nepomuk Community drove us to implement a host of new features that make Aperture more useful, more flexible and more powerful.

New Features

  • Aperture has been migrated to use the newly developed
    Nepomuk Information Element Ontology framework. This
    added a new level of precision and expressiveness to the
    output of Aperture components. The ontology itself
    is endorsed by the Nepomuk Consortium, well documented
    and maintained.
  • The output is now thoroughly tested with an extensible
    rdf validator for compliance with the ontology. This
    allowed us to fix a number of bugs that made certain
    properties appear in places they didn’t belong
  • the data source configuration API has been overhauled and
    is now much more easy to use
  • A new facility that allows clients to implement dynamic
    GUIs for data source configuration.
  • New JpgExtractor that extracts EXIF annotations from JPG
    files
  • four new experimental crawlers (Flickr, Bibsonomy,
    del.icio.us and Apple IPhoto).
  • host of small improvements and bug fixes

Another improvement is a public wiki for documentation,
tutorials and FAQ.
http://aperture.wiki.sourceforge.net/

Updated dependencies

  • Sesame 2.0 beta-6 (was beta-4)
  • RDF2Go 4.4.6
  • RDF2Go driver for sesame
  • A metadata-extractor-2.4.0-beta-1 library used by the
    JPGExtractor
  • POI has been update to 3.0-FINAL
  • flickrapi-1.0b4 used by the FlickrCrawler
  • DFKIUtils 2.0 some XML-related utilities are used by the website crawlers
  • nrlvalidator-0.1 the validator used by the unit tests
  • infsail-0.1 and unionsail-0.1 dependencies of the
    validator

Download here.

Best regards
Antoni Mylka
Christiaan Fluit
Leo Sauermann

now with barcodes

The semantic web is about URIs, and every idea needs a URI. Each of my blog posts already got one, but now also in printed form. Look to the lower right, a mobile-phone readable QR code.

barcode for this post

Technically this all is age-old and I loved it already years ago when semapedia had its “uh-ah”. The Nipponese, in the meantime, went for the full monty and covered their island coast-to-coast with QR codes, so we have to catch up.

For my mother of all cell phones E70, I use the i-nigma reader (ha, another enigma pun) because it was listed on Nokias page on barcode apps.

To hack twoday.net to render them I added a table after the blogposts in the “manage” menu – change template (html) – story.display – add somewhere this code (generated using i-nigma generator):

<!– barcode –>
<img style="float:right" src="http://212.179.113.209/QRCode/img.php?d=URL%3Ahttp%3A%2F%2Fleobard.twoday.net%2Fstories%2F<%
story.id %>&c=blogcode&s=4" alt="QR barcode linking to this story. Useful when
printed." />

The leobard.twoday.net part needs to be replaced. It can probably be simplified by replacing the whole uri with <%
story.href %> but then I feared that the URL-encoding would break.

Next task for all Semantic Web lovers: print out and clue barcodes to all physical things that deserve a URI. And don’t fuck it by picking an uncool URI, pick a cool uri for the semantic web.

Wiki am Semantic Desktop – Hiwi Job in Kaiserslautern

Wir vergeben einen

HiwiJob/ Praktikum: Ein Wiki für den Semantic Desktop.

Der Semantic Desktop erlaubt dem Benutzer, Dokumente seines Arbeitsplatzes (Dateien, Webseiten, Adressen, Termine…) beliebig zu
verknüpfen und mit weiten Informationen zu versehen. Als Schnittstelle
zum Benutzer dient dabei ein persönliches Wiki, das entsprechend
erweitert wird.

Die Arbeit erfolgt in Zusammenarbeit mit anderen Entwicklern und ist
größtenteils praktischer Natur. Die Fähigkeit zum selbständigen
Implementieren sowie zum schnellen Einarbeiten in existierende
Frameworks ist Voraussetzung. Zur Koordination der Entwicklung
wird ein Softwareentwicklungsportal benutzt. Die erstellte Software ist
größtenteils Open Source.
Anforderungen

  • Erfahrung mit Java, JSP, JavaScript, Tomcat
  • Erfahrung mit Open SourceProjekten (Dokumentation etc.)

Kontakt
Leo Sauermann
Deutsches Forschungszentrum für Künstliche Intelligenz
Forschungsbereich Wissensmanagement
Trippstadter Straße 122, Raum 3.04
Tel.: +49 631 20575-116
sauermann@dfki.unikl.de