Aperture 2006.1 alpha 3 RELEASED

We are pleased to announce the third alpha release of the Aperture framework.

Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (e.g. file systems, web sites, mail boxes) and the file formats (e.g. documents, images) occurring in these systems.

The most notable feature in this release is a new IcalCrawler. It works with
iCal files generated by many calendaring applications (Apple iCal, Korganizer,
Lotus Notes …). It uses a ical-rdf mapping developed by the W3C Rdf
Calendaring group. Apart from that there are numerous small improvements and
bugfixes. The tutorial has been expanded with more code examples and UML
diagrams to facilitate learning for new users.

This the last release before the switch to the RDF2Go framework.
(The curious can already examine the RDF2Go branch in the cvs).

The project homepage:
http://aperture.sourceforge.net

Aperture 2006.1-alpha-3 can be downloaded from here:
http://sourceforge.net/project/showfiles.php?group_id=150969&package_id=166878&release_id=460471

What’s new in alpha-3?

– new IcalCrawler

– added MIME type detection for many formats:

– improved MIME type detection of MHTML files (web archives)

– introduced HtmlParserUtil, containing large parts of the HtmlExtractor
implementation, as HTML (fragments) may occur in other document types
as well (e.g. saved mails, see MimeExtractor)

– added ThreadedExtractorWrapper class, for catching and interrupting
hanging Extractors

– added RepositoryAccessData, an AccessData implementation storing its
information in a Repository

– added ability to specify a port number for an IMAP source

– set target platform to Java 5

Leo Sauermann
Christiaan Fluit
Gunnar Grimnes
Antoni Mylka

More on the Semantic Web Congress by Benjamin Nowack

Two weeks ago I gave a talk at ZGDV.
Benjamin Nowack blogged about the ZGDV Semantic Web congress and was so polite to put his slides on the web. Also, he published the nice pictures of me having fun while giving my talk.

I can only copy that behavior and here they are, my slides on Semantic Desktop (in German):
www.dfki.uni-kl.de/~sauermann/2006/10/19/009_sauermann.pdf

I took the freedom to copy them to flickr, not to push his bandwith too much 🙂 here they are:
trying to look like Minority Report
Nepomuk slide and Leo

PhD step2: the research question and how can I answer it (is it possible to write a PhD on gnowsis?)

I will be blogging about my Semantic Web PhD for the next months, until I am finished. You will learn what I did in the last years and what I plan to do in the next months to write my thesis. Perhaps you can copy something for your own work or point me to information I missed – critique, positive and negative, is warmly welcome.

The topic of my PhD thesis is derived from my Diploma Thesis “The Gnowsis: Using Semantic Web techonologies to build a Semantic Desktop”. The work I did in 2003 was to create a Semantic Web Server for a single user, on your desktop. So the desktop is turned into a Semantic Desktop. The abstract ends with:
Using the gnowsis prototype, which is a result of this work, applications have access to all important information stored in a single computer. Users are able to classify and structure their information in any way they want by creating bidirectional links between resources. A prototype information management tool GnoGno based on a wiki /weblog was built to explore this possibility.

So, what am I going to do for PhD? Continue! I got different remarks on that by others:

  • That was a diploma thesis? After reading it, I thought it was your PhD
  • Just write down, we will see then…
  • You can never write a thesis about an implementation, thats not science

Note that I worked for 18 months on this diploma thesis, beginning June 2002 and finishing December 2003, which is far more time than any thesis student has here at DFKI, so it may contain enough to be accepted as PhD at some universities in the world. At least, I did publish a description of an implemented Semantic Wiki, a Semantic Blog and a way to extract data from Outlook using find(SPO) queries, using a mapping language like D2RQ. All these topics are still very hot, years after my work. Also, I published them piece by piece in peer-reviewed conferences or journals. Nothing to hide there.

So, I am positive that my work is science. Coincidence, I googled for websites that are like mine today, stumbling across Dennis Quan. His thesis made with David Karger at MIT on Designing End User Information Environments Built on Semistructured Data Models is a good example of the direction I want to go: describing how to build Semantic Web environments for the real world. And I interpret Dennis’ thesis in a way that you indeed can write a PhD thesis about implementation matters, half his thesis is about Adenine, Ozone and the RDF bits and pieces he created (which are very good, btw).

So the research question I have is on the borders between Semantic Web, Artificial Intelligence, and Knowledge Management:

If Personal Information Management is the main use of Personal Computers, why is then not part of the Operating System of the computers? Why does it only handle files and folders, and not Persons, Projects and Topics?

We need a system int he spirit of the memex – a personal extension of the brain. A system then be used to write down notes in a “new” way. My diploma thesis ended with the idea that Users are able to classify and structure their information in any way they want by creating bidirectional links between resources. But “Any Way” has to be specified further. We miss an answer to: how to write down information the best way, on a Semantic Desktop?

So my PhD will contain a roundtrip on the Semantic Desktop – the idea of a central server and applications around it – and then go into the Personal Information Model (PIMO) we use to manage information. At the end, I will shine light how to automatically generate the PIMO, something that is addressed a lot in our group.

The way to answer these questions and challenges is (for me) clear: Personal Information Management cannot be handled by a single applaction like MindManager of Microsoft Outlook. It has to include all information items that come into our attention during every day, it has to include my web-browsing, my e-mails, my project management tools, my co-workers, my employees and students, my project and my tasks there, my SVN commits, my papers, travel to conferences, giving talks, powerpoints, blog posts.

So it has to include all the applications in this chain: blogging, flickr, powerpoint, e-mail, MS-Word, etc etc. And what we did in gnowsis and the EPOS project, was to look that all these applications can be enhanced with plugins so that they can capture the information behind. What we need is a unified tagging scheme for each person, a “personal Technorati”. If I use the tag “burning man 2006” in delicious, I will also use it on flickr, and on my e-mails. so simple – I am always the same person, so independent of application, my PIMO is the same. Simple in theory, tricky in practice.

practice will follow.