Aperture – what it can, what we do next

What is the current status of Aperture, where are we heading, what needs to be done?

Aperture is a Java framework for extracting full-text content and metadata from various information systems (e.g. file systems, web sites, mail boxes) and the file formats (e.g. documents, images) occurring in these systems. At the moment, many people use Aperture and we try to compile a list of users on this wiki page:

http://aperture.wiki.sourceforge.net/ProjectsUsingAperture

If you are using Aperture, please give a link there.

We want to make a release in the next months, probably sooner than later, if you have free time and want to fix bugs, be our guest to quicken this 😉

The next urgent things I would like to do are:

  • fix bugs in aperture – there is one for everyone
  • A better Lucene handler, allowing Lucene developers to use Aperture as framework
  • Make e-mails openable in Thunderbird (in Outlook it works, but Thunderbird’s programming model is an open secret only readable in C code). I already tried to understand Thunderbird’s API, any new clues welcome.
  • fix the IMAP uris, they were ok in gnowsis but are broken in Aperture, thats an old one.

Danny Ayers blogged about using a mork parser to extract the browsing history of Firefox.

I replied that we may want to have this also in Aperture.

The demork code is here: http://gnowsis.opendfki.de/browser/trunk/demork

Its also packed as JAR with aperture, and used in the ThunderbirdCrawler.

Anyone wants to take this code and improve it to crawl the browsing history? Write a note to Aperture Developers (have to register first)