Aperture 1.2.0 out


Aperture is a Java framework for extracting full-text content and
metadata from various information systems (e.g. file systems, web sites,
mail boxes) and the file formats (e.g. documents, images) occurring in
these systems.


Download URL:

After three years of development Aperture is stable enough
to drop the .beta suffix from the release. 1.2.0 leverages
architectural improvements made in 1.1.0.beta to bring
support for compressed archives and to streamline
email processing. A completely new service – the
DataSourceDetector allows applications to provide
suggestions to users about the data sources on their
desktops. A host of bugfixes and minor improvements rounds
the image of the leanest and meanest version of Aperture
ever made. Enjoy.

What’s new?

  • a completely new Aperture service – the
    DataSourceDetectors – can be used to provide advice to
    the user about the data sources on the desktop
  • new subcrawlers for .zip, .gzip, tar and bzip2 compressed
  • unification of the email handling – now the ImapCrawler,
    MboxCrawler and the MimeSubCrawler use the same code in
    the DataObjectFactory to convert emails to RDF. The
    MimeExtractor has been deprecated, switch to
  • some bugfixes in the email handling code, plain text, and
    xml attachments are treated correctly, threads are
    reflected in the resulting rdf
  • the pdf extractor has some basic support for XMP metadata
    (thanks to JempBox)
  • a completely new XmlSafetyUtil class that helps to deal
    with characters that are valid in RDF, but invalid in XML
    thus breaking the serialization
  • the uris of subcrawled resources follow the pattern
    established by the Apache Commons VFS project.
  • new Sesame 2.2.1 bundled with Aperture features dramatic
    performance optimizations, e.g. the aperture test suite
    is 2 times faster, this may also be a boost for your

Best regards
Leo Sauermann
Christiaan Fluit
Antoni Mylka