“Beautiful Data” – must read

As Philosopher Daniel Dennett puts it, “a scholar is just a library’s way of making another library”
Jeff Hammerbacher tells us how facebook.com analyses 55 terabytes of new data per day, but telling it within the background of Business Intelligence Systems, Information Platforms and Brains, and the profession of the Data Scientist. Read his chapter in “Beautiful Data” and all the others if you are working in a business that crunches 0s and 1s. You can peek into the chapter “Information Platforms and the Rise of the Data Scientist” at Google books.

Why is this book and this chapter relevant to the Semantic Web? Because in data warehousing and data analysis, a convergence between relational data, temporal data, and documents happens already. And the solution is still underway to be built. With RDFa and RDF and the Linked Data principles, we have a data format that can bridge between documents and structured data.

For semantic personal information management, what we do over at Gnowsis.com with Cluug, the same is valid: how can we turn a personal information model into a tool that can be used by you to analyse your own data? A direction into this analysis is given in the “Total Recall” book, where you can learn how individuals will manage their data. With the Semantic Desktop standards we have a technological basis to start, but it will turn into “Beautiful Data” when this has been done a few years. I expect to see a mixture of data in personal semantic web systems:

relational data db, rdf
temporal data clickstreams, logfiles, …
documents html, pdf, rdfa

And we will need the same technologies as company data warehousing and business intelligence use to wrestle it:

  • map/reduce analysis,
  • temporal analysis, trend detection,
  • facetted browsing,
  • information retrieval and text retrieval,
  • and social tools: exchange of good tools and useful statistics between the community of semantic personal information management practitians

Luckily, this already happens and I can be part in it, so I am happy to be able to read Beautiful Data to learn more on what we need to do.