Which Data Should Be Open?

Following the theme of open access to data recently, I thought I’d wade in with a bit of a challenge.

There has been a lot of discussion recently about which bits of public data should be made available, and in what form. The London Gazette has a great way of publishing many facts of public record via RDF, for example. The Civil Service publishes its vacancies in RDFa, making it easy to scrape info from their vacancies site. But these are cases of already-public information being made usable through Linked Data. This is undoubtedly a useful step, and one which will make a huge difference to services and mashups, but what about other public data?

What I’m wondering is whether there are data which are generated by the Public sector, which shouldn’t be published? Sure, it would be foolish for the heads of a nation’s security services to publish the locations of its operatives (and if it did, it should at least publish them in PDF so they’re effectively hidden anyway!). But, what about post codes and geographic co-ordinates, for a handy example?

In the UK, the Royal Mail owns post code data, and it licenses the information for a fee whenever a service wants to make use of an official post code lookup. The Guardian published a story last week about a couple of developers who are running a barely-legal setup in order to make these codes free for organisations and apps to use. Is it right that UK taxpayers’ pounds have gone into the development of post codes, then other UK taxpayers must pay more pounds to access them? Is this a part of running a public service, which needs to be recovering costs in order to run effectively?

I’m not sure, really. There is an interesting trend in the UK, for the public sector to be seen to be transparent, and for “publishing data” to be generally considered a good, progressive thing. But which data will it be publishing, and at what cost? Tim Berners-Lee recently mentioned going after “low-hanging fruit” by just getting the data out there, saying that it’s our data because our taxes paid for it. Sir Tim, as covered earlier, has been appointed to an advisory role to the UK Parliament, so I assume this advice is similar to what he’s passed on to Parliament.