Local News Engine is a prototype project to help journalists/reporters scrutinise local accountability information such as planning and licensing applications. A critical part of this is actually getting hold of planning and licensing applications as data. Local Authorities are obliged by law to maintain public registers of most aspects of planning and licensing work.
Data emerging from Local News Engine starkly illustrates the difference between presenting data in a data store and hiding it behind a query search of a database. Camden planning data is published in a Socrata data store as where it can be downloaded with a few clicks – it takes 2 minutes. The other data we seek to obtain from the London Boroughs of Camden and Islington has to be scraped from the search query – it takes between 450 and 1,000 times longer (code on Github). For a direct comparison one would have to seek to obtain back data for a similar period length, but the order of magnitude differences would still be stark. I don’t intend this to be a criticism, but to bolster the case for people seeking to publish data in a data store rather than the normal way of doing business. To realise the potential of opendata an important first step is to publish it where it can easily be downloaded – not rocket science. Even sticking it in Google Docs or Dropbox as a spreadsheet would be a leap forward from only being able to access it from a consumer-facing web query.
Here’s a report from developers Open Data Services Co-operative on acquiring this public data:
‘on the time that the scrapers take to run, and the range of data that’s included in them. In order to speed up the scrapers and to ensure that the data was comparable, we spun up some VMs on Google Compute Engine to run the scrapers.
Camden License: 38.4h runtime, data back to 2005
Camden Planning: 2 min runtime, data back to 2010
Islington License: 39.5h runtime, data back to 2006
Islington Planning: 16.2h runtime, data back to 2006
We don’t expect to be able to speed up any of those scrapers during the life of the prototype, as they are largely limited by the speed of the website, and the number of queries that have to be done in order to obtain the data. A future project may be able to improve speed using techniques such as parallelisation, however this is complex to implement – so this is the 80/20 rule result.’
ODS also commented that running the scrapers in this way takes the skills required to access these long runs of public registers into ‘developer space’, not something that a member of the public would be able to do.
Partly as a result of LNE project, Camden officers are seeking to bring licensing into the data store. Islington officers have recognised that one part of their licensing presentation online (HMO licensing) is broken, I have raised this with a local Councillor.
Latest posts by William Perrin (see all)
- So what does the digital charter mean? - 21st June 2017
- Hyperlocal blog can help hold power to account in tower block blaze - 14th June 2017
- A vision for regulating the digital sphere after Brexit? - 6th April 2017