• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Talk About Local HomepageTalk About Local

Hyperlocal in the UK

  • Home
  • Working with us
  • In the press
  • Blog
  • Contact
  • LNE

Local authority data store increases availability of data by two orders of magnitude

21st November 2016 by William Perrin

benchscraperLocal News Engine is a prototype project to help journalists/reporters scrutinise local accountability information such as planning and licensing applications.  A critical part of this is actually getting hold of planning and licensing applications as data.  Local Authorities are obliged by law to maintain public registers of most aspects of planning and licensing work.

Data emerging from Local News Engine starkly illustrates the difference between presenting data in a data store and hiding it behind a query search of a database.  Camden planning data is published in a Socrata data store as  where it can be downloaded with a few clicks – it takes 2 minutes.  The other data we seek to obtain from the London Boroughs of Camden and Islington has to be scraped from the search query – it takes between 450 and 1,000 times longer (code on Github).  For a direct comparison one would have to seek to obtain back data for a similar period length, but the order of magnitude differences would still be stark.   I don’t intend this to be a criticism, but to bolster the case for people seeking to publish data in a data store rather than the normal way of doing business.  To realise the potential of opendata an important first step is to publish it where it can easily be downloaded – not rocket science.  Even sticking it in Google Docs or Dropbox as a spreadsheet would be a leap forward from only being able to access it from a consumer-facing web query.

Here’s a report from developers Open Data Services Co-operative on acquiring this public data:

‘on the time that the scrapers take to run, and the range of data that’s included in them. In order to speed up the scrapers and to ensure that the data was comparable, we spun up some VMs on Google Compute Engine to run the scrapers.

Camden License: 38.4h runtime, data back to 2005
Camden Planning: 2 min runtime, data back to 2010
Islington License: 39.5h runtime, data back to 2006
Islington Planning: 16.2h runtime, data back to 2006

We don’t expect to be able to speed up any of those scrapers during the life of the prototype, as they are largely limited by the speed of the website, and the number of queries that have to be done in order to obtain the data. A future project may be able to improve speed using techniques such as parallelisation, however this is complex to implement – so this is the 80/20 rule result.’

ODS also commented that running the scrapers in this way takes the skills required to access these long runs of public registers into ‘developer space’, not something that a member of the public would be able to do.

Partly as a result of LNE project, Camden officers are seeking to bring licensing into the data store. Islington officers have recognised that one part of their licensing presentation online (HMO licensing) is broken, I have raised this with a local Councillor.

  • About
  • Latest Posts
Follow Will
William Perrin
Founder of Talk About Local, Trustee of the Indigo Trust, Tinder Foundation, 360Giving, co-founder Connect8, former member of UK Government transparency panels, former Policy Advisor to UK Prime Minister, former Cabinet Office senior civil servant.Open data do-er, Kings Cross London blogger. Loves countryside. Two small children.
Follow Will
Latest posts by William Perrin (see all)
  • So what does the digital charter mean? - 21st June 2017
  • Hyperlocal blog can help hold power to account in tower block blaze - 14th June 2017
  • A vision for regulating the digital sphere after Brexit? - 6th April 2017

Filed Under: Blog Tagged With: #dnifund, #lne, #localnewsengine, #opendata, camden, islington

Footer

Search

  • Contact
  • Guidelines
  • Legal

© 2021 · talk about local · Maintained by Mike