Local News Engine is testing the theory that story leads can be found in local data where a newsworthy person or place is engaged in a newsworthy activity. And that the computer can find these potential leads quicker than manually reading hundreds and hundreds of pages of lists. This then allows a journalist to follow the leads up with a view to publication.
It’s important to stress that Local News Engine won’t be publishing anything itself (except the odd demo screengrab as above). LNE is a sorting tool to reduce the reading task for the journalist/reporter. The output of LNE will be used as a basis for a journalist to follow up. You can see in the screen grab of the very first proof of concept that this output still requires someone to ‘do journalism’ on it – it helps them get to the raw material quicker. The Data Protection Act has an exemption for ‘journalistic’ data processing with a view to publication, which we are doing here.
The haystack in which we are looking for these needles are several local data lists pertinent to my website KingsCrossEnvironment.com, the Camden New Journal and Islington Tribune. I have written stories based on all these data over the years.
London Borough of Camden planning data – updated daily in the Camden Data Store run by Socrata. The data can be downloaded as JSON and is clearly licensed
Camden Licensing data – pubs, bars, clubs, adult industry, betting etc that require a licence from the council and often change or edit their hours and activities. This data has to be scraped from the Camden website. I have met with the very good Camden data team and they are seeking to publish licensing as open data in the data store by the end of 2016.
London Borough of Islington licensing and planning data – Islington is the UK’s smallest borough and, lacking the resources of Camden its online presence is less sophisticated. Both these data sets have to be scraped from the website. The licences search is particularly slow and Islington officers have explained that ‘an on-going problem with our back office software meaning the search function is limited’ . The scraped data does not have a clear licence but a Councillor and officers have not raised an objection.
Magistrates Court listings, Highbury Corner Magistrates Court – I have received this data for several years as part of my work on KingsCrossEnvironment.com. It is emailed most Fridays as a pdf to about 30 local recipients. ODS have developed a parser to turn this into structured data (JSON) which is held securely as part of the project, not published online. I worked with MOJ for several years on improving courts data on their Crime and Justice Sector Transparency Panel. I met with MOJ officials on 22 September and raised this parsing work with them, subsequently wrote to them about the work and they have not objected. The use of listings data is covered by the Society of Editors code of practice, with which we comply. The complex impact of data about legal proceedings and the DPA is discussed by the then Lord Chief Justice and senior lawyers in the Criminal Procedure Rule Committee paper and amended rules.
Draft scrapers and parsers have been published on GitHub.
All data is held in the UK under tight security and encrypted. We have a data processing agreement in place between Talk About Local and Open Data Services Co-operative and agreement to follow the NMA/SOE code of practice on courts lists. We’ll delete the data at the end of the LNE prototype.
Latest posts by William Perrin (see all)
- A vision for regulating the digital sphere after Brexit? - 6th April 2017
- Back to the Brexit – simple exercise for discussing Brexit issues - 25th January 2017
- Digital opportunities presented by Brexit – Cardiff discussion - 13th December 2016