Local News Engine is testing the theory that newsworthy names and places cropping up in lists of newsworthy things creates potential leads for a journalist to follow up. We have discussed newsworthy data, so what are newsworthy names? These will be different for every publication and journalist, even on the same geographic patch. They will vary by style (e.g. heavy weight or tabloid) and issues, and context. The more I discussed this with people, the more answers I got.
But literally everyone who has appeared in say a local newspaper is by definition newsworthy, in that context. I asked several old hands naively whether local publications kept CRM-like lists of people who had cropped up in it, they coughed a bit and in one case laughed. I certainly don’t keep such a list for my blog.
So we applied a computational approach to produce a list and performed an entity extraction on the Camden New Journal website, with the CNJ’s knowledge. Academics and data processing companies have for some time been using software to extract, find or index things from huge bodies of text. Known as ‘entity extraction’ the technique looks for proper nouns (names, places). There are paid for and open source services. Paul Bradshaw ran an entity extraction on the Chilcot Inquiry. ODS built their own entity extraction code using NLTK and ran it on the Camden New Journal website as a proof of concept. This produced a list of over 1,000 proper nouns (places, companies, people, things), with links to where they occurred. There’s a very crude, unsorted list of nouns here.
The Camden Council leader Cllr Sarah Hayward is the most featured local name at 213 mentions, when the two occurrences of her name are added together and Boris Johnson (former Mayor of London and local resident) is next. The most popular non-politicians are Arsene Wenger (local football manager, 97 mentions) and the late Amy Winehouse (former resident, singer and memorial foundation 78 mentions). Local councillors are mentioned a lot as one would expect. Some oddities include Karl Marx (buried locally, 13 mentions) and Virginia Woolf (former resident, 19 mentions) crop up quite a bit, as does Margaret Thatcher (presumably from the letters page, 55 mentions).
If anyone can point me to this having been done elsewhere I should be grateful.
In user research we found that as many people were interested in the potential for location sorting as in name sorting. This reflects that many councils don’t provide decent customised alerts (Camden for instance does provide such a service). All the data we are using contains specific addresses for statutory or process reasons which can be used for a geo-query. I am interested in the bit of London known as Kings Cross. Kings Cross is never accurately defined, which in part makes the role of local reporter harder and it crosses the boundaries of two London boroughs and several wards. I sketched out a rough map for the developers (pictured), which they turned into wards. It should be possible to use all administrative geography such as SOAs.
Latest posts by William Perrin (see all)
- A vision for regulating the digital sphere after Brexit? - 6th April 2017
- Back to the Brexit – simple exercise for discussing Brexit issues - 25th January 2017
- Digital opportunities presented by Brexit – Cardiff discussion - 13th December 2016