I’m part of Data Labs at GOV.UK
My current projects at work include extracting eligibility requirements from GOV.UK content and building a knowledge graph of government entities and processes.
Most of my work in Data Labs centres around solving problems with NLP. GOV.UK has ~500k pages (known as content items) and thus has an abundance of text but little knowledge about what that text contains or describes.
A big part of what I’m working on now is extracting and structuring information from that content to make it available and useful.
For example, many services have eligibility requirements if someone can get them (e.g. you must be over 18) and there are so many of these services that it would be very hard for someone to understand what they are eligible for without spending a lot of time on it. I’m working on automatically extracting those eligibility requirements so that users can enter their personal details and be given a list of services that they may be eligible for.
I’ve also worked on classifying users from their behaviour, automatically clustering site feedback and recommending which topic a page might be part of in the GOV.UK taxonomy.
In my free time, I’m experimenting with extracting knowledge from GOV.UK content, to augment our knowledge graph. I’m hoping to use this to enable us to implement Explicit Semantic Ranking to improve internal search, add rich snippets to internal search and improve personalisation, link recommendation and navigation.