Using sandcastle to cope with a tidal wave of stories
SUMMA prototype for internal media monitoring
The BBC World Service is an international multimedia broadcaster, delivering a wide range of services in 42 languages on radio, TV, online and via handheld devices. The different language teams deliver world class, original journalism to an audience of over 376 million people every week.
But with each team working in a different language sharing stories between World Service teams to better serve audiences all around the world is challenging. And this is where the BBC’s DigiHub comes in…
They are a small team based mainly in the BBC World Service whose aim is to increase the reach of stories within World Service teams and to widen the news agenda on all language services. They have a special focus on serving young and female audiences.
The team focuses on creating original output or sharing stories that have performed well for one or more languages services, including English, and that they think could also perform well more generally across other language services.
The DigiHub team uses a variety of tools to help identify content that could be shared more widely, but no one tool functions as a one-stop shop to find suitable stories. These tools also all rely on the user speaking other languages to read the headlines and establish whether a story has been published in different languages.
The killer application of a tool such as this would exploit automation and machine learning technology to allow the DigiHub team the ability to respond more rapidly to breaking or trending news. Using machine learning to predict which stories are about to trend, before they do, allows the team to identify translate and publish before the story begins to trend, hopefully delighting our audience that the stories they want to read are available in their language when they want to read them.
How could SUMMA tech help?
A core use case of the SUMMA project has been that of “internal media monitoring”. That is, allowing a large news organisation which publishes content in many languages to look at its own multi-lingual output and help news teams look across news stories which may be in languages they do not understand.
Primarily this internal media monitoring use case has been driven by our SUMMA partners Deutsche Welle, but it became clear that internal media monitoring could be a huge benefit to BBC world service too.
The purpose of this prototype – which we named Sandcastle – was to investigate whether a combination of the SUMMA technologies could help the DigiHub. We decided to use the following modules in the prototype:
- The translation modules to translate headlines from Arabic, Russian, Farsi and Spanish into English, as the DigiHub team all share English as a common language.
- The clustering module to determine whether the same story had been published in English, Arabic, Russian, Farsi or Spanish. We don’t want to mark a story for publication in a language if it has already been published in that language.
- The summarisation module to present condensed bullet points, in English, to the user so they could quickly understand what the story was about and decide whether it is worthy of publication across languages or if the alert is false.
In addition to the SUMMA technologies, the prototype also used:
- A custom algorithm, the Trending Predictor, which was trained to predict which stories would trend, based on patterns established from historic data. This would allow the team to access and share stories in a more timely manner, while the stories are still current and likely to perform well.
- An integration was built to send the alerts to a specified channel on the BBC’s internal Slack
The diagram below shows how the various SUMMA technologies were re-assembled to power the Sandcastle alerter, and how these were integrated with the BBC’s systems and the BBC News Labs trending story predictor module.
What did we do?
The first version of Sandcastle worked solely on an integration with a web analytics tool, to send alerts to Slack based on the data of concurrent views. This allowed us to get some early feedback from the DigiHub team on what information they needed in an alert, and whether the right kind of story was being returned.
The refined Sandcastle system integrated the SUMMA translation, summarisation and clustering technologies, along with a Trending Story Predictor API (developed by by BBC News Labs), and then alerted the results to Slack. Slack was identified as the ideal interface for Sandcastle as the team used it already, and hence adding an “alerts” channel would not be a burden.
The inclusion of the Trending Story Predictor ensures the DigiHub team are only alerted to stories predicted to be trending.
What did we learn?
Following a period of trialling the Sandcastle tool, we asked the DigiHub team for their feedback.
Had the Sandcastle tool been used? What, if anything, had been useful? What were barriers to greater use? What might make the tool more useful? What need is there that this tool is not meeting?
The DigiHub team were excited about the Sandcastle tool and offerd some useful guidance as to how it can be further developed. Feedback included:
- A higher threshold for predicting trending stories, with concentrated effort on the very best stories
- More coherent summaries
- Inclusion of English language stories
- An awareness of newsworthiness to filter out very time sensitive stories
- Engagement metrics for stories and figures on how much a story had been shared
However, the team’s editor stressed that they would highly value a tool that could flag up must-not-miss ‘gems’ from across the World Service and that Sandcastle was a step in this direction. We hope to further develop Sandcastle during 2019 and ultimately would like to see it being an essential tool powering the BBC’s world service output.