Internal monitoring is seen as keeping track of what is produced within one’s own organisation. Thus, in the case of Deutsche Welle, who focused on internal monitoring in SUMMA, this comes down to tracking what content was published across the different languages covered by Deutsche Welle (DW produces in 30 languages in total). SUMMA can currently process 9 languages – DW covers all but Latvian – but the system can handle more languages if such engines are available. This allows us to make all DW departments aware of what is being produced in other languages by the rest of the organisation, thus increasing the internal distribution.

The internal media monitoring demonstrator was made available to both user partners within the consortium from June 2017. Since then it has gone through a long process of feedback and redevelopment. Deutsche Welle is the primary target user within the project for this use case. The demonstrator shows the full pipeline for internal monitoring, i.e. tracking what content was published over various languages within one organisation, in this case primarily Deutsche Welle. The content ingested into the platform contains primarily text articles and on-demand videos from our online content offer on the DW website (corresponding in broad terms to our CMS), but also some live streams.

The platform offers a range of features. The main ones are described below.

Full automation

This platform offers a fully automated workflow, allowing ingestion of content through feeds (via API) that are set to be included, user management, queries and various settings. Thus, once the content and other settings have been decided upon, no human intervention is needed for the process. All content is automatically ingested, transcribed, translated, clustered and tagged without any action required. The user only needs to take actions when it comes to viewing data within the platform, for example by applying filters.

Translation of DW articles

DW text articles that have been ingested are automatically translated into the common target language, i.e., into English.

Transcription and translation of DW videos

DW videos that have been ingested from designated feeds are automatically transcribed and a transcript in the original language is created. Subsequently, this text is automatically translated into English.

Clustering of DW content into stories

Content is automatically analysed and presented into related stories. We usually sort it by newest, as that gives related news stories with the latest DW videos and text articles and serves as the default setting.

The user can filter the results further, by language, media type or timeframe. The navigation guides the user through other options, named entities, bookmarks and the trending bar view. The query function allows saving entity and source searches according to user preferences.

Summarisation

A summary in English is automatically generated by the system, of stories (clustered content) as well as of individual items (articles, videos). The system currently uses extractive summarisation, generated from the DW teaser followed by automatically selected representative sentences from the English article text or transcript.

Keywording, topics, named entities

The system subsequently uses the English article or transcript text to automatically generate and extract thematic topics and keywords (with relevance ranking). It identifies and highlights named entities (with colour coding) and adds information and identifies relations on named entities. This greatly enhances the metadata and makes it much more searchable.

Queries and Filtering

Throughout the project, demonstrations were given on the various versions of the platform to users within Deutsche Welle and the consortium, as well as outside interested parties.

To make the DW content offer within the tool transparent, we created feed groups and permanent/standard queries, thus providing instant access to all DW articles and videos available in the system. For this instance, we excluded live streams by default, as well as any other feeds not originating from the DW API. This makes it immediately clear what content is being monitored from DW’s side.

Subgroups of feeds were then compiled, to cater for the different language user groups (e.g. Russian feed, Arabic feed, etc.), again to enable a smooth and straightforward workflow.

The simplified access view for DW content contains four major steps:

  • Start with Trending Topics
  • Filter on DW articles and videos
  • View Stories
  • Sort by Newest

This workflow gives the users immediate access to the clustered stories with the latest DW videos and text articles. This is considered the default view.

Then, further filtering is optional, e.g. filter the results further per language, media type or timeframe. Other options can then be explored from the left navigation, including named entities, bookmarks and the trending bar view.

At Deutsche Welle the platform has been demonstrated and tested at various innovation and editorial departments and meetings, with a continuous feedback loop of user input into the development process.

In particular in the second half of the project, several instances of the SUMMA system were deployed, and Deutsche Welle worked with them all to assess which features are best for what purpose. Read our blog post on different SUMMA user interfaces.

And the project lives on… An internal HLT (Human Language Technology) implementation project at Deutsche Welle explores further use, integration and customisation of the platform