Timeline extraction in SUMMA
What were the events involving Apple Inc. reported in the news last year? What were the main incidents related to Airbus A380? While it is possible to search for news stories about these entities on a news website, readers find it useful to have timelines reporting the key events for the entities that interest them.
Timeline extraction is the task of automatically constructing a timeline of events for an entity such as Apple Inc. or Airbus A380 from a set of documents. See for example below, in which a timeline for the entity Steve Jobs was created using 4 documents as input:
Timeline extraction is a challenging task, as it involves a number of challenges. First, we need to recognize the entity of interest in the documents, even when anaphoric expressions such as pronouns (e.g. his) or different ways of referring to the entity (e.g. Apple CEO) are used, a task known as coreference resolution. Second, we need to identify temporal expressions (e.g. yesterday and less than two weeks ago) and resolve them to absolute timestamps. Third, we need to determine which events mentioned in the documents involve the entity of interest and when they happened, a task we refer to as event anchoring.
Previous work on timeline extraction relied on rule-based approaches combined with pre-trained temporal linking systems due to the lack of annotated training data for the task. Thus, existing approaches cannot be ported easily to different domains and languages, which is of crucial interest for multilingual media monitoring in SUMMA.
In our recent paper presented at the 2016 Conference in Empirical Methods in Natural Language Processing (EMNLP), Savelie Cornegruta and Andreas Vlachos showed how the lack of annotated data can be overcome by generating noisy training data automatically, a technique also known as distant supervision. In experiments using the data and evaluation proposed in a recent shared task, we showed that the system developed using distant supervision is competitive with the state-of-the-art that uses pre-trained temporal linking components and rules to extract timelines. We further improved the accuracy of the timelines extracted by employing joint inference for the event anchoring stage resulting in the best results reported in the shared task evaluation.
In future work in SUMMA, we will extend this work in two directions. One direction will be to consider the task of identifying the event mentions (these were assumed to be part of the input to our approach) and develop approaches for event mention coreference. The other direction will be the task of temporal forecasting: given a timeline about past events, can we predict future events, e.g. that a party in the government today is likely to be in the opposition in the next elections?