Summarization

Summarization

Automatic Multi-Document Summarization based on Latent Semantic Analysis

We present the Embra system, a first-time entry to DUC for 2005 which performed at or above median for the manual assessment of responsiveness and on 4 out of 5 linguistic quality questions. The system takes a novel approach to relevance and redundancy, modeling sentence similarity using a latent semantic space constructed over a very large corpus. We present a simple approach to modeling specificity based on named entities which shows a small improvement over baseline. Finally, we discuss coherence and present a sentence reordering algorithm with a component-level evaluation demonstrating a positive effect. A key task in an extraction system for query-oriented multi-document summarisation, necessary for computing relevance and redundancy, is modelling text semantics. In the Embra system, we use a representation derived from the singular value decomposition of a term co-occurrence matrix. We present methods to show the reliability of performance improvements. We find that Embra performs better with dimensionality reduction. (With Ben Hachey and Gabriel Murray.)

 
Ben Hachey, Gabriel Murray, and David Reitter.
The Embra system at DUC 2005: Query-oriented multi-document summarization with a very large latent semantic space.
In Document Understanding Conference 2005, Vancouver, Canada, 2005.
[ abstract | .pdf ]
 
Ben Hachey, Gabriel Murray, and David Reitter.
Dimensionality reduction aids term co-occurrence based multi-document summarization.
In Proc. COLING-ACL Workshop Task-Focused Summarization and Question Answering 2006, Sydney, Australia, 2006.
[ abstract | .pdf ]