Rhetorical Analysis

Rhetorical Analysis

Rhetorical Analysis with Support Vector Machines


Example for a rhetorical relation

Most text displays an internal coherence structure, which can be analyzed as a tree structure of relations that hold between short segments of text. We present a machine-learning governed approach to such an analysis in the framework of Rhetorical Structure Theory. Our rhetorical analyzer observes a variety of textual properties, such as cue phrases, part-of-speech information, rhetorical context and lexical chaining. A two-stage parsing algorithm uses local and global optimization to find an analysis. Decisions during parsing are driven by an ensemble of support vector classifiers. This training method allows for a non-linear separation of samples with many relevant features. We define a chain of annotation tools that profits from a new underspecified representation of rhetorical structure. Classifiers are trained on a newly introduced German language corpus, as well as on a large English one. We present evaluation data for the recognition of rhetorical relations.

David Reitter: Simple Signals for Complex Rhetorics: On Rhetorical Analysis with Rich-Feature Support Vector Models. In: Uta Seewald-Heeg (Ed.), Sprachtechnologie für die multilinguale Kommunikation. Sankt Augustin: Gardez!.

David Reitter: Rhetorical Analysis with Rich-Feature Support Vector Models. Diplomarbeit (Master's Thesis), University of Potsdam, Germany. 2003 [Best thesis award of the GLDV] PDF: from the publications page.

David Reitter and Manfred Stede. Step by step: underspecified markup in incremental rhetorical analysis. In Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) (at EACL 2003), Budapest, 2003. [ abstract | .pdf ]

Please find the document type definition grammars and several tools to convert (LDC corpus, O'Donnell's RS3) and access URML data here.

More about the Potsdam Commentary Corpus can be found here.