Prof. Dr. Christina Niklaus

Assistant Professor in Computer Science with focus on “Databases and Data Engineering”

University of St. Gallen

Text Simplification and Open Information Extraction

Summary

Sentences that present a complex linguistic structure act as a major stumbling block for semantic applications whose predictive quality deteriorates with sentence length and complexity. The task of Text Simplification (TS) aims to modify sentences in order to make them easier to process, using a set of rewriting operations, such as reordering, deletion or splitting. These transformations are executed with the objective of converting the input into a simplified output, while preserving its main idea and keeping it grammatically sound.

State-of-the-art syntactic TS approaches suffer from two major drawbacks: first, they follow a very conservative approach in that they tend to retain the input rather than transforming it, and second, they ignore the cohesive nature of texts, where context spread across clauses or sentences is needed to infer the true meaning of a statement. We address the first problem by generating a fine-grained output with a simple and regular structure. % that is easy to analyze by downstream applications. For this purpose, we decompose a source sentence into a set of self-contained propositions, with each of them presenting a minimal semantic unit. Moreover, in order to maximize the expressiveness of the simplified sentences, we suggest not only to split the input into isolated sentences, but to also incorporate the semantic context in the form of semantic relationships between the split propositions.

To address this challenge, we present a discourse-aware TS framework that is able to split and rephrase complex English sentences within the semantic context in which they occur. Our framework differs from previous systems by using a linguistically grounded transformation stage that first transforms syntactically complex sentences into smaller units with a simpler structure using clausal and phrasal disembedding mechanisms. By using a recursive top-down approach, our framework is able to generate a hierarchical representation between those units, capturing both their semantic context and relations to other units in the form of rhetorical relations. In that way, we generate a semantic hierarchy of minimal propositions that benefits downstream Open Information Extraction (IE) tasks.

In a comparative analysis, we demonstrate that our baseline implementation DisSim outperforms the state of the art in structural TS both in an automatic evaluation and a manual analysis, obtaining the highest scores on three simplification datasets from two different domains with regard to SAMSA (0.67, 0.57, 0.54), a recently proposed metric targeted at automatically measuring the syntactic complexity of sentences which highly correlates with human judgments on structural simplicity and grammaticality. Furthermore, a comparative analysis with the annotations contained in the RST Discourse Treebank reveals that we are able to capture the contextual hierarchy between the split sentences with a precision of almost 90% and reach an average precision of approximately 70% for the classification of the rhetorical relations that hold between them. Finally, an extrinsic evaluation shows that when applying our framework as a preprocessing step the performance of state-of-the-art Open IE systems can be improved by up to 346% in precision and 52% in recall.

Publications

DisSim: A Discourse-Aware Syntactic Text Simplification Framework for English and German

Christina Niklaus, Matthias Cetto, André Freitas, Siegfried Handschuh

Proceedings of the 12th International Conference on Natural Language Generation, Association for Computational Linguistics, Tokyo, Japan, 2019 Oct, pp. 504--507

MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions

Christina Niklaus, André Freitas, Siegfried Handschuh

Proceedings of the 12th International Conference on Natural Language Generation, Association for Computational Linguistics, Tokyo, Japan, 2019 Oct, pp. 118--123

Transforming Complex Sentences into a Semantic Hierarchy

Christina Niklaus, Matthias Cetto, André Freitas, Siegfried Handschuh

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019 Jul, pp. 3415--3427

Graphene: a Context-Preserving Open Information Extraction System

Matthias Cetto, Christina Niklaus, André Freitas, Siegfried Handschuh

Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Santa Fe, New Mexico, 2018 Aug, pp. 94--98

Graphene: Semantically-Linked Propositions in Open Information Extraction

Matthias Cetto, Christina Niklaus, André Freitas, Siegfried Handschuh

Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics, Santa Fe, New Mexico, USA, 2018 Aug, pp. 2300--2311

A Survey on Open Information Extraction

Christina Niklaus, Matthias Cetto, André Freitas, Siegfried Handschuh

Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics, Santa Fe, New Mexico, USA, 2018 Aug, pp. 3866--3878

A Sentence Simplification System for Improving Relation Extraction

Christina Niklaus, Bernhard Bermeitinger, Siegfried Handschuh, André Freitas

Proceedings of {COLING} 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, The COLING 2016 Organizing Committee, Osaka, Japan, 2016 Dec, pp. 170--174