Conference paper
Proceedings of the 12th International Conference on Natural Language Generation, Association for Computational Linguistics, Tokyo, Japan, 2019 Oct, pp. 118--123
Assistant Professor in Computer Science with focus on “Databases and Data Engineering”
APA
Click to copy
Niklaus, C., Freitas, A., & Handschuh, S. (2019). MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions. In Proceedings of the 12th International Conference on Natural Language Generation (pp. 118–123). Tokyo, Japan: Association for Computational Linguistics.
Chicago/Turabian
Click to copy
Niklaus, Christina, André Freitas, and Siegfried Handschuh. “MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions.” In Proceedings of the 12th International Conference on Natural Language Generation, 118–123. Tokyo, Japan: Association for Computational Linguistics, 2019.
MLA
Click to copy
Niklaus, Christina, et al. “MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions.” Proceedings of the 12th International Conference on Natural Language Generation, Association for Computational Linguistics, 2019, pp. 118–23.
BibTeX Click to copy
@inproceedings{niklaus2019a,
title = {MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions},
year = {2019},
month = oct,
address = {Tokyo, Japan},
pages = {118--123},
publisher = {Association for Computational Linguistics},
author = {Niklaus, Christina and Freitas, André and Handschuh, Siegfried},
booktitle = {Proceedings of the 12th International Conference on Natural Language Generation},
month_numeric = {10}
}
We compiled a new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences. Contrary to previously proposed text simplification corpora, which contain only a small number of split examples, we present a dataset where each input sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances with each of them presenting a minimal semantic unit that cannot be further decomposed into meaningful propositions. This corpus is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure which is easier to process for downstream applications and thus facilitates and improves their performance.