A fast, accurate sentence segmenter
Scientific text is full of noise, e.g. symbols, non-terminal periods, sub(super)scripts, making it challenging for a machine to find the logical end of a sentence. Highly accurate segmentation will ensure your corpus contains more sentences and less fragments.
Accuracy of machine learning tasksErrors in pre-processing steps such as sentence segmentation propagate to high-level tasks, making product quality questionable. Deriving clean data from text corpora reduces this risk.
Rich information processingRetrieving information from semi-structured and unstructured data is quite demanding. Accurate, machine-readable information goes a long way in giving the best possible output for various natural language learning tasks such as information/knowledge extraction, sentence-level versioning, question-answering systems.