Sentence Segmentation: Handling multiple punctuation characters

Previously, I showed you how to segment words and sentences whilst also taking into account full stops (periods) and abbreviations. The problem with this implementation is that it is easily confused by contiguous punctuation characters. For example “).” is not recognized as the end of a sentence. This article shows you how to correct this.

Segmenting Words and Sentences 1

Even simple NLP tasks such as tokenizing words and segmenting sentences can have their complexities. Punctuation characters could be used to segment sentences, but this requires the punctuation marks to be treated as separate tokens. This would result in abbreviations being split into separate words and sentences. This post uses a classification approach to create ...

The NLP Stack

Processing natural language is a complicated business. Not that long ago it seemed to be an intractable problem to many people. Although full understanding is still a cutting edge research problem, large areas of natural language processing have become practical on most computing platforms. This is especially true when NLP techniques are applied to restricted ...