Python Data Validation

Python is a good scripting language for data analysis and processing, but are you sure your imported data is valid? As well as import errors, it is possible the data itself contains errors such as values in the wrong field, inconsistent values/fields, and unexpected situations. Immediately after reading the data, you must validate it, and ...

Segmenting Words and Sentences

Even simple NLP tasks such as tokenizing words and segmenting sentences can have their complexities. Punctuation characters could be used to segment sentences, but this requires the punctuation marks to be treated as separate tokens. This would result in abbreviations being split into separate words and sentences. This post uses a classification approach to create ...