NLTK on the Raspberry PI

If you haven’t heard of it yet, the Raspberry Pi is a $25/$35 barebones computer intended to excite kids with programming and hardware projects. It is very much modeled on the British experience of home computing in the early 1980s and even has a “Model A” and a “Model B” in homage to the BBC ...

Calculating Word Frequency Tables

Now that we can segment words and sentences, it is possible to produce word and tuple frequency tables. Here I show you how to create a word frequency table for a large collection of text files.

Segmenting Words and Sentences 1

Even simple NLP tasks such as tokenizing words and segmenting sentences can have their complexities. Punctuation characters could be used to segment sentences, but this requires the punctuation marks to be treated as separate tokens. This would result in abbreviations being split into separate words and sentences. This post uses a classification approach to create ...

Extracting Noun Phrases from Parsed Trees 6

Following on from my previous post about NLTK Trees, here is a short Python function to extract phrases from an NLTK Tree structure.

NLTK Trees

A number of NLTK functions work with Tree objects. For example, part of speech tagging and chunking classifiers, naturally return trees. Sentence manipulation functions also work with trees. Although Natural Language Processing with Python (Bird et al) includes a couple of pages about NLTK’s Tree module, coverage is generally sparse. The online documentation actually contains ...

Book Review: Python Text Processing with NLTK 2.0 Cookbook

“Python Text Processing with NLTK 2.0 Cookbook” by Jacob Perkins is a useful complement to “Natural Language Processing with Python”. Rather than trying to introduce Python, NLP, and NLTK in one book, it focuses on practical worked examples.

Support for SciPy in NLTK’s Maximum Entropy methods

Recently I have been working with the Maximum Entropy classifiers in NLTK. Maximum entropy models are similar to the well known Naive Bayes models but they allow for independence between the features – i.e. they are not “naive”. SciPy has had some problems with its Maximum Entropy code, and v0.8 must be used. v0.9 crashes ...

Book Review: Natural Language Processing with Python

“Natural Language Processing with Python” by Steven Bird, Ewan Klein & Edward Loper is often described as ‘The’ Natural Language Toolkit (NLTK) book. Written by three main members of the NLTK team, it is intended as an introduction to NLTK, and is published in print from O’Reilly and is available online under a Creative Commons ...

Why Python and NLTK?

Most modern natural language processing (NLP) depends heavily on statistics and complex statistical models. So why use Python,  a relatively slow scripting language,  for NLP? Python’s strengths are in its text, list, and structure support. Structures are weakly typed, but supported by a powerful set of language constructs in the form of list comprehensions and ...