Book Review: Natural Language Processing with Python

“Natural Language Processing with Python” by Steven Bird, Ewan Klein & Edward Loper is often described as ‘The’ Natural Language Toolkit (NLTK) book. Written by three main members of the NLTK team, it is intended as an introduction to NLTK, and is published in print from O’Reilly and is available online under a Creative Commons license.

NLTK is intended as a teaching library, and so it makes sense that this is a teaching book that attempts to introduce some basic natural language processing (NLP) techniques. It also tries to target users who have never programmed in Python before. This results in a book that aims to introduce Python, NLP, and NLTK. To be fair, the authors have done a good job in splitting the coverage up, so that it is easy enough to skip sections. However, the end result is still a book that tries to cover too much. Coverage of Python and NLP are both very basic, but most damaging is the limited coverage of NLTK itself. There are plenty of good Python guides in print and online. The authors would have been better off skipping this material and concentrating more on NLTK coverage and related areas of NLP.

After saying that, the book does a good job of demonstrating the types of things NLTK can do, and the strengths of the Python+NLTK combination. Chapters included are:

  1. Language Processing and Python
  2. Accessing Text Corpora and Lexical Resources
  3. Processing Raw Text
  4. Writing Structured Programs
  5. Categorizing and Tagging Words
  6. Learning to Classify Text
  7. Extracting Information from Text
  8. Analyzing Sentence Structure
  9. Building Feature-Based Grammars
  10. Analyzing the Meaning of Sentences
  11. Managing Linguistic Data

So this covers quite a range of NLP techniques, and each chapter includes quite a few pieces of sample code – all of which use data supplied with NLTK. The weaknesses in the coverage appear when you try to adapt the code to your own applications. For example, all the classifier examples use Naive Bayes (nice and fast, good for quick demos). Maximum Entropy models are mentioned, but no samples or practical examples are provided. Similarly, the Named Entity Recognition (NER) section (included with Extracting Information from Text) only uses the built-in NER chunker, and fails to discuss to creation of your own NER chunker or classifier.

So yes, this book could make a good first introduction to NLTK, and I would recommend it as such. However, you are better off learning Python from a more in-depth book. For practical applications, I would recommend Python Text Processing with NLTK 2.0 Cookbook by Jacob Perkins as a very useful complement. The NLTK source code is generally very well documented, so do not be afraid to look at this if things appear unclear. Finally, this book does have good references for further reading. You will find these invaluable if you wish to know more. This is particularly applicable for NLP and machine learning subjects.

Leave a Reply