NLTK on the Raspberry PI

If you haven’t heard of it yet, the Raspberry Pi is a $25/$35 barebones computer intended to excite kids with programming and hardware projects. It is very much modeled on the British experience of home computing in the early 1980s and even has a “Model A” and a “Model B” in homage to the BBC Micro. It is about the size of an Altoids tin, uses an ARM cpu, boots from an SD flash card, and runs Linux. Further information can be found on the official Raspberry Pi website. Yes it can also run NLTK!

The ‘PI’ in the Raspberry Pi actually refers to Python, and Python is intended as an integral part of the ‘standard’ teaching toolkit. Being Linux, you can install almost any cut-down distribution that you feel like, but the Raspberry Pi Foundation’s standard Raspbian ‘Wheezy’ SD card image includes Scratch and Python.

Thekeywordgeek posted NLTK installation instructions on the official forums. These are pretty much standard NLTK installation instructions, although you will probably want to use a larger SD card. I used an 8GB card but the Wheezy image is only 2GB. The solution is to expand the standard root partition to take advantage of the extra space. This is performed using the raspi-config utility which is provided in the standard install. At the command line, type:

sudo raspi-config

This is a useful text utility which also lets you set the cheap gaming keyboard, character set, etc.

The only other issue I have found so far is that nltk.downloads() is incapable of downloading all data files at once. This appears to be related to file size and timing. However, I was able to download the book example files, and individual corpus files, data models, etc.

The Raspberry Pi is underpowered for serious ‘heavy’ natural language processing, but it should prove useful as a great education platform that introduces some basic language processing to young and budget-limited programmers.