Book Review: PostGIS Cookbook

The PostGIS Cookbook by Paolo Conti, Thomas J Kraft, Stephen Vincent Mather, and Bborie Park (buy from Packt) is a collection of how-to recipes for the PostGIS extension for Postgres. It is recommended for intermediate and even advanced PostGIS users. Novice Postgres/PostGIS users may also find it useful, but they will need to use it ...

Book Review: Google Maps JavaScript API Cookbook

Google Maps JavaScript API Cookbook by Alper Dincer & Balkan Uraz  (buy from Packt) is a good introduction to the Google Maps JavaScript API in its current incarnation. Although it glosses over some of the potential weaknesses, it provides good working examples of everything from simple map features through to more sophisticated topics such as ...

Running the Charniak-Johnson Parser from Python 2

Although the Python NLTK library contains a number of parsers and grammars, these only support words which are defined in the grammar. This also applies to the supported Probabilistic Context Free Grammars (PCFGs). Therefore, in order to work with a more general parser that can handle unseen words, you have to use a Python wrapper ...

Python Geospatial Development: Second Edition published

The second edition of Erik Westra’s Python Geospatial Development has just been published. Full Disclosure: I served as a technical editor for the new edition.

NLTK (alpha) for Python 3 Released

The first alpha release of NLTK 3.0 — i.e. NLTK for Python3 has just been released. Downloads and further information can be found here: http://nltk.org/nltk3-alpha/   Although not quite ready for prime time, this is a major step towards full Python 3 support in the NLTK library.

Extracting Body content from a Web Page using .NET

Boilerpipe is a useful library for extracting body content from web pages and discard the ‘boilerplate’ (menus, footers, advertising, etc). It is a Java library, so it requires a Bridge (e.g. JPype for Python) if you wish to use it in a non-Java environment.  Luckily for C# users, Arif Ogan has ported Boilerpipe to C#/Mono. ...

Extracting Body Content from a Web Page

I recently encountered the problem of having to extract the main body content from a series of web pages, and to discard all of the ‘boiler plate’ — i.e. header, menus, footer, and advertising. The application was performing statistical comparisons between web pages, and although it was producing the correct answers for my test data, ...

Book Review: OpenLayers Cookbook

The OpenLayers CookBook by Antonio Santiago Perez is a good description of the more sophisticated functionality support by the OpenLayers open source library.

NLTK on the Raspberry PI

If you haven’t heard of it yet, the Raspberry Pi is a $25/$35 barebones computer intended to excite kids with programming and hardware projects. It is very much modeled on the British experience of home computing in the early 1980s and even has a “Model A” and a “Model B” in homage to the BBC ...

Sentence Segmentation: Handling multiple punctuation characters

Previously, I showed you how to segment words and sentences whilst also taking into account full stops (periods) and abbreviations. The problem with this implementation is that it is easily confused by contiguous punctuation characters. For example “).” is not recognized as the end of a sentence. This article shows you how to correct this.