Mapping Earthquakes

Maptitude can also be used plot earthquakes, examine patterns in earthquakes, and even look for correlations with other factors such as oil industry activity. Earthquake catalogs from recent years can be downloaded from the US Geological Survey at . For the following maps, the data is downloaded as a CSV (comma separated value) file, ...

A Simple GISDK Demo

This is a simple demonstration of the GISDK macro language that is supplied with Caliper Maptitude. It lets the user select a layer from the current map and then deletes all of the layer’s selection sets (except the default ‘Selection‘). It demonstrates the use of GISDK command structures, API, and a custom dialog box. The ...

Mapping the St Albans Sinkhole

Mapping the St Albans Sinkhole
On 1st October, a large sinkhole opened up in St Albans, UK, cutting off an entire cul-de-sac of houses. New sinkholes are very common, but this one quickly became international news due to its photogenic proximity to houses. We think of sinkholes as appearing in places like Florida or the Yorkshire Dales. Why did one ...

NLTK on the Raspberry PI

If you haven’t heard of it yet, the Raspberry Pi is a $25/$35 barebones computer intended to excite kids with programming and hardware projects. It is very much modeled on the British experience of home computing in the early 1980s and even has a “Model A” and a “Model B” in homage to the BBC ...

Sentence Segmentation: Handling multiple punctuation characters

Previously, I showed you how to segment words and sentences whilst also taking into account full stops (periods) and abbreviations. The problem with this implementation is that it is easily confused by contiguous punctuation characters. For example “).” is not recognized as the end of a sentence. This article shows you how to correct this.

Using BerkeleyDB to Create a Large N-gram Table

Previously, I showed you how to create N-Gram frequency tables from large text datasets. Unfortunately, when used on very large datasets such as the English language Wikipedia and Gutenberg corpora, memory limitations limited these scripts to unigrams. Here, I show you how to use the BerkeleyDB database to create N-gram tables of these large datasets.

Calculating N-Gram Frequency Tables

The Word Frequency Table scripts can be easily expanded to calculate N-Gram frequency tables. This post explains how.

Calculating Word and N-Gram Statistics from a Wikipedia Corpora

As well as using the Gutenberg Corpus, it is possible to create a word frequency table for the English text of the Wikipedia encyclopedia.

Calculating Word Statistics from the Gutenberg Corpus

Following on from the previous article about scanning text files for word statistics, I shall extend this to use real large corpora. First we shall use this script to create statistics for the entire Gutenberg English language corpus. Next I shall do the same with the entire English language Wikipedia.

Calculating Word Frequency Tables

Now that we can segment words and sentences, it is possible to produce word and tuple frequency tables. Here I show you how to create a word frequency table for a large collection of text files.