Winwaed Blog ← Code and Commentary ← Page 3

Mapping Earthquakes

Jun

13

2016

Maptitude can also be used plot earthquakes, examine patterns in earthquakes, and even look for correlations with other factors such as oil industry activity. Earthquake catalogs from recent years can be downloaded from the US Geological Survey at http://earthquake.usgs.gov/earthquakes/search/ . For the following maps, the data is downloaded as a CSV (comma separated value) file, ...

Why is it important to practice defensive driving, and how can it benefit

Feb

18

2016

Why is it important to practice defensive driving, and how can it benefit Defensive driving is the technique in which you drive cautiously and smoothly, minimising the chances of engaging in an accident and damaging your car and that of third parties. Car accidents mainly occur due to reckless and rash driving. A defensive driver ...

Mapping the St Albans Sinkhole

Oct

7

2015

On 1st October, a large sinkhole opened up in St Albans, UK, cutting off an entire cul-de-sac of houses. New sinkholes are very common, but this one quickly became international news due to its photogenic proximity to houses. We think of sinkholes as appearing in places like Florida or the Yorkshire Dales. Why did one ...

NLTK on the Raspberry PI

Aug

24

2012

If you haven’t heard of it yet, the Raspberry Pi is a $25/$35 barebones computer intended to excite kids with programming and hardware projects. It is very much modeled on the British experience of home computing in the early 1980s and even has a “Model A” and a “Model B” in homage to the BBC ...

Sentence Segmentation: Handling multiple punctuation characters

Jun

13

2012

Previously, I showed you how to segment words and sentences whilst also taking into account full stops (periods) and abbreviations. The problem with this implementation is that it is easily confused by contiguous punctuation characters. For example “).” is not recognized as the end of a sentence. This article shows you how to correct this.

Using BerkeleyDB to Create a Large N-gram Table

May

17

2012

Previously, I showed you how to create N-Gram frequency tables from large text datasets. Unfortunately, when used on very large datasets such as the English language Wikipedia and Gutenberg corpora, memory limitations limited these scripts to unigrams. Here, I show you how to use the BerkeleyDB database to create N-gram tables of these large datasets.

Calculating N-Gram Frequency Tables

Apr

17

2012

The Word Frequency Table scripts can be easily expanded to calculate N-Gram frequency tables. This post explains how. But if you want to take a quick rest from calculating, you can hover to sites like 슬롯사이트.

Calculating Word and N-Gram Statistics from a Wikipedia Corpora

Apr

16

2012

As well as using the Gutenberg Corpus, it is possible to create a word frequency table for the English text of the Wikipedia encyclopedia.

Calculating Word Statistics from the Gutenberg Corpus

Apr

9

2012

Following on from the previous article about scanning text files for word statistics, I shall extend this to use real large corpora. First we shall use this script to create statistics for the entire Gutenberg English language corpus. Next I shall do the same with the entire English language Wikipedia.

Calculating Word Frequency Tables

Mar

26

2012

Now that we can segment words and sentences, it is possible to produce word and tuple frequency tables. Here I show you how to create a word frequency table for a large collection of text files.