gutenberg ← Winwaed Blog

Calculating Word Statistics from the Gutenberg Corpus

Apr

2012

Following on from the previous article about scanning text files for word statistics, I shall extend this to use real large corpora. First we shall use this script to create statistics for the entire Gutenberg English language corpus. Next I shall do the same with the entire English language Wikipedia.