New Book: “Artificial Intelligence with Python” by Prateek Joshi

Packt Publishing have just published “Artificial Intelligence with Python” by Prateek Joshi. I was the technical editor. The book serves as a good introduction to a wide range of AI techniques and Python libraries. Due to this breadth of coverage, there isn’t the space for really in-depth discussion of individual techniques, but the book should ...

Effective Python Penetration Testing published

I was a technical reviewer/editor for Effective Python Penetration Testing by Rejah Rehim which has just been published by Packt Publishing. This book covers a range of security and analysis techniques and tools, including the automation of penetration testing – all using Python.  

New Book Published: Mastering Python Forensics

I was a technical editor for Packt Publishing’s latest book, Mastering Python Forensics by Dr Michael Spreitzenbarth and Dr Johann Uhrmann, which was published last week. Mastering Python Forensics covers Windows, Linux, virtualization, mobile Android and Apple IOS), and memory forensics using Python.  

Running the Charniak-Johnson Parser from Python 2

Although the Python NLTK library contains a number of parsers and grammars, these only support words which are defined in the grammar. This also applies to the supported Probabilistic Context Free Grammars (PCFGs). Therefore, in order to work with a more general parser that can handle unseen words, you have to use a Python wrapper ...

Python Geospatial Development: Second Edition published

The second edition of Erik Westra’s Python Geospatial Development has just been published. Full Disclosure: I served as a technical editor for the new edition.

NLTK (alpha) for Python 3 Released

The first alpha release of NLTK 3.0 — i.e. NLTK for Python3 has just been released. Downloads and further information can be found here: http://nltk.org/nltk3-alpha/   Although not quite ready for prime time, this is a major step towards full Python 3 support in the NLTK library.

Extracting Body Content from a Web Page

I recently encountered the problem of having to extract the main body content from a series of web pages, and to discard all of the ‘boiler plate’ — i.e. header, menus, footer, and advertising. The application was performing statistical comparisons between web pages, and although it was producing the correct answers for my test data, ...

Using BerkeleyDB to Create a Large N-gram Table 1

Previously, I showed you how to create N-Gram frequency tables from large text datasets. Unfortunately, when used on very large datasets such as the English language Wikipedia and Gutenberg corpora, memory limitations limited these scripts to unigrams. Here, I show you how to use the BerkeleyDB database to create N-gram tables of these large datasets.

Calculating N-Gram Frequency Tables

The Word Frequency Table scripts can be easily expanded to calculate N-Gram frequency tables. This post explains how.

Calculating Word and N-Gram Statistics from a Wikipedia Corpora 3

As well as using the Gutenberg Corpus, it is possible to create a word frequency table for the English text of the Wikipedia encyclopedia.