How to easily improve the quality of OpenStreetMaps

There are many ways that you can help to improve the quality of OpenStreetMap data. After familiarizing yourself with OpenStreetMap features and tags, you can use a tool such as JOSM or iD to verify and if necessary correct the features in an area which you are intimately familiar with. For example, you might initially ...

Extracting OpenStreetMap Data for Ultra Mileage

Although we supply a number of PBF extract files from the OpenStreetMap global ‘planet’ file, it is possible to create your own extracts for use with the Ultra Pre-Processor to create custom road packs. Our own extracts are created using the Osmosis command line tool. There are other feature extraction tools such as the Overpass API, ...

Accessing a COM Server from R

Microsoft have recently added support for R to their Visual Studio 2015 development environment. R is a programming language for statistical and graphical programming that is widely used by statisticians and data miners. To use R with Visual Studio, you need to download and install R Tools for Visual Studio and Microsoft R Open. These ...

New Maptitude 3d Surfaces Section

I have just added a new section to the Maptitude ‘Howto’ pages over at mapping-tools.com, discussing Maptitude’s 3d surface and landscape options. Here is an example image of Snowdonia, created using Ordnance Survey elevation data combined with Google Maps Satellite imagery: Other examples include Guadalupe Mountains (Texas), and geological overlays of both the Caprock Escarpment ...

Converting old Ordnance Survey Height Data

I’m currently putting together some pages for the Maptitude How-To pages covering Maptitude’s support for digital terrain models and elevation models. Left over from my undergraduate geology days, I happened to have the Ordnance Survey 1:50,000 height data files for the 20km SH64 square (my dissertation covered an area straddling the SH75SW and SH75SE 5km ...

Running the Charniak-Johnson Parser from Python 2

Although the Python NLTK library contains a number of parsers and grammars, these only support words which are defined in the grammar. This also applies to the supported Probabilistic Context Free Grammars (PCFGs). Therefore, in order to work with a more general parser that can handle unseen words, you have to use a Python wrapper ...

Extracting Body content from a Web Page using .NET

Boilerpipe is a useful library for extracting body content from web pages and discard the ‘boilerplate’ (menus, footers, advertising, etc). It is a Java library, so it requires a Bridge (e.g. JPype for Python) if you wish to use it in a non-Java environment.  Luckily for C# users, Arif Ogan has ported Boilerpipe to C#/Mono. ...

Extracting Body Content from a Web Page

I recently encountered the problem of having to extract the main body content from a series of web pages, and to discard all of the ‘boiler plate’ — i.e. header, menus, footer, and advertising. The application was performing statistical comparisons between web pages, and although it was producing the correct answers for my test data, ...

NLTK on the Raspberry PI

If you haven’t heard of it yet, the Raspberry Pi is a $25/$35 barebones computer intended to excite kids with programming and hardware projects. It is very much modeled on the British experience of home computing in the early 1980s and even has a “Model A” and a “Model B” in homage to the BBC ...

Sentence Segmentation: Handling multiple punctuation characters

Previously, I showed you how to segment words and sentences whilst also taking into account full stops (periods) and abbreviations. The problem with this implementation is that it is easily confused by contiguous punctuation characters. For example “).” is not recognized as the end of a sentence. This article shows you how to correct this.