Geospatial Data Verification

Geospatial Data Verification
Previously we looked at visual data verification using Python and Pandas. Here we shall extend this to look at geospatial data verification of the earlier Oklahoma Injection Well Dataset. Gesopatial data can be managed and plotted using Geopandas – a geospatial extension to Pandas. This comes with some basic basemap data, but you will probably ...

Visual Data Verification

Visual Data Verification
Previously we looked at importing and the initial verification of data in Python. Next we shall look at the visual verification of data. We shall use Pandas with Matplotlib to plot a series of graphs to check for erroneous data. We will use numpy, matplotlib, and pandas; and all of these can be installed with ...

Handling Unicode Input in Python

We have looked at reading data into Python, but have ignored the issue of character encoding. In English speaking countries we often assume a text file or string is simple ASCII. More often than not, the file is actually a Unicode file. With Python 2, ignoring this issue would not usually result in any problems ...

Python Data Validation: Date & Time

Regardless of language, handling dates and times is trickier than simple numbers and strings. This is because, even within the Gregorian system, there are a wide range of different formats in addition to multiple time zones and daylight savings / summer time corrections. Just to complicate things, the corrections vary according to date, and these ...

Python Data Validation

Python is a good scripting language for data analysis and processing, but are you sure your imported data is valid? As well as import errors, it is possible the data itself contains errors such as values in the wrong field, inconsistent values/fields, and unexpected situations. Immediately after reading the data, you must validate it, and ...

Solving the Six Degrees of Kevin Bacon Problem

This article shows you how to solve the “Six Degrees of Kevin Bacon” game using a mixture of SPARQL and Python. Meanwhile, if you want to relax from problems like this, games such as w88 for mobile can be played. SPARQL SPARQL is a query language for triple stores that was born out of the ...

Importing Data into Python

Python is a popular tool for data manipulation and processing. In this first post about Python data manipulation and input, we look at a number of different ways to get your data files loaded into Python. Structured non-tabular data Structured non-tabular data typically consists of data records with fields which are not always present, in ...

Using BerkeleyDB to Create a Large N-gram Table

Previously, I showed you how to create N-Gram frequency tables from large text datasets. Unfortunately, when used on very large datasets such as the English language Wikipedia and Gutenberg corpora, memory limitations limited these scripts to unigrams. Here, I show you how to use the BerkeleyDB database to create N-gram tables of these large datasets.

Calculating N-Gram Frequency Tables

The Word Frequency Table scripts can be easily expanded to calculate N-Gram frequency tables. This post explains how. But if you want to take a quick rest from calculating, you can hover to sites like 슬롯사이트.

Calculating Word and N-Gram Statistics from a Wikipedia Corpora

As well as using the Gutenberg Corpus, it is possible to create a word frequency table for the English text of the Wikipedia encyclopedia.