Using Python to Filter and Correct Data

This example uses a Python script to create a new layer within Maptitude from an input file of Oklahoma well injection data. The input data has a number of erroneous coordinates, so the script filters the bad coordinates out. A few of these coordinates have incorrect signs (e.g. negative latitude or positive longitude), so these are automatically corrected. Finally, each well typically has multiple entries (rows) and these are collated.

The data is the UIC (Underground Injection Control) Injection Volumes report for 2014, available from the Oil and Gas Division of the Oklahoma Corporation Commission. This data is reported by individual oil companies, and lists the volumes of water that have been disposed off in wells. This water is typically “produced water” (i.e. separated from oil that has been pumped from an oil field), and contrary to popular perception, not related to hydraulic fracturing (‘fracking’) activities.

The data file is an Excel workbook, but we export this to a CSV (comma separated value) file that can be easily read with Python’s csv module.

Maptitude includes a Python interface module. At the moment (June 2016), this module only supports Python 2. The following script was written with Python 2.7. You will need a Python COM interface, and ActivePython 2.5 (or later) is recommended. I also recommend the use of a Python development environment such as PythonWin or IDLE.

The Python module can be found in GISDK\Samples\Python sub-directory in the main Maptitude program folder. The installer will have a name such as ‘Caliper-5.0.win32.exe’. Choose the correct installer for your version of Maptitude.

Within Python, the module is named ‘caliper’ (no quotes). Maptitude is started with a call to caliper.Gisdk().

Here is the full script:

# Reads the Oklahoma Well Data File and plots it in Maptitude as a new layer.
# Data is aggregated for the entire year, and a sized-symbol theme is applied according
# to the total injected volume.
# Many coordinates are bad. If possible these are corrected, otherwise they are discarded.

import caliper
import csv
import sys

# Parameters - these could be passed in the command line if you wished

# Input file
csvFileName = 'c:\\Users\\richard\\Desktop\\earthquakes\\UIC Injection Report Year 2014.csv'

# Input Maptitude map- the layer will be added to this
mapFileName = 'c:\\Users\\richard\\Desktop\\earthquakes\\maps\\'

# Layer and Theme name for Maptitude
layer_name = 'Well Injections'
theme_name = 'WellInjections'

# Files to store the layer and joined data table
geogFileName = 'c:\\Users\\richard\\Desktop\\earthquakes\\maps\\well_injections.dbd'
dbFileName = 'c:\\Users\\richard\\Desktop\\earthquakes\\maps\\well_injections.ffa'

# Class to store the data for a well
class WellData(object):
    def __init__(self, nm, op, fm, lng, lat, vol):
        self.Name = nm
        self.Operator = op
        self.Formation = fm
        self.Longitude = lng
        self.Latitude = lat
        self.Volume = vol

    # Add additional injection volume to a well object
    def AddVolume(self, vol):
        self.Volume = self.Volume + vol

    # Create a Maptitude coord for this well.
    # dk references the Maptitude app object
    def Coordinate(self, dk):
        return dk.Coord( long(self.Longitude*1000000), long(self.Latitude*1000000) )

# isNumeric only works with unicode - which isn't the default for Python2
# So here is our own implementation
def is_number(s):
        return True
    except ValueError:
        import unicodedata
        return True
    except (TypeError, ValueError):
    return False

# The main code starts here

# Connect to Maptitude and open the existing OK state map
dk = caliper.Gisdk("Maptitude")
dk.OpenMap( mapFileName, { "Auto Project" : "True" } )

# Delete any existing well layer (if we have one)
layers = dk.GetLayerNames()
if layer_name in layers:
    dk.DropLayer(None, layer_name)

# Read the well data in from a CSV file
print("Reading data...")

nrow = 0
wells = {}          # wells that have been successfully read
bad_wells = set()   # list of wells that have bad coords
# counts of wells that are in the wrong hemisphere but correctable
wrong_south = 0
wrong_east = 0

# Use the csv module to open the well data file and read it
with open(csvFileName) as wellfile:
    readCSV = csv.DictReader(wellfile, delimiter=',')
    for row in readCSV:
        # Read the next row of the file
        # Many rows are empty at the end - detect these by checking the name
        name = row["WellName"].strip()
        if  len(name) > 0 and (not name in bad_wells):
            # row has data: keep a simple count
            nrow += 1
            vol = float( row["Volume"] )   # volume injected for this well&month

            # Do we have this well already?
            if name in wells:
                # Yes we have this well, so just add the volume to it
                # New well, so fetch all the well data from this row and create a new WellData obj
                op = row["OperatorName"]
                fm = row["FormationName"]
                slng = row["LongX"]
                slat = row["LatY"]
                if is_number(slng) and is_number(slat):
                    lng = float(slng)
                    lat = float(slat)
                    # Correct coords with incorrect signs (ie. wrong hemispheres)
                    if lng>0.0:
                        lng = -lng
                        wrong_east = wrong_east + 1
                    if lat<0.0:
                        lat = -lat
                        wrong_south = wrong_south+1

                    # Check it is within an approximate Oklahoma box
                    if lng < -94.4 and lng > -103:
                        if lat > 33.60 and lat < 37.0:
                            wells[name] = WellData(name,op,fm, lng,lat, vol)
# Report some summary data regarding erroneous rows
print("Total rows processed: {0}" . format(nrow))
print("No of valid wells: {0}" . format( len(wells) ) )
print("No of bad wells: {0}" . format ( len(bad_wells) ))
print("Bad Hemisphere:  Eastern: {0};  Southern: {1}" . format(wrong_east,wrong_south))

# Create the Maptitude layer
# We create a point layer and a data table, then join them together
print("\nCreating new layer...")

dk.CreateDatabase(geogFileName, "Point", { "Layer Name" : layer_name } )
lyr_name = dk.AddLayer(None, layer_name, geogFileName, layer_name, { "Shared" : "True" } )

field_defs = [ ["ID", "Integer", 8, None, "Yes"],
               ["Name", "String", 40, None, "No","Well Name"],
               ["Operator", "String", 40, None, "No","Name of the Operator of the well"],
               ["Formation", "String", 40, None, "No","Name of Formation where the water is injected"],
               ["Volume", "Real", 12, 2, "No", "Volume of injected water (barrels)" ]

view_name = dk.CreateTable(layer_name + "_VIEW", dbFileName, "FFA", field_defs)
dk.JoinTableToLayer(geogFileName, lyr_name, "FFA", dbFileName, None, "ID", { "Symmetric" : "True"} )
dk.apply("G30 new layer default settings", "gis_ui",lyr_name)

# write each well out to the new layer/table
id = 0  # Maptitude identifer
for well_name, this_well in wells.iteritems():
    new_id = dk.AddPoint( this_well.Coordinate(dk), id)
    dk.SetRecordsValues( None,
                         [ [ "Name", "Operator", "Formation", "Volume"], [ str(new_id) ] ],
                         [ this_well.Name, this_well.Operator, this_well.Formation, this_well.Volume ],
                         None )

# Set the theme for this new layer:  Sized blue circle-with-cross symbol
dk.SetIcon(lyr_name +"|", "Font Character","Caliper Cartographic|16",  52)
dk.SetIconColor(lyr_name +"|", dk.ColorRGB(0,0,65535))
theme = dk.CreateContinuousTheme(theme_name, [lyr_name + ".Volume"],
                            { "Title" : "Well Injections", "Minimum size": 4, "Maximum size": 20 } )
dk.ShowTheme(None, theme)

# Finished!
dk.SaveMap( None, mapFileName )
dk.CloseMap( None)
dk = None

This loads the CSV file into a series of WellData objects. Each well is represented by multiple rows (one per month). These are aggregated by summing the volume injected for each well for the entire year. Coordinates are also filtered so that only those within a rectangular box around Oklahoma are accepted. Some have erroneous signs that place them in the wrong hemisphere, and these are corrected.

The data is plotted as a point layer with a JOINed table. The table stores the non-geospatial information for each well. A blue sized-symbol theme is applied. This is sized according to the volume of water injected (in barrels).

Here are the results (blue) on a map that also shows earthquakes for 2014-5 (red):

Imported well data (blue symbols) on a map that also shows earthquakes (red dots).

The input data had a number of fields that we ignored. One of the ones that was imported is the FormationName. This refers to the geological formation that the water was injected into. By creating a Maptitude selection set, it is possible to plot a single formation. Here we plot just the wells that injected water into the Arbuckle Formation:

Restricting the displayed wells (blue) to only those that injected water into the Arbuckle Formation. Notice the correlation with the earthquakes (red).

Here we see that there is a pretty good correlation between earthquakes and volumes injected into the Arbuckle Formation. These maps are discussed in more detail in the Mapping Earthquakes Case Study; and this relationship is given a much more rigorous treatment in the Science paper Oklahoma’s recent earthquakes and saltwater disposal“, Walsh, F., Zoback, M. 2015 (Sci. Adv. 2015;1:e1500195).