Working with vectors using GDAL and Python

15.04.2010 09:55 ·  GIS  ·  gdal, python, howto

GDAL is a free library for working with raster and vector data, OGR is a part of the GDAL and is used to work with vector data. The command line utilities included in the library are widely used to perform a variety of tasks. Thanks to the developed API, you can work with OGR functions from many programming languages. This article is dedicated to using the OGR API in Python and is based on the GDAL Vector API Tutorial.

Preparation

I assume that all software is installed using the OSGeo4W installer. To use GDAL from Python, the appropriate language bindings are required. In the OSGeo4W the required package is called gdal-python (for GDAL 1.5.x) or gdal16-python (for GDAL 1.6.x). I recommend using the 1.6.x version.

The GDAL Python API consists of five core modules and five optional modules (exist for compatibility with older versions):

They can be imported using the following code:

# main modules
from osgeo import gdal
from osgeo import ogr
from osgeo import osr
from osgeo import gdal_array
from osgeo import gdalconst

# compatibility modules. They will be removed in GDAL 2.0
import gdal
import ogr
import osr
import gdalnumeric
import gdalconst

When using GDAL version 1.5 and above, it is recommended to import “main” modules. If the code needs to be compatible with older GDAL versions, modules can be imported using the following approach:

try:
    from osgeo import ogr
except ImportError:
    import ogr

In most cases, it is enough to import only the ogr module.

Opening a file

The ogr.Open() function is used to open a vector dataset. OGR supports various datasource types; it can be a file, a database, a directory with files, or even a remote web service, depending on the driver. The function takes the full path to the dataset as the first argument and an optional constant describing the opening mode as the second argument. If the constant is omitted, read-only mode is assumed. On success, the function returns an OGRDataSource object; otherwise, it returns None.

import sys
from osgeo import ogr
ogrData = ogr.Open("/home/alex/test/points.shp", False)
# or without specifying open mode
# ogrData = ogr.Open("/home/alex/test/points.shp")

# checking that everything is fine
if ogrData is None:
    print "ERROR: open failed"
    sys.exit(1)

Working with layers

Since a dataset can contain multiple layers, it is necessary to specify a layer to process. The number of layers in a dataset can be retrieved using the GetLayerCount() method of the OGRDataSource object. The layer can be accessed either by its index (from the range 0…GetLayerCount() - 1) or by its name:

print "Number of layers", ogrData.GetLayerCount()

# get layer by its index
layer = ogrData.GetLayer(0)
# alternatively
#layer = ogrData[0]
# or by its name
layer = ogrData.GetLayerByName("points")
# alternatively
#layer = ogrData["points"]

The example above is very simple. For instance, there is no check if the layer variable is a valid object, but this does not mean that there should not be such a check. In case of an error (e.g. if the index value is out of range), the variable layer will be set to None.

After retrieving a layer, it is worth calling the ResetReading() function to make sure that we are at the beginning of the layer. But if the layer has just been opened, as in our example, you can omit this call

if layer is None:
    print "ERROR: can't access layer"
    sys.exit(1)

layer.ResetReading()

Getting information and working with data

The number of features in the layer can be obtained using GetFeatureCount(). The GetSpatialRef() function returns the coordinate reference system of the layer. The extent of a layer can be obtained by calling GetExtent(), and the number of fields by calling GetFieldCount(). Be careful when getting the extent! Depending on the driver, the extent value may or may not depend on the spatial filter, so it is recommended to deactivate the spatial filter before calling GetExtent().

print "Feature count", layer.GetFeatureCount()
print "Layer SRC", layer.GetSpatialRef()
print "Layer extent", layer.GetExtent()
print "Field count", layer.GetFieldCount()

After getting a layer object, you can start reading its features. Usually, a loop and the function GetNextFeature() are used to access features. This function returns the next feature of a layer, or None if there are no features left. Before retrieving features, you can set a spatial or attribute filter using SetSpatialFilter()/SetSpatialFilterRect() or SetAttributeFilter() respectively.

feat = layer.GetNextFeature()
while feat is not None:
    # process feature

The following snippet can be used to get the fields of the feature and their values:

feat = layer.GetNextFeature()
featDef = layer.GetLayerDefn() # layer's schema (attribute table)
while feat is not None:
    for i in range(featDef.GetFieldCount()): # loop over all fields
        fieldDef = featDef.GetFieldDefn(i) # get field with index i
        print "Field name", fieldDef.GetNameRef() # and output information
        print "Field type", fieldDef.GetType()
        print "Field value", feat.GetFieldAsString(i)
    feat = layer.GetNextFeature() # get next feature

Fields can have different data types, for each data type, there is a corresponding getter function. For example, for the OFTInteger data type — GetFieldAsInteger(), for OFTDateTimeGetFieldAsDateTime() and so on. A readable representation of the value of any field can also be obtained using the GetFieldAsString() function.

The GetGeometryRef() function returns the geometry of a feature:

geom = feat.GetGeometryRef()
if geom is None:
    print "Invalid geometry"

if geom.GetGeometryType() == ogr.wkbPoint:
    print "%.3f, %.3f" % ( geom.GetX(), geom.GetY() )
else:
    print "Non point geometry"

The obtained geometry can be exported to various formats using one of the functions ExportToGML(), ExportToJson(), ExportToKML(), ExportToWkb(), ExportToWkt(). We can also perform various geoprocessing operations on the geometry: ConvexHull(), Buffer(), Difference()

When you have finished processing the data, it is necessary to free the resources and close the dataset

ogrData.Destroy()

Writing files

Let’s look at how to write data using the example of saving to a Shapefile. First of all, we need to specify which driver to use:

driverName = "ESRI Shapefile"
drv = ogr.GetDriverByName(driverName)
if drv is None:
    print "%s driver not available." % driverName

Once the driver has been successfully initialised, we can create a dataset. The Shapefile driver allows us to create either a single Shapefile or a directory containing one or more Shapefiles. If you need to create a single file, be sure to specify the file name and extension. If necessary, to fine-tune the driver, we can pass creation options as a list of key-value pairs as a second parameter.

ogrData = drv.CreateDataSource("/home/alex/point_out.shp")
if ogrData is None:
    print "Failed to create output file."
    sys.exit(1)

Now we can create a new layer in a dataset. For each layer, we should specify its name and geometry type, while the coordinate reference system can be omitted

layer = ogrData.CreateLayer("point_out", None, ogr.wkbPoint)
if layer is None:
    print "Failed to create layer."
    sys.exit(1)

OK, the layer is created, let’s add attributes to describe the features of the layer. It is necessary to add fields (attributes) before adding features to the layer. For each field, we need to create an OGRField object and set the necessary parameters (name, type, length, etc.)

fieldDef = ogr.FieldDefn("FileName", ogr.OFTString) # field name and data type
fieldDef.SetWidth(32) # field length

if layer.CreateField(fieldDef) != 0:
    print "Failed to create field."
    sys.exit(1)

# for fields, containing floating point values, we can also set precision
fieldDef = ogr.FieldDefn("Area", ogr.OFTReal) # field name and data type
fieldDef.SetWidth(18) # field length
fieldDef.SetPrecision(6) # precision (number of decimal places)

if layer.CreateField(fieldDef) != 0:
    print "Failed to create field."
    sys.exit(1)

To write a feature to the layer, we need to create an OGRFeature object, set the attributes and geometry. The OGRFeature object must be associated with a layer:

name = "/home/alex/test.tif" # attribute value
feat = ogr.Feature(layer.GetLayerDefn()) # instantiate OGRFeature
feat.SetField("FileName", name) # set attribute

pt = ogr.Geometry(ogr.wkbPoint) # create a point geometry
x = 1.0
y = 2.0
pt.SetPoint_2D(0, x, y)
feat.SetGeometry(pt) # assign geometry to a feature

And finally, write data to a file

if layer.CreateFeature(feat) != 0:
    print "Failed to create feature in shapefile."
    sys.exit(1)

# free memory and close file
feat.Destroy()
ogrData.Destroy()

Example. Select by attribute and calculate area

Using the information from the post, let’s try to write a simple Python script to select features from a shapefile by attribute value, save matching features to another file, and calculate the area of the feature’s geometry. The script will take four parameters: the source shapefile, the path to the output shapefile, the name and value of the attribute to filter on.

# -*- coding: utf-8 -*-

#!/usr/bin/env python

import os
import sys
from osgeo import ogr

def usage():
    print "Usage: query_shape.py input_shapefile output_shapefile query_field match_value"
    sys.exit(0)

if __name__ == "__main__":
    args = sys.argv[1:]

    if len(args) != 4:
        usage()

    inPath = os.path.normpath(args[0])
    outPath = os.path.normpath(args[1])
    fieldName = args[2]
    fieldValue = args[3]

    # open source dataset
    inputData = ogr.Open(inPath, False)
    if inputData is None:
        print "ERROR: open failed"
        sys.exit(1)

    # get first layer (shapefile always contains only one layer)
    # and reset reading to start from the first feature
    inputLayer = inputData.GetLayer(0)
    inputLayer.ResetReading()

    # appply attribute filter (using equal contidion as an example)
    query = fieldName + '=' + fieldValue
    inputLayer.SetAttributeFilter(query)

    # prepare destination dataset for writing features
    # first, we need get driver
    driverName = "ESRI Shapefile"
    drv = ogr.GetDriverByName(driverName)
    if drv is None:
        print "%s driver not available." % driverName
        sys.exit(1)

    # creating destination dataset
    outputData = drv.CreateDataSource(outPath)
    if outputData is None:
        print "Creation of output file failed."
        sys.exit(1)

    # and destination layer in the destination dataset
    layerName = os.path.basename(outPath).strip(".shp")
    outputLayer = outputData.CreateLayer(layerName, None, ogr.wkbPolygon)
    if outputLayer is None:
        print "Layer creation failed."
        sys.exit(1)

    # start iterating over features in input layer
    inFeat = inputLayer.GetNextFeature()

    # and build feature definition (attribute table) for output layer
    inFeatDef = inputLayer.GetLayerDefn()
    # copy all existing fields
    for i in range(inFeatDef.GetFieldCount()):
        fieldDef = inFeatDef.GetFieldDefn(i)
        if outputLayer.CreateField (fieldDef) != 0:
            print "Can't create field %s" % fieldDef.GetNameRef()
            sys.exit(1)
    # and add a new field to store area of the geometry
    fieldDef = ogr.FieldDefn("area", ogr.OFTReal)
    fieldDef.SetWidth(12)
    fieldDef.SetPrecision(4)
    if outputLayer.CreateField (fieldDef) != 0:
        print "Can't create field %s" % fieldDef.GetNameRef()
        sys.exit(1)

    # actual loop over input features which match our attribute filter
    inFeatDef = inputLayer.GetLayerDefn()
    while inFeat is not None:
        outFeat = ogr.Feature(outputLayer.GetLayerDefn())
        # copy attributes and geometry of the input feature
        res = outFeat.SetFrom(inFeat)
        if res != 0:
            print "Can't copy feature"
            sys.exit(1)
        # calculate area of the geometry
        geom = inFeat.GetGeometryRef()
        area = geom.GetArea()
        # and save this value to a field
        outFeat.SetField("area", area)

        # write output feature to the destination layer
        if outputLayer.CreateFeature(outFeat) != 0:
            print "Failed to create feature in shapefile.\n"
            sys.exit(1)

        # cleanup temporary objects and go to the next feature
        outFeat.Destroy()
        inFeat = inputLayer.GetNextFeature()

    # closing input and output datasets
    inputData.Destroy()
    outputData.Destroy()

If we take the grassland.shp file from the QGIS sample dataset as the source layer and try to get a new file with objects whose F_CODE field equals EB010 using this script, then the corresponding command will look like this (don’t forget to run gdal16 before running the script):

python query_shape.py shape/veg.shp out.shp SUBTYPE_L 2

For those who want to learn more about raster processing with GDAL and Python, I suggest the following links as a starting point:

⮜ Prev
Next ⮞