Working with vectors using GDAL and Python
15.04.2010 09:55 · GIS · gdal, python, howto
GDAL is a free library for working with raster and vector data, OGR is a part of the GDAL and is used to work with vector data. The command line utilities included in the library are widely used to perform a variety of tasks. Thanks to the developed API, you can work with OGR functions from many programming languages. This article is dedicated to using the OGR API in Python and is based on the GDAL Vector API Tutorial.
Preparation
I assume that all software is installed using the OSGeo4W installer. To use GDAL from Python, the appropriate language bindings are required. In the OSGeo4W the required package is called gdal-python
(for GDAL 1.5.x) or gdal16-python
(for GDAL 1.6.x). I recommend using the 1.6.x version.
The GDAL Python API consists of five core modules and five optional modules (exist for compatibility with older versions):
gdal
— bingings for GDAL library (raster data support)ogr
— bingings for OGR library (vector data support)osr
— projections supportgdal_array
— helper functionsgdalconst
— various constants used in other modules
They can be imported using the following code:
# main modules
from osgeo import gdal
from osgeo import ogr
from osgeo import osr
from osgeo import gdal_array
from osgeo import gdalconst
# compatibility modules. They will be removed in GDAL 2.0
import gdal
import ogr
import osr
import gdalnumeric
import gdalconst
When using GDAL version 1.5 and above, it is recommended to import “main” modules. If the code needs to be compatible with older GDAL versions, modules can be imported using the following approach:
try:
from osgeo import ogr
except ImportError:
import ogr
In most cases, it is enough to import only the ogr
module.
Opening a file
The ogr.Open()
function is used to open a vector dataset. OGR supports various datasource types; it can be a file, a database, a directory with files, or even a remote web service, depending on the driver. The function takes the full path to the dataset as the first argument and an optional constant describing the opening mode as the second argument. If the constant is omitted, read-only mode is assumed. On success, the function returns an OGRDataSource
object; otherwise, it returns None
.
import sys
from osgeo import ogr
ogrData = ogr.Open("/home/alex/test/points.shp", False)
# or without specifying open mode
# ogrData = ogr.Open("/home/alex/test/points.shp")
# checking that everything is fine
if ogrData is None:
print "ERROR: open failed"
sys.exit(1)
Working with layers
Since a dataset can contain multiple layers, it is necessary to specify a layer to process. The number of layers in a dataset can be retrieved using the GetLayerCount()
method of the OGRDataSource
object. The layer can be accessed either by its index (from the range 0…GetLayerCount() - 1
) or by its name:
print "Number of layers", ogrData.GetLayerCount()
# get layer by its index
layer = ogrData.GetLayer(0)
# alternatively
#layer = ogrData[0]
# or by its name
layer = ogrData.GetLayerByName("points")
# alternatively
#layer = ogrData["points"]
The example above is very simple. For instance, there is no check if the layer
variable is a valid object, but this does not mean that there should not be such a check. In case of an error (e.g. if the index value is out of range), the variable layer
will be set to None
.
After retrieving a layer, it is worth calling the ResetReading()
function to make sure that we are at the beginning of the layer. But if the layer has just been opened, as in our example, you can omit this call
if layer is None:
print "ERROR: can't access layer"
sys.exit(1)
layer.ResetReading()
Getting information and working with data
The number of features in the layer can be obtained using GetFeatureCount()
. The GetSpatialRef()
function returns the coordinate reference system of the layer. The extent of a layer can be obtained by calling GetExtent()
, and the number of fields by calling GetFieldCount()
. Be careful when getting the extent! Depending on the driver, the extent value may or may not depend on the spatial filter, so it is recommended to deactivate the spatial filter before calling GetExtent()
.
print "Feature count", layer.GetFeatureCount()
print "Layer SRC", layer.GetSpatialRef()
print "Layer extent", layer.GetExtent()
print "Field count", layer.GetFieldCount()
After getting a layer object, you can start reading its features. Usually, a loop and the function GetNextFeature()
are used to access features. This function returns the next feature of a layer, or None
if there are no features left. Before retrieving features, you can set a spatial or attribute filter using SetSpatialFilter()
/SetSpatialFilterRect()
or SetAttributeFilter()
respectively.
feat = layer.GetNextFeature()
while feat is not None:
# process feature
The following snippet can be used to get the fields of the feature and their values:
feat = layer.GetNextFeature()
featDef = layer.GetLayerDefn() # layer's schema (attribute table)
while feat is not None:
for i in range(featDef.GetFieldCount()): # loop over all fields
fieldDef = featDef.GetFieldDefn(i) # get field with index i
print "Field name", fieldDef.GetNameRef() # and output information
print "Field type", fieldDef.GetType()
print "Field value", feat.GetFieldAsString(i)
feat = layer.GetNextFeature() # get next feature
Fields can have different data types, for each data type, there is a corresponding getter function. For example, for the OFTInteger
data type — GetFieldAsInteger()
, for OFTDateTime
— GetFieldAsDateTime()
and so on. A readable representation of the value of any field can also be obtained using the GetFieldAsString()
function.
The GetGeometryRef()
function returns the geometry of a feature:
geom = feat.GetGeometryRef()
if geom is None:
print "Invalid geometry"
if geom.GetGeometryType() == ogr.wkbPoint:
print "%.3f, %.3f" % ( geom.GetX(), geom.GetY() )
else:
print "Non point geometry"
The obtained geometry can be exported to various formats using one of the functions ExportToGML()
, ExportToJson()
, ExportToKML()
, ExportToWkb()
, ExportToWkt()
. We can also perform various geoprocessing operations on the geometry: ConvexHull()
, Buffer()
, Difference()
…
When you have finished processing the data, it is necessary to free the resources and close the dataset
ogrData.Destroy()
Writing files
Let’s look at how to write data using the example of saving to a Shapefile. First of all, we need to specify which driver to use:
driverName = "ESRI Shapefile"
drv = ogr.GetDriverByName(driverName)
if drv is None:
print "%s driver not available." % driverName
Once the driver has been successfully initialised, we can create a dataset. The Shapefile driver allows us to create either a single Shapefile or a directory containing one or more Shapefiles. If you need to create a single file, be sure to specify the file name and extension. If necessary, to fine-tune the driver, we can pass creation options as a list of key-value pairs as a second parameter.
ogrData = drv.CreateDataSource("/home/alex/point_out.shp")
if ogrData is None:
print "Failed to create output file."
sys.exit(1)
Now we can create a new layer in a dataset. For each layer, we should specify its name and geometry type, while the coordinate reference system can be omitted
layer = ogrData.CreateLayer("point_out", None, ogr.wkbPoint)
if layer is None:
print "Failed to create layer."
sys.exit(1)
OK, the layer is created, let’s add attributes to describe the features of the layer. It is necessary to add fields (attributes) before adding features to the layer. For each field, we need to create an OGRField
object and set the necessary parameters (name, type, length, etc.)
fieldDef = ogr.FieldDefn("FileName", ogr.OFTString) # field name and data type
fieldDef.SetWidth(32) # field length
if layer.CreateField(fieldDef) != 0:
print "Failed to create field."
sys.exit(1)
# for fields, containing floating point values, we can also set precision
fieldDef = ogr.FieldDefn("Area", ogr.OFTReal) # field name and data type
fieldDef.SetWidth(18) # field length
fieldDef.SetPrecision(6) # precision (number of decimal places)
if layer.CreateField(fieldDef) != 0:
print "Failed to create field."
sys.exit(1)
To write a feature to the layer, we need to create an OGRFeature
object, set the attributes and geometry. The OGRFeature
object must be associated with a layer:
name = "/home/alex/test.tif" # attribute value
feat = ogr.Feature(layer.GetLayerDefn()) # instantiate OGRFeature
feat.SetField("FileName", name) # set attribute
pt = ogr.Geometry(ogr.wkbPoint) # create a point geometry
x = 1.0
y = 2.0
pt.SetPoint_2D(0, x, y)
feat.SetGeometry(pt) # assign geometry to a feature
And finally, write data to a file
if layer.CreateFeature(feat) != 0:
print "Failed to create feature in shapefile."
sys.exit(1)
# free memory and close file
feat.Destroy()
ogrData.Destroy()
Example. Select by attribute and calculate area
Using the information from the post, let’s try to write a simple Python script to select features from a shapefile by attribute value, save matching features to another file, and calculate the area of the feature’s geometry. The script will take four parameters: the source shapefile, the path to the output shapefile, the name and value of the attribute to filter on.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import sys
from osgeo import ogr
def usage():
print "Usage: query_shape.py input_shapefile output_shapefile query_field match_value"
sys.exit(0)
if __name__ == "__main__":
args = sys.argv[1:]
if len(args) != 4:
usage()
inPath = os.path.normpath(args[0])
outPath = os.path.normpath(args[1])
fieldName = args[2]
fieldValue = args[3]
# open source dataset
inputData = ogr.Open(inPath, False)
if inputData is None:
print "ERROR: open failed"
sys.exit(1)
# get first layer (shapefile always contains only one layer)
# and reset reading to start from the first feature
inputLayer = inputData.GetLayer(0)
inputLayer.ResetReading()
# appply attribute filter (using equal contidion as an example)
query = fieldName + '=' + fieldValue
inputLayer.SetAttributeFilter(query)
# prepare destination dataset for writing features
# first, we need get driver
driverName = "ESRI Shapefile"
drv = ogr.GetDriverByName(driverName)
if drv is None:
print "%s driver not available." % driverName
sys.exit(1)
# creating destination dataset
outputData = drv.CreateDataSource(outPath)
if outputData is None:
print "Creation of output file failed."
sys.exit(1)
# and destination layer in the destination dataset
layerName = os.path.basename(outPath).strip(".shp")
outputLayer = outputData.CreateLayer(layerName, None, ogr.wkbPolygon)
if outputLayer is None:
print "Layer creation failed."
sys.exit(1)
# start iterating over features in input layer
inFeat = inputLayer.GetNextFeature()
# and build feature definition (attribute table) for output layer
inFeatDef = inputLayer.GetLayerDefn()
# copy all existing fields
for i in range(inFeatDef.GetFieldCount()):
fieldDef = inFeatDef.GetFieldDefn(i)
if outputLayer.CreateField (fieldDef) != 0:
print "Can't create field %s" % fieldDef.GetNameRef()
sys.exit(1)
# and add a new field to store area of the geometry
fieldDef = ogr.FieldDefn("area", ogr.OFTReal)
fieldDef.SetWidth(12)
fieldDef.SetPrecision(4)
if outputLayer.CreateField (fieldDef) != 0:
print "Can't create field %s" % fieldDef.GetNameRef()
sys.exit(1)
# actual loop over input features which match our attribute filter
inFeatDef = inputLayer.GetLayerDefn()
while inFeat is not None:
outFeat = ogr.Feature(outputLayer.GetLayerDefn())
# copy attributes and geometry of the input feature
res = outFeat.SetFrom(inFeat)
if res != 0:
print "Can't copy feature"
sys.exit(1)
# calculate area of the geometry
geom = inFeat.GetGeometryRef()
area = geom.GetArea()
# and save this value to a field
outFeat.SetField("area", area)
# write output feature to the destination layer
if outputLayer.CreateFeature(outFeat) != 0:
print "Failed to create feature in shapefile.\n"
sys.exit(1)
# cleanup temporary objects and go to the next feature
outFeat.Destroy()
inFeat = inputLayer.GetNextFeature()
# closing input and output datasets
inputData.Destroy()
outputData.Destroy()
If we take the grassland.shp
file from the QGIS sample dataset as the source layer and try to get a new file with objects whose F_CODE
field equals EB010
using this script, then the corresponding command will look like this (don’t forget to run gdal16
before running the script):
python query_shape.py shapefiles/grassland.shp out.shp F_CODE EB010
Useful links
For those who want to learn more about raster processing with GDAL and Python, I suggest the following links as a starting point: