Blog

Here I post my thoughts, QGIS tips and tricks, updates on my QGIS-related work, etc.

Again about non-ASCII characters in matplotlib

04.08.2009 08:07 ·  Notes  ·  matplotlib, python, tips

A bit more about Cyrillic and other non-ASCII characters in matplotlib. I decided to create a more comprehensive example to show how to output Cyrillic (or any other non-ASCII) characters in different parts of the plot. Actually, there is nothing complex here; you just need to find the time and read the manual, which is quite good, by the way. But as not everyone likes to read manuals… Anyway, here is the code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from pylab import *
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

plt.rcParams["text.usetex"] = False
fp = fm.FontProperties(fname="/home/alex/devel/mpl_cyr/CharisSILR.ttf")
plt.text(0.5, 0.5, u"довільний текст", fontproperties=fp)

# create some data to use for the plot
x = arange(0.0, 3.0, 0.01)
y1 = sin(2 * pi * x)
y2 = cos(2 * pi * x)

# the main axes is subplot(111) by default
plot(x, y1, label=u"Синусоїда")
plot(x, y2, label=u"Косинусоїда")
xlabel(u"Підпис вісі X", fontproperties=fp)
ylabel(u"Підпис вісі Y", fontproperties=fp)
title(u"Назва графіка", fontproperties=fp)
legend(prop=fp)
show()

Cyrillic and other non-ASCII characters in matplotlib

23.07.2009 17:49 ·  Notes  ·  matplotlib, python, tips

There is a wonderful Python library - matplotlib. It is a plotting library that supports a wide range of plot types and is designed to emulate MATLAB commands and behaviour. The library is easy to learn; to draw a simple plot, you literally need two commands. I have used this library in my Statist plugin for QGIS and in another GIS project. To make installation of matplotlib more convenient for inexperienced users, I recently packaged it for OSGeo4W.

Sometimes, I needed to display non-ASCII (namely Cyrillic) characters on matplotlib plots. And there was a problem: such text was drawn as empty squares. Reading manuals, googling, and asking on the mailing list led to two solutions that I would like to share with you.

Method 1: the almighty TeX

matplotlib can use LaTeX to display both plain text and mathematical symbols. Moreover, a limited subset of TeX and the corresponding parser, fonts, and renderer are built into the library, so for this subset, you do not even need to have a full TeX installation. Unfortunately, this TeX subset only contains mathematical characters and letters of the Greek alphabet. In all other cases, an external LaTeX installation is required. To use LaTeX for text rendering, we should set the option

text.usetex: True

in the rc-file. This can be done either globally, by editing the rc-file once, or as needed at runtime. Below is an example of run-time initialisation:

# -*- coding: utf-8 -*-
from matplotlib import rc

rc("font", {"family": "serif"})
rc("text", usetex=True)
rc("text.latex", unicode=True)
rc("text.latex", preamble="usepackage[utf8]{inputenc}")
rc("text.latex", preamble="usepackage[russian]{babel}")

Now we can output Cyrillic (or any other non-ASCII characters)

xlabel(u"Вісь Х: довжина, см")

The big disadvantage of this method is that the user needs to have LaTeX installed.

Method 2: unicode + fonts

matplotlib uses its own font rendering engine with full Unicode support. Therefore, we can explicitly specify a font that contains the required character sets and render the text using that font. Here is a small example:

# -*- coding: utf-8 -*-

import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

plt.rcParams["text.usetex"] = False
fp = fm.FontProperties(fname="/home/alex/.fonts/academy.ttf")
plt.text(0.5, 0.5, u"кириличний текст", fontproperties=fp)
plt.show()

There is also a disadvantage - the required font may not be available on the target system or may be in a different directory, so the font has to be stored in the same directory as your program. But in my opinion, this is much better than having LaTeX as a dependency.

Personally, I chose the latter option.

Vacations and GIS

08.07.2009 15:32 ·  Notes  ·  spatialite

It’s been a week since I went on “last year’s” holiday. Yes, I managed to get those unused 19 days. But that doesn’t stop my bosses from pestering me with phone calls and even dragging me into the office a few times. But all in all, it is very good.

I have more free time, which I spend not only on holidays but also on interesting and necessary things like plugins for QGIS. I’ve already received some feedback on Statist, and I’m improving it. Also, I have an idea for another plugin, but I don’t know if it will be implemented yet.

I am also involved in a GIS project, or rather, its continuation. I came across a nasty bug in the SpatiaLite provider in QGIS. At least, it seems to me that it is a bug. This has caused some difficulties, and I have to look for workarounds…

By the way, a few days ago, Alessandro Furieri (SpatiaLite author) announced the release of SpatiaLite 2.3.1 and released two new projects:

In my opinion, SQLite-based spatial databases are a real alternative to shapefiles. Although shapefiles are the de-facto standard in GIS today, they are already outdated and do not meet the ever-increasing requirements.

Statist plugin for QGIS

02.07.2009 15:20 ·  GIS  ·  qgis, plugins, statist

I have released my plugin for QGIS — Statist.

It is used to obtain statistical information on the specified field of the vector layer attribute table. Both numeric (integer, real, and date) and text (string) fields are supported. The plugin can work on the whole attribute table as well as on selected features. In addition to displaying basic statistical values, Statist also displays a frequency distribution histogram of the field values.

Statist plugin dialog
Statist plugin dialog

To use Statist, it is necessary to have matplotlib installed (it can be installed via OSGeo4W or downloaded from the project page, as it is used to display the frequency distribution histogram.

The plugin is available from my QGIS plugins repository. Comments, feature requests, and bug reports are welcome. It is best to post them in the bugtracker, but email is fine too.

If someone does not need a frequency distribution histogram and unnecessary dependencies, they can use the “Basic Statistics” tool from fTools (now included in core). After my patch, it has the same functionality as Statist except for the frequency distribution histogram.

PostGIS vs ArcSDE: raster load speed test (summary)

30.06.2009 10:14 ·  GIS  ·  postgis, wktraster, arcsde

Let’s summarise what was said in the posts about PostGIS and ArcSDE. The image loading speed tests showed that PostGIS was much slower than ArcSDE.

However, do not forget that WKTRatser is still in an early stage of development (version 0.1.6 at the time of testing) whereas ArcSDE has been around for years. Also, rasters are usually loaded into the database once, so the load speed test is of little practical value. It would be much more interesting to compare the performance of these products when processing rasters. Unfortunately, this is not possible for a number of reasons.

PostGIS vs ArcSDE: raster load speed test (part 3)

30.06.2009 10:14 ·  GIS  ·  postgis, wktraster, arcsde

ArcSDE was tested on the same machine using the same dataset (see PostGIS test). The discs were formatted before the test, and the system was restored from the snapshot. As I could not get ArcSDE to work with my self-compiled PostgreSQL 8.3.7, I tested it on its bundled PostgreSQL 8.3.0 + ArcGIS 9.3 SP1 (build 1850) + ArcSDE 9.3.

As in the PostGIS test, the database cluster was located on a separate 80 GB disc. For the purity of the experiment, the original image in MrSID format was converted to ERDAS IMAGINE using ArcGIS tools. Loading and all other operations were done using ArcGIS Python scripts, not ArcCatalog.

The conversion to the ERDAS IMAGINE format has been carried out with the following script:

import arcgisscripting
gp = arcgisscritpting.create()
gp.toolbox = "management"
gp.CopyRaster_management("N-38-45.sid", "N-38-45.img", "#", "0", "#", "NONE", "NONE", "#")

and it took 1202 s (~20 min), the resulting file has the same size as when using gdal_translate, i.e. ~4.7 Gb. Now let’s build pyramids

import arcgisscripting
gp = arcgisscritpting.create()
gp.toolbox = "management"
gp.BuildPyramids_management("N-38-45.img")

Building the pyramids took 451 s (~7 min.) Before loading the raster into the database, we need to create a RasterDataset for it:

import arcgisscripting
gp = arcgisscritpting.create()
gp.toolbox = "management"
gp.CreateRasterDataset_management("Database Connections/raster.sde", "N_38_45", "14.25", "8_BIT_UNSIGNED", "#", 3, "#", "PYRAMIDS -1 CUBIC", "128 128", "LZ77", "#")

This operation took exactly 2 seconds :-). The time is so small that it can be neglected. Now we can load the raster into the created dataset:

import arcgisscripting
gp = arcgisscritpting.create()
gp.toolbox = "management"
gp.workspace = "d:\raster"
gp.CreateRasterDataset_management("N-38-45.img","Database Connections/raster.sde","LAST","FIRST","0","#","NONE","0","NONE")

The raster loading process was quite fast, taking only 1337 s (~22 min). After loading the image, the database cluster grew from 42,495,444 bytes (~40.5 MB) to 6,934,776,988 bytes (~6.45 GB).

PostGIS vs ArcSDE: raster load speed test (part 2)

30.06.2009 09:25 ·  GIS  ·  postgis, wktraster, arcsde

Continue the testing saga.

Before describing the test and its results, a few words about the test platform.

The test itself was carried out according to Mateusz’s instructions. When the SQL representation of the raster was created, the resulting SQL file was written to the second disc, and before loading it into the database, the file was moved to the data partition of the first disc. After each stage of testing, all disks were defragmented using OS tools, and the machine rebooted.

This Landsat scene was used as a test image. Quite detailed information about the image can be found in the file N-38-45-45.met, which is located next to it, and I give a partially reduced output of gdalinfo below:

Driver: MrSID/Multi-resolution Seamless Image Database (MrSID)
Files: d:\wktraster_test\N-38-45.sid
Size is 42962, 39235
Origin = (193892.625000000000000,5543955.125000000000000)
Pixel Size = (14.250000000000000,-14.250000000000000)
IMAGE__INPUT_FILE_SIZE=5086292652.000000
IMAGE__TARGET_COMPRESSION_RATIO=29.999998
IMAGE__BITS_PER_SAMPLE=8
IMAGE__COMPRESSION_WEIGHT=2.000000
IMAGE__COMPRESSION_GAMMA=1.000000
IMAGE__COMPRESSION_BLOCK_SIZE=4096
Band 1 Block=1024x128 Type=Byte, ColorInterp=Red
Minimum=0.000, Maximum=242.000, Mean=101.223, StdDev=35.192
Overviews: 21481x19618, 10741x9809, 5371x4905, 2686x2453, 1343x1227, 672x614, 336x307, 168x154
Band 2 Block=1024x128 Type=Byte, ColorInterp=Green
Minimum=0.000, Maximum=242.000, Mean=126.427, StdDev=33.576
Overviews: 21481x19618, 10741x9809, 5371x4905, 2686x2453, 1343x1227, 672x614, 336x307, 168x154
Band 3 Block=1024x128 Type=Byte, ColorInterp=Blue
Minimum=0.000, Maximum=252.000, Mean=105.329, StdDev=29.924
Overviews: 21481x19618, 10741x9809, 5371x4905, 2686x2453, 1343x1227, 672x614, 336x307, 168x154

Before starting the test, a snapshot of the system was taken, and all disks were defragmented using OS tools. The source data (i.e., the raster) is located on the second partition of the 320 Gb disc.

Unfortunately, both GDAL and WKTRaster do not yet fully support the MrSID format, so the raster was converted to the ERDAS IMAGINE format (*.img) using the following command:

gdal_translate.exe -of HFA N-38-45.sid N-38-45.img

The conversion took 1160 s (~19 min), and the resulting file occupied ~4.7 Gb of disk space. This file was used in all subsequent operations. We have the image in the supported format, and now we need to build pyramids (overviews), which are not present in our file:

gdaladdo -r average N-38-45.img 2 4 8 16 32 64 128

This command took 1057 s (~17 min) to execute. Using gdalinfo, we make sure that the overviews have been created successfully

Band 1 Block=64x64 Type=Byte, ColorInterp=Undefined
Description = Layer_1
Overviews: 21481x19618, 10741x9809, 5371x4905, 2686x2453, 1343x1227, 672x614, 336x307
Metadata:
LAYER_TYPE=athematic
Band 2 Block=64x64 Type=Byte, ColorInterp=Undefined
Description = Layer_2
Overviews: 21481x19618, 10741x9809, 5371x4905, 2686x2453, 1343x1227, 672x614, 336x307
Metadata:
LAYER_TYPE=athematic
Band 3 Block=64x64 Type=Byte, ColorInterp=Undefined
Description = Layer_3
Overviews: 21481x19618, 10741x9809, 5371x4905, 2686x2453, 1343x1227, 672x614, 336x307
Metadata:
LAYER_TYPE=athematic

Now we can prepare the image for loading into the database. The preparation consists of using the Python script gdal2wktraster.py (shipped with WKTRaster) to generate an SQL file with the image dump.

gdal2wktraster.py -r N-38-45.img -t N_38_45_img_rb_128 -o N_38_45_img_rb_128.sql --index --srid 32638 -k -m 128x128 -O -M -v

This is quite a long process, so make sure you have some tea/coffee. On my test platform, it took 2904 s (~48 min). At the end, the script generates a report of the work done, showing how many tables will be created in the database after the script is loaded and how many tiles (blocks) will be in each table:

------------------------------------------------------------
Summary of GDAL to WKT Raster processing:
------------------------------------------------------------
Number of processed raster files: 1
List of generated tables (number of tiles):
1 N_38_45_img (103152)
2 o_2_N_38_45_img (25872)
3 o_4_N_38_45_img (6468)
4 o_8_N_38_45_img (1638)
5 o_16_N_38_45_img (420)
6 o_32_N_38_45_img (110)
7 o_64_N_38_45_img (30)
8 o_128_N_38_45_img (9)

The gdal2wktraster.py script produced the file N_38_45_img_rb_128.sql, which took up — hold on tight :-) — 13,564,596,564 bytes (~12.6 GB). There you go.

Since all the necessary preparations were made at the PostGIS configuration stage, the only remaining step is to load this monstrous script into the database.:

psql -f N_38_45_img_rb_128.sql -U postgres -d postgis

Loading the SQL script took 1952 s (32 min), and the database cluster grew from 38,748,476 bytes (~36.9 MB) to 7,283,127,972 bytes (~6.78 GB).

PostGIS testing is over, let’s move on to ArcSDE.

PostGIS vs ArcSDE: raster load speed test (part 1)

22.06.2009 09:02 ·  GIS  ·  postgis, wktraster, arcsde

Before moving on to the actual testing, I will describe the process of configuring PostgreSQL + PostGIS + WKTRaster. Since PostgreSQL has been built from source, all the work normally performed by the installer (creating users, initialising database clusters, etc.) has to be done manually.

The compilation process was described in the previous post. Here I will assume that you are using Windows XP Pro; PostgreSQL, GEOS, Proj, PostGIS and WKTRaster are already compiled, and everything is in the c:\postgres directory.

Let’s go!

Read more ››

matplotlib for OSGeo4W

15.06.2009 08:24 ·  GIS  ·  osgeo4w, matplotlib

There is a Python extension called matplotlib. It is very convenient and functional, allowing you to easily create and display various graphs and charts. It generates high-quality images and supports adding captions, including various special characters. I used this extension when I needed to output a histogram in my plugin.

That’s why I decided to create a package for the OSGeo4W installer. I read the instructions, experimented on a virtual machine, and here is the result: matplotlib is now available via the OSGeo4W network installer.

PostGIS vs ArcSDE: raster load speed test (preparation)

09.06.2009 12:03 ·  GIS  ·  postgis, wktraster, arcsde

Recently, PostGIS has received support for raster data and the ability to load images directly into the database through the WKTRaster extension. This is one of the features that previously fell short compared to ArcSDE.

As soon as raster support became available, it was natural to want to compare PostGIS and ArcSDE. When I saw a forum topic about it, I immediately volunteered to help.

Today I spent most of the day preparing: I downloaded source archives, read installation instructions, and compiled all necessary components. There were a few pitfalls: first, PostgreSQL 8.3.7 refused to compile, saying that utf8_and_shift_jis_2004.o could not be built. After some investigation, I found that the following files are missing

../src/backend/utils/mb/conversion_procs/utf8_and_shift_jis_2004/utf8_and_shift_jis_2004.c
../src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/euc_jis_2004_and_shift_jis_2004.с

More precisely, they are present, but not in the src directory where the compiler looks for them, but in a completely different directory. After moving these files to the correct directory, the compilation was successfully completed. I described the compilation process in detail in the previous post.

All other components were compiled without any issues, the only trouble was that the archive with the SVN version of PostGIS turned out to be “broken”, so I had to re-download it.

The test data set has been downloaded, and all the components have been built. Now I am waiting for the test instructions.

To be continued…