New Landmark's in IT

An event marking an important stage of development or a turning point in history.

Thursday, May 18, 2006

GIS and Spatial Extensions with MySQL

MySQL 4.1 introduces spatial functionality in MySQL. This article describes some of the uses of spatial extensions in a relational database, how it can be implemented in a relational database, what features are present in MySQL and some simple examples.

Often the spatial functions are called geographic information system (GIS) functions, because GIS applications are the most obvious use-case for the spatial functionality. The spatial functions can be used both to provide a means to organize GIS data together with more traditional types of data and to represent non-GIS data with a spatial attribute.

GIS applications then and now
A GIS (geographic information system) stores and looks up objects which have one or more spatial attributes, such as size and position, and is used to process such objects. A simple example would be a system that stores addresses in a town using geographic coordinates. If this rather static data was then combined with other information, such as the location of a taxi-cab, then this data could be used to find the closest cab to a certain location. Many GIS applications are very specialized, in areas such as the mapping of real-world objects, map creation and meteorology.

Among the many attributes of a general GIS is that the objects may have multiple dimensions, and that complex shapes are supported. Typically two-dimensional objects with nearly unlimited complexity need to be supported.

Another attribute of a general GIS is that it supports combining objects and looking for different types of overlaps, such as looking for points that are contained within a given geometrical object. Lastly, a GIS provides a means of organizing objects in layers, i.e. REGIONs inside DISTRICTs.
A good example of an application where GIS is of importance and which also has real-world implications is in the area of meteorology. When it comes to producing weather maps we see every day on the TV screen, meteorology is a complex science. Very advanced and high-powered IT systems that are concerned with the actual number-crunching are combined with large databases. The issue is not that data is not available, the issue is what data to use and how, and how the different types of data are matched. The data from the number-crunching systems is supplied in the form of large files with multi-dimensional arrays of prognosticated data. The issue now is, how can a specific data object be matched with another, and provide a relevant prognosis of the weather in the coming five days.

In other words, we have data covering at least three dimensions (two-dimensional geography plus a height of a measurement for example, maybe combined with a time dimension).
What is obvious from the above discussion is that a database that only works on simple data types, such as INTEGER or DECIMAL, will just not be good enough for this type of application. The geographical shapes might be any type of polygon that then needs to be matched with some other object, i.e. what is the total area of two objects (which is not necessarily the sum of the areas of the two objects, as they might overlap).

The traditional solution to all this has been to use specialized database systems that use proprietary spatial indexing, and proprietary interfaces. These systems are complex though, and not only that, any GIS stores some non-GIS data (for example, a meteorological database might store a temperature measurement at a given geographical location, height and time as a DECIMAL). And for non-GIS data, SQL has become the standard. As applications for GIS have expanded beyond what goes on in labs and high-powered scientific institutions and are found in more traditional applications, such a taxicab scheduling systems, the need for a more standardized solution has exploded. Another factor here is the availability of inexpensive and easy to use GPS systems which may be combined with mobile communication systems, so that getting a real-time position from, a taxi-cab, a ship on the sea or a police car is no longer complex or expensive.
OpenGIS - Linking SQL with GIS functionality
The desire to combine SQL with GIS was initially driven by the developers of GIS applications desiring to expand into other areas, more than by SQL database vendors getting into the GIS market.

The driving organization behind opening up GIS to a broader market and making GIS technologies available everywhere the Open Geospatial Consortium (OGC). OGC has been around for about 10 years and is a non-profit organization that works on many areas of GIS and provides specifications of interoperability with many other standards. One such standard that OGC has provided is a specification for interoperability SQL databases.
This specification, in short, defines extensions to a SQL based relational database to allow for GIS objects and operations. There are four important areas here:
  • Data types. There needs to be data types to store the GIS information. This is best illustrated with an example, a POINT in a 2-dimensional system.
  • Operations. There must be additional operators to support the management of multi-dimensional objects, again, this is best illustrated with an example, a function that computes the AREA of a polygon of any shape.
  • The ability to input and output GIS data. To make systems interoperable, OGC has specified how contents of GIS objects are represented in binary and text format.
  • Indexing of spatial data. To use the different operators, some means of indexing of GIS data is needed, or in technical terms, spatial indexing.

In addition to the above, there is also a need for GIS metadata, and in some cases for using different coordinate systems. MySQL currently supports a planar coordinate system. The other major coordinate system in use is the geocentric one, i.e. a coordinate system on the Earth's surface, which is not yet supported by MySQL.

Non-GIS use for MySQL spatial extensions
There are uses for spatial extensions outside of the pure GIS world. Spatial data is not limited to maps or to the Earth's geography. The definition of spatial data and operations is wide, and even though MySQL follows the OpenGIS specification to a large extent, this specification does not limit the use of spatial data to GIS applications.

Any type of data that has more than one dimension can be treated as a spatial entity. The specification does not limit the axis on the coordinate system to any particular unit, such as inches or centimeters, and each axis does not necessarily have to represent the same unit: they are just numbers. Of course, certain functionality assumes that the X and Y axis have similar properties, such as the Area() function.

Fact is, a lot of real-world data has two dimensions if we look at it this way, in particular if we assume that one dimension is a date. For example we could store the maximum and minimum price for a stock as a POINT, where the X axis is the date and the Y axis is the price. We can then use the spatial functions to check for overlaps and intersections with other stocks.

The OpenGISĀ® Simple Features specifications for SQL
This specification is what is used as the ground for almost all implementations of GIS-functions within an SQL-based relational database. This standard defines the data types, operations, input and output format, functions and much more. This is the standard that is followed by almost all SQL databases with spatial extensions, including MySQL.

MySQL GIS Datatypes (abstract types in gray)
The data types start from the most generic at the top of the hierarchy, GEOMETRY, to a number of specific types, such as POINT and LINESTRING. Some of the data types are "abstract" in the sense that you cannot create objects of this type, such as the GEOMETRY type. This does not mean that you cannot have a column of the type GEOMETRY, just that you cannot have any value in that column of the type GEOMETRY, but you can have values of any other spatial type in that column. Among the abstract object types, only GEOMETRY can be used as a column type.

Among the functions that can be performed on spatial objects are functions that evaluate the proximity of objects, such as Overlap(), functions that combine two spatial objects to create another spatial object, such as Envelope() and functions that perform conversion to and from text and binary formats, such as AsText() and GeomFromText(). As any spatial object can be treated as a GEOMETRY object, any function that operate on a GEOMETRY can operate on any other type of spatial object, such as a POINT, which doesn't necessarily mean that this operation always is meaningful.

The specification includes the definition of two formats for external representation of spatial data, WKB (Well Known Binary) and WKT (Well Known Text). This allows data to be exported and imported in binary and text formats. To complement these formats, there are functions that convert between the WKB and WKT formats and all of the spatial data types.
Spatial Indexing
Spatial data may be indexed, just like other data in MySQL. To make this effective, a special type of indexing is used for spatial data called R-tree ("R" stands for Region) indexing. The involves organizing the minimum bounding rectangle (MBR) of the spatial objects in a tree structure that is then used by the different spatial functions. There are a few variations of R-tree indexing, MySQL uses R-trees with quadratic splitting, which is one of the standard methods of building an R-tree index.

An R-tree index is similar to a B+-tree in many ways, and organizes the indexed nodes in a hierarchy where the nodes in the index represent the MBR of the objects in the node. The leaf nodes in the index contain references to the row that contain the data, just like a B+-tree index.

Current Spatial features in MySQL
In the current implementation of spatial extensions in MySQL, parts of the OpenGIS Simple Features have been left out. Among these are:
  • Functions - Some functions are missing, among them some that deal with creating new spatial data values from existing ones, such as Union() (not to be confused with the SQL UNION operator) and Intersection() (not to be confused with the Intersect() spatial function).
  • GIS Metadata - The OpenGIS Simple Feature specification defines certain metadata tables for GIS data.
  • SRID - A Spatial Reference Identifier (SRID) defines the properties of the coordinate system. MySQL stores the SRID of any spatial objects and it can be extracted, but it is not used and the coordinate system is always supposed to be planar.

Except the above, all of the features of the OpenGIS Simple Features specification are included in MySQL 4.1 Spatial Extensions, support for the missing features will be added in future MySQL versions.