A proper tutorial on the theory behind geography, cartography, and geodesy and how to use GIS for fictional maps has been on my todo list for a while, but I thought I'd make a bit of a stop gap with this thread. It moves kind of fast and leaves out a lot, and simplifies things in places so it's not necessary true, but close enough to get by with.


Geographic Information Systems (GIS) are software for storing, managing, processing, analyzing, and presenting data which incorporates a geographic component. Or put another way, it deals with things and where they are. It contrasts with CAD (Computer Assisted Design) which deals more with the design of individual things or sets of things, but doesn't pay much attention to where they are. In very rough terms, you would use a GIS to deal with a country or a city, but a CAD to deal with a house or a refrigerator. Both share properties with graphics software, particularly vector graphics software, but differ in a number of key ways. GIS is also quite diverse in the forms it takes: there are desktop GIS applications, GIS servers of various forms, and various specialized tools.

The big name in desktop GIS is ESRI ArcGIS (http://www.esri.com/software/arcgis/index.html). This is to GIS what Adobe Photoshop is to raster graphics. It comes in a number of license levels of varying cost and there are a wide assortment of supporting extensions available. The most basic license level without any extensions costs 1,500 USD. I doubt there's anyone here willing to pay for ArcGIS just for fantasy mapping, and anyone who already has access to it, doesn't need this thread, so I'll move on from ArcGIS.

Thankfully, there's a healthy Free/Open Source GIS community which has produced a great selection of GIS software. The Open Source Geospatial Foundation (OSGeo, http://www.osgeo.org/) is a major hub and supports many of these projects. They also provide an easy installer for an assortment of FOSS GIS software under Windows called OSGeo4W (http://trac.osgeo.org/osgeo4w/) this can not only install software, but also manages it and can help keep it up to date (It's based on the APT package manager from Debian GNU/Linux).

Probably the most advanced GIS available as open source is GRASS. It's also a bit old and has an interface that most Windows users will probably find incomprehensible. MacOS users will probably die of shock. If you don't like The GIMP, you really won't like GRASS. This really isn't for newbies, and you probably won't need the extra capabilities it offers.

Next up is QuantumGIS (QGIS, http://www.qgis.org/) This looks a lot more like what MacOS/Windows users are used to and resembles ArcGIS in a number of ways. This is my primary Desktop GIS. it has an extensive library of plugins written in C++ and Python, and is quite capable of doing the majority of things a GIS needs to do.

There are a number of other FOSS GISes though I am not that familiar with them: uDig, gvSIG, and OpenJump. Google Earth also has some very rudimentary GIS capabilites, and there are a number of Web based GISes, often fairly specialized.

Now, to store GIS data, there need to be special GIS file formats. These tend to be related to and in some cases are derived from graphics formats. Like graphics formats, there are both raster and vector GIS formats.

Vector data is usually handled using the model of a relational database. Each class of things to be handled is a "feature set" equivalent to a relational database "table". If you aren't familiar with relational databases, a table/feature set is much like a simple spreadsheet with each row being a feature, a 'thing', with a number of fields (the column), one of the fields can be thought of as holding a shape. Almost all GIS featuresets use a a single unique feature id ('fid') attribute as a key. The geometry of the shapes is far more strict than in vector graphics. Most GIS formats only support straight segments, and restrict self intersection, particularly of polygons. Generally a feature set will only allow one kind of shape, points, line-strings (several segments joined together in a sequence), or polygons. There are also 'multipoints', 'multilines', and 'multipolygons' which are sometimes treated as equivalent to or separate from their singleton counterparts.

The dominant vector format is the somewhat confusingly named ESRI Shapefile, which is actually several files. An individual "shapefile" has at least three files, and possibly more. The first is the shape data file (.shp), the second is an index file (.shx), and the third is a dBase IV data file containing the non-spatial attributes, most will also have at least a projection file (.prj) describing the coordinate system of the shapes. All these files need to be in the same directory and have the same name.

The other big vector format is the XML based open standard GML. This was designed to be stupendously flexible and to integrate into existing XML data. As a result it's almost impossible to implement in a truly general way and so it is usually restricted to capabilities and data model similar to that of a collection of shapefiles (one GML file can hold multiple feature sets). In theory, an XML schema file is needed to explain how a particular GML file works, but in practice software often just guesses by looking at the file itself.

Other formats include GeoRSS which is RSS with geographic data added, GeoJSON which is encoded using the JSON interchange format, KML used by Google Maps/Earth, and plain spreadsheet type files containing coordinates.

Raster data is more like familiar raster graphics formats, in fact the most common format is the GeoTIFF, which can sometimes be opened in normal graphics software as an ordinary TIFF. The difference is that GIS raster data isn't restricted to small integers, the data doesn't necessarily represent colour, there is extra metadata indicating the coordinate system of the file, and they are often designed to handle very large files efficiently. In the particular case that the data stored is elevation, the file is called a Digital Elevation Model (DEM), which is similar to greyscale "heightfields" used in graphics. Another common data set is satellite images, which often use a different set of "colour" bands: Landsat 7 for instance records a "panchromatic" (greyscale) image over the visible bands, plus red, green, and blue, as well as near, mid, and thermal infrared; SPOT5 on the other hand has panchromatic, green, red, and near infrared (No blue, though the panchromatic band includes blue.).

Now, I've mentioned Projections and Coordinate Systems, and I need to explain those briefly. The earth is not flat, we've known this since at least 240 BC thanks to Eratosthanes and his well experiment (Christopher Columbus just thought it was a lot smaller than everyone else did, and he was wrong.) Maps on the other hand, are flat. If you've ever tried to flatten out an orange peel or a burst balloon, you know that you need to do some tearing or stretching to make it work. Projections are a way to do that stretching (and sometimes tearing) to make a flat map. If you look at it in terms of numbers, rather than shapes, then it's often called a coordinate system instead (Sometimes a "projected coordinate system" to distinguish from "geographic coordinate systems" which I'll explain next). The stretching means that no matter what projection you use, there is going to be distortion, but you can control what kind of distortion and where it is by choosing the right projection.

Underneath projections you have to worry about the actual shape of the Earth which is formally called the "geoid". This is best described as a fat, slightly lumpy pear shape. it is close to being a sphere, and very close to being a flattened sphere (an oblate spheroid), but it bulges a bit "north heavy" and has lumps. This lumpy shape is not very easy to work with, so the oblate spheroid is usually used, but our measurements of it vary both with our ability to measure it, and as the Earth itself changes. Also, by shifting and distorting that spheroid, we can match it up to the geoid more closely for certain areas. This leads to many, many different spheroids we can use, and these are called datums (Not "data" even though that's normally the correct plural for "datum"), sometimes they are also called "geographic coordinate systems" as opposed to "projected coordinate systems". In the case of another planet (unless it's exactly the same size as Earth), you'll have to define your own datum which is easier than it sounds.

So, now you should either by completely lost, or you should have just enough understanding to be able to actually do something in a GIS, and have some idea of what that something is. In my next post I'll try to explain how to do some simple things using QuantumGIS including loading a shapefile, creating and editing a shapefile, adjusting projections, and defining new projections for a custom world.