Open Street Map XML data

Posted on 4th May 2017


I want to process large amounts of XML data from Open Street Map (OSM). I.e. that obtained from GeoFrabrik or OSM.Planet. For smaller snapshots, do look at OSMnx.

My pure-Python project to read and process OSM data, currently a work in progress, can be found on GitHub, as "OSMDigest".

The XML format is documented on the OSM Wiki. There is no formal schema, but the data you can download seems to be of quite a constrained type:

  • Start with an <osm> element giving the "version", "generator" and "timestamp".
  • Then a <bounds> element giving the rectangle in latitude/longitude coordinates which encloses the data.
  • Following this, elements of three types. (They seem to appear in the order given here, though this I guess is unimportant). Each of these elements contains some common attributes: "id" giving the OSM id (which is unique within each type), the (optional) "user", "uid"; giving the user who last modified the object, the "timestamp" of last modification, the edit "version" (which increases on each edit) and the "changeset" number. There is also a "visible", but in the downloaded data which I've seen, this is always either missing, or "true".
  • <node> specifies a point on the planet, and has attributes "lon", "lat" for coordinates. May contain 0 or more <tag> sub-elements.
  • <way> specifies a path. Contains, in order, <nd> sub-elements referencing nodes, and 0 or more <tag>s.
  • <relation> specifies some logical relationship between other objects (e.g. the route of a bus, the area enclosing woodland, traffic instructions such as "no left turn here"). Contains <member> sub-elements referencing the other objects which make up the relationship, and 0 or more <tag>s.
  • Then we have three sub-elements which never contain further elements themselves:
  • <tag> which is a key/value pair, stored as attributes "k" and "v".
  • <nd> which references a node and contains just the attribute "ref"
  • <member> which contains attributes "ref", "type" and a (maybe empty, but always present) "role" describing what role the member has in the relationship.

The meaning of ways and relations is defined by the tags present. For more details see: - Way article. Things rapidly get complicated. A way which starts and ends at the same node is a "closed" way, and are often, but not always, treated as Areas. For example, a closed way tagged "highway=footway" is assumed to be a circular pathway, unless we also have the tag "area=yes" in which case it is a pedestrian plaza. But "landuse=forest" is always an "area" even without the "area=yes" tag. - Relation article and types of relation. - Possible keys and values can be found here: Key descriptions by group and Map features.


Categories
Recent posts