Chapter 6 Topology and Geocoding

Written by Paul Pickell

Frequently, we need spatial data to behave and relate in specific and predictable ways. Many types of analyses may expect spatial data to be represented and interact in a standard form. In this chapter, we will extend our knowledge of data models using topology, which unlocks many advanced spatial analyses. We will look at a specific example of an analysis that requires topology, geocoding, which will be a convenient segue into network analysis discussed in the following chapter.

Learning Objectives

  1. Understand the role of topology in governing data behaviour and data organization
  2. Recognize some examples and uses of 2D and 3D topologies
  3. Understand the role of bounding a set of points from triangulation and convex hulls
  4. Synthesize the process of geocoding
  5. Practice geocoding addresses and reverse geocoding addresses to other coordinate systems

Key Terms

Vertex, Node, Pseudonode, Dangle, Planar Topology, Non-Planar Topology, Geocoding, Adjacency, Overlap, Connect, Inside, Reverse Geocoding, Singlepart, Multipart, Holes, Delaunay Triangulation, Thiessen Polygons, Voronoi Diagram, Centroid, Convex Hull, Convex Alpha Hull, Multipatch

6.1 Topology

Topology describes the relationships of spatial data. This is a very broad definition that encompasses the wide range of possible arrangements of spatial data in practice. If we drill down into this concept, topology is really what allows us undertake specific types of analysis that requires or expects spatial data to behave in a certain way. If you think about the feature geometries that we have at our disposal, then there are no fewer than nine combinations of how these geometries can interact as illustrated in Figure 6.1 below.

Grid showing all the combinations of how point, line and polygon geometries can interact. Pickell, CC-BY-SA-4.0.

Figure 6.1: Grid showing all the combinations of how point, line and polygon geometries can interact. Pickell, CC-BY-SA-4.0.

It is important to recognize that there may be cases where we “expect” that a given combination of features will conform to a specific interaction. For example, the provinces and territories of Canada are typically represented as polygons that share adjacent boundaries. That is, the adjacent boundaries shared between any two provinces or territories cannot logically overlap as this representation (model) would contravene the legal definitions of the provinces and territories. In another case, a human technician may erroneously digitize a road that crosses another road without indicating that the two roads share an intersection, which could have consequences for how traffic flow can be modeled between the two roads (i.e., intersection with traffic light versus overpass). These are both examples of situations where topology is needed. Topology applies logic to define how features are expected to relate to other features in order to conform to knowledge systems like legal definitions of land and traffic flow. In short, topology ensures data integrity for other types of analysis.

6.2 Planar vs. Non-Planar Topology

In the context of topology, planar refers to the concept that all vertices of feature vector geometry are mapped onto the same plane. So in a planar world-view, all lines and polygons share coincident vertices. For example, if two polygons overlap, then the overlapping area forms a new polygon with a boundary of vertices defined by the union of the two other polygons. Also, if two lines overlap, then the two lines are divided into four new segments and a new vertex is formed at the intersection. In other words, planar topology does not allow polygons or lines to lay “underneath” or “on top” of another line or polygon and feature geometries must always be distinct.

On the other hand, non-planar topology is the concept that vertices of feature vector geometry can be mapped to different planes. It is important to emphasize here that when we are talking about planes that we are not referring to projected coordinate systems. It is generally assumed that any two spatial data layers containing feature geometries are interacting within the same projected coordinate system. Non-planar topology allows for other knowledge systems to be represented in spatial data. The case where a pipeline runs underneath a river or a territory that was traditionally used by several Indigenous peoples (Figure 6.2) are examples of valid non-planar topology.

Figure 6.2: Non-planar topology of 36 indigenous territories overlapping Vancouver Island, British Columbia. Data from Native Land (n.d.), CC0.

6.3 Implementing Planar Topology

Implementing planar topology involves defining specific rules for how features should relate to one another given some analytical context. This process also requires that the spatial data are housed a relational database or data model that supports topology. In other words, topology is enforced only by data models that support topological rules. When a topological rule is violated, the relational database identifies the contravening features and displays them on the map and in the attribute table. Then, it is up to an analyst to decide how the error should be corrected. For example, some errors like intersecting lines can automatically be split at the intersection while overlapping polygons might need to be manually edited to reflect the correct adjacency. Thus, the process of applying topology is first to work within a data model that supports topology, then choose the topological rules that reinforce a particular knowledge system, and finally to inspect and decide how to deal with any contraventions. Since planar topology is only supported by certain data models, and some data models are proprietary to certain software, the exact topological rules that can be implemented in a GIS are mostly dependent on the software that you are using. Instead of examining a specific GIS software package, we will discuss the “fundamental” planar topological relationships that are common across nearly all implementations of topology. (If you want to know more about how topology is implemented within specific data models, skip ahead to the “Data models supporting planar topology” section.)

So far, we have seen that there are six ways to combine feature geometries (points, lines, and polygons). We can extend this understanding to include at least six different ways that they can relate to one another: adjacent; overlap; intersect; connect; cover; and inside. Some of these relationships can be modeled between two different spatial layers (e.g., two point layers) or within a single spatial layer. In the following sections, we will look at different planar topological rules that apply both between and within feature geometries.

6.4 Adjacency and Overlap

There are times when we need to ensure that two polygons are adjacent to one another by sharing a common edge. If two polygons are not adjacent to one another, then a gap, known as a sliver, exists between them or they must overlap. Consider the case where we are mapping land covers. If we have a formal scheme that describes all possible land covers, then we expect that a map of land covers will have perfect adjacency between all polygons so that there are no areas that are not mapped (i.e., slivers) and that no area has multiple, overlapping land covers. Since lines are also 2-dimensional, lines can overlap other lines. Depending on the context, a topological rule may be needed to promote or prevent this relationship. For example, if you are modeling bus routes, then one road might support several different routes.

Figure 6.3: Simplified boundary of Canada (red) and the United States (blue), centred at the Rainy River boundary between Minnesota and Ontario. Without preserving topology, the output results in illogical overlaps in some places and slivers in other places. Pickell, CC-BY-SA-4.0. Data from Natural Resources Canada and licensed under the Open Government Licence - Canada.

Some examples of adjacency and overlap topological rules:

  • Polygons within the same layer must not have gaps
  • Polygons within the same layer must not overlap
  • Polygons must not overlap other polygons
  • Lines must not overlap other lines
  • Lines must not self-overlap

6.5 Intersect and Connect

As we have seen from Chapter 3, lines are often used to represent phenomena that flow, so intersection and connection are important concepts for these representations. Important to understanding how connection and intersection work in planar topology, we need to understand that lines are comprised of a set of vertices and nodes. A node is simply the terminating vertex in a set of vertices for a line. For example, suppose the line segment \(A\) has a set of vertices, \([[1,0],[1,3],[1,5]]\). Then the nodes for \(A\) are \([1,0]\) and \([1,5]\) (Figure 6.4). Since nodes define the end points of a line segment, they are key to enforcing connection rules. We will look at network analysis in more detail in the next chapter. For now, let us consider two different networks that can help us conceptualize some fundamental line topology using nodes.

Lines are always comprised of two nodes. Line A shown here has nodes at [1,0] and [1,5]. Pickell, CC-BY-SA-4.0.

Figure 6.4: Lines are always comprised of two nodes. Line A shown here has nodes at [1,0] and [1,5]. Pickell, CC-BY-SA-4.0.

A network of streams and rivers is based on the hydrological knowledge system that explains how water moves over a terrain surface. In both theory and practice, we know that water flows from higher elevations to lower elevations with limited exceptions. Thus, we expect that streams will connect with other streams and continue to flow towards some outlet such as an ocean. Connection refers to the fact that the endpoint node of one stream will fall somewhere on another stream segment. Where two line segments come together, it is possible for one segment \(A\) to “undershoot” the other segment \(B\), resulting in the end node of segment \(A\) appropriately named a dangle (Figure 6.5) and a loss of connection.

A dangle forms when a line (B) does not connect to another line (A). Pickell, CC-BY-SA-4.0.

Figure 6.5: A dangle forms when a line (B) does not connect to another line (A). Pickell, CC-BY-SA-4.0.

Dangles are the opposite case to intersections, which occur when two line segments cross each other. With planar topology, intersections must be modeled with a shared node representing the intersection location. For example, suppose line segment \(B\) has a set of vertices, \([[0,1],[2,1],[4,1]]\). If line segments \(A\) (defined above) and \(B\) are mapped together with non-planar topology, then they will intersect at \([1,2]\), which is not a vertex represented in either segment (Figure 6.6).

Line A mapped with Line B in non-planar topology. Pickell, CC-BY-SA-4.0.

Figure 6.6: Line A mapped with Line B in non-planar topology. Pickell, CC-BY-SA-4.0.

Thus, the intersection of \(A\) and \(B\) with planar topology would yield four new segments: \(C=[[0,2],[1,2]]\), \(D=[[1,2],[1,3],[1,5]]\), \(E=[[1,2],[2,2],[4,2]]\), and \(F=[[1,0],[1,2]]\). Figure 6.7 illustrates how all four of these new segments share the same node \([1,2]\) at the intersection of \(A\) and \(B\).

Line A mapped with Line B in planar topology yields segments C, D, E, and F. All segments share (1,2) as a node. Pickell, CC-BY-SA-4.0.

Figure 6.7: Line A mapped with Line B in planar topology yields segments C, D, E, and F. All segments share (1,2) as a node. Pickell, CC-BY-SA-4.0.

As well, pseudonodes can occur when a node does not actually terminate a line segment at a junction, for example, between two streams or roads. In other words, a pseudonode is a node that is shared by two lines. Figure 6.8 illustrates a pseudonode occurring at \([3,5]\).

Lines A and B share a pseudonode at [3,5], indicated in red. Pickell, CC-BY-SA-4.0.

Figure 6.8: Lines A and B share a pseudonode at [3,5], indicated in red. Pickell, CC-BY-SA-4.0.

Some examples of intersection and connection topological rules:

  • Lines must not intersect other lines
  • Lines must intersect other lines
  • Lines must not self-intersect
  • Lines within a same layer must not self-intersect
  • Lines must not have dangles

6.6 Coincident and Disjoint

Point features can be either coincident or disjoint with other point features. Point features that need to be disjoint may be representing trees, mountain peaks, or any similar type of feature that would be expected to be discrete in geographic space. There are also instances where we might need one set of point features to be coincident with another such as field plots that are centered using a tree or other spatially-discrete feature on the landscape.

Some examples of coincident and disjoint topological rules:

  • Points must be disjoint with other points
  • Points must by coincident with other points

6.7 Cover

Cover refers to planar topology where a feature lays on or within another feature. For example, dams represented as point features must be covered by a line representing a river (Figure 6.9). Similarly, lines representing rivers must be covered by polygons representing watersheds. As well, property parcel polygons must be covered by the municipal or regional tax authority polygon.

Topological relationship between dam (point) covered by a river (line), which is covered by a watershed (polygon). Pickell, CC-BY-SA-4.0.

Figure 6.9: Topological relationship between dam (point) covered by a river (line), which is covered by a watershed (polygon). Pickell, CC-BY-SA-4.0.

Some examples of cover topological rules:

  • Point must be covered by a line
  • Point must be covered by a polygon
  • Line must be covered by a polygon
  • Polygon must be covered by a polygon

6.7.1 Multipart geometry

Sometimes we need to represent several points, lines or polygons as a collection, which is known as multipart geometry. Multipart geometry allow us to represent several disjoint and non-adjacent geometries as a single feature. In this way, we can assign attribute values to the collection of features rather than each geometry individually. The territorial boundary of Canada is a good example of an instance where a multipart geometry can be useful because all of the contiguous land and non-contiguous land (i.e., islands) can be represented and associated with a single feature in the attribute table. However, if the distinction of features is important, such as identifying the names of islands in the Haida Gwaii archipelago, then a singlepart geometry should be used (Figure 6.10).