Microsoft SQL Server Spatial made my dog fat!

Well, not really. But the data certainly would suggest it has. The “data” in this case, is a spatial object with a unique characteristic. Yes, the spatial object happens to look like a dog. (As a matter of fact, I do plan to keep my day job!) But the spatial object also happens to have a hidden flaw.

I created the spatial object using MapInfo Pro, a popular geographic information system product. The “dog” object actually is a spatial element on row in a table in MapInfo Pro. From MapInfo Pro, I exported the data up to a table in SQL Server 2012.

My Dog

After the export, some characteristics of the spatial object had changed. The chart in Figure 2 compares the characteristics of the dog, before and after the export; i.e., in MapInfo Pro before the export, and in SQL Server 2012 after the export. As you can see, the dog grew in size in SQL Server 2012 vs MapInfo Pro. It’s spherical area increased 125 percent. Yet, the perimeter length stayed nearly constant.

It also added a few extra nodes. Plus the structure changed, from consisting of 1 polygon to now consisting of 2. Finally, the centroid shifted over 4 kilometers. How’s that?

Dog Statistics

If you are familiar with geographic information systems, then you are familiar with the concept of “valid” geometries. In particular, a polygon object cannot consist of components that loop back over each other-a ”self-intersection” as the problem is commonly called. My dog has three self-intersections around its neck, as originally drawn.

In MapInfo Pro, running the “Check Regions” command fingers the self intersections. In Figure 3, the red push pins denote the self intersections.

Dog with self-intersecting neck

After transferring the table to SQL Server 2012, the self intersections were gone. This explains the change in characteristics. How did that happen?

When you transfer spatial data from MapInfo Pro to SQL Server 2012, the ODBC driver used by MapInfo Pro applies the MakeValid() spatial method to the spatial data. Make Valid added the extra nodes; hence the extra nodes seen in the SQL Server table. It then used the extra nodes to reconfigure the line segments so that they formed two non-overlapping polygons, one for the head and one for the body; hence the change from 1 polygon t o 2. (The extra nodes are difficult to see, because the original and the new nodes overlap perfectly on each other.)

So that explains the extra nodes and polygon. How is it that the area measurement is so different, especially if the perimeter length is virtually unchanged. The answer is simple. With the original self intersecting polygon, the area calculation was mathematically wrong. It’s a side affect of self intersections, and one of the reasons why SQL Server throws an error if you apply the STArea() method to an invalid record.

With most commercial (spatial) data products, you’ll find them to be clean of self intersections and other problems. Same is true with public domain data sources from governmental bodies such as the United States Geological Survey. Whether or not this is true with internally generated spatial data depends a lot on your organizations practices and procedures. Rather than assume, be safe and set up a mechanism that allows you to compare the spatial data before and after the load.

Daniel Brasuk