To protect privacy, we restrict precise map coordinates in our data mart. What’s a map analyst to do?

To protect customer privacy, organizations sometimes do not allow in their data marts elements that can be used to identify specific customers, elements such as addresses. This is particularly true if their customer base includes consumers (as opposed to businesses). I guess some employees are tempted to search through address records looking for names of famous people.

Recently some people have suggested that even map coordinates should be barred from data marts. With reverse geocoders now being publicly available, its become very easy to derive an address from a map coordinate. Therefore, the thinking goes, providing map coordinates to the organization is akin to providing protected addresses. Yet if you omit map coordinates from your data mart, you deprive analysts of valuable data. Without map coordinates, heat maps no longer are possible. Proximity analyses are not feasible. Even trading area analyses can become difficult.

As an alternative, geographical keys such as ZIP+4s and Census Block IDs are acceptable in data marts, because they relate to more than one household. Therefore, these keys provide a level of privacy protection.

A safe, convenient, and cheap alternative to address specific map coordinates (and ZIP+4 coordinates) are the centroids of Census Blocks (at least in the USA). Per the Census Bureau, Census Blocks are defined as:

“…statistical areas bounded by visible features, such as streets, roads, streams, and railroad tracks, and by nonvisible boundaries, such as selected property lines and city, township, school district, and county limits and short line-of-sight extensions of streets and roads. Generally, census blocks are small in area; for example, a block in a city bounded on all sides by streets. [emphasis added] Census blocks in suburban and rural areas may be large, irregular, and bounded by a variety of features, such as roads, streams, and transmission lines. In remote areas, census blocks may encompass hundreds of square miles. Census blocks cover the entire territory of the United States, Puerto Rico, and the Island Areas. Census blocks nest within all other tabulated census geographic entities and are the basis for all tabulated data.”

Census Blocks have properties favorable to heat mapping and proximity studies. While the boundaries of Census Blocks themselves are not drawn with regard to population, the parent geography, the Block Group, typically has between 600 and 3000 people. Therefore, Census Blocks can be quite small. Typically, the higher the population density in a region, the smaller the Census Block area. Of course, smaller Census Block sizes make for tighter centroid spacing. This makes it less likely that the use of Census Block centroids instead of address coordinates could substantially distort heat maps (especially if you weight the centroids by say customer counts).

The examples below show the Census Blocks from a couple of residential areas in the Austin, Texas region. Each image depicts the same size area, roughly one mile across. One is near the city center (shown on the right), and one is in a more suburban-like setting to the west of the city center (on the left). The green dots represent the centroids; the green lines the Census Block boundaries.

Low Density High Density

Here’s a wider area image for the Austin region, this one 20 miles across. In this image, the Census Blocks are color coded by population density – the lighter the shading (towards yellow), the higher the density. Downtown and the University of Texas lie in the lower right quadrant.

Population Density

To get the Census Block centroids into your database, the simplest means is to compute them from the Census Block boundaries. The boundaries are available as a public domain product from the Census Bureau. Since they are available in “shape file” format, it’s a straightforward process to load the boundaries into your Microsoft SQL Server database. (Read up on some methods of importing the data from this post.)

If you include a computed column for the centroid of the Census Block polygon in your Census Block table, the centroid will be readily available with no extra computation. The SQL code for the table might look like so:

CREATE TABLE [FederalData].[Census2010BlockGeog]
[CensusCode] [varchar](15) NOT NULL,

[CentroidLongitude]  AS (CONVERT([decimal](11,6),round([GEOGRAPHY]::STGeomFromText([GEOMETRY]::STGeomFromText([GeogBoundary].[STAsText](),(4326)).STCentroid().STAsText(),(4326)).Long,(6)))),

[CentroidLatitude]  AS (CONVERT([decimal](11,6),round([GEOGRAPHY]::STGeomFromText([GEOMETRY]::STGeomFromText([GeogBoundary].[STAsText](),(4326)).STCentroid().STAsText(),(4326)).Lat,(6)))),

[GeogBoundary] [geography] NOT NULL,

[GeogCentroid]  AS ([GEOGRAPHY]::STGeomFromText([GEOMETRY]::STGeomFromText([GeogBoundary].[STAsText](),(4326)).STCentroid().STAsText(),(4326))) PERSISTED

Chances are, if you already geocode your customer addresses, then you’ve also already added to them Census Block IDs (which is the column labeled CensusCode in my script). Commercial geocoders such as those from Pitney Bowes and Melissa Data append Census Block IDs to geocoded addresses.

Most likely, some (hopefully minor) fraction of your addresses in your database geocode to something worse than the street address level. For example, if a specific address is not recognized by the geocoder, then the map coordinates assigned to it likely would be at the ZIP+4 or even ZIP Code level. In such cases, the coordinates already are sufficiently vague to prevent reverse-geocoding. Therefore, you can safely allow these less accurate map coordinates into your data mart.

Furthermore, these lower quality geocodes typically will not be assigned to a Census Block. This fact makes it simple to automate the choice of map coordinates to include in your data mart. In the following code snippet, a customer record would be assigned to a Census Block centroid if it’s geocoded to the street address level, and to the inaccurate geocode itself if that geocode is worse than the street address level (and therefore, no Census Block was assigned).

COALESCE(b.CentroidLongitude, c.AddressLongitude, NULL) as Longitude,
COALESCE(b.CentroidLatitude, c.AddressLatitude, NULL) as Latitude,
FROM [MyCustomers] c
LEFT OUTER JOIN [CENSUS BLOCKS] b on c.[CensusCode] = b.[CensusCode]

If your privacy rules forbid map coordinates from being included in your data mart, all if is not lost. Centroids of Census Block provide a meaningful, readily available, and inexpensive alternative to street address level geocodes, at least in the USA.

Daniel Brasuk
Topics Covered: