Why Addresses Need “Map Location IDs” in a database or data warehouses.
Let’s say we need unique identifiers for map locations. Off hand, you might guess that a normal address makes for a perfect identifier. After all, with geocoding applications, we can easily translate an address to a place on the map. So it would seem like a perfect identifier. However, as I’m sure you have guessed by now, there are lots of use cases for which an address does not make a great identifier of a map location, at least from a database or data mart perspective, or even from an analytical perspective.
The operative phrase here is Map Location ID. Fundamentally, an address is a Postal unit, a place where mail is delivered. It’s a string of characters, created by humans, and therefore subject to lots of different ways to spell and present it. Commonly, the address is synonymous with a single structure, like a house or a small commercial building. On the other hand – and where Map Location IDs become important – things like office and apartment buildings have lots of different addresses, one for each unit of mail delivery (but not necessarily one for each suite). Further, for lots of use cases, the address also represents a collection of structures, all with the same address or each with its own, and that have something in common. For example, big hospital conglomerates might consist of dozens of buildings on the same parcel of land. Yet they all share a single mail address since the Post Office delivers the mail to the hospital’s mail room.
For lots of businesses and operations, the physical location and structures of its customers are fundamental to the way it does business. There’s even a term for this kind of operation: a facilities-based operation. Some examples: Certain kinds of insurance companies; utilities; real estate developers and managers; telecommunications providers; and commercial pest control providers. You might say of these kinds of businesses, “they sell to a business but provide the service to a building.” These kinds of operations can benefit from attaching Map Location IDs to addresses and other kinds of locations that it serves, for the purpose of better organizing them, either in a database application and/or data mart.
Some use cases:
- A large collection of warehouses, each with its own address. As an analyst I would want to review pest infestations and treatments for the complex as a whole, so I would want to query all of the structures in the complex. With a common ID associated with all the addresses and structures in the facility, I would not need to know in advance each individual address.
- Hospital complexes frequently occupy many buildings. As a telecommunications provider or a utility, I would need to know the location of each building on the property. A geocode of its address would only give me the location of the centroid of the land parcel and would tell me nothing about the individual structures there. With the right ID structure, I could avoid this confusion.
- Office buildings and shopping centers might contain hundreds of tenants. Moreover, every tenant has its own mailing address. If I’m laying fiber to serve the building, that fiber likely serves the entire building. Some tenants would buy the provider’s products; some would not. In any case, as tenants come and go, the fiber and its capabilities remain in place. A proper ID structure would let me maintain the connection between the fiber availability in the building and incoming tenants.
- Entertainment and shopping districts increasingly are relocating from sixties and seventies era shopping malls to sections of cities. Often, these areas are equivalent to neighborhoods, but not always. Sometimes new “neighborhoods” pop up out of thin air or out of a PR’s rep’s imagination, and hence aren’t really mapped. (“NoMa, Washington, DC” anyone!) In any case, the district boils down to a collection of distinct addresses and buildings in what would appear from the air to be a somewhat arbitrarily defined area of town. With a properly structured identifier, I could easily locate all the properties, businesses and/or people in the area without necessarily having to have use mapping technology.
Components of the Map Location ID.
There’s a few ways to do this. Of course the best way depends on your needs. I’ll stick with a method that we like. It involves a four part identifier.
Administrative Unit: This is the top level of the identifier. Usually, its something political or administrative. The assumption is there’s something significant about this organizational unit and you always or at least very frequently need to filter on it. Plus, it has to be a level for which you always know the value. For example, we could use the State or Province at the top level because, lets face it, if you don’t know the State or Province, then you are truly lost. On the other hand, the top level could be a region unique to your organization, such as “the west region.”
Collection ID: This component is used to group together related addresses. My use case of the entertainment district is one example. Another could be a simple strip shopping center in which all of the store fronts have been assigned different address numbers. If you were to geocode the addresses, they all would be assigned different map coordinates. Yet they are all in the same structure. As still an another example, consider a medical or other kind of office park, in which each building has its own address, and therefore map coordinates. In both cases, all the addresses effectively represent a single location.
We default the value to zero, to mean the location is not part of a collection. Most run-of-the-mill commercial building and single family detached houses would have a collection ID of zero. Of course, for residential addresses in the suburbs, you could identify a subdivision of tract homes by a collection number, but that’s only if sub- divisions have a particular significance (like those with home owners associations, perhaps). For most flexibility, you really want collection numbers to be unique across all administrative units, to allow collections to be moved around.
￼￼Address ID: This component is the core of the map location ID. It corresponds to a geocode-able address and/or physical location. It should always be unique across all collections. This way, addresses can be rearranged among collections without conflict. Same goes for the being unique across administrative units.
The simplest way to administer the ID values is to maintain a key database that hands out new values as needed. Think of a simple table of a single column of type integer, with an auto-incrementing number like you get from the identity function in Microsoft SQL Server.
Structure ID: Structures are individual buildings on the same parcel. There are lots of examples in which a mailing address corresponds to multiple structures. For example, universities and modern hospital complexes often have multiple buildings, yet all are identified by a single address. By using a structure number, you have the means of distinguishing individual buildings in the complex, even assigning distinct map coordinates to each. We always default the structure value to 0. This entry corresponds to the address of the complex, which probably is its mailing address. Its this address that you would pass to a geocoder to obtain map coordinates. Structures numbered past zero correspond to individual buildings at this address.
Internally in a database, we store each component in its own column. That of course allows us to index and select at any combination of levels. For presentation of the identifier, we like the dot structure:
Administrative Unit . Collection ID . Address ID . Structure ID
Here’s an example implementation, in which the Collection ID shown represents an entertainment and commercial district. Addresses outside of the district, and indeed not part of any district, have a collection ID of zero. A couple of the addresses have two structures, the main one and a secondary one (which in one case is a parking deck).
You might wonder why can we not use the map coordinates themselves (i.e, the longitudes and latitudes) as the unique map location identifier, or at least as the Address ID? Since they are numeric, you might think they make the perfect key.
￼￼￼￼￼￼￼￼However, they fail as a identifier for many reasons. First, the actual values are somewhat arbitrary. Recall that a map coordinate represents a single point in or adjacent to a parcel of land. Ask five different geocoders for the map coordinates of an address and you will get five different coordinates for the same parcel, since there is no one true way to assign it. Further, the map data behind geocoders continuously improves. Ask the same geocoder for the coordinates of an address over five years, and you will get five different coordinates. Also, some unique addresses share the same map coordinates (like when different floors of the same building happen to have been assigned different street numbers). Finally, there’s the reality that in any geocoding operation significant quantities of addresses will not get assigned to accurate geocodes (because of new construction, poor data, or because its an unoccupied, non-mailable structure, for example).
Added benefits of a common Map Location ID.
Map-like query capabilities without the need for GIS tools. Geographic information system (GIS) technology lets you easily query clusters of locations off a map. Most people do not find the technology easy to use, plus it can be expensive to distribute widely. When clusters of locations are important to you, properly structured map location keys let a business intelligence analyst, for example, mimic map queries without having the need for sophisticated map query tools. Plus, with the use of keys, you always get the same selection. Plus, if you do employ mapping technology, the IDs give you a means of visualizing related addresses and locations.
Geocode the same address only once each refresh cycle. Over the years, we have run into some large organizations in which different operations each maintain their own database of locations. Often, there’s a huge amount of overlap between the locations in each database. Worse, each operation independently validates, standardizes, and geocodes the addresses. Since there’s no guarantee that they all use the same products, schedules, and geocoding rules, there’s a strong likelihood that each operation produces different map data. By implementing and distributing common map location identifiers, you can avoid the duplication of effort, plus maintain consistency across the organization.
The hard part: how to initialize the Map Location IDs on your addresses and facilities.
It’s actually not as difficult as you might assume. We’ve done it enough times to know. I’ll have to cover the technique in another post, but I’ll give you a hint. To start, you will need a sophisticated, yet off the shelf, address standardization and geocoding product. This will help you automate a large part of the initial effort (as well as long term maintenance). You will also need management guidance on prioritization. Because building out the structure IDs, and to some extent the collections, requires some human effort, you will want to rank order the collections and addresses by importance. Finally, another hint is that you need to budget for data maintenance, because stuff does in fact move.
We love maps and how they expose spatial relationships in data. Sometimes, though, the map itself gets in the way. That’s especially true if you are a facilities based operation, or if you repeatedly work with clusters of locations that have something in common besides the ZIP Code. The use of a well defined and maintained Map Location ID can expose important spatial relationships in a database even if you don’t have access to map.