I'm looking for direction on a design quandry that's challenging our database designers. I have searched the archives and Google and yielded nothing conclusive, though I suspect this is a simple/novice type of issue.
We have a table that stores event locations. It is the called the Location table. The table requires that the LocationCode be unique. The table also requires that the LocationName be unique. However, there are times in the real world where two locations will have the same name though they exist in different places (e.g. two schools with the same name that exist in different counties in the same state). One designer insists the names should be forced to be different by appending a value to the name like the county name (e.g. Union High School - Francis County). Her rationale for this is about ensuring "controlled vocabulary". Another designer insists the names should be allowed to be the same because this reflects reality and LocationCodes should be used to enforce/track uniqueness.
I'm leaning in the direction of the second designer -- the names should be allowed to be different. A comparable example that I can think of is the fact that people often share the same name (e.g. Jane Doe). In a table that stores contact information about people, it would seem the names wouldn't be forced to be different and a social security code would do the job of tracking uniqueness.
So are there general guidelines or standards for this issue? A link to helpful documentation would be fantastic. Thanks in advance.
I'm leaning in the direction of the second designer -- the names should be allowed to be different.
I think you meant to say that you think the names should be allowed to be the same, that your table should allow duplicate names.
In a table that stores contact information about people, it would seem the names wouldn't be forced to be different and a social security code would do the job of tracking uniqueness.
While that's true in a simple case, in real-world applications you find that
Unambiguously identifying people is both a hard problem and an application-dependent problem. The local Bureau of Motor Vehicles has its ways of identifying people; the IRS has its ways; employers have their own ways; schools and hospitals have their own ways (privacy laws have a big impact here). You will probably need to find your own way.
Here's what you simply can't avoid: at some point, somebody at your company has to be able to tell which of many database rows that say "John Smith" match up with the angry customer in front of your desk.
I look at location names like this. The full name of "San Francisco" is "San Francisco, Calif., USA". The full name makes it easy to distinguish "Clinton, Miss." from "Clinton, Iowa".
I have found cases where two distinct places have the same informal name. For example, there might be a town named "Natural Bridge, Tenn.", and another place in Tennessee that is a natural bridge also named "Natural Bridge, Tenn." The difference is that one of those places is invariably a city, and the other invariably isn't. (In my experience, at least. I wouldn't be surprised to discover exceptions.)
Whether any of those real-world issues matter to you, though, is application-dependent. I don't have to deal with place names that aren't cities, so storing the "full name" is a simple and direct way to insure good data.
In your case, for event locations, I can't see allowing duplicate names being practical in a business sense. If the names are duplicated, you essentially require people to memorize which location code refers to each real-world location, given its name as a hint. With a database of any significant size, people can't do that reliably without writing down the differences anyway.
This is not a question about correct design. It's a question about information requirements. It's the analysis phase that should discover which is correct, not the design phase. It's important to distiguish between analysis and design, even when the same team is doing both
If names are allowed to be duplicates, then users are going to require some other piece of informaton when they need to disambiguate. Providing the distinct IDs may not be sufficient for the users, because the iDs may be meaningless to them.
Get back to the people who ordered the project. What do they want? If they want whatever the users are going to want, interview the users, or somebody that can speak for them. If nobody can speak for the users, and if the owners of the project refuse to speak for themselves, then you are reduced to taking your best guess.
I don't disagree with Catcall (+1), but I think there is a third way which might allow you to have your cake and eat it too.
One of the possible reasons for disliking the unique naming scheme is that it might be bothersome to some people that some locations include a qualifier while others don't.
For many situations, using the indistinct name may be good enough, while for others, you need certainty about exactly which name you mean. Why not store the basic location name and the qualifier in separate columns. If the qualifier is something like the county containing the location, then you could store this for every location anyway. This would give you the option of working with either the simple (but non-unique) names when that is good enough and working with fully qualified names when that is necessary.
One way of attempting this is along with the location having extra column's that record longitude/latitude of the place. Always and every time the name of the location, followed by longitude and latitude is unique.
Essentially the table structure would be something like,
Location(location_name, location_longitude, location_latitude)
Trying to defend the choice I made is best is relative to what is available to me as a programmer as well as the limitations I have to solve the problem. So it really doesn't matter until it solves the problem efficiently.
Note: The latitude/longitude can be obtained by writing a web-service to any of the geo-location providers.
There are standard codes for almost everything within USA.
State County FIPS Codes give a mail code to almost every inhabitable building the US Census has found.
For example Mellissa: http://www.melissadata.com/reference-data/zipdata.htm
Check out ZIP CODE lists that can be added to a table that will find almost everything a postage stamp can find.
International addreses or locations are a challenge even in better than 3rd World Countries - especially if it has a large agrarian population. Until recently, in Costa Rica for example, not only were there no house numbers, there were/are few streets that are formally named and recorded. You address the letter and the postman uses a bit of creativity to find the recipient. An address like "Jesus Ochoa, 200 meters from OLD Coke Cola Factory, Street with Biggest Oak Tree, Third yellow house with green shutters" were common 20 years ago. If they paint the house the address is changed!! And the OLD Coke factory has been gone for years.
If a country has a standard location service similar to FIPS, there is a good chance they started numbering from one as well will be using he same numbers. Don't expect unique numbering systems.
To make it unique preface the list with a Standard Country Code. Careful the ISO 3166-1 (Universal standard) has over lapping code structures.