What is the best street address normalization?

Today I have a table containing:

Table a -------- name description street1 street2 zipcode city fk_countryID 

I have a discussion on what is the best way to normalize this in terms of quick search. For instance. find all rows filtered by city or zip code. The proposed new structure is as follows:

 Table A -------- name description fk_streetID streetNumber zipcode fk_countryID Table Street -------- id street1 street2 fk_cityID Table City ---------- id name Table Country ------------- id name 

. The question is to have only one field for the name of the street instead of two.
My argument is that having two feilds is considered normal for supporting international addresses.

The pro argument is that it will be worth the cost of performance when searching and possible duplication.

I am wondering what is the best way to go here.

UPDATE

I aim to have 15,000 brands associated with 50,000 stores, where 1,000 users will perform multiple searches every day via the Internet and iPhone. In addition, I will have 3 parties receiving data from the database for their sites.

The site has not yet been launched, so we have no idea about the workload. And we will have about 1,000 brands associated with about 4,000 stores when we start.

+4
source share
4 answers

I think the surest example is a path, possibly with a third free form field:

 name description street1 street2 street3 zipcode city fk_countryID 

The only thing you can normalize halfway for international addresses is the zip code (although this should be a free-form field) and the city. Street addresses change too much.

+1
source

My standard advice (with years of data storage / BI experience) is here:
always saves the smallest level of broken parts , that is, the parameter of several fields.

In addition to this, depending on your needs, you can add indexes or even a compound field, which are two other concatenated fields - although be sure to support it with a trigger, not manually, or you will have data synchronization and quality problems.
Part of the correct answer for you will always depend on your actual use. Can you ever expect to use an address in the standard (2-line) format for mailing ... or sharing with other objects? Or is it really a clean read-only database that is just configured for queries and not used for more standard address needs such as mailing lists.

A At the end of the day, if you have problems with query performance, you can add additional structures, such as compound fields, indexes, and even other tables with the same data in a different form. Then there are also server level caching options if performance is slow. If you are building a complex or intensive transport site, most likely you will get a product that will help in any case, for example, in the world of Ruby programming people use sphinx thinking. If query performance is still a problem and your data is growing, you may you will end up having to consider non-sql solutions like MongoDB .

One final principle that I also adhere to is: think about how people will update the data if this happens on this system. When people first enter data and then subsequently edit this information, they expect the information to be “the same”, so any internal manipulation that actually changes the form or content of user input will become a major headache when trying to allow them to do simple editing. I have seen insanely complex algorithms for encoding and decoding data this way, and they often have problems.

+2
source

Note that high normalization means more joins, so in each case this will not lead to an accelerated search.

+1
source

As others noted, address normalization (or “standardization”) is most effective when the data is combined into one table, but the individual parts are in separate columns (for example, in your first example). I work in the address verification field (on SmartyStreets) and you will find that standardizing addresses is a very difficult task. There is more documentation for this task: https://www.smartystreets.com/Features/Standardization/

With the volume of requests that you will process, I highly recommend that you ensure that the addresses are correct before deploying. Processing the address list and removing duplicates, standardizing formats, etc. A CASS certified vendor (such as SmartyStreets, although there are others) will provide this service.

0
source

Source: https://habr.com/ru/post/1381683/


All Articles