How do I match different email addresses?

I have a requirement to match US mailing addresses during the import process. The problem is that the address bar can be entered in several ways. Example:

123 Main Street

123 Main St.

123 Main St

How to standardize the address so that I can perform the mapping? We import 10,000 addresses at a time, so I don’t want to use a service like Google, Yahoo or USPS. Is there an open or commercial library for address standardization that is not a COM component? I don't care if the address is real or not, all I care about is matching.

+4
source share
1 answer

This type of thing is very complex. Some companies are completely based on providing this feature.

I would not recommend this; existing libraries and services exist for this:

https://www.usps.com/business/address-management-products.htm

http://smartystreets.com/products/liveaddress-api

If these are not parameters, and if the link link (http://stackoverflow.com/questions/824588/address-match-key-algorithm) does not help you, you will basically have to cook everything down to some common denominator. for example, divide the line into its component parts (street number, street number suffix, unit / number number, street name, street type and street direction). Then convert all possible abbreviations for each (if applicable) to this common denominator. In the case of the type of street "St." You can choose a “street” for a common denominator, in which case you convert “St.” or “St” to “Street”, then make any match - if all the data in your database contains a “street” for this type of street .

+6
source

Source: https://habr.com/ru/post/1432787/


All Articles