Fuzzy street search using MySQL Fulltext (or sphinx?)

I have a database table full of addresses from geocoding Google Maps. Google reduces all directions (West → W, East → E, etc.).

Therefore, if I enter an address such as “100 West Pender Street,” then the formatted address returned by Google Maps will be “100 W Pender St,” which I insert into my table.

Now, if the user comes in and looks for this address, all of the following should match:

western express street 100 pender 100 w pender 100 western waybill

and they more or less act. The “w” in the table is ignored, however, as it falls below the minimum word length. addresses that fall on the eastern penner get equal weighting in the search results ("E" is also ignored).

What is the best way to handle this?

I suspect a minimum word length of 1 is a "bad thing."

I could search and replace known abbreviations (N, E, S, W, St, Ave, Dr, etc.) in google addresses and replace them with their extensions - but there are some street names where this is not valid (in some cities have single-letter street names: J Street, etc.)

Also addresses like “123 160 St” are not searchable at all, because the street number (123) and street name (160) both fall below the minimum word length.

Is MySQL FullText right for this? Sphinx offers something better?

Or is there another solution that I have not yet considered? Keep in mind that the user's search query will be matched not only with the property address, but also with other text columns, such as the name and description of the property.

+6
source share
1 answer

This is really an incredibly difficult problem - if you are on your own. I work in the address verification industry at a company called SmartyStreets , where our products fulfill the task you described. This is a complex sequence of operations that match search queries for valid, even endpoints. The accreditation of performing an address search is exactly, correctly, and completely called CASS Certification.

The difference between Google results and CASS-Certified results is that Google’s algorithms are “the best.” This is what Google is good at ... unfortunately, this refers to addresses that are also not very efficient. (See: http://answers.smartystreets.com/questions/269/why-did-the-address-fail-validation-it-looks-good-to-me )

Fuzzy searches with MySQL will produce results, and your code may have algorithms to help, but there is no guarantee of accuracy or reliability, or in this case even cost.

I don’t think you want your users to get the wrong addresses in response to their request. It makes your service look subfield and users don’t get the expected value (right?) ... I suggest you find a CASS software provider. For example, you can check Google’s “address verification” - the best web solution I can recommend is the LiveAddress API.

0
source

Source: https://habr.com/ru/post/900463/


All Articles