What is the best approach for interpreting text input for geocoding purposes?

Consider the following site:

http://maps.google.com

It has a basic text input where the user can type in business, countries, provinces, cities, addresses and postal codes. I wonder what is the best way to implement such a search. I understand that, perhaps, Google Maps uses full-text search with all kinds of data in one table and has the ability to have a parser that classifies input (for example, between digital, such as zip codes and coordinates, and text, for example, business and addresses).

When disseminating data in many tables and systems, a parser is required. The parser can be built from regular expressions or can be built using IA tools, such as artificial neural networks and genetic algorithms.

Which approach would you recommend?

+3
source share
3 answers

It’s best to aggregate data from all your tables into a search index. Lucene is a free search engine similar to how the Google search engine (inverted index) works, and should allow you to search on any of these values ​​or any combination of them with relative ease.

http://lucene.apache.org/java/docs/

Lucene ( , Google -). - Lucene - . ( ... , ), , .

Lucene , ( ), . Lucene, ( ), . , Lucene , .

+3

. , , , .., , .

, , :

:

, , (, ), , IP , , , :

:

, . .

P.s , .

+1

, . , , , , .. , , , . , . , , - , areacode ..

You can also try some naive approximation approaches, for example, 6 consecutive numbers are likely to be area code. Look for common words, such as “road,” “restaurant,” “street,” etc., which will be part of many queries, and then use some approximation to find out what they are looking for. Hope this helps.

0
source

Source: https://habr.com/ru/post/1708688/


All Articles