Clearing skipped geocoding (or general guidelines for clearing data)

I have a fairly large database of address addresses (500k +) from around the world. Although many addresses are duplicates or next to duplicates. Whenever a new address is entered, I check if it is already in the database, and if so, I take the existing lat / long and apply it to the new record. The reason I don’t refer to a separate table is because the addresses are not used as a group for searching, and they often have enough differences in the address that I want to keep in them. If I have a complete match at the address, I apply this lat / long. If not, I go to the city level and apply it, if I can’t get the match there, I have a separate process for launching.

Now that you have an extensive background, a problem. Sometimes I end up with lat / long, which is far beyond the normal acceptable error range. However, oddly enough, this is usually one or two of these lat / lengths that are out of range, while the rest of the data exists in the database with the correct city name.

How would you recommend clearing the data. I have a geonames database, so theoretically I have the correct data. What I'm struggling with is what you do to accomplish this.

If someone can point me towards some (low level) direction of data cleaning, that would be great.

+3
source share
1 answer

This is an old question, but true principles never die, right?

SmartyStreets. "", , , CASS-Certified software ( , , ​​).

USPS CASS "" "" ( : ) . , SmartyStreets LiveAddress, . , , . , API .

: , JSON ( JSON, , ). , , , SmartyStreets. , , / .

0

Source: https://habr.com/ru/post/1704618/


All Articles