Algorithm for getting US zip codes from gis x, y coordinates

I have a database of many tens of thousands of events that occurred at specific geographic locations in the United States. Data includes x, y coodinates for each event encoded using the NAD83 reference system. I want to write or use an algorithm to reliably get the US zip code associated with each x, y coordinate of NAD83.

I do not yet have postal code definitions using the NAD83 reference system. And I did not do such programming before. But it just seems that it would be intuitively simple to find out if the given x, y coordinate is in the geometrical form of the American postal code determined using the same NAD83 reference frame.

Can someone help me with the following:
1.) Where can I get reliable US Zip Code definitions in the format of the NAD83 reference system?
2.) Where can I find sample code for an algorithm to search for the zip code given by x, y coordinate?

Any links you can send to study articles / tutorials, sample code, and NAD83 zip code definitions will be really helpful. I do Google searches, but I decided that the people on this site could give me more expert guidance.

I have Java code every day. But if the code you provide is not written in java, I could take the code written in another language and adapt it to java for my purposes. I do not have the database software installed on my computer because I just use csv or text files as tabs in my java applications. If you have a database that you propose to use, I need links to instructions on how to get the data in a format that I can import into a programming language, such as java.

Finally, the street addresses in my dataset do not contain postal codes, and street addresses are randomly written, so it would be very difficult to try to clear the address data in order to try to get the postal codes from addresses. I can isolate the data in several neighboring cities, possibly in a few hundred postal codes, but I think the NAD83 x, y coordinates are my best shot at getting the zip code where every event in my dataset happened. I want to associate the resulting zip code with the analysis of the zip code with other data that I receive about each zip code from sources such as the US Census, etc.

Thanks in advance to everyone who wants to help.

+6
source share
3 answers

I don't know where to get the zip code, but I think you can delete it, the ZIP code of each state .

and to question (2) you will first need geographical information, i.e. the boundary of each state . then you simply list all the points (x, y) and determine which polygon it is in.

Here is a sample code, it was written for SGU124 .

 #include <map> #include <cstdio> #include <cstring> #include <algorithm> #define MAXN 10005 using namespace std; struct pnt{ int x,y; }; struct seg{ pnt a,b; } s[MAXN]; int n; pnt p; int h[MAXN<<1]; int k[MAXN<<1]; void work(){ int i,x,y,c = 0; memset(h,0,sizeof(h)); memset(k,0,sizeof(k)); for (i=0;i<n;i++){ if (s[i].ax<=px && px<=s[i].bx && s[i].ay<=py && py<=s[i].by){ printf("BORDER\n"); return; } if (s[i].ax==s[i].bx){ x = s[i].ax; y = py - px + x; if (x<=px && s[i].ay<=y && y<=s[i].by){ h[x+MAXN] = 1; if (y==s[i].ay) k[x+MAXN] |= 1; else if (y==s[i].by) k[x+MAXN] |= 2; } } else{ y = s[i].ay; x = px - py + y; if (x<=px && s[i].ax<=x && x<=s[i].bx){ //printf("%d %d %d %d\n",s[i].ax,s[i].ay,s[i].bx,s[i].by); h[x+MAXN] = 1; if (x==s[i].ax) k[x+MAXN] |= 4; else if (x==s[i].bx) k[x+MAXN] |= 8; } } } for (i=px;i>=-10000;i--){ //if (h[i+MAXN]>0) printf("@ %d %d\n",i,k[i+MAXN]); if (k[i+MAXN]!=9 && k[i+MAXN]!=6) c += h[i+MAXN]; } //printf("p @ %d %d ",px,py); if (c%2) printf("INSIDE\n"); else printf("OUTSIDE\n"); } int main(){ freopen("sgu124.in","r",stdin); int i; while (~scanf("%d",&n)){ for (i=0;i<n;i++){ scanf("%d%d",&s[i].ax,&s[i].ay); scanf("%d%d",&s[i].bx,&s[i].by); if (s[i].ax>s[i].bx || s[i].ay>s[i].by) swap(s[i].a,s[i].b); } scanf("%d%d",&p.x,&p.y); work(); //break; } return 0; } 
+1
source

You can use GeoTools in java. Below is an example of finding a point in a shapefile.

 // projection/datum in SR-ORG:7169 (GCS NAD83) File shapeFile = new File("zt08_d00.shp"); FileDataStore store = FileDataStoreFinder.getDataStore(shapeFile); SimpleFeatureSource featureSource = store.getFeatureSource(); // Boulder, CO Filter filter = CQL.toFilter("CONTAINS(the_geom, POINT(-105.292778 40.019444))"); SimpleFeatureCollection features = featureSource.getFeatures(filter); for (SimpleFeature f : features) { System.out.println(f.getAttribute('NAME')); } 

I grabbed a shapefile from a collection of the US Census Bureau 5-digit areas for counting zip codes from the 2000 census. I just used one file for the state of colorado. You need to combine them into one FeatureSource . The execution of these outputs is 80302 for Boulder, CO.

GeoTools also allows you to convert between projections , if necessary. Fortunately, these shapefiles are already in NAD83.

+4
source

You mentioned that you have addresses that you could use. In this case, the address verification service will allow you to programmatically find postal codes based on address and city / state. Even if they are poorly formatted, these addresses can lead to 90 or 95% of your goal, as a result of which the rest will either clear or process or try to use the coordinates to determine.

SmartyStreets will take the downloaded CSV file with your data and check the address verification (correct and standardize the address), and then check the addresses using USPS data. One unique feature of SmartyStreets is that they charge nothing for bad addresses. This will allow you to format and process the various permutations of each address (to try to explain random data) and pay for it only if a positive match is resolved.

In the interest of full disclosure, I am the founder of SmartyStreets . We provide proof of address.

0
source

Source: https://habr.com/ru/post/905375/


All Articles