Python Pandas "apply" a series of returns; cannot convert to dataframe

Ok, I'm halfway there. I am geocoding data with geophysics. I wrote a simple function to enter the name of the country - the country and returned the latitude and longitude. I use apply to run the function and return a Pandas series object. I cannot convert it to a data frame. I'm sure I am missing something obvious, but I'm new to python and still RTFMing. BTW, the geocoding function works great.

# Import libraries import os import pandas as pd import numpy as np from geopy.geocoders import Nominatim def locate(x): geolocator = Nominatim() # print(x) # debug try: #Get geocode location = geolocator.geocode(x, timeout=8, exactly_one=True) lat = location.latitude lon = location.longitude except: #didn't work for some reason that I really don't care about lat = np.nan lon = np.nan # print(lat,lon) #debug return lat, lon # Note: also tried return { 'LAT': lat, 'LON': lon } df_geo_in = df_addr.drop_duplicates(['COUNTRY']).reset_index() #works perfectly df_geo_in['LAT'], df_geo_in['LON'] = df_geo_in.applymap(locate) # error: returns more than 2 values - default index + column with results 

I also tried

 df_geo_in['LAT','LON'] = df_geo_in.applymap(locate) 

I get a single dataframe without an index and one colume with a series in it.

I tried a number of other methods, including 'applymap':

 source_cols = ['LAT','LON'] new_cols = [str(x) for x in source_cols] df_geo_in = df_addr.drop_duplicates(['COUNTRY']).set_index(['COUNTRY']) df_geo_in[new_cols] = df_geo_in.applymap(locate) 

which returned the error after a long time:

ValueError: Columns must be the same length as the key

I also tried manually converting the series to a framework using the df.from_dict(df_geo_in) method without success.

The goal is to geocode 166 unique countries and then attach them to the 188K addresses in df_addr. I try to be pandas -y in my code and not write loops if possible. But I did not find the magic for converting series to dataframes, and this is the first time I tried to use apply.

Thanks in advance - ancient C programmer

+6
source share
2 answers

I assume df_geo is a single column df, so I believe the following should work:

changes:

 return lat, lon 

to

 return pd.Series([lat, lon]) 

then you should be able to assign like this:

 df_geo_in[['LAT', 'LON']] = df_geo_in.apply(locate) 

What you tried to do was assign the result of applymap two new columns, which is incorrect here, since applymap designed to work on each element in df, so if lhs doesn't have the same expected shape, t gives the desired result.

Your last method is also incorrect because you drop duplicate countries, and then expect this to lead each geolocation country back, but the shape is different.

Most likely, for large dfs, a non-duplicated df geolocation will be created, and then combine this back with your large df as follows:

 geo_lookup = df_addr.drop_duplicates(['COUNTRY']) geo_lookup[['LAT','LNG']] = geo_lookup['COUNTRY'].apply(locate) df_geo_in.merge(geo_lookup, left_on='COUNTRY', right_on='COUNTRY', how='left') 

this will create a df with non-duplicated countries with geodata addresses, and then we will perform a left merge back to master df.

+7
source

It is always easier to test some sample data, but try the following zip function to see if it works.

 df_geo_in['LAT_LON'] = df_geo_in.applymap(locate) df_geo_in['LAT'], df_geo_in['LON'] = zip(*df_geo_in.LAT_LON) 
0
source

Source: https://habr.com/ru/post/984471/


All Articles