I have a pandas framework with two columns (snippet below). I am trying to use the City column to display Borough (you will notice some Unspecified values that need to be replaced). To do this, I try to show each city the highest city and display it in a dictionary where the key will be a city and the value will be the highest place for this city.
City Borough Brooklyn Brooklyn Astoria Queens Astoria Unspecified Ridgewood Unspecified Ridgewood Queens
So, if Ridgewood is paired with Queens 100 times, Brooklyn 4 times and Manhattan 1 time, the couple will be Ridgewood: Queens.
So far I have tried this code:
specified = data[['Borough','City']][data['Borough']!= 'Unspecified'] paired = specified.Borough.groupby(specified.City).max()
At first glance, this looked like the right conclusion, but after a more thorough examination, the exit was incorrect. Any ideas?
EDIT:
I tried the following sentence: paired = specified .groupby ("City"). agg (lambda x: stats.mode (x ['Borough']) [0])
I noticed that some of the Boroughs came out of truncated, as shown below:
paired.Borough.value_counts() #[Out]# QUEENS 58 #[Out]# MANHATTAN 7 #[Out]# STATEN ISLAND 4 #[Out]# BRONX 4 #[Out]# BROOKLYN 3 #[Out]# MANHATTA 2 #[Out]# STATE 1 #[Out]# QUEEN 1 #[Out]# MANHA 1 #[Out]# BROOK 1
Of course, I can simply manually replace the truncated words, but I am curious to find out what is the reason?
PS - Here is the DF output indicated by FYI:
specified #[Out]# <class 'pandas.core.frame.DataFrame'> #[Out]# Int64Index: 719644 entries, 1 to 396225 #[Out]# Data columns: #[Out]# Borough 719644 non-null values #[Out]# City 651617 non-null values #[Out]# dtypes: object(2) specified.Borough.value_counts() #[Out]# QUEENS 215382 #[Out]# BROOKLYN 208565 #[Out]# MANHATTAN 150016 #[Out]# BRONX 94648 #[Out]# STATEN ISLAND 51033