Pandas nested for insert multiple data loop into different created data frames

I am new to data science and am currently improving my skills. I used a dataset from kaggle and planned how to present the data and ran into a problem.

What I was trying to achieve was to insert data into different data frames using a for loop. I saw an example of this and used a dictionary to save data frames, but the data in the data frame is overwritten.

I have a list of data frames:

continents_list = [african_countries, asian_countries, european_countries, north_american_countries,
          south_american_countries, oceanian_countries]

This is an example of my data frame from one of the continents:

    Continent   Country Name   Country Code    2010    2011    2012    2013    2014
7    Oceania      Australia         AUS        11.4    11.4    11.7    12.2    13.1
63   Oceania         Fiji           FJI        20.1    20.1    20.2    19.6    18.6
149  Oceania     New Zealand        NZL        17.0    17.2    17.7    15.8    14.6
157  Oceania   Papua New Guinea     PNG         5.4     5.3     5.4     5.5     5.4
174  Oceania   Solomon Islands      SLB         9.1     8.9     9.3     9.4     9.5

First, I selected the entire row for the country that has the highest rate for the year:

def select_highest_rate(continent, year):
    highest_rate_idx = continent[year].idxmax()
    return continent.loc[highest_rate_idx]

for, , :

def show_highest_countries(continents_list):
    df_highest_countries = {}
    years_list = ['2010','2011','2012','2013','2014']
    for continent in continents_list:
        for year in years_list:
            highest_country = select_highest_rate(continent, year)
            highest_countries = highest_country[['Continent','Country Name',year]]
            df_highest_countries[year] = pd.DataFrame(highest_countries)
    return df_highest_countries

: ,

: () ? ?

+4
1

, , , 2010-2014 :

df_highest_countries[year] = pd.DataFrame(highest_countries)

, :

df_highest_countries[continent+str(year)] = pd.DataFrame(highest_countries)

finaldf = pd.concat(df_highest_countries, join='outer').reset_index(drop=True)

for, , melt groupby. . pivot_table , .

df = pd.concat(continents_list)

# MELT FOR YEAR VALUES IN COLUMN
df = pd.melt(df, id_vars=['Continent', 'Country Name', 'Country Code'], var_name='Year')

# AGGREGATE HIGHEST VALUE AND MERGE BACK TO ORIGINAL SET
df = df.groupby(['Continent', 'Year'])['value'].max().reset_index().\
        merge(df, on=['Continent', 'Year', 'value'])

# PIVOT BACK TO YEAR COLUMNS
pvt = df.pivot_table(index=['Continent', 'Country Name', 'Country Code'],
                     columns='Year', values='value').reset_index()
+1

Source: https://habr.com/ru/post/1674090/


All Articles