How to smooth / combine multiple columns with similar information based on one index column in pandas?

Question

How to smooth / combine multiple columns with similar information based on one index column in pandas?

I had a question about smoothing or dropping a data frame from several columns in one row with key information in several rows with the same key column and corresponding data. Suppose a dataframe looks something like this:

df = pd.DataFrame({'CODE': ['AA', 'BB', 'CC'],
              'START_1': ['1990-01-01', '2000-01-01', '2005-01-01'],
              'END_1': ['1990-02-14', '2000-03-01', '2005-12-31'],
              'MEANING_1': ['SOMETHING', 'OR', 'OTHER'],
              'START_2': ['1990-02-15', None, '2006-01-01'],
              'END_2': ['1990-06-14', None, '2006-12-31'],
              'MEANING_2': ['ELSE', None, 'ANOTHER']})
  CODE     START_1       END_1  MEANING_1     START_2       END_2 MEANING_2
0   AA  1990-01-01  1990-02-14  SOMETHING  1990-02-15  1990-06-14      ELSE
1   BB  2000-01-01  2000-03-01         OR        None        None      None
2   CC  2005-01-01  2005-12-31      OTHER  2006-01-01  2006-12-31   ANOTHER

and I need to get it in a form like this:

  CODE       START         END    MEANING
0   AA  1990-01-01  1990-02-14  SOMETHING
1   AA  1990-02-15  1990-06-14       ELSE
2   BB  2000-01-01  2000-03-01         OR
3   CC  2005-01-01  2005-12-31      OTHER
4   CC  2006-01-01  2006-12-31    ANOTHER

I have a solution as follows:

df_a = df[['CODE', 'START_1', 'END_1', 'MEANING_1']]
df_b = df[['CODE', 'START_2', 'END_2', 'MEANING_2']]
df_a = df_a.rename(index=str, columns={'CODE': 'CODE',
                                'START_1': 'START',
                                'END_1': 'END',
                                'MEANING_1': 'MEANING'})
df_b = df_b.rename(index=str, columns={'CODE': 'CODE',
                                'START_2': 'START',
                                'END_2': 'END',
                                'MEANING_2': 'MEANING'})
df = pd.concat([df_a, df_b], ignore_index=True)
df = df.dropna(axis=0, how='any')

This gives the desired result. Of course, this does not seem very pythonic and clearly not perfect if you have more than two groups of columns that need to be collapsed (I actually have 6 in my real code). I studied the methods groupby(), melt()and stack(), but have not yet found them very useful. Any suggestions would be appreciated.

+4

python pandas

DrPiranoid 04 . '18 0:35

5

, melt

df1=df.melt('CODE')

df1[['New','New2']]=df1.variable.str.split('_',expand=True)
df1.set_index(['CODE','New2','New']).value.unstack()
Out[492]: 
New               END    MEANING       START
CODE New2                                   
AA   1     1990-02-14  SOMETHING  1990-01-01
     2     1990-06-14       ELSE  1990-02-15
BB   1     2000-03-01         OR  2000-01-01
     2           None       None        None
CC   1     2005-12-31      OTHER  2005-01-01
     2     2006-12-31    ANOTHER  2006-01-01

+3

Wen 04 . '18 3:38

. , , common_cols, var_cols, data_count.

common_cols = ['CODE']
var_cols = ['START', 'END', 'MEANING']
data_count = 2

dfs = {i: df[common_cols + [k+'_'+str(int(i)) for k in var_cols]].\
          rename(columns=lambda x: x.split('_')[0]) for i in range(1, data_count+1)}

pd.concat(list(dfs.values()), ignore_index=True)

#   CODE       START         END    MEANING
# 0   AA  1990-01-01  1990-02-14  SOMETHING
# 1   BB  2000-01-01  2000-03-01         OR
# 2   CC  2005-01-01  2005-12-31      OTHER
# 3   AA  1990-02-15  1990-06-14       ELSE
# 4   BB        None        None       None
# 5   CC  2006-01-01  2006-12-31    ANOTHER

0

jpp 04 . '18 0:48

.

# the following line get rid of _x suffix 
df = df.set_index("CODE")
df.columns = pd.Index(map(lambda x : str(x)[:-2], df.columns)
pd.concat([df.iloc[:, range(len(df.columns))[i::2]] for i in range(2)])

Dataframe - Pandas

2 . 6, OP.

pd.concat([df.iloc[:, range(len(df.columns))[i::6]] for i in range(6)])

0

Tai 04 . '18 1:15

:

df.columns = [i[0] for i in df.columns.str.split('_')]
df = df.T
cond = df.index.duplicated()
concat_df = pd.concat([df[~cond],df[cond]],axis=1).T
sort_df = concat_df.sort_values('START').iloc[:-1]
sort_df.CO = sort_df.CO.ffill()

0

thomas.mac 04 . '18 1:15

Scott Boston · Accepted Answer · 2018-02-04T01:28:05+0000

pd.wide_to_long:

pd.wide_to_long(df, stubnames=['END', 'MEANING', 'START'],
                i='CODE', j='Number', sep='_', suffix='*')

:

                    END    MEANING       START
CODE Number                                   
AA   1       1990-02-14  SOMETHING  1990-01-01
BB   1       2000-03-01         OR  2000-01-01
CC   1       2005-12-31      OTHER  2005-01-01
AA   2       1990-06-14       ELSE  1990-02-15
BB   2             None       None        None
CC   2       2006-12-31    ANOTHER  2006-01-01

/ dropna, , . df.reset_index().drop('Number', 1).

How to smooth / combine multiple columns with similar information based on one index column in pandas?

More articles: