I had a question about smoothing or dropping a data frame from several columns in one row with key information in several rows with the same key column and corresponding data. Suppose a dataframe looks something like this:
df = pd.DataFrame({'CODE': ['AA', 'BB', 'CC'],
'START_1': ['1990-01-01', '2000-01-01', '2005-01-01'],
'END_1': ['1990-02-14', '2000-03-01', '2005-12-31'],
'MEANING_1': ['SOMETHING', 'OR', 'OTHER'],
'START_2': ['1990-02-15', None, '2006-01-01'],
'END_2': ['1990-06-14', None, '2006-12-31'],
'MEANING_2': ['ELSE', None, 'ANOTHER']})
CODE START_1 END_1 MEANING_1 START_2 END_2 MEANING_2
0 AA 1990-01-01 1990-02-14 SOMETHING 1990-02-15 1990-06-14 ELSE
1 BB 2000-01-01 2000-03-01 OR None None None
2 CC 2005-01-01 2005-12-31 OTHER 2006-01-01 2006-12-31 ANOTHER
and I need to get it in a form like this:
CODE START END MEANING
0 AA 1990-01-01 1990-02-14 SOMETHING
1 AA 1990-02-15 1990-06-14 ELSE
2 BB 2000-01-01 2000-03-01 OR
3 CC 2005-01-01 2005-12-31 OTHER
4 CC 2006-01-01 2006-12-31 ANOTHER
I have a solution as follows:
df_a = df[['CODE', 'START_1', 'END_1', 'MEANING_1']]
df_b = df[['CODE', 'START_2', 'END_2', 'MEANING_2']]
df_a = df_a.rename(index=str, columns={'CODE': 'CODE',
'START_1': 'START',
'END_1': 'END',
'MEANING_1': 'MEANING'})
df_b = df_b.rename(index=str, columns={'CODE': 'CODE',
'START_2': 'START',
'END_2': 'END',
'MEANING_2': 'MEANING'})
df = pd.concat([df_a, df_b], ignore_index=True)
df = df.dropna(axis=0, how='any')
This gives the desired result. Of course, this does not seem very pythonic and clearly not perfect if you have more than two groups of columns that need to be collapsed (I actually have 6 in my real code). I studied the methods groupby(), melt()and stack(), but have not yet found them very useful. Any suggestions would be appreciated.