I am new to pandas. I have a date exam conducted by the sponsor and company:
import pandas pd
df = pd.DataFrame({
'sponsor': ['A71991', 'A71991', 'A71991', 'A81001', 'A81001'],
'sponsor_class': ['Industry', 'Industry', 'Industry', 'NIH', 'NIH'],
'year': [2012, 2013, 2013, 2012, 2013],
'passed': [True, False, True, True, True],
})
Now I want to display a CSV file with a line for each sponsor and his class, as well as columns for the passage and total rates for years:
sponsor,sponsor_class,2012_total,2012_passed,2013_total,2013_passed
A71991,Industry,1,1,2,1
A81001,NIH,1,1,1,1
How do I get out dfof this restructured frame? It seems to me that I need to group by sponsorand sponsor_class, and then rotate the total counter, and the counter, for which it passedis equal Trueby year, and then smooth these columns. (I know I'm ending up pd.write_csv(mydf).)
I tried to start with this:
df_g = df.groupby(['sponsor', 'sponsor_class', 'year', 'passed'])
But it gives me an empty data frame.
It seems to me that I need a pivot table somewhere to display the year and convey the status ... but I do not know where to start.
UPDATE . Somehow:
df_g = df_completed.pivot_table(index=['lead_sponsor', 'lead_sponsor_class'],
columns='year',
aggfunc=len, fill_value=0)
df_g[['passed']]
(1), , passed (2) CSV.