Expand python data panel

I am trying to expand the following data. I am a Stata user, and my problem can be fixed with the "fillin" command in stata, now I am trying to rewrite this command in python and could not find any command that works.

For example: converting this data frame: (my dataframe is bigger than the example, this example just illustrates what I want to do)

id year X Y 1 2008 10 20 1 2010 15 25 2 2011 2 4 2 2012 3 6

to that

id year X Y 1 2008 10 20 1 2009 . . 1 2010 15 20 1 2011 . . 1 2012 . . 2 2008 . . 2 2009 . . 2 2010 . . 2 2011 2 4 2 2012 3 6 thank you and sorry for my english

+4
source share
2 answers

This can be done using .loc[]

from itertools import product
import pandas as pd

df = pd.DataFrame([[1,2008,10,20],[1,2010,15,25],[2,2011,2,4],[2,2012,3,6]],columns=['id','year','X','Y'])
df = df.set_index(['id','year'])

# All combinations of index
#idx = list(product(df.index.levels[0], df.index.levels[1]))
idx = list(product(range(1,3), range(2008,2013)))

df.loc[idx]
+2
source

Create a new multi-index from the data framework and then reindex

years = np.tile(np.arange(df.year.min(), df.year.max()+1,1) ,2)
ids = np.repeat(df.id.unique(), df.year.max()-df.year.min()+1)
arrays = [ids.tolist(), years.tolist()]
new_idx = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=['id', 'year'])

df = df.set_index(['id', 'year'])

df.reindex(new_idx).reset_index()

    id  year    X       Y
0   1   2008    10.0    20.0
1   1   2009    NaN     NaN
2   1   2010    15.0    25.0
3   1   2011    NaN     NaN
4   1   2012    NaN     NaN
5   2   2008    NaN     NaN
6   2   2009    NaN     NaN
7   2   2010    NaN     NaN
8   2   2011    2.0     4.0
9   2   2012    3.0     6.0
+1
source

Source: https://habr.com/ru/post/1688548/


All Articles