In the pandas frame, I have a column that looks like this:
0 M 1 E 2 L 3 M.1 4 M.2 5 M.3 6 E.1 7 E.2 8 E.3 9 E.4 10 L.1 11 L.2 12 M.1.a 13 M.1.b 14 M.1.c 15 M.2.a 16 M.3.a 17 E.1.a 18 E.1.b 19 E.1.c 20 E.2.a 21 E.3.a 22 E.3.b 23 E.4.a
I need to group the whole value, where the first elements are E, M, or L , and then for each group I need to create a subgroup, where is the index 1, 2, or 3 , which will contain an entry for each lowercase letter (a, BC, ...) Potentially, the solution should work for any number of levels uniting the elements (in this case, the number of levels is 3 (for example: A.1.a))
0 1 2 E 1 a b c 2 a 3 a b 4 a L 1 2 M 1 a b c 2 a 3 a
I tried:
df.groupby([0,1,2]).count()
But as a result, there is no level L, since there are no entries at the last sublevel
The workaround is to add a dummy variable and then delete it ... for example:
df[2][(df[0]=='L') & (df[2].isnull()) & (df[1].notnull())]='x' df = df.replace(np.nan,' ', regex=True) df.sort_values(0, ascending=False, inplace=True) newdf = df.groupby([0,1,2]).count()
which gives:
0 1 2 E 1 a b c 2 a 3 a b 4 a L 1 x 2 x M 1 a b c 2 a 3 a
Then I process the dummy x entry later in my code ...
how to avoid this trivial way to use groupby ?