How to change python 3 pandas.DataFrame tabular form?

I have a pandas data frame counted and grouped by specific columns.

import pandas as pd
df = pd.DataFrame({'x':list('aaabbbbbccccc'),'y':list('2225555577777'), 'z':list('1312223224432')})
#
df.groupby(['x','y','z'])['z'].count()
# or
df.groupby(['x','y','z'])['z'].agg(['count'])
# or
df.groupby(['x','y','z'])['z'].count().reset_index(name='counts')

Results:

   x  y  z  counts
0  a  2  1       2
1  a  2  3       1
2  b  5  2       4
3  b  5  3       1
4  c  7  2       2
5  c  7  3       1
6  c  7  4       2

How to convert the result to the following form?

   x  y 1 2 3 4
0  a  2 2 0 1 0
1  b  5 0 4 1 0
2  c  7 0 2 1 2
+4
source share
3 answers

You will need to use unstack+ reset_index:

(df.groupby(['x','y','z'])['z']
   .count()
   .unstack(-1, fill_value=0)
   .reset_index()
   .rename_axis(None, axis=1)
)

   x  y  1  2  3  4
0  a  2  2  0  1  0
1  b  5  0  4  1  0
2  c  7  0  2  1  2

Note that you can replace df.groupby(['x','y','z'])['z'].count()with df.groupby(['x','y','z']).size()for compactness, but be careful what sizeNaNs also considers.

+4
source

Sort of crosstab

pd.crosstab([df.x,df.y],df.z).reset_index()
Out[81]: 
z  x  y  1  2  3  4
0  a  2  2  0  1  0
1  b  5  0  4  1  0
2  c  7  0  2  1  2
+4
source

PROJECT/ KILL <-- (: overkill )


Pandas factorize . pd.factorize , , .

, , - Numpy bincount.

"" , , "bin". np.bincount , 0:. , , bin . ? , " ". , " ". . ""

tups = list(zip(df.x, df.y))
i, r = pd.factorize(tups)
j, c = pd.factorize(df.z)
n, m = len(r), len(c)
b = np.bincount(i * m + j, minlength=n * m).reshape(n, m)

pd.DataFrame(
    np.column_stack([r.tolist(), b]),
    columns=['x', 'y'] + c.tolist()
)

   x  y  1  3  2  4
0  a  2  2  1  0  0
1  b  5  0  1  4  0
2  c  7  0  1  2  2

z

, Pandas factorize. Numpy unique . , . , np.unique ( ). O(n * log(n)) . , Nump .

z, , OP. , , . Numpy, sort pd.factorize

tups = list(zip(df.x, df.y))
i, r = pd.factorize(tups)
j, c = pd.factorize(df.z, sort=True)
n, m = len(r), len(c)
b = np.bincount(i * m + j, minlength=n * m).reshape(n, m)

pd.DataFrame(
    np.column_stack([r.tolist(), b]),
    columns=['x', 'y'] + c.tolist()
)

   x  y  1  2  3  4
0  a  2  2  0  1  0
1  b  5  0  4  1  0
2  c  7  0  2  1  2
+3

Source: https://habr.com/ru/post/1695219/


All Articles