Smooth a series in pandas, i.e. A series whose items are lists

I have a number of views:

s = Series([['a','a','b'],['b','b','c','d'],[],['a','b','e']])

which looks like

0       [a, a, b]
1    [b, b, c, d]
2              []
3       [a, b, e]
dtype: object

I would like to calculate how many total elements I have. My naive landmarks such as

s.values.hist()

or

s.values.flatten()

does not work. What am I doing wrong?

+4
source share
2 answers
s.map(len).sum()

does the trick. s.map(len)applies len()to each element and returns a series of all lengths, then you can just use sumin this series.

+1
source

Personally, I like to have arrays in dataframes, for each individual element - one column. This will give you much more functionality. So here is my alternative approach

>>> raw = [['a', 'a', 'b'], ['b', 'b', 'c', 'd'], [], ['a', 'b', 'e']]
>>> df = pd.DataFrame(raw)
>>> df
Out[217]: 
      0     1     2     3
0     a     a     b  None
1     b     b     c     d
2  None  None  None  None
3     a     b     e  None

,

>>> df.count(axis=1)
Out[226]: 
0    3
1    4
2    0
3    3

sum() , .

-, : . , - ,

>>> foo = [col.value_counts() for x, col in df.iteritems()]
>>> foo
Out[246]: 
[a    2
 b    1
 dtype: int64, b    2
 a    1
 dtype: int64, b    1
 c    1
 e    1
 dtype: int64, d    1
 dtype: int64]

foo . - "xth value", 0 " " .

, " ".

>>> df2 = pd.DataFrame(foo)
>>> df2
Out[266]: 
    a   b   c   d   e
0   2   1 NaN NaN NaN
1   1   2 NaN NaN NaN
2 NaN   1   1 NaN   1
3 NaN NaN NaN   1 NaN
>>> test.sum(axis=0)
Out[264]: 
a    3
b    4
c    1
d    1
e    1
dtype: float64

, , , . , , pandas.

0

Source: https://habr.com/ru/post/1543150/


All Articles