Pandas Dataframe: how to parse integers into a string of 0s and 1s?

Question

Pandas Dataframe: how to parse integers into a string of 0s and 1s?

I have the following pandas DataFrame.

import pandas as pd
df = pd.read_csv('filename.csv')

print(df)

      sample      column_A         
0     sample1        6/6    
1     sample2        0/4
2     sample3        2/6    
3     sample4       12/14   
4     sample5       15/21   
5     sample6       12/12   
..    ....

The values in column_Aare not fractions, and this data should be processed in such a way that I can convert each value to 0sand 1s(not convert integers to my binary copies).

"Numerator" above gives the total number 1s, and "denominator" gives the total number 0sand 1stogether.

So, the table should be in the following format:

      sample      column_A         
0     sample1     111111    
1     sample2     0000
2     sample3     110000    
3     sample4     11111111111100    
4     sample5     111111111111111000000 
5     sample6     111111111111  
..    ....

I have never parsed an integer to output lines 0 and 1 like this. How to do it? Is there a pandas method to use with expressions lambda? Pythonic string parsing or regex?

+4

python pandas regex parsing

ShanZhengYang 25 . '16 15:07

3

:

df2 = df['column_A'].str.split('/', expand=True).astype(int)\
                    .assign(ones='1').assign(zeros='0')

df2
Out: 
    0   1 ones zeros
0   6   6    1     0
1   0   4    1     0
2   2   6    1     0
3  12  14    1     0
4  15  21    1     0
5  12  12    1     0

(df2[0] * df2['ones']).str.cat((df2[1]-df2[0])*df2['zeros'])
Out: 
0                   111111
1                     0000
2                   110000
3           11111111111100
4    111111111111111000000
5             111111111111
dtype: object

. , , .

+4

ayhan 25 . '16 15:35

Here are some alternative solutions using extract () and . str. repeat () :

In [187]: x = df.column_A.str.extract(r'(?P<ones>\d+)/(?P<len>\d+)', expand=True).astype(int).assign(o='1', z='0')

In [188]: x
Out[188]:
   ones  len  o  z
0     6    6  1  0
1     0    4  1  0
2     2    6  1  0
3    12   14  1  0
4    15   21  1  0
5    12   12  1  0

In [189]: x.o.str.repeat(x.ones) + x.z.str.repeat(x.len-x.ones)
Out[189]:
0                   111111
1                     0000
2                   110000
3           11111111111100
4    111111111111111000000
5             111111111111
dtype: object

or slow (two apply()) single line:

In [190]: %paste
(df.column_A.str.extract(r'(?P<one>\d+)/(?P<len>\d+)', expand=True)
   .astype(int)
   .apply(lambda x: ['1'] * x.one + ['0'] * (x.len-x.one), axis=1)
   .apply(''.join)
)
## -- End pasted text --
Out[190]:
0                   111111
1                     0000
2                   110000
3           11111111111100
4    111111111111111000000
5             111111111111
dtype: object

+1

Maxu Jul 25 '16 at 18:20

source share

Ami Tavory · Accepted Answer · 2016-07-25T15:16:12+0000

, :

def to_binary(s):
    n_d = s.split('/')
    n, d = int(n_d[0]), int(n_d[1])
    return '1' * n + '0' * (d - n)

,

>>> to_binary('4/5')
'11110'

pandas.Series.apply:

 df.column_A.apply(to_binary)

Pandas Dataframe: how to parse integers into a string of 0s and 1s?

More articles: