Pandas Dataframe: how to parse integers into a string of 0s and 1s?

I have the following pandas DataFrame.

import pandas as pd
df = pd.read_csv('filename.csv')

print(df)

      sample      column_A         
0     sample1        6/6    
1     sample2        0/4
2     sample3        2/6    
3     sample4       12/14   
4     sample5       15/21   
5     sample6       12/12   
..    ....

The values ​​in column_Aare not fractions, and this data should be processed in such a way that I can convert each value to 0sand 1s(not convert integers to my binary copies).

"Numerator" above gives the total number 1s, and "denominator" gives the total number 0sand 1stogether.

So, the table should be in the following format:

      sample      column_A         
0     sample1     111111    
1     sample2     0000
2     sample3     110000    
3     sample4     11111111111100    
4     sample5     111111111111111000000 
5     sample6     111111111111  
..    ....

I have never parsed an integer to output lines 0 and 1 like this. How to do it? Is there a pandas method to use with expressions lambda? Pythonic string parsing or regex?

+4
3

, :

def to_binary(s):
    n_d = s.split('/')
    n, d = int(n_d[0]), int(n_d[1])
    return '1' * n + '0' * (d - n)

,

>>> to_binary('4/5')
'11110'

pandas.Series.apply:

 df.column_A.apply(to_binary)
+6

:

df2 = df['column_A'].str.split('/', expand=True).astype(int)\
                    .assign(ones='1').assign(zeros='0')

df2
Out: 
    0   1 ones zeros
0   6   6    1     0
1   0   4    1     0
2   2   6    1     0
3  12  14    1     0
4  15  21    1     0
5  12  12    1     0

(df2[0] * df2['ones']).str.cat((df2[1]-df2[0])*df2['zeros'])
Out: 
0                   111111
1                     0000
2                   110000
3           11111111111100
4    111111111111111000000
5             111111111111
dtype: object

. , , .

+4

Here are some alternative solutions using extract () and . str. repeat () :

In [187]: x = df.column_A.str.extract(r'(?P<ones>\d+)/(?P<len>\d+)', expand=True).astype(int).assign(o='1', z='0')

In [188]: x
Out[188]:
   ones  len  o  z
0     6    6  1  0
1     0    4  1  0
2     2    6  1  0
3    12   14  1  0
4    15   21  1  0
5    12   12  1  0

In [189]: x.o.str.repeat(x.ones) + x.z.str.repeat(x.len-x.ones)
Out[189]:
0                   111111
1                     0000
2                   110000
3           11111111111100
4    111111111111111000000
5             111111111111
dtype: object

or slow (two apply()) single line:

In [190]: %paste
(df.column_A.str.extract(r'(?P<one>\d+)/(?P<len>\d+)', expand=True)
   .astype(int)
   .apply(lambda x: ['1'] * x.one + ['0'] * (x.len-x.one), axis=1)
   .apply(''.join)
)
## -- End pasted text --
Out[190]:
0                   111111
1                     0000
2                   110000
3           11111111111100
4    111111111111111000000
5             111111111111
dtype: object
+1
source

Source: https://habr.com/ru/post/1649048/


All Articles