Numpy Converts a string representation of a boolean matrix to a logical array

Question

Numpy Converts a string representation of a boolean matrix to a logical array

Is there a numpy native way to convert an array of string representations of boolean types, for example:

['True','False','True','False']

For a real boolean array, which can I use for masking / indexing? I could do a for loop and rebuild the array, but for large arrays this is slower.

+6

python numpy

Newmu Jun 05 '13 at 16:10

source share

3 answers

I found a method that is even faster than DSM, inspired by Eric, although the improvement is best seen with smaller value lists; at very large values, the cost of the iteration itself begins to outweigh the advantage of performing a truth check during the creation of the numpy array, and not after. Testing with is and == (for situations where strings are interned compared to when they may be absent, since is will not work with non-integer strings. Since 'True' is likely to be a literal in the script, it should be interned, although ) showed that although my version with == was slower than with is , it was much faster than the DSM version.

Test setup:

 import timeit def timer(statement, count): return timeit.repeat(statement, "from random import choice;import numpy as np;x = [choice(['True', 'False']) for i in range(%i)]" % count) >>> stateIs = "y = np.fromiter((e is 'True' for e in x), bool)" >>> stateEq = "y = np.fromiter((e == 'True' for e in x), bool)" >>> stateDSM = "y = np.array(x) == 'True'"

With 1000 titles, faster operators take up about 66% of DSM time:

 >>> timer(stateIs, 1000) [101.77722641656146, 100.74985342340369, 101.47228618107965] >>> timer(stateEq, 1000) [112.26464996250706, 112.50754567379681, 112.76057346127709] >>> timer(stateDSM, 1000) [155.67689949529995, 155.96820504501557, 158.32394669279802]

For smaller row arrays (in hundreds, not thousands), elapsed time is less than 50% of DSM:

 >>> timer(stateIs, 100) [11.947757485669172, 11.927990253608186, 12.057855628259858] >>> timer(stateEq, 100) [13.064947253943501, 13.161545451986967, 13.30599035623618] >>> timer(stateDSM, 100) [31.270060799078237, 30.941749748808434, 31.253922641324607]

A little over 25% DSM when done with 50 items on the list:

 >>> timer(stateIs, 50) [6.856538342483873, 6.741083326021908, 6.708402786859551] >>> timer(stateEq, 50) [7.346079345032194, 7.312723444475523, 7.309259899921017] >>> timer(stateDSM, 50) [24.154247576229864, 24.173593700599667, 23.946403452288905]

For 5 items, about 11% DSM:

 >>> timer(stateIs, 5) [1.8826215278058953, 1.850232652068371, 1.8559381315990322] >>> timer(stateEq, 5) [1.9252821868467436, 1.894011299061276, 1.894306935199893] >>> timer(stateDSM, 5) [18.060974208809057, 17.916322392367874, 17.8379771602049]

+2

Jab Jun 05 '13 at 16:17

source share

Is this enough?

 my_list = ['True', 'False', 'True', 'False'] np.array(x == 'True' for x in my_list)

This is not native, but if you start with a non-native list anyway, it really doesn't matter.

0

Eric Jun 05 '13 at 16:13

source share

DSM · Accepted Answer · 2013-06-05T16:16:46+0000

You should be able to do a logical comparison, IIUC, whether the dtype string or an object :

 >>> a = np.array(['True', 'False', 'True', 'False']) >>> a array(['True', 'False', 'True', 'False'], dtype='|S5') >>> a == "True" array([ True, False, True, False], dtype=bool)

or

 >>> a = np.array(['True', 'False', 'True', 'False'], dtype=object) >>> a array(['True', 'False', 'True', 'False'], dtype=object) >>> a == "True" array([ True, False, True, False], dtype=bool)

Numpy Converts a string representation of a boolean matrix to a logical array

More articles: