I found a method that is even faster than DSM, inspired by Eric, although the improvement is best seen with smaller value lists; at very large values, the cost of the iteration itself begins to outweigh the advantage of performing a truth check during the creation of the numpy array, and not after. Testing with is and == (for situations where strings are interned compared to when they may be absent, since is will not work with non-integer strings. Since 'True' is likely to be a literal in the script, it should be interned, although ) showed that although my version with == was slower than with is , it was much faster than the DSM version.
Test setup:
import timeit def timer(statement, count): return timeit.repeat(statement, "from random import choice;import numpy as np;x = [choice(['True', 'False']) for i in range(%i)]" % count) >>> stateIs = "y = np.fromiter((e is 'True' for e in x), bool)" >>> stateEq = "y = np.fromiter((e == 'True' for e in x), bool)" >>> stateDSM = "y = np.array(x) == 'True'"
With 1000 titles, faster operators take up about 66% of DSM time:
>>> timer(stateIs, 1000) [101.77722641656146, 100.74985342340369, 101.47228618107965] >>> timer(stateEq, 1000) [112.26464996250706, 112.50754567379681, 112.76057346127709] >>> timer(stateDSM, 1000) [155.67689949529995, 155.96820504501557, 158.32394669279802]
For smaller row arrays (in hundreds, not thousands), elapsed time is less than 50% of DSM:
>>> timer(stateIs, 100) [11.947757485669172, 11.927990253608186, 12.057855628259858] >>> timer(stateEq, 100) [13.064947253943501, 13.161545451986967, 13.30599035623618] >>> timer(stateDSM, 100) [31.270060799078237, 30.941749748808434, 31.253922641324607]
A little over 25% DSM when done with 50 items on the list:
>>> timer(stateIs, 50) [6.856538342483873, 6.741083326021908, 6.708402786859551] >>> timer(stateEq, 50) [7.346079345032194, 7.312723444475523, 7.309259899921017] >>> timer(stateDSM, 50) [24.154247576229864, 24.173593700599667, 23.946403452288905]
For 5 items, about 11% DSM:
>>> timer(stateIs, 5) [1.8826215278058953, 1.850232652068371, 1.8559381315990322] >>> timer(stateEq, 5) [1.9252821868467436, 1.894011299061276, 1.894306935199893] >>> timer(stateDSM, 5) [18.060974208809057, 17.916322392367874, 17.8379771602049]