BACKGROUND
I have many numeric message codes in a NumPy array, and I will need to quickly convert them to strings. I had some performance issues and would like to understand why and how to do it quickly.
SOME STANDARDS
i am a trivial approach
import numpy as np
lookupdict = {
1: "val1",
2: "val2",
27: "val3",
35: "val4",
59: "val5" }
arr = np.random.choice(lookupdict.keys(), 1000000)
res = [ lookupdict[k] for k in arr ]
The search dictionary takes up most of my coffee break, 758 ms. (I also tried res = map(lookupdict.get, arr), but it's even worse.)
II - Without NumPy
import random
lookupdict = {
1: "val1",
2: "val2",
27: "val3",
35: "val4",
59: "val5" }
arr = [ random.choice(lookupdict.keys()) for _ in range(1000000) ]
res = [ lookupdict[k] for k in arr ]
Synchronization results vary significantly up to 76 ms!
It should be noted that I'm interested in search synchronization. Random generation is just the creation of some test data. Not interested if it takes a lot of time or not. All test results shown here are for only one million searches.
III - NumPy
, - . , NumPy :
res = [ lookupdict[k] for k in list(arr) ]
778 , 110 570 , . , , .
IV - np.int32 int
(np.int32 vs. int), " ". , , , :
res = [ lookupdict[int(k)] for k in arr ]
, , - , 266 . , --- , .
V - np.int32
, NumPy, dict :
import numpy as np
lookupdict = {
np.int32(1): "val1",
np.int32(2): "val2",
np.int32(27): "val3",
np.int32(35): "val4",
np.int32(59): "val5" }
arr = np.random.choice(lookupdict.keys(), 1000000)
res = [ lookupdict[k] for k in arr ]
177 . , 76 .
VI - int
import numpy as np
lookupdict = {
1: "val1",
2: "val2",
27: "val3",
35: "val4",
59: "val5" }
arr = np.array([ random.choice(lookupdict.keys()) for _ in range(1000000) ],
dtype='object')
res = [ lookupdict[k] for k in arr ]
86 , Python 76 .
- dict keys
int, int ( Python): 76 - dict
int, int (NumPy): 86 - dict
np.int32, np.int32: 177 - dict
int, np.int32: 758
(S)
? , ? - NumPy, ( , ) dict np.int32. ( , dict , - . , , 10 .)