Numpy String Indexing Array

Question

Numpy String Indexing Array

I have an array of strings

>>> lines array(['RL5\\Stark_223', 'RL5\\Stark_223', 'RL5\\Stark_223', ..., 'RL5\\Stark_238', 'RL5\\Stark_238', 'RL5\\Stark_238'], dtype='|S27')

Why can I index a row for the first element of an array

 >>> lines[0][0:3] 'RL5'

But not in the same place for all array elements

 >>> lines[:][0:3] array(['RL5\\Stark_223', 'RL5\\Stark_223', 'RL5\\Stark_223'], dtype='|S27')

Can anyone suggest a method to get the following result:

array (['RL5', 'RL5', 'RL5', ... 'RL5', 'RL5')

+6

python string arrays numpy indexing

geophys 12 sept '13 at 17:58

source share

5 answers

Jaime · Answer 1 · 2013-09-12T20:49:23+0000

To extract the first n characters of each line, you can abuse .astype :

 >>> s = np.array(['RL5\\Stark_223', 'RL5\\Stark_223', 'RL5\\Stark_223']) >>> s array(['RL5\\Stark_223', 'RL5\\Stark_223', 'RL5\\Stark_223'], dtype='|S13') >>> s.astype('|S3') array(['RL5', 'RL5', 'RL5'], dtype='|S3')

Daniel · Answer 2 · 2013-09-12T18:14:32+0000

Do not forget chararrays!

 lines.view(np.chararray).ljust(3) chararray(['RL5', 'RL5', 'RL5', 'RL5', 'RL5', 'RL5'], dtype='|S3')

Although its strangely slower:

 #Extend lines to 600000 elements %timeit lines.view(np.chararray).ljust(3) 1 loops, best of 3: 542 ms per loop %timeit np.vectorize(lambda x: x[:3])(lines) 1 loops, best of 3: 239 ms per loop %timeit map(lambda s: s[0:3], lines) 1 loops, best of 3: 243 ms per loop %timeit arr.astype('|S3') 100 loops, best of 3: 4.72 ms per loop

Maybe because its data duplication, the advantage of this is the dtype of the output array is minimized: S3 vs S64 .

Donald anderson · Answer 3 · 2013-09-12T18:06:34+0000

try it

 map(lambda s:s[0:3],lines)

Andy hayden · Answer 4 · 2013-09-12T18:06:47+0000

You can use numpy vectorize :

 In [11]: np.vectorize(lambda x: x[:3])(lines) Out[11]: array(['RL5', 'RL5', 'RL5', 'RL5', 'RL5', 'RL5'], dtype='|S64')

user2561747 · Answer 5 · 2016-11-03T21:00:06+0000

If you are looking for a quick and (somewhat more) flexible way, try:

 lines.view('|S1').reshape(-1, lines.dtype.itemsize)[:, :3].reshape(-1).view('|S3')

What can be used for more arbitrary slicing and slicing.

Time Information:

 import numpy as np lines = np.array(['RL5\\Stark_223', 'RL5\\Stark_223', 'RL5\\Stark_223', 'RL5\\Stark_238', 'RL5\\Stark_238', 'RL5\\Stark_238'], dtype='|S27').repeat(100000) %timeit lines.view(np.chararray).ljust(3) 1 loop, best of 3: 231 ms per loop %timeit np.vectorize(lambda x: x[:3])(lines) 1 loop, best of 3: 226 ms per loop %timeit map(lambda s: s[0:3], lines) 1 loop, best of 3: 171 ms per loop %timeit lines.astype('|S3') 100 loops, best of 3: 3.58 ms per loop %timeit lines.view('|S1').reshape(-1, lines.dtype.itemsize)[:, :3].reshape(-1).view('|S3') 100 loops, best of 3: 5.16 ms per loop

Numpy String Indexing Array

More articles: