Efficiently create a multidimensional array from a list of strings requiring .split (',')

Question

Efficiently create a multidimensional array from a list of strings requiring .split (',')

I am trying to make a simple calculation currently in a loop forinto an array numpy. In this case, this is a calculation in a list of strings in the form:

strings = ['12,34', '56,78'...]

I need:

Separate the lines with a comma separator and make two entries, for example,
strings = [[12, 34], [56, 78]...]
Filter this nested list only for members that meet some arbitrary criteria, for example. both numbers in the sublist fall into a certain range.

I am trying to get to know the library numpy, but I could not use the improved computation speed without increasing the overhead in processing the original list. For example, my instinct is to convert split()and int()in Python to create an array, but it turned out to be more costly than a simple loop for.

In addition, I cannot compose the various operations numpynecessary for this in an array created from the original list. Is there any reasonable way to do this or is it a lost reason for such things when the array is used only once?

. , , Python, .

:

import random
import datetime as dt
import numpy as np

raw_locs = [str(random.randint(1,100)) + ',' + str(random.randint(1,100)) 
            for x in xrange(100000)]

if __name__ =='__main__':

    # Python approach
    start1 = dt.datetime.now()
    results = []
    for point in raw_locs:
        lon, lat = point.split(",")
        lat = int(lat)
        lon = int(lon)
        if 0 <= lon <= 50 and 50 <= lat <= 100:
            results.append(point)
    end1 = dt.datetime.now()

    # Python list comprehension prior to numpy array
    start2 = dt.datetime.now()
    converted_list = [map(int, item.split(',')) for item in raw_locs]
    end2 = dt.datetime.now()

    # List comprehension + numpy array creation
    start3 = dt.datetime.now()
    arr = np.array([map(int, item.split(',')) for item in raw_locs])
    end3 = dt.datetime.now()

    start4 = dt.datetime.now()   
    results2 = arr[((0 <= arr[:,0]) & (arr[:,0] <= 50) 
                    & (50 <= arr[:,1]) & (arr[:,1] <= 100))]
    end4 = dt.datetime.now()

    # Print results
    print "Pure python for whole solution took:                {}".format(end1 - start1)
    print "Just python list comprehension prior to array took: {}".format(end2 - start2)
    print "Comprehension + array creation took:                {}".format(end3 - start3)
    print "Numpy actual calculation took:                      {}".format(end4 - start4)
    print "Total numpy time:                                   {}".format(end4 - start3)

+4

python arrays numpy

roganjosh 01 . '16 18:24

1

Andras Deak · Accepted Answer · 2016-12-01T18:39:22+0000

, , - timeit, , , . . , numpy np.array() .

: , , numpy.fromstring, :

arr = np.fromstring(','.join(raw_locs),sep=',').reshape(-1,2)

, :

Pure python for whole solution took:                0:00:00.128965
Just python list comprehension prior to array took: 0:00:00.156092
Comprehension + array creation took:                0:00:00.186023
Join + fromstring took:                             0:00:00.035040
Numpy actual calculation took:                      0:00:00.001355
Total numpy time:                                   0:00:00.222454

, dtype numpy.float64 , . dtype=np.int64 fromstring, .

Efficiently create a multidimensional array from a list of strings requiring .split (',')

More articles: