How to efficiently expand arrays in python?

My question is how to efficiently expand an array by copying itself many times. I am trying to expand my polls to a complete dataset by copying each sample N times. N is the influence factor that was signed with the sample. So I wrote two loops to accomplish this task (the script inserted below). It works, but slow. My sample size is 20,000, and try to expand it to 3 million in full size. Is there any function I can try? Thanks for the help!

---- My script ----

lines = np.asarray(person.read().split('\n'))
df_array = np.asarray(lines[0].split(' '))
for j in range(1,len(lines)-1):
    subarray = np.asarray(lines[j].split(' '))
    factor = int(round(float(subarray[-1]),0))
    for i in range(1,factor):
        df_array = np.vstack((df_array, subarray))
print len(df_array)
+4
source share
3 answers

First you can try to load the data along with numpy.loadtxt.

, , numpy.repeat:

>>> data = np.array([[1, 2, 3],
...                  [4, 5, 6]])
>>> np.repeat(data, data[:,-1], axis=0)
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6]])

, data[:,-1], np.round(data[:,-1]).astype(int).

+2

numpy , . , vstack, .

, , , - , :

def upsample(stream):
    for line in stream:
        rec = line.strip().split()
        factor = int(round(float(rec[-1]),0))
        for i in xrange(factor):
            yield rec

df_array = np.array(list(upsample(person)))
+1

, , broadcasting. n dimensional n-1 dimensional.

, np.vstack() . .

, 1D- n,

>>> n = 5
>>> df_array = np.arange(n)
>>> df_array
array([0, 1, 2, 3, 4])

n x 10:

>>> bigger_array = np.empty([10,n])
>>> bigger_array[:] = df_array
>>> bigger_array
array([[ 0.,  1.,  2.,  3.,  4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.]])

So, with a single line of code, you can fill it with the contents of a smaller array.

big_array [:] = df_array

NB. Avoid using python lists. They are far, much slower than Numpy ndarray.

+1
source

Source: https://habr.com/ru/post/1620830/


All Articles