How to efficiently expand arrays in python?

Question

How to efficiently expand arrays in python?

My question is how to efficiently expand an array by copying itself many times. I am trying to expand my polls to a complete dataset by copying each sample N times. N is the influence factor that was signed with the sample. So I wrote two loops to accomplish this task (the script inserted below). It works, but slow. My sample size is 20,000, and try to expand it to 3 million in full size. Is there any function I can try? Thanks for the help!

---- My script ----

lines = np.asarray(person.read().split('\n'))
df_array = np.asarray(lines[0].split(' '))
for j in range(1,len(lines)-1):
    subarray = np.asarray(lines[j].split(' '))
    factor = int(round(float(subarray[-1]),0))
    for i in range(1,factor):
        df_array = np.vstack((df_array, subarray))
print len(df_array)

+4

python arrays numpy

Angela y Dec 18 '15 at 23:52

source share

3 answers

numpy , . , vstack, .

, , , - , :

def upsample(stream):
    for line in stream:
        rec = line.strip().split()
        factor = int(round(float(rec[-1]),0))
        for i in xrange(factor):
            yield rec

df_array = np.array(list(upsample(person)))

+1

fivetentaylor 19 . '15 0:08

, , broadcasting. n dimensional n-1 dimensional.

, np.vstack() . .

, 1D- n,

>>> n = 5
>>> df_array = np.arange(n)
>>> df_array
array([0, 1, 2, 3, 4])

n x 10:

>>> bigger_array = np.empty([10,n])
>>> bigger_array[:] = df_array
>>> bigger_array
array([[ 0.,  1.,  2.,  3.,  4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.]])

So, with a single line of code, you can fill it with the contents of a smaller array.

big_array [:] = df_array

NB. Avoid using python lists. They are far, much slower than Numpy ndarray.

+1

timbo Dec 19 '15 at 0:30

source share

eph · Accepted Answer · 2015-12-19T01:08:49+0000

First you can try to load the data along with numpy.loadtxt.

, , numpy.repeat:

>>> data = np.array([[1, 2, 3],
...                  [4, 5, 6]])
>>> np.repeat(data, data[:,-1], axis=0)
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6]])

, data[:,-1], np.round(data[:,-1]).astype(int).

How to efficiently expand arrays in python?

More articles: