Cut the data in an array in numpy according to other arrays in an efficient way

Question

Cut the data in an array in numpy according to other arrays in an efficient way

I would like to give you your help on the issue of reducing data in arrays in python, I am new to python, but I have some programming experience.

The problem is this: I have an array S of n elements that comes from the sensor’s measurements and approaches four other arrays that indicate the year, month, day and time of the measurements (y_lna, m_lna, d_lna AND h_lna), I also have another array of T of m equal elements, followed by 4 arrays (y, m, d, h), I want to create a vector of the same size as S, where values from T correspond to S values in hours, days, months and years.

The data are organized in such a way that they have values from year 0 to year n as follows:

Data   h d m  y
d1    00 1 1 2003
d2    03 1 1 2003
...
dn    10 5 8 2009

I created a function that allows you to do this, but I'm not sure if this is done correctly, it also takes a lot of time for the number of iterations that it performs, is there any way to do this more efficiently? and i don't know how to deal with nan values

def reduce_data(h, d, m, y, h_lna, d_lna, m_lna, y_lna, data):
    year = np.linspace(2003, 2016, 14, True)
    month = np.linspace(1, 12, 12, True)
    new_data = []
    for a in year:
        ind1 = [i for i in range(len(y)) if y[i] == a]
        ind1_l = [i for i in range(len(y_lna)) if y_lna[i] == a]
        for b in range(len(month)):
            ind2 = [i for i in ind1 if m[i] == b + 1]
            ind2_l = [i for i in ind1_l if m_lna[i] == b + 1]
            for c in range(len(ind2)):  # days
                ind3 = [i for i in ind2 if d[i] == c]
                ind3_l = [i for i in ind2_l if d_lna[i] == c]
                for dd in range(len(ind3)):
                    for e in range(len(ind3_l)):
                        if h[ind3[dd]] == h_lna[ind3_l[e]]:
                            new_data.append(data[ind3[dd]])
    return new_data

I appreciate your cooperation

EDIT: I am adding data that I am working with, the sensor values are not real, I replaced them with random data, but the time values are real (only for one year). data1 contains sensor data S, the temporary variables of which are reference values to reduce, data2 contains sensor data T with its temporary variables, and finally, result is the one that has the expected results.

DATA 1

        S       h_lna   d_lna   m_lna   y_lna
    0   0        8       6        2     2003
    1   2        9       6        2     2003
    2   4       10       6        2     2003
    3   6       11       6        2     2003
    4   8       12       6        2     2003
    5   10      13       6        2     2003
    6   12      14       6        2     2003
    7   14      15       6        2     2003
    8   16      16       6        2     2003
    9   18      17       6        2     2003
   10   20      18       6        2     2003

DATA 2

    T   h   d   m   y
0   864 0   6   2   2003
1   865 1   6   2   2003
2   866 2   6   2   2003
3   867 3   6   2   2003
4   868 4   6   2   2003
5   869 5   6   2   2003
6   870 6   6   2   2003
7   871 7   6   2   2003
8   872 8   6   2   2003
9   873 9   6   2   2003
10  874 10  6   2   2003
11  875 11  6   2   2003
12  876 12  6   2   2003
13  877 13  6   2   2003
14  878 14  6   2   2003
15  879 15  6   2   2003
16  880 16  6   2   2003
17  881 17  6   2   2003
18  882 18  6   2   2003
19  883 19  6   2   2003
20  884 20  6   2   2003
21  885 21  6   2   2003
22  886 22  6   2   2003
23  887 23  6   2   2003
24  888 0   7   2   2003
25  889 1   7   2   2003
26  890 2   7   2   2003
27  891 3   7   2   2003
28  892 4   7   2   2003
29  893 5   7   2   2003
30  894 6   7   2   2003
31  895 7   7   2   2003
32  896 8   7   2   2003
33  897 9   7   2   2003
34  898 10  7   2   2003

RESULT

    result  h_lna   d_lna   m_lna   y_lna
0   872        8      6      2      2003
1   873        9      6      2      2003
2   874       10      6      2      2003
3   875       11      6      2      2003
4   876       12      6      2      2003
5   877       13      6      2      2003
6   878       14      6      2      2003
7   879       15      6      2      2003
8   880       16      6      2      2003
9   881       17      6      2      2003
10  882       18      6      2      2003

+4

python python-3.x numpy scipy pandas

Riverarodrigoa 03 '17 10:26

1

John Zwinck · Accepted Answer · 2017-05-04T13:12:44+0000

"join". Data 2 :

d2i = d2.set_index(['y', 'm', 'd', 'h'])

d2i MultiIndex (y, m, d, h) (T).

join():

d1.join(d2i, ['y_lna', 'm_lna', 'd_lna', 'h_lna'])

DatetimeIndex , . pd.to_datetime() :

year = np.datetime64(d2.y - 1970, 'Y') # Unix epoch = 1970-01-01
month = np.timedelta64(d2.m - 1, 'M') # January adds 0
day = np.timedelta64(d2.d - 1, 'D')
hour = np.timedelta64(d2.h, 'h')
index = pd.to_datetime(year + month + day + hour)
d2s = pd.Series(d2['T'], index)

T . DataFrames, , join/merge/index/asof.

Cut the data in an array in numpy according to other arrays in an efficient way

More articles: