Major Package vs LSTM Major

Does RNN learn various dependency patterns when the input packet is the main one, not the time major?

+5
source share
2 answers

(Edit: sorry, my original argument was the reason why this makes sense, but I realized that it is not, it is a bit OT.)

I did not find the arguments of the TF groups behind this, but does not fulfill the computational meaning, since the ops are written in C ++.

Intuitively, we want to stretch (multiply / add, etc.) different functions from the same sequence to the same time interval. Different timestamps cannot be executed in parallel, while burst / sequences can include the function> periodicity / sequence> time.

By default, Numpy and C ++ use a lowercase memory card (primary) (C-like), so

[[ 0. 1. 2.] [ 3. 4. 5.] [ 6. 7. 8.]] 

It is laid as [0,1,2,3,4,5,6,7,8] in memory. This means that if we have

 x = np.zeros([time,batch,feature]) 

( time_major=True in the tensor stream)

In the Row-major memory, we get a layout of the type x[0,0,0],x[0,0,1],x[0,0,2],…,x[0,1,0],... , therefore ex. the point product of weights and vectors from the same sequence and timestep ( w*x[t,b,:] ) is the most continuous operation, followed by the following sequence w*x[t,b+1,:] , etc. d. This is what we want during training.

With time_major=False , which by default has [batch, time, feature], so ex-functions are from the same sequence, but different timestamps are more adjacent, i.e. w*x[batch,t,:] , followed by w*x[batch,t+1,:] , etc. It may be faster to predict a single sequence if the RNN is deployed, but this is an assumption.

If you came to this question for the same reason as I did, I learned to be careful with the slightly unintuitive Numpy indexing, which should be pythonic, not necessarily Row Major. Look at it. As expected:

 x = np.zeros([3,3]) x[0:9].flat = np.arange(10) print x > [[ 0. 1. 2.] > [ 3. 4. 5.] > [ 6. 7. 8.]] 

We also expect x[1] == x[0,1] , but

 print x[1] > [ 3. 4. 5.] print x[np.arange(10)<=4] > IndexError: index 3 is out of bounds for axis 0 with size 3 
+4
source

It makes no difference what the model is studying.

At time t, RNNs need results from t-1, so we need to calculate the times. If time_major=False , TensorFlow transfers the sequence of sequences from (batch_size, max_sequence_length) to (max_sequence_length, batch_size) *. It processes the transposed part one line at a time: at t = 0, the first element of each sequence is processed, latent states and outputs are calculated; at t = max_sequence_length, the last element of each sequence is processed.

So, if your data already has temporary values, use time_major=True , which avoids transposition. But there’s not much point in manually transferring data before submitting it to TensorFlow.

* If you have multidimensional inputs (for example, sequences of (batch_size, max_sequence_length, embedding_size) words: (batch_size, max_sequence_length, embedding_size) ), then the axes 0 and 1 are transposed, which leads to (max_sequence_length, batch_size, embedding_size)

+2
source

Source: https://habr.com/ru/post/1264005/


All Articles