Major Package vs LSTM Major

Question

Major Package vs LSTM Major

Does RNN learn various dependency patterns when the input packet is the main one, not the time major?

+5

python deep-learning tensorflow lstm recurrent-neural-network

ssjadon Feb 09 '17 at 7:18

source share

2 answers

ragulpr · Answer 1 · 2017-02-12T18:36:17+0000

(Edit: sorry, my original argument was the reason why this makes sense, but I realized that it is not, it is a bit OT.)

I did not find the arguments of the TF groups behind this, but ~~does~~ not fulfill the computational meaning, since the ops are written in C ++.

Intuitively, we want to stretch (multiply / add, etc.) different functions from the same sequence to the same time interval. Different timestamps cannot be executed in parallel, while burst / sequences can include the function> periodicity / sequence> time.

By default, Numpy and C ++ use a lowercase memory card (primary) (C-like), so

[[ 0. 1. 2.] [ 3. 4. 5.] [ 6. 7. 8.]]

It is laid as [0,1,2,3,4,5,6,7,8] in memory. This means that if we have

 x = np.zeros([time,batch,feature])

( time_major=True in the tensor stream)

In the Row-major memory, we get a layout of the type x[0,0,0],x[0,0,1],x[0,0,2],…,x[0,1,0],... , therefore ex. the point product of weights and vectors from the same sequence and timestep ( w*x[t,b,:] ) is the most continuous operation, followed by the following sequence w*x[t,b+1,:] , etc. d. This is what we want during training.

With time_major=False , which by default has [batch, time, feature], so ex-functions are from the same sequence, but different timestamps are more adjacent, i.e. w*x[batch,t,:] , followed by w*x[batch,t+1,:] , etc. It may be faster to predict a single sequence if the RNN is deployed, but this is an assumption.

If you came to this question for the same reason as I did, I learned to be careful with the slightly unintuitive Numpy indexing, which should be pythonic, not necessarily Row Major. Look at it. As expected:

 x = np.zeros([3,3]) x[0:9].flat = np.arange(10) print x > [[ 0. 1. 2.] > [ 3. 4. 5.] > [ 6. 7. 8.]]

We also expect x[1] == x[0,1] , but

 print x[1] > [ 3. 4. 5.] print x[np.arange(10)<=4] > IndexError: index 3 is out of bounds for axis 0 with size 3

Mattias arro · Answer 2 · 2017-10-03T12:39:29+0000

It makes no difference what the model is studying.

At time t, RNNs need results from t-1, so we need to calculate the times. If time_major=False , TensorFlow transfers the sequence of sequences from (batch_size, max_sequence_length) to (max_sequence_length, batch_size) *. It processes the transposed part one line at a time: at t = 0, the first element of each sequence is processed, latent states and outputs are calculated; at t = max_sequence_length, the last element of each sequence is processed.

So, if your data already has temporary values, use time_major=True , which avoids transposition. But there’s not much point in manually transferring data before submitting it to TensorFlow.

* If you have multidimensional inputs (for example, sequences of (batch_size, max_sequence_length, embedding_size) words: (batch_size, max_sequence_length, embedding_size) ), then the axes 0 and 1 are transposed, which leads to (max_sequence_length, batch_size, embedding_size)

Major Package vs LSTM Major

More articles: