Suppose I have N sequences x [i], each of which has the length seqLength [i] for 0 <= i <N. As far as I understand from the cuDNN docs, they should be ordered by the length of the sequence, the longest of the first, so suppose that seqLength [i]> = seqLength [i + 1]. Suppose they have the dimension of an element D, therefore x [i] is a 2D tensor of the form (seqLength [i], D). As far as I understand, I should prepare a tensor x where all x [i] are adjacent to each other, that is, it will have the form (sum (seqLength), D).
According to the cuDNN docs, the cudnnRNNForwardInference / cudnnRNNForwardTraining receive the argument int seqLength and cudnnTensorDescriptor_t* xDesc , where:
seqLength : number of iterations to deploy.
xDesc : an array of tensor descriptors. Each must have the same second dimension. The first dimension can decrease from element n to element n + 1, but cannot increase.
I'm not quite sure that I understand correctly. Is seqLength my high (seqLength)?
And xDesc is an array. What length? max (seqLength)? If so, I assume that it describes one package of functions for each frame, but some of the later frames will contain fewer sequences. It seems that the number of sequences in the frame is described in the first dimension. So:
xDesc[t].shape[0] = len([i for i in range(N) if t < seqLength[i]])
for all 0 <= t <max (seqLength). Those. 0 <= xDesc[t].shape[0] <= N.
How many dimensions each xDesc [t] describes, ie What is len (xDesc [t] .shape)? I would suggest that this is 2, and the second dimension is the dimension of the element, i.e. D, i.e.:
xDesc[t].shape = (len(...), D)
The steps should be set accordingly, although this is also not entirely clear. If x is stored in the main row order, then
xDesc[0].strides[0] = D * xDesc[0].shape[0] xDesc[0].strides[1] = 1
But how does cuDNN calculate the offset for frame t ? I assume that it will track and therefore compute sum([xDesc[t2].strides[0] for t2 in range(t)]) .
Most of the code examples I've seen assume that all sequences are the same length. They also all describe 3 dimensions on xDesc [t], not 2. Why is this? The third dimension is always 1, as well as the step of the second and third dimension, and the step for the first dimension is N. Thus, it is assumed that the tensor x has an ordered row order and has the form (max (seqLength), N, D). The code is actually a bit weird. For example, from TensorFlow:
int dims[] = {batch_size, data_size, 1}; int strides[] = {dims[1] * dims[2], dims[2], 1}; cudnnSetTensorNdDescriptor( ..., sizeof(dims) / sizeof(dims[0]) , dims , strides );
The code looks very similar in all the examples I found. cudnnSetTensorNdDescriptor search for cudnnSetTensorNdDescriptor or cudnnRNNForwardTraining . For instance:
I found one example that can handle sequences of various lengths. Again find cudnnSetTensorNdDescriptor :
This claims that there should be 3 dimensions for each xDesc[t] . He has a comment:
these measurements are what CUDNN expects: (mini-packet measurement, data measurement, and number 1 (because each descriptor describes one data frame)
Change: Support for this was added at the end of 2018 for PyTorch, in this commit.
Am I missing something from the cuDNN documentation? I really did not find this information in it.
My question is mainly my conclusion on how to correctly set the arguments x , seqLength and xDesc for cudnnRNNForwardInference / cudnnRNNForwardTraining , as well as my implicit assumptions, or, if not, how I would use them, how the memory layout looks, etc. .?