In general, when you shuffle workout data (a set of sequences), you shuffle the order in which the sequences are fed to the RNN, you do not shuffle the order in the individual sequences. This is normal to do when your network is stateless:
Nuclear-free case:
Network memory is only retained for the duration of the sequence. Learning sequence B before sequence A does not matter because the state of the network memory is not stored in the sequence.
On the other hand:
Stealth:
Network memory is stored in sequences. Here you cannot blindly shuffle your data and expect optimal results. Sequence A must be submitted to the network before sequence B, because A reaches B, and we want the network to evaluate sequence B with the memory of what was in sequence A.
source share