You might think of a Bidirectional layer as follows:
forward = Recurrent(..)(input) backward = Recurrent(..., reverse_input=True)(input) output = merge([forward, backward], ...)
So - as you can see - you are losing your temporal orientation. You analyze input from start to finish. In this case, setting stateful=True simply takes its initial state from the previous pattern in accordance with the direction of the bidirectional branch ( forward accepts from forward , backward accepts from backward ).
This causes your model to lose interpretation - selections from parallel batches can be interpreted as a compact sequence divided into batches.
source share