An embedded layer is faster because it is essentially the equivalent of a dense layer, which makes simplifying assumptions.
Imagine a layer with the word for embedding with such weights:
w = [[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], [0.9, 0.0, 0.1, 0.2]]
The Dense layer will process them as the actual weights with which matrix multiplication needs to be performed . The embedding layer will simply process these weights as a list of vectors, each of which represents a single word ; The 0th word in the dictionary is w[0] , the 1st word is w[1] , etc.
For example, use the weights above and this sentence:
[0, 2, 1, 2]
Naive Dense -based network needs to convert this sentence to hot coding
[[1, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 1]]
then multiply the matrix
[[1 * 0.1 + 0 * 0.5 + 0 * 0.9, 1 * 0.2 + 0 * 0.6 + 0 * 0.0, 1 * 0.3 + 0 * 0.7 + 0 * 0.1, 1 * 0.4 + 0 * 0.8 + 0 * 0.2], [0 * 0.1 + 0 * 0.5 + 1 * 0.9, 0 * 0.2 + 0 * 0.6 + 1 * 0.0, 0 * 0.3 + 0 * 0.7 + 1 * 0.1, 0 * 0.4 + 0 * 0.8 + 1 * 0.2], [0 * 0.1 + 1 * 0.5 + 0 * 0.9, 0 * 0.2 + 1 * 0.6 + 0 * 0.0, 0 * 0.3 + 1 * 0.7 + 0 * 0.1, 0 * 0.4 + 1 * 0.8 + 0 * 0.2], [0 * 0.1 + 0 * 0.5 + 1 * 0.9, 0 * 0.2 + 0 * 0.6 + 1 * 0.0, 0 * 0.3 + 0 * 0.7 + 1 * 0.1, 0 * 0.4 + 0 * 0.8 + 1 * 0.2]]
=
[[0.1, 0.2, 0.3, 0.4], [0.9, 0.0, 0.1, 0.2], [0.5, 0.6, 0.7, 0.8], [0.9, 0.0, 0.1, 0.2]]
However, the Embedding layer just looks at [0, 2, 1, 2] and takes the weights of the index layer zero, two, one and two to get immediately
[w[0], w[2], w[1], w[2]]
=
[[0.1, 0.2, 0.3, 0.4], [0.9, 0.0, 0.1, 0.2], [0.5, 0.6, 0.7, 0.8], [0.9, 0.0, 0.1, 0.2]]
So this is the same result just obtained, I hope, faster.
Embedding level has limitations:
- The input must be integers in [0, vocab_length).
- There is no bias.
- No activation.
However, none of these restrictions should matter if you just want to convert an integer-encoded word into an attachment.