Why is word insertion actually a vector?

Question

Why is word insertion actually a vector?

I regret my naivety, but I do not understand why the embedding of words that are the result of the NN learning process (word2vec) are actually vectors.

Nesting is a process of decreasing dimensions, during the training process, NN reduces arrays of words 1/0 by arrays of smaller size, the process does not do anything that uses vector arithmetic.

Thus, we got only arrays, not vectors. Why should I treat these arrays as vectors?

Despite the fact that we got vectors, why does everyone display them as vectors coming from the origin (0,0)?

Again, I'm sorry if my question looks stupid.

+4

machine-learning nlp neural-network embedding word2vec

com Oct 13 '17 at 7:27

source share

4 answers

Maxim · Answer 1 · 2017-10-13T10:14:58+0000

a process does nothing using vector arithmetic

The educational process has nothing to do with vector arithmetic, but when arrays are created, it turns out that they have pretty nice properties, so you can think about a "linear word space".

For example, what words have the attachments closest to the given word in this space?

Put it another way, words with a similar meaning form a cloud. Here is a two-dimensional representation of t-SNE:

Another example: the distance between “man” and “woman” is very close to the distance between “uncle” and “aunt”:

As a result, you have pretty reasonable arithmetic:

W("woman") − W("man") ≃ W("aunt") − W("uncle")
W("woman") − W("man") ≃ W("queen") − W("king")

, . , .

alvas · Answer 2 · 2017-10-16T07:55:31+0000

?

- (NLP), .
.

(: https://en.wikipedia.org/wiki/Word_embedding)

Word2Vec?

Word2vec - , . , .
Word2vec , , .
, , , .

(: https://en.wikipedia.org/wiki/Word2vec)

?

- , ( ), .
, .
, .

/ ?

( ) , , ( "" ) , .
, , .
, , .

(: https://en.wikipedia.org/wiki/Vector_space)

?

-, ( : ).

, IS ( ) ( )

, , tensor. - .

OP:

?

- (. )

?

, - .

, "" :

>>> semnum = semantic_numbers = {'car': 5, 'vehicle': 2, 'apple': 232, 'orange': 300, 'fruit': 211, 'samsung': 1080, 'iphone': 1200}
>>> abs(semnum['fruit'] - semnum['apple'])
21
>>> abs(semnum['samsung'] - semnum['apple'])
848

, fruit apple , samsung apple . "" , .

, (, ):

>>> import numpy as np
>>> semnum = semantic_numbers = {'car': [5, -20], 'vehicle': [2, -18], 'apple': [232, 1010], 'orange': [300, 250], 'fruit': [211, 250], 'samsung': [1080, 1002], 'iphone': [1200, 1100]}

, :

>>> np.array(semnum['apple']) - np.array(semnum['orange'])
array([-68, 761])

>>> np.array(semnum['apple']) - np.array(semnum['samsung'])
array([-848,    8])

, , , , . :

>>> import numpy as np
>>> orange = np.array(semnum['orange'])
>>> apple = np.array(semnum['apple'])
>>> samsung = np.array(semnum['samsung'])

>>> np.linalg.norm(apple-orange)
763.03604108849277

>>> np.linalg.norm(apple-samsung)
848.03773500947466

>>> np.linalg.norm(orange-samsung)
1083.4685043876448

"", apple samsung, orange - samsung. , apple samsung, orange.

, " ?" . , Word2Vec/embedding ( Bengio 2003).

, , , (.. )?

, / , , .

, , . , , , , .

, " " ( ) / /, , .

, , Word2Vec, , .

: https://github.com/keon/awesome-nlp#word-vectors

Anuj Gupta · Answer 3 · 2017-10-13T10:22:35+0000

d- (d 300 600, ), ( d-dim , d-dim ).

( ) [ 2- ]

Zephro · Answer 4 · 2017-10-19T16:06:12+0000

Word2Vec - CBOW + Skip-Gram

CBOW - ( N; N = ). M x N; M = ).

, , - , NN . - .

P , , . , .. ( ). .

, , . , , 2 3- .

: https://arxiv.org/pdf/1301.3781.pdf

Why is word insertion actually a vector?

More articles: