Why is word insertion actually a vector?

I regret my naivety, but I do not understand why the embedding of words that are the result of the NN learning process (word2vec) are actually vectors.

Nesting is a process of decreasing dimensions, during the training process, NN reduces arrays of words 1/0 by arrays of smaller size, the process does not do anything that uses vector arithmetic.

Thus, we got only arrays, not vectors. Why should I treat these arrays as vectors?

Despite the fact that we got vectors, why does everyone display them as vectors coming from the origin (0,0)?

Again, I'm sorry if my question looks stupid.

+4
source share
4 answers

a process does nothing using vector arithmetic

The educational process has nothing to do with vector arithmetic, but when arrays are created, it turns out that they have pretty nice properties, so you can think about a "linear word space".

For example, what words have the attachments closest to the given word in this space?

coming words

Put it another way, words with a similar meaning form a cloud. Here is a two-dimensional representation of t-SNE:

tsne

Another example: the distance between β€œman” and β€œwoman” is very close to the distance between β€œuncle” and β€œaunt”:

word distance

As a result, you have pretty reasonable arithmetic:

W("woman") βˆ’ W("man") ≃ W("aunt") βˆ’ W("uncle")
W("woman") βˆ’ W("man") ≃ W("queen") βˆ’ W("king")

, . , .

+3

?

- (NLP), .

.

(: https://en.wikipedia.org/wiki/Word_embedding)

Word2Vec?

Word2vec - , . , .

Word2vec , , .

, , , .

(: https://en.wikipedia.org/wiki/Word2vec)

?

- , ( ), .

, .

, .

/ ?

( ) , , ( "" ) , .

, , .

, , .

(: https://en.wikipedia.org/wiki/Vector_space)

?

-, ( : ).

, IS ( ) ( )

, , tensor. - .


OP:

?

- (. )

?

, - .

, "" :

>>> semnum = semantic_numbers = {'car': 5, 'vehicle': 2, 'apple': 232, 'orange': 300, 'fruit': 211, 'samsung': 1080, 'iphone': 1200}
>>> abs(semnum['fruit'] - semnum['apple'])
21
>>> abs(semnum['samsung'] - semnum['apple'])
848

, fruit apple , samsung apple . "" , .

, (, ):

>>> import numpy as np
>>> semnum = semantic_numbers = {'car': [5, -20], 'vehicle': [2, -18], 'apple': [232, 1010], 'orange': [300, 250], 'fruit': [211, 250], 'samsung': [1080, 1002], 'iphone': [1200, 1100]}

, :

>>> np.array(semnum['apple']) - np.array(semnum['orange'])
array([-68, 761])

>>> np.array(semnum['apple']) - np.array(semnum['samsung'])
array([-848,    8])

, , , , . :

>>> import numpy as np
>>> orange = np.array(semnum['orange'])
>>> apple = np.array(semnum['apple'])
>>> samsung = np.array(semnum['samsung'])

>>> np.linalg.norm(apple-orange)
763.03604108849277

>>> np.linalg.norm(apple-samsung)
848.03773500947466

>>> np.linalg.norm(orange-samsung)
1083.4685043876448

"", apple samsung, orange - samsung. , apple samsung, orange.

, " ?" . , Word2Vec/embedding ( Bengio 2003).


, , , (.. )?

, / , , .

, , . , , , , .

, " " ( ) / /, , .

, , Word2Vec, , .

: https://github.com/keon/awesome-nlp#word-vectors

+2

d- (d 300 600, ), ( d-dim , d-dim ).

( ) [ 2- ]

0

Word2Vec - CBOW + Skip-Gram

CBOW - ( N; N = ). M x N; M = ).

, , - , NN . - .

P , , . , .. ( ). .

, , . , , 2 3- .

enter image description here

: https://arxiv.org/pdf/1301.3781.pdf

0

Source: https://habr.com/ru/post/1682252/


All Articles