Error retrieving phrases using Gensim

I am trying to get bigrams in sentences using Phrases in Gensim as follows.

from gensim.models import Phrases
from gensim.models.phrases import Phraser
documents = ["the mayor of new york was there", "machine learning can be useful sometimes","new york mayor was present"]

sentence_stream = [doc.split(" ") for doc in documents]
#print(sentence_stream)
bigram = Phrases(sentence_stream, min_count=1, threshold=2, delimiter=b' ')
bigram_phraser = Phraser(bigram)

for sent in sentence_stream:
    tokens_ = bigram_phraser[sent]
    print(tokens_)

Even if he catches “new”, “York” as “new York”, he does not catch “car”, learning like “machine learning”

However, in the example shown on the Gensim website , they were able to catch the words “machine”, “training” as “machine learning”.

Please let me know how to get machine learning like bigram in the example above.

+1
source share
2 answers

, gensim Phrases, : , , , min_count threshold.

, "" "", , (, "" "" ) , "new_york" bigram, . , min_count threshold, "machine_learning" , bigram--- , , , , .

, , , . ( , .)

, , , . ? ( ) , .

, , n-. , , - .

( gensim documentation comment, , Phrases , , - "new_york" "machine_learning", , ... , , , . - , , "new_york" , "machine_learning" , "machine_learning".)

+2

, threshold?

.

0

Source: https://habr.com/ru/post/1685420/


All Articles