I am using spaCy at the moment to determine the semantic similarity between the two lines. It works well and requires only a couple of lines of code, with all the work being done behind the scenes:
>>> import spacy
>>> nlp = spacy.load('en')
>>> nlp('string').similarity(nlp('another string'))
0.796
However, this requires ~ 600 mb of module data. As I post on Heroku, this far exceeds the available mucus size. I am considering other alternatives for hosting, but is there another structure that I could use? I do not require "industrial strength", but other structures do not seem to work either / are not implemented so quickly.
source
share