The short answer is: "No, this cannot be done in principle, which works even remotely." This is an unresolved problem in research on natural language processing, and also, it turns out, is the subject of my doctoral work. I will very briefly summarize where we are and point to several publications:
The meaning of words
The most important assumption here is that you can get a vector that represents every word in a sentence in quesion. This vector is usually chosen to capture the contexts in which the word may appear. For example, if we consider only three contexts “eat”, “red” and “fluffy”, the word “cat” can be represented as [98, 1, 87], because if you read a very long piece of text (several billion words are not uncommon by today's standard), the word "cat" will appear very often in the context of "fluffy" and "eating", but not so often in the context of "red". Similarly, a “dog” can be represented as [87,2,34], and an “umbrella” can be [1,13,0]. Imagining these vectors as points in three-dimensional space, the “cat” is clearly closer to the “dog” than to the “umbrella”, so “cat” also means something more like a “dog” than an “umbrella”.
This line of work has been investigated since the beginning of the 90s (for example, this work by Greffenstette) and brought some surprisingly good results. For example, here are some random entries in a thesaurus that I recently created when my computer read wikipedia:
theory -> analysis, concept, approach, idea, method voice -> vocal, tone, sound, melody, singing james -> william, john, thomas, robert, george, charles
These lists of similar words were obtained completely without human intervention - you insert the text and come back in a few hours.
Phrase problem
You may ask why we are not doing the same for longer phrases such as "foxes love fruit." This is because we do not have enough text. So that we can reliably establish that X is similar to, we need to see many examples of the use of X in context. When X is one word, such as “voice,” it is not too complicated. However, as X grows longer, the chances of finding natural occurrences of X are exponentially slower. For comparison, Google has about 1B pages containing the word "fox", and not one page containing "foxes-foxes", despite the fact that this is a perfectly correct English sentence, and we all understand what this means.
Structure
To solve the problem of sparse data, we want to perform composition, i.e. take vectors for words that are easy to get from real text, and combine them in such a way as to fix their meaning. The bad news is that so far no one has been able to do this well.
The simplest and most obvious way is to add or propagate individual word vectors together. This leads to an undesirable side effect that “cats chase dogs” and “dog cat cats” mean the same thing to your system. In addition, if you multiply, you must be especially careful or all offers will be presented in the form of [0,0,0, ..., 0], which defeats the point.
Further reading
I will not discuss the more complex composition methods that have been proposed so far. I suggest you read Katrin Erk "vector space models of the meaning of words and phrases: an overview" . This is a very good high-level review to get you started. Unfortunately, the publisher’s website does not have free access, write to the author directly to get a copy. In this article you will find links to many more specific methods. More understandable are Mitchell and Lapata (2008) and Baroni and Zamparelli (2010) .
Edit after the comment by @vpekar: The essence of this answer is to emphasize the fact that although there are naive methods (for example, adding, multiplying, similarity to the surface, etc.), they are fundamentally wrong and generally not You should expect great performance from them.