This is an interesting question, but I suspect you are asking it for the wrong reasons. Do you think of this “lexical Unicode” as something that will allow you to break sentences into language-neutral atomic elements of meaning, and then be able to recreate them in some other specific language? As a means of achieving a universal translator, is it possible?
Even if you can encode and store, say, an English sentence using "lexical unicode", you cannot expect to read it and magically display it, say, in Chinese, keeping the value intact.
Your Unicode analogy, however, is very useful.
Keep in mind that Unicode, although a “universal” code, does not embody the pronunciation, meaning, or use of the character in question. Each point in the code refers to a particular glyph in a particular language (or rather, in a script used by a group of languages). It is elemental at the level of visual representation of the glyph (within the framework of style, formatting and fonts). The Unicode code point for the Latin letter "A" is exactly that. This is the Latin letter "A". It cannot be automatically displayed, for example, in the Arabic letter Alif (ا) or in the index (Devnagari) "A" (अ).
Keeping the Unicode analog, your lexical Unicode will have codes for each word (word form) in each language. Unicode has code point ranges for a specific script. Your lexical Unicode will have a series of codes for each language. Different words in different languages, even if they have the same meaning (synonyms), must have different code points. The same word, with different meanings, or different pronunciations (homonyms), must have different code points.
In Unicode for some languages ​​(but not for all), where one and the same character has a different shape depending on its position in the word - for example. in Hebrew and Arabic, the shape of the glyph changes at the end of the word - then it has a different code point. Similarly, in your lexical Unicode, if a word has a different form depending on its position in a sentence, it can guarantee its own code point.
Perhaps the easiest way to come up with code points for the English language would be to base your system on, say, a specific edition of the Oxford English Dictionary and sequentially assign a unique code for each word. You will have to use a different code for each different meaning of the same word, and you will have to use a different code for different forms - for example, if the same word can be used as a noun and as a verb, then you will need two codes
Then you have to do the same for every other language you want to include - using the most authoritative dictionary for that language.
Most likely, this exercise is more and more effort than worth it. If you decide to include in the world all the living languages ​​of the world, as well as some historical dead and some fictional ones - as Unicode does, you will get a code space that is so large that your code must be extremely wide to accommodate it. You won’t get anything in terms of compression — it’s likely that a sentence represented as a String in the original language will take up less space than the same sentence presented as code.
PS For those who say that this is an impossible task, because the meanings of the words change, I do not see a problem in this. To use the Unicode analogy, the use of letters has changed (though not as fast as the meaning of words), but Unicode does not refer to the fact that “th” used to be used as “y” in the Middle Ages. Unicode has a code point for 't', 'h' and 'y', and each of them performs its own task.
PPS Actually, it refers to Unicode, that "oe" also "Ĺ“" or "ss" can be written "Ăź" in German