It is theoretically impossible to restore a specific word from a stem, since one stem can be common to many words. One of the possibilities, depending on your application, would be to create a database of stems, each of which was mapped to an array of several words. But then you will need to predict which of these words is appropriate, given the original meaning for the re-conversion.
As a very naive solution to this problem, if you know word tags, you can try storing words with tags in your database:
run: NN: runner VBG: running VBZ: runs
Then, given the stem “run” and the tag “NN”, you can determine that “runner” is the most likely word in this context. Of course, this decision is far from perfect. It is noteworthy that you will need to handle the fact that the same word form can be marked differently in different contexts. But remember that any attempt to solve this problem will be, at best, an approximation.
Edit: from the comments below, it looks like you probably want to use lemmatization instead of failing. Here's how to get word lemmas using NLP tools in Stanford Core :
import java.util.*; import edu.stanford.nlp.pipeline.*; import edu.stanford.nlp.ling.*; import edu.stanford.nlp.ling.CoreAnnotations.*; Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma"); pipeline = new StanfordCoreNLP(props, false); String text = "Hello, world!"; Annotation document = pipeline.process(text); for(CoreMap sentence: document.get(SentencesAnnotation.class)) { for(CoreLabel token: sentence.get(TokensAnnotation.class)) { String word = token.get(TextAnnotation.class); String lemma = token.get(LemmaAnnotation.class); } }
source share