Search for a database or text file of English words with their various forms

I am working on a project and I need to get the root of a given word (completion). As you know, generation algorithms that do not use a dictionary are not accurate. I also tried WordNet, but this is not good for my project. I found the phpmorphy project, but it does not include an API in Java.

I am currently looking for a database or text file with English words with their different forms. eg:

run run works ... include ... ...

Thanks for the help or advise.

+4
source share
1 answer

You can download LanguageTool (Disclaimer: I am the maintainer), which comes with the binary file english.dict . The LanguageTool Wiki describes how to upload this file as a text file:

 java -jar morfologik-tools-1.6.0-standalone.jar fsa_dump -x -d english.dict 

For run file will contain the following:

 ran run VBD run run NN run run VB run run VBN run run VBP running run VBG runs run NNS runs run VBZ 

The first column is the flex form, the second is the base form, and the third is the speech part tag according to the (slightly expanded) Penn Treebank Tag Set .

+8
source

Source: https://habr.com/ru/post/1498260/


All Articles