I am working on a project that requires scanning through paragraphs of natural English text and determining what word they are. The application works with AJAX, PHP and MySQL.
My application does not have to be 100% accurate and just trying to find the best content that matches the text input. To do this, I used the SQL version of the WordNet database, which allows me to search for words and their types using the view dict.
SELECT lemma, pos FROM dict WHERE lemma = 'fool' ORDER BY lemma;
The above is an example of what the database sees, but my PHP actually creates dynamic anchor parameters based on text from AJAX calls and will actually contain a lot of keywords.
This will return an array of entries with each word searched and their type.
However, my problem is that most words can be plural types, for example, with an example of a fool, it returns three as a noun and four as a verb. I do not need past differences, but I would like to know whether this word is a noun or a verb in its use.
This problem persists for most words, which means that I cannot pinpoint the different types of words, because it can be of any use.
I am wondering if anyone can point me in the right direction of the algorithm or what I can do to give at least the best guess about what a word type is.
The most important for the correct are adjectives and nouns.