Word Net - word synonyms and related word constructors - Java or Python

I am looking to use WordNet to find a collection of similar terms from a basic set of terms.

For example, the word 'discouraged' - potential synonyms can be: daunted, glum, deterred, pessimistic .

I also wanted to identify potential bigrams, such as; beat down, put off, caved in , etc.

How can I extract this information using Java or Python? Are there any WordNet databases / web interfaces hosted in the database that would allow such queries?

Thanks!

+6
source share
3 answers

The easiest way to understand WordNet data is by looking in the Prolog files. They are described here:

http://wordnet.princeton.edu/wordnet/man/prologdb.5WN.html

WordNet terms are grouped into synsets. Synthesis is the maximum synonym. Syntheses have a primary key so that they can be used in semantic relationships.

So, answering your first question, you can list the different feelings and corresponding synonyms of the word as follows:

 Input X: Term Output Y: Sense Output L: Synonyms in this Sense s_helper(X,Y) :- s(X,_,Y,_,_,_). ?- setof(H,(s_helper(Y,X),s_helper(Y,H)),L). 

Example:

 ?- setof(H,(s_helper(Y,'discouraged'),s_helper(Y,H),L). Y = 301664880, L = [demoralised, demoralized, discouraged, disheartened] ; Y = 301992418, L = [discouraged] ; No 

The second part of your question is WordNet terms: word sequences. Thus, you can search in these WordNet conditions for words as follows:

 Input X: Word Output Y: Term s_helper(X) :- s(_,_,X,_,_,_). word_in_term(X,Y) :- atom_concat(X,' ',H), sub_atom(Y,0,_,_,H). word_in_term(X,Y) :- atom_concat(' ',X,H), atom_concat(H,' ',J), sub_atom(Y,_,_,_,J). word_in_term(X,Y) :- atom_concat(' ',X,H), sub_atom(Y,_,_,0,H). ?- s_helper(Y), word_in_term(X,Y). 

Example:

 ?- s_helper(X), word_in_term('beat',X). X = 'beat generation' ; X = 'beat in' ; X = 'beat about' ; X = 'beat around the bush' ; X = 'beat out' ; X = 'beat up' ; X = 'beat up' ; X = 'beat back' ; X = 'beat out' ; X = 'beat down' ; X = 'beat a retreat' ; X = 'beat down' ; X = 'beat down' ; No 

This will give you potential n-grams, but not much morphological variation. WordNet also demonstrates some lexical relationships that may be helpful.

But both of the Prolog requests I gave are not very efficient. The problem is the lack of indexing words. A Java implementation may, of course, implement something better. Imagine something:

 class Synset { static Hashtable<Integer,Synset> synset_access; static Hashtable<String,Vector<Synset>> term_access; } 

Some Prologs can do the same by specifying a directive; this can instruct the Prolog system to index several arguments for a predicate.

Enabling a web service should not be that difficult, either in Java or Prolog. Many Prologs systems make it easy to embed Prolog on Java web servers and servlets.

A list of prologs that support web servers can be found here:

http://en.wikipedia.org/wiki/Comparison_of_Prolog_implementations#Operating_system_and_Web-related_features

Best wishes

+3
source

These are two different problems.

1) Wordnet and python. Use NLTK, it has a nice interface to wordnet . You could write something on your own, but honestly, why make life difficult? Lingpipe probably also has something built in, but NLTK is much easier to use. I think nltk just loads the ntlk database, but I'm sure there is apis to talk to wordnet.

2) To get bigrams in nltk follow this guide . In general, you tokenize the text, and then simply iterate over the sentence, getting all n-grams for each word, waiting back and forth.

+3
source

As an alternative to NLTK, you can use one of the available WordNet SPARQL endpoints available to retrieve such information. Request example:

 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX wordnet: <http://www.w3.org/2006/03/wn/wn20/schema/> SELECT DISTINCT ?label { ?input_word a wordnet:WordSense; rdfs:label ?input_label. FILTER (?input_label = 'run') ?synset wordnet:containsWordSense ?input_word. ?synset wordnet:containsWordSense ?synonym. ?synonym rdfs:label ?label. } LIMIT 100 

In the Java universe, Jena and Sesame can use frames.

+2
source

Source: https://habr.com/ru/post/894552/


All Articles