Hierarchy of meaning

I am looking for a method for building a hierarchy of words.

Background: I am an "amateur" natural language processing enthusiast, and now one of the problems that interest me is the determination of the hierarchy of semantics of words from a group of words.

For example, if I have a collection containing a "super" representation of others, i.e.

[cat, dog, monkey, animal, bird, ... ] 

I am interested to use any technique that would allow me to extract the word "animal", which has the most meaningful and accurate representation of other words within this set.

Note: they do NOT match in meaning. cat! = dog! = monkey! = animal BUT a cat is a subset of an animal, and a dog is a subset of an animal.

I know that many of you will tell me to use wordnet. Well, I will try, but in fact I am interested in making a very specific area that WordNet does not apply, because: 1) Most of the words are not found in Wordnet 2) All words are in another language; translation is possible, but limited to a limited effect.

another example:

 [ noise reduction, focal length, flash, functionality, .. ] 

therefore, functionality includes everything in this set.

I also tried crawling pages on Wikipedia and applying some methods on td-idf, etc., but Wikipedia pages are really not that important.

Can someone tell me which direction my research should go? (I could use something)

+4
source share
2 answers

It sounds like you want to use something like a hyper-nickname / hyponym relationship in WordNet, but without using WordNet because of the language and domain, specific coverage issues? That is, if you had a hyper-imaginary relationship related to the domain, you could get a "super" representation, just looking for the nearest parent who included all the words in the list or the nearest node that was equal to one of the list words and include all the others.

To begin with, I would like to point out that WordNets are indeed available for many of the major languages โ€‹โ€‹of the world, see the list in Global WordNet .

To get the hyper-switched domain relationships , you can use the technique presented in Snow et al. Studying syntactic patterns for automatic detection of hyper-imaginary . That is, you can start with a small list of seed hyper-images, and then use them to train the classifier for detecting hyper-images in the case. Then you will run this classifier over the data from your domain to create a list of hypernate pairs related to the domain.

+5
source

mind analysis and mood analysis people can do related things in terms of determining which words represent product functions without knowing anything about the products.

A quick sketch of the idea of โ€‹โ€‹how you can do this, which I completely did on the spot: Parse a bunch of sentences in the corresponding domain; find nouns and adjectives. Find out which noun phrases are associated with adjectives. Cluster nominal phrases together based on adjectives used to describe them. Animals will strive together because they will be described by adjectives such as "fluffy" or "cute," etc. (In particular, hierarchical clustering is likely to be most appropriate.)

If you try this and it works, let me know. :)

+1
source

Source: https://habr.com/ru/post/1305019/


All Articles