I am looking for a method for building a hierarchy of words.
Background: I am an "amateur" natural language processing enthusiast, and now one of the problems that interest me is the determination of the hierarchy of semantics of words from a group of words.
For example, if I have a collection containing a "super" representation of others, i.e.
[cat, dog, monkey, animal, bird, ... ]
I am interested to use any technique that would allow me to extract the word "animal", which has the most meaningful and accurate representation of other words within this set.
Note: they do NOT match in meaning. cat! = dog! = monkey! = animal BUT a cat is a subset of an animal, and a dog is a subset of an animal.
I know that many of you will tell me to use wordnet. Well, I will try, but in fact I am interested in making a very specific area that WordNet does not apply, because: 1) Most of the words are not found in Wordnet 2) All words are in another language; translation is possible, but limited to a limited effect.
another example:
[ noise reduction, focal length, flash, functionality, .. ]
therefore, functionality includes everything in this set.
I also tried crawling pages on Wikipedia and applying some methods on td-idf, etc., but Wikipedia pages are really not that important.
Can someone tell me which direction my research should go? (I could use something)