How can I create my own body in the Natural Language Python toolbox?

I recently expanded the nltk name structure and would like to know how I can turn the two files that I have (male.txt, female.txt) into a body so that I can access them using the existing nltk.corpus methods . Anyone have any suggestions?

Thanks a lot, James.

+4
source share
3 answers

As readme says that the corpus of names is not in the public domain - you should send an email with any changes that you make to the author of the corpus (the address is in this file). In addition to these details of the law and courtesy, you can simply replace one or both of these files with your own, they are in a completely simple format (one name per line, comments are allowed [[and are ignored]] and begin with '#' ).

To install a completely new enclosure instead of just customizing existing ones, you can start with the documents presented here .

+4
source

It's time to understand how processor reading works by looking at the source code in nltk.corpus and then looking at the enclosures (located in / home / [user] / nltk_data / corpora / names - this will probably be in "My Documents for XP" and somewhere in User for Win7 users).

The structure of the enclosure and its associated function will give a good idea of โ€‹โ€‹how to use the various enclosures available in NLTK.

In my case, I looked at the name variable in the nltk.corpus source code and was interested in the WordListCorpusReader function, since the name corpus is just a list of words.

+1
source

Alex is right, start with the documents and find out which corpus reader will work for your corpus. A simple copy of it, given the path to your enclosure (files). As you will see in the documents, inline enclosures are simply instances of separate classes for reading classes. Look through the code in the package nltk.corpus should also be useful.

0
source

Source: https://habr.com/ru/post/1299833/


All Articles