Install software enclosures or NLTK models, i.e. Without a graphical user interface loader?

Question

Install software enclosures or NLTK models, i.e. Without a graphical user interface loader?

My project uses NLTK. How can I list the requirements for the project body and model so that they can be automatically installed? I do not want to click nltk.download() through the GUI, installing the packages one by one.

Also any way to freeze the same list of requirements (e.g. pip freeze )?

+42

install packages nltk requirements corpus

Bluu Apr 30 2018-11-18T00:

source share

4 answers

In addition to the command line option already mentioned, you can programmatically set the NLTK data in a Python script by adding the argument to the download() function.

See the text help(nltk.download) , in particular:

 Individual packages can be downloaded by calling the ``download()`` function with a single argument, giving the package identifier for the package that should be downloaded: >>> download('treebank') # doctest: +SKIP [nltk_data] Downloading package 'treebank'... [nltk_data] Unzipping corpora/treebank.zip.

I can confirm that this works to load one package at a time or when passing list or tuple .

 >>> import nltk >>> nltk.download('wordnet') [nltk_data] Downloading package 'wordnet' to [nltk_data] C:\Users\_my-username_\AppData\Roaming\nltk_data... [nltk_data] Unzipping corpora\wordnet.zip. True

You can also try downloading the downloaded package without any problems:

 >>> nltk.download('wordnet') [nltk_data] Downloading package 'wordnet' to [nltk_data] C:\Users\_my-username_\AppData\Roaming\nltk_data... [nltk_data] Package wordnet is already up-to-date! True

A function also appears that returns a boolean value that you can use to find out if the download succeeded:

 >>> nltk.download('not-a-real-name') [nltk_data] Error loading not-a-real-name: Package 'not-a-real-name' [nltk_data] not found in index False

+16

Wesley Baugh Feb 15 '13 at 0:37

source share

To install all NLTK cases and models:

 python -m nltk.downloader popular

Alternatively, on Linux you can use:

 sudo python -m nltk.downloader -d /usr/local/share/nltk_data popular

You can also view enclosures and models through the command line:

 mlee@server:/scratch/jjylee/tests$ sudo python -m nltk.downloader [sudo] password for jjylee: NLTK Downloader --------------------------------------------------------------------------- d) Download l) List u) Update c) Config h) Help q) Quit --------------------------------------------------------------------------- Downloader> d Download which package (l=list; x=cancel)? Identifier> l Packages: [ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian) [ ] basque_grammars..... Grammars for Basque [ ] bllip_wsj_no_aux.... BLLIP Parser: WSJ Model [ ] book_grammars....... Grammars from NLTK Book [ ] cess_esp............ CESS-ESP Treebank [ ] chat80.............. Chat-80 Data Files [ ] city_database....... City Database [ ] cmudict............. The Carnegie Mellon Pronouncing Dictionary (0.6) [ ] comparative_sentences Comparative Sentence Dataset [ ] comtrans............ ComTrans Corpus Sample [ ] conll2000........... CONLL 2000 Chunking Corpus [ ] conll2002........... CONLL 2002 Named Entity Recognition Corpus [ ] conll2007........... Dependency Treebanks from CoNLL 2007 (Catalan and Basque Subset) [ ] crubadan............ Crubadan Corpus [ ] dependency_treebank. Dependency Parsed Treebank [ ] europarl_raw........ Sample European Parliament Proceedings Parallel Corpus [ ] floresta............ Portuguese Treebank [ ] framenet_v15........ FrameNet 1.5 Hit Enter to continue: [ ] framenet_v17........ FrameNet 1.7 [ ] gazetteers.......... Gazeteer Lists [ ] genesis............. Genesis Corpus [ ] gutenberg........... Project Gutenberg Selections [ ] hmm_treebank_pos_tagger Treebank Part of Speech Tagger (HMM) [ ] ieer................ NIST IE-ER DATA SAMPLE [ ] inaugural........... C-Span Inaugural Address Corpus [ ] indian.............. Indian Language POS-Tagged Corpus [ ] jeita............... JEITA Public Morphologically Tagged Corpus (in ChaSen format) [ ] kimmo............... PC-KIMMO Data Files [ ] knbc................ KNB Corpus (Annotated blog corpus) [ ] large_grammars...... Large context-free and feature-based grammars for parser comparison [ ] lin_thesaurus....... Lin Dependency Thesaurus [ ] mac_morpho.......... MAC-MORPHO: Brazilian Portuguese news text with part-of-speech tags [ ] machado............. Machado de Assis -- Obra Completa [ ] masc_tagged......... MASC Tagged Corpus [ ] maxent_ne_chunker... ACE Named Entity Chunker (Maximum entropy) [ ] moses_sample........ Moses Sample Models Hit Enter to continue: x Download which package (l=list; x=cancel)? Identifier> conll2002 Downloading package conll2002 to /afs/mit.edu/u/m/mlee/nltk_data... Unzipping corpora/conll2002.zip. --------------------------------------------------------------------------- d) Download l) List u) Update c) Config h) Help q) Quit --------------------------------------------------------------------------- Downloader>

+16

Franck Dernoncourt Dec 09 '15 at 23:57

source share

I managed to install cases and models inside a custom directory using the following code:

 import nltk nltk.download(info_or_id="popular", download_dir="/path/to/dir") nltk.data.path.append("/path/to/dir")

this will install the “ all ” bodies / models inside /path/to/dir and tell NLTK where to look for it ( data.path.append ).

You cannot “freeze” the data in the requirements file, but you can add this code to your __init__ , in addition to the code, to check if the files are already there.

+2

Luis Alberto Santana Jan 16 '17 at 16:58

source share

burgersmoke · Accepted Answer · 2011-05-18 23:26

The NLTK website lists the command line interface for downloading packages and collections at the bottom of this page:

http://www.nltk.org/data

Using the command line depends on which version of Python you are using, but on my installation of Python2.6 I noticed that I was missing the spanish_grammar model and it worked fine:

 python -m nltk.downloader spanish_grammars

You mention the list of requirements for the project body and model, and although I’m not sure if it’s possible to do this automatically, I decided that I would at least share it.

Install software enclosures or NLTK models, i.e. Without a graphical user interface loader?

More articles: