This previous question dealt with how to import modules such as nltk for streaming data.
The steps described are:
zip -r nltkandyaml.zip nltk yaml mv ntlkandyaml.zip /path/to/where/your/mapper/will/be/nltkandyaml.mod
Now you can import the nltk module for use in your Python script: import zipimport
importer = zipimport.zipimporter('nltkandyaml.mod') yaml = importer.load_module('yaml') nltk = importer.load_module('nltk')
I have a job that I want to run on Amazon EMR , and I'm not sure where to put the archived files. Do I need to create a loading script under formatting options, or should I put tar.gz in S3 and then in additional arguments? I am new to this and would appreciate an answer that could get me through this process would be greatly appreciated.
source share