Lambda does not support NLTK file size

Question

Lambda does not support NLTK file size

I am writing a python script that parses a piece of text and returns data in JSON format. I use NLTK for data analysis. Basically, this is my thread:

Create an endpoint (API gateway) -> calls my lambda function -> returns the JSON of the required data.

I wrote my script deployed to lambda, but I ran into this problem:

Resource \ u001b [93mpunkt \ u001b [0m not found. Please use NLTK Downloader to get the resource:
\ u001b [31m →> import nltk nltk.download ('punkt') \ u001b [0m
Search in: - '/ home / sbx_user1058 / nltk_data' - '/ usr / share / nltk_data' - '/ usr / local / share / nltk_data' - '/ usr / lib / nltk_data' - '/ usr / local / lib / nltk_data '-' / var / lang / nltk_data '-' / var / lang / lib / nltk_data '

Even after loading "punkt", my script still gave me the same error. I tried the solutions here:

Python optimization script extract and process large data files

but the problem is that the nltk_data folder is huge and the lambda has a size limit.

How can I fix this problem? Or where else can I use my script and still integrate the API call?

I am using serverless to deploy python scripts.

+5

json python lambda amazon-web-services

noor Oct 20 '17 at 9:36

source share

1 answer

0bserver07 · Accepted Answer · 2017-10-24T21:41:42+0000

There are two things you can do:

The errors seem to be that the path is not being defined properly, maybe set it as env variable?

sys.path.append(os.path.abspath('/var/task/nltk_data/')

or in this way

After running nltk.download() copy it to the root folder of your AWMS lambda application. (Name the dir, which will be called "nltk_data".)
In the lambda function toolbar (in the AWS console) add NLTK_DATA = ./nltk_data to the environment variable-var-var.

Reduce the size of nltk downloads since you don’t need all of them.
- Delete all zip files, save only the desired section, for example: stop words. This can be moved to: save nltk_data/corpora/stopwords and delete the rest.
- Or If you need tokenizers, save them to nltk_data/tokenizers/punkt . Most of them can be downloaded separately: python -m nltk.downloader punkt , then copy the files.

Lambda does not support NLTK file size

More articles: