I am writing a python script that parses a piece of text and returns data in JSON format. I use NLTK for data analysis. Basically, this is my thread:
Create an endpoint (API gateway) -> calls my lambda function -> returns the JSON of the required data.
I wrote my script deployed to lambda, but I ran into this problem:
Resource \ u001b [93mpunkt \ u001b [0m not found. Please use NLTK Downloader to get the resource:
\ u001b [31m β> import nltk nltk.download ('punkt') \ u001b [0m
Search in: - '/ home / sbx_user1058 / nltk_data' - '/ usr / share / nltk_data' - '/ usr / local / share / nltk_data' - '/ usr / lib / nltk_data' - '/ usr / local / lib / nltk_data '-' / var / lang / nltk_data '-' / var / lang / lib / nltk_data '
Even after loading "punkt", my script still gave me the same error. I tried the solutions here:
Python optimization script extract and process large data files
but the problem is that the nltk_data folder is huge and the lambda has a size limit.
How can I fix this problem? Or where else can I use my script and still integrate the API call?
I am using serverless to deploy python scripts.
source share