Python child process automatically crashes when issuing HTTP request

I am having a problem combining multiprocessing, requests (or urllib2) and nltk. Here is a very simple code:

>>> from multiprocessing import Process >>> import requests >>> from pprint import pprint >>> Process(target=lambda: pprint( requests.get('https://api.github.com'))).start() >>> <Response [200]> # this is the response displayed by the call to `pprint`. 

A bit more about what this piece of code does:

  • Import multiple required modules
  • Starting a child process
  • Issue HTTP GET request on api.github.com from child process
  • Display result

This works great. The problem occurs when importing nltk:

 >>> import nltk >>> Process(target=lambda: pprint( requests.get('https://api.github.com'))).start() >>> # nothing happens! 

After importing NLTK, the requests actually automatically reset the thread (if you try to use the named function instead of the lambda function, adding several print statements before and after the call, you will see that execution stops right after calling requests.get ) Does anyone know that at NLTC can explain this behavior and how to overcome the problem?

Here is the version I'm using:

 $> python --version Python 2.7.5 $> pip freeze | grep nltk nltk==2.0.5 $> pip freeze | grep requests requests==2.2.1 

I am running Mac OS X v. 10.9.5.

Thanks!

+6
source share
3 answers

Updating your python and python libraries should fix the problem:

 alvas@ubi :~$ pip freeze | grep nltk nltk==3.0.3 alvas@ubi :~$ pip freeze | grep requests requests==2.7.0 alvas@ubi :~$ python --version Python 2.7.6 alvas@ubi :~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.2 LTS Release: 14.04 Codename: trusty 

From the code:

 from multiprocessing import Process import nltk import time def child_fn(): print "Fetch URL" import urllib2 print urllib2.urlopen("https://www.google.com").read()[:100] print "Done" while True: child_process = Process(target=child_fn) child_process.start() child_process.join() print "Child process returned" time.sleep(1) 

[output]:

 Fetch URL <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content Done Child process returned Fetch URL <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content Done Child process returned Fetch URL <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content Done Child process returned 

From the code:

 alvas@ubi :~$ python Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from multiprocessing import Process >>> import requests >>> from pprint import pprint >>> Process(target=lambda: pprint( ... requests.get('https://api.github.com'))).start() >>> <Response [200]> >>> import nltk >>> Process(target=lambda: pprint( ... requests.get('https://api.github.com'))).start() >>> <Response [200]> 

It should also work with python3 :

 alvas@ubi :~$ python3 Python 3.4.0 (default, Jun 19 2015, 14:20:21) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from multiprocessing import Process >>> import requests >>> Process(target=lambda: print(requests.get('https://api.github.com'))).start() >>> >>> <Response [200]> >>> import nltk >>> Process(target=lambda: print(requests.get('https://api.github.com'))).start() >>> <Response [200]> 
+1
source

It seems that using Nltk and Python queries in a child process is rare. Try using Thread instead of Process, I had the same problem with some other library, and the requests and replacing Process with Thread worked for me.

+1
source

This issue still seems unresolved. https://github.com/nltk/nltk/issues/947 I think this is a serious problem (if you do not play with NLTK, do not POC and test models, not real applications) I do NLP pipelines for RQ workers ( http://python-rq.org/ )

 nltk==3.2.1 requests==2.9.1 
0
source

Source: https://habr.com/ru/post/988870/


All Articles