Problems executing beautifulsoup4 in Apache / mod_python / Django

I am trying to make an HTML page on the fly using BeautifulSoup version 4 in Django (using Apache2 with mod_python). However, as soon as I pass any HTML line to the BeautifulSoup constructor (see the code below), the browser just freezes while waiting for the web server. I tried the equivalent code in the CLI and it works like a charm. Therefore, I assume this is due to the BeautifulSoups environment, in this case Django + Apache + mod_python.

import bs4 import django.shortcuts as shortcuts def test(request): s = bs4.BeautifulSoup('<b>asdf</b>') return shortcuts.render_to_response('test.html', {}) 

I installed BeautifulSoup using pip, pip install beautifulsoup4 . I tried installing BeautifulSoup3 using standard Debian packages, apt-get install python-beautifulsoup , and then the following equivalent code works fine (both from the browser and from the CLI).

 from BeautifulSoup import BeautifulSoup import django.shortcuts as shortcuts def test(request): s = BeautifulSoup('<b>asdf</b>') return shortcuts.render_to_response('test.html', {}) 

I looked at the Apache access and error logs and they do not show any information about what is happening with the request, which has stalled. I also checked / var / log / syslog and / var / log / messages, but received no additional information.

Here is the Apache configuration I used:

 <VirtualHost *:80> DocumentRoot /home/nandersson/src <Directory /home/nandersson/src> SetHandler python-program PythonHandler django.core.handlers.modpython SetEnv DJANGO_SETTINGS_MODULE app.settings PythonOption django.root /home/nandersson/src PythonDebug On PythonPath "['/home/nandersson/src'] + sys.path" </Directory> <Location "/media/"> SetHandler None </Location> <Location "/app/poc/"> SetHandler None </Location> </VirtualHost> 

I'm not sure how to debug this further, not sure if this is a mistake or not. Has anyone got any ideas on how to get to the bottom or face similar issues?

+4
source share
4 answers

I am using Apache2 with mod_python. I solved the freeze problem by explicitly passing "html.parser" to get the soup.

 s = bs4.BeautifulSoup('<b>asdf</b>', 'html.parser') 
+15
source

This could be the interaction between Cython and mod_wsgi, described here here , and studied in the context of Beautiful Soup here . Here are earlier questions similar to yours.

+2
source

Try

 doc = BeautifulSoup(html, 'html5lib') 

In my cases, β€œhtml.parser” often results in HTMLParseError https://groups.google.com/forum/?fromgroups=#!topic/beautifulsoup/x_L9FpDdqkc

+2
source

I ran into the same problem about a year ago, just tried a similar setup (django + mod_wsgi + apache2) with the new version of BeautifulSoup 4.3.2, and it seems that the problem is fixed.

+1
source

Source: https://habr.com/ru/post/1436565/


All Articles