Python on Appengine using BeautifulSoup ImportError: no module named bs4

EDIT2: SOLVED! See the answer below regarding proper import. from lib.bs4 import BeautifulSoup instead of from bs4 import BeautifulSoup

EDIT: Putting bs4 at the root of the project seems to solve the problem; however, this is not an ideal structure. So, I leave this question active in order to try to find a more reliable solution.

In the past, a question about this question has been asked, but the solutions there do not seem to work. I'm not sure if this is due to changes with BeautifulSoup or with Appengine, to be honest.

See: Python 2.7: How to use BeautifulSoup in the Google App Engine? How do I enable third-party Python libraries in the Google App Engine? and Which version of BeautifulSoup works with GAE (python 2.5)?

The solution proposed by Lipis seems to add a third-party library to the libs folder in the project root directory, and then add the following to the main application

 import sys sys.path.insert(0, 'libs') 

Currently my structure is as follows:

 ntj-test β”œβ”€β”€ lib β”‚ └── bs4 β”œβ”€β”€ templates β”œβ”€β”€ main.py β”œβ”€β”€ get_data.py └── app.yaml 

Here is my app.yaml:

 application: ntj-test version: 1 runtime: python27 api_version: 1 threadsafe: yes handlers: - url: /favicon\.ico static_files: favicon.ico upload: favicon\.ico - url: .* script: main.app libraries: - name: webapp2 version: latest - name: jinja2 version: latest 

Here is my main.py:

 import webapp2 import jinja2 import get_data import sys sys.path.insert(0, 'lib') JINJA_ENVIRONMENT = jinja2.Environment( loader=jinja2.FileSystemLoader('templates'), extensions=['jinja2.ext.autoescape'], autoescape=True, ) class MainHandler(webapp2.RequestHandler): def get(self): teamName = get_data.all_coach_data()[1] coachName = get_data.all_coach_data()[2] teamKey = get_data.all_coach_data()[0] values = { 'coachName': coachName, 'teamName': teamName, 'teamKey': teamKey, } template = JINJA_ENVIRONMENT.get_template('index.html') self.response.write(template.render(values)) app = webapp2.WSGIApplication([ ('/', MainHandler) ], debug=True) 

get_data.py returns the correct data to my variables to populate the values ​​that I checked in the debugger.

The problem occurs when running main.py in my development environment (so far I have not downloaded gcloud). In any case, regardless of the excellent tricks that I found on the links above or during a Google search, the terminal always returns:

 Import Error: No module named bs4 

In one of the SO links at the top, the commentator says: "GAE only supports Pure Python modules. Bs4 is not clean because some parts were written in C." I'm not sure if this is true or not, and I'm not sure how to check it. I do not have enough reputation to comment to find out. :(

I went through the bs4 docs on the Crummy website, I read all the questions and answers related to it, and I tried to pick up the hints from the Appengine documentation. However, I could not find a solution that is not related to using an outdated version of BeautifulSoup, which does not have the functions I need.

I am starting to program and use StackOverflow, so if I left some important information or did not use good practice with a question, please let me know. I will edit and add additional information if necessary.

Thanks!

edits: I was not sure the get_data code would be redundant, but here it is:

 from bs4 import BeautifulSoup import urllib2, re teamKeys = { 'ATL': 'Atlanta Falcons', 'HOU': 'Houston Texans', } def get_all_coaches(): for key in teamKeys: page = urllib2.urlopen("http://www.nfl.com/teams/coaches?coaType=head&team=" + key) soup = BeautifulSoup(page) return(head_coach(soup)) def head_coach(soup): head = soup.select('.coachprofiletext p')[0].text position, name = re.split(': ', head) return name def export_coach_data(): testList = [] for key in teamKeys: page = urllib2.urlopen("http://www.nfl.com/teams/coaches?coaType=head&team=" + key) soup = BeautifulSoup(page) teamKey = key teamName = teamKeys[key] headCoach = head_coach(soup) t = [ teamKey, teamName, str(headCoach), ] testList.append(t) return(testList) def all_coach_data(): results = data.export_coach_data() ATL = results[0] HOU = results[1] return ATL 

I would like to point out that this is probably overwhelmed by poor execution (I only do it seriously for a few months in my free time), but it returns the correct values ​​to my main variables.

Here is the application log:

 2014-11-05 15:36:53 Running command: "['C:\\Python27\\pythonw.exe', 'C:\\Program Files\\Google\\Cloud SDK\\google-cloud-sdk\\platform\\google_appengine\\dev_appserver.py', '--skip_sdk_update_check=yes', '--port=11080', '--admin_port=8003', u'G:\\projects\\coaches']" INFO 2014-11-05 15:37:00,119 devappserver2.py:725] Skipping SDK update check. WARNING 2014-11-05 15:37:00,157 api_server.py:383] Could not initialize images API; you are likely missing the Python "PIL" module. INFO 2014-11-05 15:37:00,190 api_server.py:171] Starting API server at: http://localhost:19713 INFO 2014-11-05 15:37:00,210 dispatcher.py:183] Starting module "default" running at: http://localhost:11080 INFO 2014-11-05 15:37:00,216 admin_server.py:117] Starting admin server at: http://localhost:8003 ERROR 2014-11-05 20:37:48,726 wsgi.py:262] Traceback (most recent call last): File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\runtime\wsgi.py", line 239, in Handle handler = _config_handle.add_wsgi_middleware(self._LoadHandler()) File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\runtime\wsgi.py", line 298, in _LoadHandler handler, path, err = LoadObject(self._handler) File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\runtime\wsgi.py", line 84, in LoadObject obj = __import__(path[0]) File "G:\projects\coaches\main.py", line 3, in <module> import get_data File "G:\projects\coaches\get_data.py", line 1, in <module> from bs4 import BeautifulSoup ImportError: No module named bs4 INFO 2014-11-05 15:37:48,762 module.py:652] default: "GET / HTTP/1.1" 500 - 
+5
source share
4 answers

EDIT: It has been pointed out that this is a little hack. If so, how can this decision be changed so as not to require renaming the modules inside BS4?

A couple of users at http://www.reddit.com/r/learnpython helped me solve this problem.

When decomposing the solution proposed by Lipis, we added the following to main.py:

 import os, sys rootdir = os.path.dirname(os.path.abspath(__file__)) lib = os.path.join(rootdir, 'lib') sys.path.append(lib) 

Then, and here, as no one ever mentioned in any of the other SO answers, I added "lib.bs4" to all my import statements, as such:

 from lib.bs4 import BeautifulSoup 

But not only that, there were links to bs4 in the bs4 library itself, so I searched and replaced all those that were lib.bs4.<something> .

Now, finally, my application starts and the structure is organized. All credit relates to / u / invalidusemame and / u / prohulaelk .

Hope this post helps someone else get stuck in a similar situation. It might be obvious that importing would require adding an addition to the import statement, but this did not immediately become obvious from all the answers.

Thanks to everyone who helped troubleshoot!

+3
source

I believe your problem is a typo of main.py:

 sys.path.insert(0, 'lib') 

Your libs directory, not lib .

+2
source

Alternatively, you can create a file called appengine_config.py to load third-party libraries. This file will load when starting a new instance.

 import sys import os.path # add `lib` subdirectory to `sys.path`, so our `main` module can load third-party libraries. sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'lib')) 
+1
source

OK, a couple more fixes. Import lxml in app.yaml, in libraries:

 libraries: - name: lxml version: "2.3" <<- do NOT use "latest" 

Make sure you have the __init__.py file in lib. I added code to make it self-binding:

 import os import sys libs_directory = os.path.dirname(os.path.abspath(__file__)) if libs_directory not in sys.path: sys.path.insert(0, libs_directory) root_directory = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) if root_directory not in sys.path: sys.path.insert(0, root_directory) 
0
source

Source: https://habr.com/ru/post/1206284/


All Articles