EDIT2: SOLVED! See the answer below regarding proper import. from lib.bs4 import BeautifulSoup instead of from bs4 import BeautifulSoup
EDIT: Putting bs4 at the root of the project seems to solve the problem; however, this is not an ideal structure. So, I leave this question active in order to try to find a more reliable solution.
In the past, a question about this question has been asked, but the solutions there do not seem to work. I'm not sure if this is due to changes with BeautifulSoup or with Appengine, to be honest.
See: Python 2.7: How to use BeautifulSoup in the Google App Engine? How do I enable third-party Python libraries in the Google App Engine? and Which version of BeautifulSoup works with GAE (python 2.5)?
The solution proposed by Lipis seems to add a third-party library to the libs folder in the project root directory, and then add the following to the main application
import sys sys.path.insert(0, 'libs')
Currently my structure is as follows:
ntj-test βββ lib β βββ bs4 βββ templates βββ main.py βββ get_data.py βββ app.yaml
Here is my app.yaml:
application: ntj-test version: 1 runtime: python27 api_version: 1 threadsafe: yes handlers: - url: /favicon\.ico static_files: favicon.ico upload: favicon\.ico - url: .* script: main.app libraries: - name: webapp2 version: latest - name: jinja2 version: latest
Here is my main.py:
import webapp2 import jinja2 import get_data import sys sys.path.insert(0, 'lib') JINJA_ENVIRONMENT = jinja2.Environment( loader=jinja2.FileSystemLoader('templates'), extensions=['jinja2.ext.autoescape'], autoescape=True, ) class MainHandler(webapp2.RequestHandler): def get(self): teamName = get_data.all_coach_data()[1] coachName = get_data.all_coach_data()[2] teamKey = get_data.all_coach_data()[0] values = { 'coachName': coachName, 'teamName': teamName, 'teamKey': teamKey, } template = JINJA_ENVIRONMENT.get_template('index.html') self.response.write(template.render(values)) app = webapp2.WSGIApplication([ ('/', MainHandler) ], debug=True)
get_data.py returns the correct data to my variables to populate the values ββthat I checked in the debugger.
The problem occurs when running main.py in my development environment (so far I have not downloaded gcloud). In any case, regardless of the excellent tricks that I found on the links above or during a Google search, the terminal always returns:
Import Error: No module named bs4
In one of the SO links at the top, the commentator says: "GAE only supports Pure Python modules. Bs4 is not clean because some parts were written in C." I'm not sure if this is true or not, and I'm not sure how to check it. I do not have enough reputation to comment to find out. :(
I went through the bs4 docs on the Crummy website, I read all the questions and answers related to it, and I tried to pick up the hints from the Appengine documentation. However, I could not find a solution that is not related to using an outdated version of BeautifulSoup, which does not have the functions I need.
I am starting to program and use StackOverflow, so if I left some important information or did not use good practice with a question, please let me know. I will edit and add additional information if necessary.
Thanks!
edits: I was not sure the get_data code would be redundant, but here it is:
from bs4 import BeautifulSoup import urllib2, re teamKeys = { 'ATL': 'Atlanta Falcons', 'HOU': 'Houston Texans', } def get_all_coaches(): for key in teamKeys: page = urllib2.urlopen("http://www.nfl.com/teams/coaches?coaType=head&team=" + key) soup = BeautifulSoup(page) return(head_coach(soup)) def head_coach(soup): head = soup.select('.coachprofiletext p')[0].text position, name = re.split(': ', head) return name def export_coach_data(): testList = [] for key in teamKeys: page = urllib2.urlopen("http://www.nfl.com/teams/coaches?coaType=head&team=" + key) soup = BeautifulSoup(page) teamKey = key teamName = teamKeys[key] headCoach = head_coach(soup) t = [ teamKey, teamName, str(headCoach), ] testList.append(t) return(testList) def all_coach_data(): results = data.export_coach_data() ATL = results[0] HOU = results[1] return ATL
I would like to point out that this is probably overwhelmed by poor execution (I only do it seriously for a few months in my free time), but it returns the correct values ββto my main variables.
Here is the application log:
2014-11-05 15:36:53 Running command: "['C:\\Python27\\pythonw.exe', 'C:\\Program Files\\Google\\Cloud SDK\\google-cloud-sdk\\platform\\google_appengine\\dev_appserver.py', '--skip_sdk_update_check=yes', '--port=11080', '--admin_port=8003', u'G:\\projects\\coaches']" INFO 2014-11-05 15:37:00,119 devappserver2.py:725] Skipping SDK update check. WARNING 2014-11-05 15:37:00,157 api_server.py:383] Could not initialize images API; you are likely missing the Python "PIL" module. INFO 2014-11-05 15:37:00,190 api_server.py:171] Starting API server at: http://localhost:19713 INFO 2014-11-05 15:37:00,210 dispatcher.py:183] Starting module "default" running at: http://localhost:11080 INFO 2014-11-05 15:37:00,216 admin_server.py:117] Starting admin server at: http://localhost:8003 ERROR 2014-11-05 20:37:48,726 wsgi.py:262] Traceback (most recent call last): File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\runtime\wsgi.py", line 239, in Handle handler = _config_handle.add_wsgi_middleware(self._LoadHandler()) File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\runtime\wsgi.py", line 298, in _LoadHandler handler, path, err = LoadObject(self._handler) File "C:\Program Files\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\runtime\wsgi.py", line 84, in LoadObject obj = __import__(path[0]) File "G:\projects\coaches\main.py", line 3, in <module> import get_data File "G:\projects\coaches\get_data.py", line 1, in <module> from bs4 import BeautifulSoup ImportError: No module named bs4 INFO 2014-11-05 15:37:48,762 module.py:652] default: "GET / HTTP/1.1" 500 -