Python 2.7 on Google App Engine cannot use lxml.etree

I am trying to use html5lib with lxml on python 2.7 in the google engine. But when I run the following code, it leads to an error: "NameError: global name" etree "is not defined." Is it not possible to use lxml.etree in the google engine? or am I missing something?

app.yaml

application: testsite version: 1 runtime: python27 api_version: 1 threadsafe: false handlers: - url: /.* script: index.py libraries: - name: lxml version: "2.3" # I thought this would allow me to use lxml.etree 

index.py

 from testhandler import TestHandler application = webapp.WSGIApplication([('/', TestHandler)], debug=True) 

testhandler.py

 import urllib2 import html5lib from html5lib import treebuilders try: from lxml import etree print("running with lxml.etree") except ImportError: try: # Python 2.5 import xml.etree.cElementTree as etree print("running with cElementTree on Python 2.5+") except ImportError: try: # Python 2.5 import xml.etree.ElementTree as etree print("running with ElementTree on Python 2.5+") except ImportError: try: # normal cElementTree install import cElementTree as etree print("running with cElementTree") except ImportError: try: # normal ElementTree install import elementtree.ElementTree as etree print("running with ElementTree") except ImportError: print("Failed to import ElementTree from any known place") from google.appengine.ext import webapp class TestHandler(webapp.RequestHandler): def get(self): f = urllib2.urlopen("http://www.yahoo.com/").read() doc = html5lib.parse(f, treebuilder='lxml') elems = doc.xpath("//*[local-name() = 'a']") self.response.out.write(len(elems)) 

Error

 running with cElementTree on Python 2.5+ Status: 500 Internal Server Error Content-Type: text/html; charset=utf-8 Cache-Control: no-cache Expires: Fri, 01 Jan 1990 00:00:00 GMT Content-Length: 769 <pre>Traceback (most recent call last): File &quot;/usr/local/bin/google_appengine/google/appengine/ext/webapp/_webapp25.py&quot;, line 701, in __call__ handler.get(*groups) File &quot;/home/test/testhandler.py&quot;, line 38, in get parser = html5lib.HTMLParser(tree= treebuilders.getTreeBuilder('lxml')) File &quot;/home/test/html5lib/html5parser.py&quot;, line 68, in __init__ self.tree = tree(namespaceHTMLElements) File &quot;/home/test/html5lib/treebuilders/etree_lxml.py&quot;, line 176, in __init__ builder = etree_builders.getETreeModule(etree, fullTree=fullTree) NameError: global name 'etree' is not defined </pre> 

ADD

No, I tried several ways to create a doc object, but no luck. One way, I tried to import from lxml.html import document_fromstring , and this gives me this error.

 Traceback (most recent call last): File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4143, in _HandleRequest self._Dispatch(dispatcher, self.rfile, outfile, env_dict) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4049, in _Dispatch base_env_dict=env_dict) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 616, in Dispatch base_env_dict=base_env_dict) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3120, in Dispatch self._module_dict) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3024, in ExecuteCGI reset_modules = exec_script(handler_path, cgi_path, hook) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2887, in ExecuteOrImportScript exec module_code in script_module.__dict__ File "/home/yoo/eclipse_workspace/website_checker/src/index.py", line 5, in <module> from handlers.updatecheck import UpdateCheckHandler File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate return func(self, *args, **kwargs) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module return self.FindAndLoadModule(submodule, fullname, search_path) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate return func(self, *args, **kwargs) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule description) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate return func(self, *args, **kwargs) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted description) File "/home/test/updatecheck.py", line 4, in <module> from lxml.html import document_fromstring File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate return func(self, *args, **kwargs) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module return self.FindAndLoadModule(submodule, fullname, search_path) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate return func(self, *args, **kwargs) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule description) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate return func(self, *args, **kwargs) File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted description) File "/usr/lib/python2.7/dist-packages/lxml/html/__init__.py", line 12, in <module> from lxml import etree ImportError: cannot import name etree 

According to the error, it seems that the application engine does not allow me to load the etree module for some reason. I wanted to use xpath with lxml, but I cannot spend a lot of time to understand what is going on here and there is not enough knowledge about python. So I would try to find a way with the version of "simpletree".

 f = urllib2.urlopen("http://www.yahoo.com/").read() p = html5lib.HTMLParser() doc = p.parse(f) # do something with doc.childNodes self.response.out.write(len(doc.childNodes)) 

Not a very good way, but at least it worked when I tested the live Google engine.

+6
source share
4 answers

Is lxml installed locally? I had the same error: import failed. You can download lxml here: http://pypi.python.org/pypi/lxml/

lxml works with GAE, and that's great. But this is a real lack of any documentation or examples about it right now.

+1
source

On Windows, I had this problem and this is because the python27 distribution does not include lxml. You can use the easy_install script, but you have to compile the source that gave me the problems.

Using this post, I found on Google forums:

https://groups.google.com/forum/?fromgroups=#!topic/comp.lang.python/Q8YeOIbn5Ds

However, if you want to relieve yourself of pain by trying to force it to build from the source code, just install a pre-compiled binary, for example, one that is available: http://www.lfd.uci.edu/~gohlke/pythonlibs/# lxml

Just download the executable from the above website and run * .exe and it will close all the necessary codes.

+1
source

Try

import lxml

at the top of your test arm

0
source

install using pip: pip install lxml

0
source

Source: https://habr.com/ru/post/901476/


All Articles