TypeError: object "NoneType" not called by Python with BeautifulSoup XML
I have the following XML file:
<user-login-permission>true</user-login-permission> <total-matched-record-number>15000</total-matched-record-number> <total-returned-record-number>15000</total-returned-record-number> <active-user-records> <active-user-record> <active-user-name>username</active-user-name> <authentication-realm>realm</authentication-realm> <user-roles>Role</user-roles> <user-sign-in-time>date</user-sign-in-time> <events>0</events> <agent-type>text</agent-type> <login-node>node</login-node> </active-user-record> There are many entries. I am trying to get the values ββfrom tags and save them in another text file using the following code:
soup = BeautifulSoup(open("path/to/xmlfile"), features="xml") with open('path/to/outputfile', 'a') as f: for i in range(len(soup.findall('active-user-name'))): f.write ('%s\t%s\t%s\t%s\n' % (soup.findall('active-user-name')[i].text, soup.findall('authentication-realm')[i].text, soup.findall('user-roles')[i].text, soup.findall('login-node')[i].text)) I get a TypeError: error object "NoneType" not called by Python with BeautifulSoup XML for the string: for I'm in the range (len (soup.findall ("active-user-name")):
Any idea what could be causing this?
Thanks!
There are a number of problems that need to be solved with this: firstly, the XML file you provided is invalid XML - the root element is required.
Try something like XML:
<root> <user-login-permission>true</user-login-permission> <total-matched-record-number>15000</total-matched-record-number> <total-returned-record-number>15000</total-returned-record-number> <active-user-records> <active-user-record> <active-user-name>username</active-user-name> <authentication-realm>realm</authentication-realm> <user-roles>Role</user-roles> <user-sign-in-time>date</user-sign-in-time> <events>0</events> <agent-type>text</agent-type> <login-node>node</login-node> </active-user-record> </active-user-records> </root> Now on to python. Firstly, there is no findall method, it is either findall or find_all . findall and find_all equivalent as described here
Further, I would suggest changing the code so that you do not use the find_all method quite often - using find instead will improve efficiency, especially for large XML files. In addition, the code below is easier to read and debug:
from bs4 import BeautifulSoup xml_file = open('./path_to_file.xml', 'r') soup = BeautifulSoup(xml_file, "xml") with open('./path_to_output_f.txt', 'a') as f: for s in soup.findAll('active-user-record'): username = s.find('active-user-name').text auth = s.find('authentication-realm').text role = s.find('user-roles').text node = s.find('login-node').text f.write("{}\t{}\t{}\t{}\n".format(username, auth, role, node)) Hope this helps. Let me know if you need more help!
The fix for my version of this problem is to force the BeautifulSoup instance to a type string. You do the following: https://groups.google.com/forum/#!topic/comp.lang.python/ymrea29fMFI
you are using the following pythonic: From the python manual
str ([object])
Returns a string containing a beautifully printed representation of the object. For strings, this returns a string. The difference with the view (object) is that str (object) does not always try to return a string acceptable to eval (); its purpose is to return the print string. If no argument is given, returns an empty string,