How to get attribute value using BeautifulSoup and Python?

I fail to get the attribute value using BeautifulSoup and Python. Here's how XML is structured:

...
</total>
<tag>
    <stat fail="0" pass="1">TR=111111 Sandbox=3000613</stat>
    <stat fail="0" pass="1">TR=121212 Sandbox=3000618</stat>
    ...
    <stat fail="0" pass="1">TR=999999 Sandbox=3000617</stat>
</tag>
<suite>
...

What I'm trying to get is value pass, but for the life of me I just don’t understand how to do it. I checked BeautifulSoup and it seems that I should use something like stat['pass'], but this does not seem to work.

Here is my code:

with open('../results/output.xml') as raw_resuls:
results = soup(raw_resuls, 'lxml')
for stat in results.find_all('tag'):
            print stat['pass']

If I do results.stat['pass'], it returns the value located in another tag, the path up in the XML block.

If I print a variable stat, I get the following:

<stat fail="0" pass="1">TR=787878 Sandbox=3000614</stat>
...
<stat fail="0" pass="1">TR=888888 Sandbox=3000610</stat>

This seems to be normal.

I am sure that I missed something or did something wrong. Where should I look? Am I taking the wrong approach?

!

+4
3

, :

from bs4 import BeautifulSoup

with open('test.xml') as raw_resuls:
    results = BeautifulSoup(raw_resuls, 'lxml')

for element in results.find_all("tag"):
    for stat in element.find_all("stat"):
        print(stat['pass'])

, pass stat, , .

, stat. .

XML

<tag>
    <stat fail="0" pass="1">TR=111111 Sandbox=3000613</stat>
    <stat fail="0" pass="1">TR=121212 Sandbox=3000618</stat>
    <stat fail="0" pass="1">TR=999999 Sandbox=3000617</stat>
</tag>

script

1
1
1

(. ), , BeautifulSoup, , . , , , . , , , Python Soup, , , , .

from bs4 import BeautifulSoup

# Parses a string of form 'TR=abc123 Sandbox=abc123' and stores it in a dictionary with the following
# structure: {'TR': abc123, 'Sandbox': abc123}. Returns this dictionary. 
def parseTestID(testid):
    dict = {'TR': testid.split(" ")[0].split("=")[1], 'Sandbox': testid.split(" ")[1].split("=")[1]}
    return dict

# Parses the XML content of 'rawdata' and stores pass value, TR-ID and Sandbox-ID in a dictionary of the 
# following form: {'Pass': pasvalue, TR': TR-ID, 'Sandbox': Sandbox-ID}. This dictionary is appended to
# a list that is returned.
def getTestState(rawdata):
    # initialize parser
    soup = BeautifulSoup(rawdata,'lxml')
    parsedData= []

    # parse for tags
    for tag in soup.find_all("tag"):
        # parse tags for stat
        for stat in tag.find_all("stat"):
            # store everthing in a dictionary
            dict = {'Pass': stat['pass'], 'TR': parseTestID(stat.string)['TR'], 'Sandbox': parseTestID(stat.string)['Sandbox']}
            # append dictionary to list
            parsedData.append(dict)

    # return list
    return parsedData

script , , , (, )

# open file
with open('test.xml') as raw_resuls:
    # get list of parsed data 
    data = getTestState(raw_resuls)

# print parsed data
for element in data:
    print("TR = {0}\tSandbox = {1}\tPass = {2}".format(element['TR'],element['Sandbox'],element['Pass']))

TR = 111111 Sandbox = 3000613   Pass = 1
TR = 121212 Sandbox = 3000618   Pass = 1
TR = 222222 Sandbox = 3000612   Pass = 1
TR = 232323 Sandbox = 3000618   Pass = 1
TR = 333333 Sandbox = 3000605   Pass = 1
TR = 343434 Sandbox = ZZZZZZ    Pass = 1
TR = 444444 Sandbox = 3000604   Pass = 1
TR = 454545 Sandbox = 3000608   Pass = 1
TR = 545454 Sandbox = XXXXXX    Pass = 1
TR = 555555 Sandbox = 3000617   Pass = 1
TR = 565656 Sandbox = 3000615   Pass = 1
TR = 626262 Sandbox = 3000602   Pass = 1
TR = 666666 Sandbox = 3000616   Pass = 1
TR = 676767 Sandbox = 3000599   Pass = 1
TR = 737373 Sandbox = 3000603   Pass = 1
TR = 777777 Sandbox = 3000611   Pass = 1
TR = 787878 Sandbox = 3000614   Pass = 1
TR = 828282 Sandbox = 3000600   Pass = 1
TR = 888888 Sandbox = 3000610   Pass = 1
TR = 999999 Sandbox = 3000617   Pass = 1

:

XML XML, soup.find("tag"), soup.find_all("tag"), . , .

, find() find_all(), find_all().

, string . , tag = <tag>I love Soup!</tag> tag.string = "I love Soup!".

, . , tag = <tag color=red>I love Soup!</tag> tag['color']="red".

"TR=abc123 Sandbox=abc123" Python. : Python?

+7

, find_all('tag') html- tag:

>>> results.find_all('tag')                                                                      
[<tag>                                                                                     
<stat fail="0" pass="1">TR=111111 Sandbox=3000613</stat>                                   
<stat fail="0" pass="1">TR=121212 Sandbox=3000618</stat>                                   
<stat fail="0" pass="1">TR=999999 Sandbox=3000617</stat>                                   
</tag>]

, stat, results.find_all('stat'):

>>> stat_blocks = results.find_all('stat')                                                                      
[<stat fail="0" pass="1">TR=111111 Sandbox=3000613</stat>, <stat fail="0" pass="1">TR=121212 Sandbox=3000618</stat>, <stat fail="0" pass="1">TR=999999 Sandbox=3000617</stat>]

, "pass" :

>>> passes = [s['pass'] if s is not None else None for s in stat_blocks]                   
>>> passes                                                                                   
['1', '1', '1']  

:

>>> for s in stat_blocks:                                                                  
...     print(s['pass'])                                                                   
...                                                                                        
1                                                                                          
1                                                                                          
1     

python , , . test , , , .

+1

Your tag may have multiple stat entries. Do you have only one tag entry?

If so, first find the "tag", then go through the "stat" entries that are contained in the "tag" entry. Sort of:

for stat in soup.find("tag").find_all("stat"):
    print(stat["pass"])
0
source

Source: https://habr.com/ru/post/1673927/


All Articles