Python for CSV splits a row into two columns when I want one

I clear the page with BeautifulSoup, and part of the logic is that sometimes part of the content of the tag <td>may contain <br>.

So sometimes it looks like this:

<td class="xyz">
    text 1
    <br>
    text 2
</td>

and sometimes it looks like this:

<td class="xyz">
    text 1
</td>

I look through this and add output_row to the list, which I end up adding to the list of lists. I see the previous format or the last, I want the text to be in one cell.

I found a way to determine if I see the tag <br>because td.string appears as nothing, and I also know that in text 2 there is always “ABC”. So:

    elif td.string == None:
        if 'ABC' in td.contents[2]:
            new_string = td.contents[0] + ' ' + td.contents[2]
            output_row.append(new_string)
            print(new_string)
        else:    
            #this is for another situation and it works fine

Jupyter Notebook, " 1 2" . CSV, . , td.string ( <br>), 1 , <br>, .

, ( ), , .

:

with open('C:/location/file.csv', 'w',newline='') as csv_file:
    writer=csv.writer(csv_file,delimiter=',')
    #writer.writerow(headers)
    for row in output_rows:
        writer.writerow(row)

csv_file.close
+4
1

get_text() "strip" "separator":

from bs4 import BeautifulSoup

dat="""
<table>
    <tr>
        <td class="xyz">
            text 1
            <br>
            text 2
        </td>

        <td class="xyz">
            text 1
        </td>
    </tr>
</table>
"""

soup = BeautifulSoup(dat, 'html.parser')
for td in soup.select("table > tr > td.xyz"):
    print(td.get_text(separator=" ", strip=True))

text 1 text 2
text 1
+2
source

Source: https://habr.com/ru/post/1656973/


All Articles