I am creating a csv file that collects several articles cleared from the website. Articles are obtained by crossing text from the URLs contained in another file. I would like to make a CSV file as a list in which each article corresponds to a list item.
The code I used now:
import csv import requests from bf4 import BeautifulSoup with open('Training_news.csv', newline='') as file: reader= csv.reader (file, delimiter=' ') for row in reader: for url in row: r=requests.get(url) r.encoding = "ISO-8859-1" soup = BeautifulSoup(r.content, 'lxml') text = soup.find_all(("p",{"class": "story-body-text story-content"})) with open('Training_News_5.csv', 'w', newline='') as csvfile: spamwriter = csv.writer(csvfile, delimiter=' ') spamwriter.writerow(text)
However, the generated CSV file gives me the following:
<p>Advertisement</p>, <p class="byline-dateline"><span class="byline" itemprop....... <p class="feedback-message">We're interested in your feedback on this page. <strong>Tell us what you think.</strong></p>, <p class="user-action"><a href="http://www.nytimes.com/">Go to Home Page Β»</a></p>
Saved articles are only three out of 50, and they do not allow me to select each article separately.
source share