Using Python to clear Twitter nested sections and intervals?

Question

Using Python to clear Twitter nested sections and intervals?

I'm trying to clear my favorite and retweets from Twitter search results.

After running Python below, I get an empty list []. I don’t use the Twitter API because it doesn’t look at hashtag tweets that far.

The code I use is:

from bs4 import BeautifulSoup
import requests

url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
r  = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "lxml")
all_likes = soup.find_all('span', class_='ProfileTweet-actionCountForPresentation')
print(all_likes)

I can successfully save html to a file using this code. When searching for text, there is no information about a large amount of information, for example, the names of the classes I'm looking for ...

So (part of) the problem, apparently, is the exact access to the source code.

 filename = 'newfile2.txt'
 with open(filename, 'w') as handle:
      handle.writelines(str(data))

This screenshot shows the range I'm trying to clear.

I looked at this question, while others liked it, but I did not quite understand. How to use BeautifulSoup to get deeply nested div values?

+4

python html web-scraping twitter beautifulsoup

David Beales 20 . '16 23:49

1

David Moodie · Accepted Answer · 2016-01-21T00:26:22+0000

, GET HTML, #timeline. , , , .

from bs4 import BeautifulSoup
import requests

url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
r = requests.get(url, headers=headers)
data = r.text
soup = BeautifulSoup(data, "lxml")
all_likes = soup.find_all('span', class_='ProfileTweet-actionCountForPresentation')
print(all_likes)

Using Python to clear Twitter nested sections and intervals?

More articles: