Python: getting text from html using beatifulsoup

Question

Python: getting text from html using beatifulsoup

I am trying to extract a ranking text number from this link example link: kaggle user ranking no1 . A sharper image:

I am using the following code:

def get_single_item_data(item_url):
    sourceCode = requests.get(item_url)
    plainText = sourceCode.text
    soup = BeautifulSoup(plainText)
    for item_name in soup.findAll('h4',{'data-bind':"text: rankingText"}):
        print(item_name.string)

item_url = 'https://www.kaggle.com/titericz'   
get_single_item_data(item_url)

Result None. The problem is that it soup.findAll('h4',{'data-bind':"text: rankingText"})outputs:

[<h4 data-bind="text: rankingText"></h4>]

but in the html of the link when checking it looks like this:

<h4 data-bind="text: rankingText">1st</h4>. This can be seen in the image:

Clearly no text. How can I overcome this?

Edit: Having printed the variable soupin the terminal, I see that this value exists:

Therefore, there must be access to soup.

Edit 2: I've tried unsuccessfully to use the most voted answer this question qaru.site/questions/970563 / ... . There may be a solution.

+4

python html html-parsing beautifulsoup kaggle

Mpizos Dimitris 17 . '15 13:40

4

javascript, " ".

, , . wget, , rankText script :

<script type="text/javascript"
profile: {
...
   "ranking": 96,
   "rankingText": "96th",
   "highestRanking": 3,
   "highestRankingText": "3rd",
...

, .

+3

steinar 17 . '15 13:56

, :

def get_single_item_data(item_url):
    sourceCode = requests.get(item_url)
    plainText = sourceCode.text
    #soup = BeautifulSoup(plainText, "html.parser")
    pattern = re.compile("ranking\": [0-9]+")
    name = pattern.search(plainText)
    ranking = name.group().split()[1]
    print(ranking)

item_url = 'https://www.kaggle.com/titericz'
get_single_item_data(item_url)

, , , , , rankText 'st', 'th' ..

0

Tales Pádua 17 . '15 18:37

.

javascript-, . , html , .

<h4 data-bind="text: rankingText"></h4>

Please see the Selenium web driver . Using this driver, you can get the full page and run js as usual.

-1

Ali Nikneshan Dec 17 '15 at 13:47

source share

alecxe · Accepted Answer · 2015-12-17T15:28:14+0000

selenium, @Ali, javascript, . -. , script , profile, json Python :

import re
import json

from bs4 import BeautifulSoup
import requests


response = requests.get("https://www.kaggle.com/titericz")
soup = BeautifulSoup(response.content, "html.parser")

pattern = re.compile(r"profile: ({.*}),", re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)

profile_text = pattern.search(script.text).group(1)
profile = json.loads(profile_text)

print profile["ranking"], profile["rankingText"]

1 1st

Python: getting text from html using beatifulsoup

More articles: