Parsing json var inside script tag

I'm currently trying to clear json output from 'https://sports.bovada.lv/soccer/premier-league'

he has a source with the following

<script type="text/javascript">var swc_market_lists = {"items":[{"description":"Game Lines","id":"23", ... </script>

I am trying to get the contents of swc_market_listsvar

Now the problem is that when I use the following code

import requests
from lxml import html



url = 'https://sports.bovada.lv/soccer/premier-league'
r = requests.get(url)
tree = html.fromstring(r.content)
var = tree.xpath('//script')
print(var)

I get an empty var value.

I also tried saving r.textand viewing it, but I don't see script tags.

What am I missing?

+4
source share
1 answer

You need to pass the header User-Agentin order for it to work:

r = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})

To get scriptwhat you want, you can check for the presence swc_market_listsin the text:

script = tree.xpath('//script[contains(., "swc_market_lists")]/text()')[0]
print(script)

To extract the value of a variable swc_market_lists:

import re

data = re.search(r"var swc_market_lists = (.*?);$", script).group(1)
print(data)

, , json.loads() Python:

import json
data = json.loads(data)
+6

Source: https://habr.com/ru/post/1628059/


All Articles