Why does cURL give the correct answer, but no?

Why does cURL give the correct answer, but scrapy does not?

The site I want to clean up using javascript to fill out the form, then SEND it and check before displaying the content.

I replicated this js in python, after clearing the parameters from javascript in the original GET request. My value "TS644333_75" corresponds to the value js (as verified by executing document.write (..) out, instead of letting it obey as usual), and if you copy and paste the result into cURL, it works too. For example:

curl  --http1.0 'http://www.betvictor.com/sports/en/football' -H 'Connection: keep-alive'
 -H  'Accept-Encoding: gzip,deflate' -H 'Accept-Language: en' 
 -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' 
 -H 'Referer: http://www.betvictor.com/sports/en/football' -H 'User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0' 
--data 
'TS644333_id=3&
 TS644333_75=286800b2a80cd3334cd2895e42e67031%3Ayxxy%3A3N6QfX3q%3A1685704694&
 TS644333_md=1&
 TS644333_rf=0&
 TS644333_ct=0&
 TS644333_pd=0' --compressed

Where to get TS644333_75 I just copied and pasted the result that my Python code computed when simulating js.

wirehark POST, ( , POST- , ).

, scrapy shell:

1) scrapy shell "http://www.betvictor.com/sports/en/football"

:

2) from scrapy.http import FormRequest

   req=FormRequest(
    url='http://www.betvictor.com/sports/en/football',
    formdata={
              'TS644333_id': '3',
              'TS644333_75': '286800b2a80cd3334cd2895e42e67031:yxxy:3N6QfX3q:1685704694',
              'TS644333_md': '1',
              'TS644333_rf': '0',
              'TS644333_ct': '0',
              'TS644333_pd': '0'
    },
    headers={
    'Referer': 'http://www.betvictor.com/sports/en/football',
    'Connection': 'keep-alive'
   }
   )

3) fetch(req)

, , - JavaScript, .

, wirehark, ( POST) , indentical .

? , , ? ?

":" , , POST, , wirehark, .

+4
2

, , URL- - , scrapy, URL- :

http://www.betvictor.com/sports/en/football/
+4

, ( , , ). python, POST url (.. ':' --> '%3A'), . , wirehark , , , - , .

Scrapy :

 ot=     ( ('TS644333_id', '3'),
              ('TS644333_75', value),
              ('TS644333_md', '1'),
              ('TS644333_rf', '0'),
              ('TS644333_ct', '0'),
              ('TS644333_pd', '0')
             )

to formdata=, , .

{'Content-Type': 'application/x-www-form-urlencoded'}.

anana, URL- "/" URL- , GET, js POST-, !

+2

Source: https://habr.com/ru/post/1532654/


All Articles