How to parse a website that does not show codes in the view source?

I'm not sure how to describe the problems correctly, but anyway, so I want to use mechanize to capture the form and get the input name. however, when I analyze the use of mechanization, it does not display the form name and input name. and if I try manually by looking at the website, I have to check the element so that I can get the input name, but still, it is dynamic, so every time I check the element, it gives me a different name. Any ideas? By the way, the site I'm trying to parse is https://www.ursa.ucla.edu/logon/logon.asp , if anyone is interested.

Here is what I tried:

br = mechanize.Browser(factory=mechanize.RobustFactory()) br.open("https://www.ursa.ucla.edu/logon/logon.asp/") br.select_form(nr=0) print br.response().read() 

Thanks Advance, Richard.

+4
source share
1 answer

The webpage you are trying to parse is not directly accessible. When you visit https://www.ursa.ucla.edu/logon/logon.asp , he will do the following:

Now I don't know how python handles redirection headers. You may need to see the answer you get. In the best case, this will be the last page with hidden variables, you will need to parse them and send a POST request to the same URL to get the real login page. In the worst case, you will need to follow the headlines completely from the first page.

+1
source

Source: https://habr.com/ru/post/1392286/


All Articles