Selenium gets page source different with right click in browser

I have a problem with parsing a webpage since I get a different page source when I do:

display = Display(visible=False, size=(800, 600), backend='xvfb')
display.start()
driver = webdriver.Firefox()
url = "http://www.aaa.com"
driver.get(url)
with codecs.open('page.html', 'w', 'utf-8') as f:
    f.write(driver.page_source)

When I open a file to view the actual text, it differs in what I get with a right click in the browser.

For example, some hrefs become lowercase. And some tag in the page source:

<table class="list" boroder="0" id="list_id">

turned into

<table border="0" id="list_id" class="list">

I am sure this is the same url I am requesting ...

+4
source share
1 answer

There are two main problems in getting the source of a web page, as you do.

  • - HTML, HTML. HTML , DOM. driver.page_source - DOM HTML , . , , DOM -. :

    <table class="list" border="0" id="list_id">
    

    <table border="0" id="list_id" class="list">
    

    . , HTML. ( , , , <a><b> <b><a>.) - , . : <TABLE> <TABLE> . , HTML (XHTML .)

    , Selenium Firefox . , , .

  • , - Ajax. , - . , . driver.page_source , , Ajax , Firefox, , , driver.page_source , Ajax.

+4

Source: https://habr.com/ru/post/1532201/


All Articles