How to run a request from a static website?

Problem

I have the following question: I need to find some information about the company using the following link.

I need to do this with a search by entity name with a search type value of "start with". I would also like to see “All items” on the page in the Display number of items to view . For example, if I type “google” in the “Enter a name” text box, the script should return a list of companies with names starting with “google” (although this is only the starting point of what I want to do).

Question: How should I use Python for this? I found the following thread: Using Python to request a webpage to start a search

I tried the example in the first answer, the code below:

 from bs4 import BeautifulSoup as BS import requests protein='Q9D880' text = requests.get('http://www.uniprot.org/uniprot/' + protein).text soup = BS(text) MGI = soup.find(name='a', onclick="UniProt.analytics('DR-lines', 'click', 'DR-MGI');").text MGI = MGI[4:] print protein +' - ' + MGI 

This code works because the UniPort website contains analytics that accepts these parameters. However, the website I use does not have this.

I also tried to do the same as the first answer in this thread: how to send a request to an .aspx page in python

However, the sample code provided in the first answer does not work on my machine (Ubuntu 12.4 with Python 2.7). I also do not understand what values ​​should be there, since I am dealing with another aspx site.

How can I use Python to start a search with certain criteria (not sure if this is the correct web terminology, maybe you can submit a form?)?

I am in the background of C ++ and do not deal with the website. I am also learning Python. Any help is appreciated.

First EDIT:
With a lot of help from @Kabie, I compiled the following code (trying to understand how it works):

 import requests from lxml import etree URL = 'http://corp.sec.state.ma.us/CorpWeb/CorpSearch/CorpSearch.aspx' #With get_fields(), we fetched all <input>s from the form. def get_fields(): res = requests.get(URL) if res.ok: page = etree.HTML(res.text) fields = page.xpath('//form[@id="Form1"]//input') return { e.attrib['name']: e.attrib.get('value', '') for e in fields } #hard code some selects from the Form def query(data): formdata = get_fields() formdata.update({ 'ctl00$MainContent$ddRecordsPerPage':'25', }) # Hardcode some <select> value formdata.update(data) res = requests.post(URL, formdata) if res.ok: page = etree.HTML(res.text) return page.xpath('//table[@id="MainContent_SearchControl_grdSearchResultsEntity"]//tr') def search_by_entity_name(entity_name, entity_search_type='B'): return query({ 'ctl00$MainContent$CorpSearch':'rdoByEntityName', 'ctl00$MainContent$txtEntityName': entity_name, 'ctl00$MainContent$ddBeginsWithEntityName': entity_search_type, }) result = search_by_entity_name('google') 

The above code is placed in a script called query.py . I got the following error:

Traceback (last last call): File "query.py", line 39, in
result = search_by_entity_name ('google')
File "query.py", line 36, in search_by_entity_name
'ctl00 $ MainContent $ ddBeginsWithEntityName': entity_search_type,
File "query.py", line 21, in the query
formdata.update ({
AttributeError: object "NoneType" does not have attribute "update"

It seems to me that the search was unsuccessful? Why?

+4
source share
1 answer

You can view the page to see if all fields should be published. There is a good tutorial for Chrome DevTools . Other tools, such as FireBug in FireFox or DragonFly in Opera, also do this work while I recommend DevTools .

After sending the request. In the Network panel Network you can see the form data that was submitted. In this case:

 __EVENTTARGET: __EVENTARGUMENT: __LASTFOCUS: __VIEWSTATE:5UILUho/L3O0HOt9WrIfldHD4Ym6KBWkQYI1GgarbgHeAdzM9zyNbcH0PdP6xtKurlJKneju0/aAJxqKYjiIzo/7h7UhLrfsGul1Wq4T0+BroiT+Y4QVML66jsyaUNaM6KNOAK2CSzaphvSojEe1BV9JVGPYWIhvx0ddgfi7FXKIwdh682cgo4GHmilS7TWcbKxMoQvm9FgKY0NFp7HsggGvG/acqfGUJuw0KaYeWZy0pWKEy+Dntb4Y0TGwLqoJxFNQyOqvKVxnV1MJ0OZ4Nuxo5JHmkeknh4dpjJEwui01zK1WDuBHHsyOmE98t2YMQXXTcE7pnbbZaer2LSFNzCtrjzBmZT8xzCkKHYXI31BxPBEhALcSrbJ/QXeqA7Xrqn9UyCuTcN0Czy0ZRPd2wabNR3DgE+cCYF4KMGUjMUIP+No2nqCvsIAKmg8w6Il8OAEGJMAKA01MTMONKK4BH/OAzLMgH75AdGat2pvp1zHVG6wyA4SqumIH//TqJWFh5+MwNyZxN2zZQ5dBfs3b0hVhq0cL3tvumTfb4lr/xpL3rOvaRiatU+sQqgLUn0/RzeKNefjS3pCwUo8CTbTKaSW1IpWPgP/qmCsuIovXz82EkczLiwhEZsBp3SVdQMqtAVcYJzrcHs0x4jcTAWYZUejvtMXxolAnGLdl/0NJeMgz4WB9tTMeETMJAjKHp2YNhHtFS9/C1o+Hxyex32QxIRKHSBlJ37aisZLxYmxs69squmUlcsHheyI5YMfm0SnS0FwES5JqWGm2f5Bh+1G9fFWmGf2QeA6cX/hdiRTZ7VnuFGrdrJVdbteWwaYQuPdekms2YVapwuoNzkS/A+un14rix4bBULMdzij25BkXpDhm3atovNHzETdvz5FsXjKnPlno0gH7la/tkM8iOdQwqbeh7sG+/wKPqPmUk0Cl0kCHNvMCZhrcgQgpIOOgvI2Fp+PoB7mPdb80T2sTJLlV7Oe2ZqMWsYxphsHMXVlXXeju3kWfpY+Ed/D8VGWniE/eoBhhqyOC2+gaWA2tcOyiDPDCoovazwKGWz5B+FN1OTep5VgoHDqoAm2wk1C3o0zJ9a9IuYoATWI1yd2ffQvx6uvZQXcMvTIbhbVJL+ki4yNRLfVjVnPrpUMjafsnjIw2KLYnR0rio8DWIJhpSm13iDj/KSfAjfk4TMSA6HjhhEBXIDN/ShQAHyrKeFVsXhtH5TXSecY6dxU+Xwk7iNn2dhTILa6S/Gmm06bB4nx5Zw8XhYIEI/eucPOAN3HagCp7KaSdzZvrnjbshmP8hJPhnFhlXdJ+OSYDWuThFUypthTxb5NXH3yQk1+50SN872TtQsKwzhJvSIJExMbpucnVmd+V2c680TD4gIcqWVHLIP3+arrePtg0YQiVTa1TNzNXemDyZzTUBecPynkRnIs0dFLSrz8c6HbIGCrLleWyoB7xicUg39pW7KTsIqWh7P0yOiHgGeHqrN95cRAYcQTOhA== __SCROLLPOSITIONX:0 __SCROLLPOSITIONY:106 __VIEWSTATEENCRYPTED: __EVENTVALIDATION:g2V3UVCVCwSFKN2X8P+O2SsBNGyKX00cyeXvPVmP5dZSjIwZephKx8278dZoeJsa1CkMIloC0D51U0i4Ai0xD6TrYCpKluZSRSphPZQtAq17ivJrqP1QDoxPfOhFvrMiMQZZKOea7Gi/pLDHx42wy20UdyzLHJOAmV02MZ2fzami616O0NpOY8GQz1S5IhEKizo+NZPb87FgC5XSZdXCiqqoChoflvt1nfhtXFGmbOQgIP8ud9lQ94w3w2qwKJ3bqN5nRXVf5S53G7Lt+Du78nefwJfKK92BSgtJSCMJ/m39ykr7EuMDjauo2KHIp2N5IVzGPdSsiOZH86EBzmYbEw== ctl00$MainContent$hdnApplyMasterPageWitoutSidebar:0 ctl00$MainContent$hdn1:0 ctl00$MainContent$CorpSearch:rdoByEntityName ctl00$MainContent$txtEntityName:GO ctl00$MainContent$ddBeginsWithEntityName:M ctl00$MainContent$ddBeginsWithIndividual:B ctl00$MainContent$txtFirstName: ctl00$MainContent$txtMiddleName: ctl00$MainContent$txtLastName: ctl00$MainContent$txtIdentificationNumber: ctl00$MainContent$txtFilingNumber: ctl00$MainContent$ddRecordsPerPage:25 ctl00$MainContent$btnSearch:Search Corporations ctl00$MainContent$hdnW:1920 ctl00$MainContent$hdnH:1053 ctl00$MainContent$SearchControl$hdnRecordsPerPage: 7h7UhLrfsGul1Wq4T0 + BroiT + Y4QVML66jsyaUNaM6KNOAK2CSzaphvSojEe1BV9JVGPYWIhvx0ddgfi7FXKIwdh682cgo4GHmilS7TWcbKxMoQvm9FgKY0NFp7HsggGvG / acqfGUJuw0KaYeWZy0pWKEy + Dntb4Y0TGwLqoJxFNQyOqvKVxnV1MJ0OZ4Nuxo5JHmkeknh4dpjJEwui01zK1WDuBHHsyOmE98t2YMQXXTcE7pnbbZaer2LSFNzCtrjzBmZT8xzCkKHYXI31BxPBEhALcSrbJ / QXeqA7Xrqn9UyCuTcN0Czy0ZRPd2wabNR3DgE + cCYF4KMGUjMUIP + No2nqCvsIAKmg8w6Il8OAEGJMAKA01MTMONKK4BH / OAzLMgH75AdGat2pvp1zHVG6wyA4SqumIH // TqJWFh5 + MwNyZxN2zZQ5dBfs3b0hVhq0cL3tvumTfb4lr / xpL3rOvaRiatU + sQqgLUn0 / RzeKNefjS3pCwUo8CTbTKaSW1IpWPgP / qmCsuIovXz82EkczLiwhEZsBp3SVdQMqtAVcYJzrcHs0x4jcTAWYZUejvtMXxolAnGLdl / 0NJeMgz4WB9tTMeETMJAjKHp2YNhHtFS9 / C1o + Hxyex32QxIRKHSBlJ37aisZLxYmxs69squmUlcsHheyI5YMfm0SnS0FwES5JqWGm2f5Bh + 1G9fFWmGf2QeA6cX / hdiRTZ7VnuFGrdrJVdbteWwaYQuPdekms2YVapwuoNzkS / A __EVENTTARGET: __EVENTARGUMENT: __LASTFOCUS: __VIEWSTATE:5UILUho/L3O0HOt9WrIfldHD4Ym6KBWkQYI1GgarbgHeAdzM9zyNbcH0PdP6xtKurlJKneju0/aAJxqKYjiIzo/7h7UhLrfsGul1Wq4T0+BroiT+Y4QVML66jsyaUNaM6KNOAK2CSzaphvSojEe1BV9JVGPYWIhvx0ddgfi7FXKIwdh682cgo4GHmilS7TWcbKxMoQvm9FgKY0NFp7HsggGvG/acqfGUJuw0KaYeWZy0pWKEy+Dntb4Y0TGwLqoJxFNQyOqvKVxnV1MJ0OZ4Nuxo5JHmkeknh4dpjJEwui01zK1WDuBHHsyOmE98t2YMQXXTcE7pnbbZaer2LSFNzCtrjzBmZT8xzCkKHYXI31BxPBEhALcSrbJ/QXeqA7Xrqn9UyCuTcN0Czy0ZRPd2wabNR3DgE+cCYF4KMGUjMUIP+No2nqCvsIAKmg8w6Il8OAEGJMAKA01MTMONKK4BH/OAzLMgH75AdGat2pvp1zHVG6wyA4SqumIH//TqJWFh5+MwNyZxN2zZQ5dBfs3b0hVhq0cL3tvumTfb4lr/xpL3rOvaRiatU+sQqgLUn0/RzeKNefjS3pCwUo8CTbTKaSW1IpWPgP/qmCsuIovXz82EkczLiwhEZsBp3SVdQMqtAVcYJzrcHs0x4jcTAWYZUejvtMXxolAnGLdl/0NJeMgz4WB9tTMeETMJAjKHp2YNhHtFS9/C1o+Hxyex32QxIRKHSBlJ37aisZLxYmxs69squmUlcsHheyI5YMfm0SnS0FwES5JqWGm2f5Bh+1G9fFWmGf2QeA6cX/hdiRTZ7VnuFGrdrJVdbteWwaYQuPdekms2YVapwuoNzkS/A+un14rix4bBULMdzij25BkXpDhm3atovNHzETdvz5FsXjKnPlno0gH7la/tkM8iOdQwqbeh7sG+/wKPqPmUk0Cl0kCHNvMCZhrcgQgpIOOgvI2Fp+PoB7mPdb80T2sTJLlV7Oe2ZqMWsYxphsHMXVlXXeju3kWfpY+Ed/D8VGWniE/eoBhhqyOC2+gaWA2tcOyiDPDCoovazwKGWz5B+FN1OTep5VgoHDqoAm2wk1C3o0zJ9a9IuYoATWI1yd2ffQvx6uvZQXcMvTIbhbVJL+ki4yNRLfVjVnPrpUMjafsnjIw2KLYnR0rio8DWIJhpSm13iDj/KSfAjfk4TMSA6HjhhEBXIDN/ShQAHyrKeFVsXhtH5TXSecY6dxU+Xwk7iNn2dhTILa6S/Gmm06bB4nx5Zw8XhYIEI/eucPOAN3HagCp7KaSdzZvrnjbshmP8hJPhnFhlXdJ+OSYDWuThFUypthTxb5NXH3yQk1+50SN872TtQsKwzhJvSIJExMbpucnVmd+V2c680TD4gIcqWVHLIP3+arrePtg0YQiVTa1TNzNXemDyZzTUBecPynkRnIs0dFLSrz8c6HbIGCrLleWyoB7xicUg39pW7KTsIqWh7P0yOiHgGeHqrN95cRAYcQTOhA== __SCROLLPOSITIONX:0 __SCROLLPOSITIONY:106 __VIEWSTATEENCRYPTED: __EVENTVALIDATION:g2V3UVCVCwSFKN2X8P+O2SsBNGyKX00cyeXvPVmP5dZSjIwZephKx8278dZoeJsa1CkMIloC0D51U0i4Ai0xD6TrYCpKluZSRSphPZQtAq17ivJrqP1QDoxPfOhFvrMiMQZZKOea7Gi/pLDHx42wy20UdyzLHJOAmV02MZ2fzami616O0NpOY8GQz1S5IhEKizo+NZPb87FgC5XSZdXCiqqoChoflvt1nfhtXFGmbOQgIP8ud9lQ94w3w2qwKJ3bqN5nRXVf5S53G7Lt+Du78nefwJfKK92BSgtJSCMJ/m39ykr7EuMDjauo2KHIp2N5IVzGPdSsiOZH86EBzmYbEw== ctl00$MainContent$hdnApplyMasterPageWitoutSidebar:0 ctl00$MainContent$hdn1:0 ctl00$MainContent$CorpSearch:rdoByEntityName ctl00$MainContent$txtEntityName:GO ctl00$MainContent$ddBeginsWithEntityName:M ctl00$MainContent$ddBeginsWithIndividual:B ctl00$MainContent$txtFirstName: ctl00$MainContent$txtMiddleName: ctl00$MainContent$txtLastName: ctl00$MainContent$txtIdentificationNumber: ctl00$MainContent$txtFilingNumber: ctl00$MainContent$ddRecordsPerPage:25 ctl00$MainContent$btnSearch:Search Corporations ctl00$MainContent$hdnW:1920 ctl00$MainContent$hdnH:1053 ctl00$MainContent$SearchControl$hdnRecordsPerPage: gaWA2tcOyiDPDCoovazwKGWz5B + FN1OTep5VgoHDqoAm2wk1C3o0zJ9a9IuYoATWI1yd2ffQvx6uvZQXcMvTIbhbVJL + ki4yNRLfVjVnPrpUMjafsnjIw2KLYnR0rio8DWIJhpSm13iDj / KSfAjfk4TMSA6HjhhEBXIDN / ShQAHyrKeFVsXhtH5TXSecY6dxU + Xwk7iNn2dhTILa6S / Gmm06bB4nx5Zw8XhYIEI / eucPOAN3HagCp7KaSdzZvrnjbshmP8hJPhnFhlXdJ + OSYDWuThFUypthTxb5NXH3yQk1 + 50SN872TtQsKwzhJvSIJExMbpucnVmd + V2c680TD4gIcqWVHLIP3 + arrePtg0YQiVTa1TNzNXemDyZzTUBecPynkRnIs0dFLSrz8c6HbIGCrLleWyoB7xicUg39pW7KTsIqWh7P0yOiHgGeHqrN95cRAYcQTOhA == __EVENTTARGET: __EVENTARGUMENT: __LASTFOCUS: __VIEWSTATE:5UILUho/L3O0HOt9WrIfldHD4Ym6KBWkQYI1GgarbgHeAdzM9zyNbcH0PdP6xtKurlJKneju0/aAJxqKYjiIzo/7h7UhLrfsGul1Wq4T0+BroiT+Y4QVML66jsyaUNaM6KNOAK2CSzaphvSojEe1BV9JVGPYWIhvx0ddgfi7FXKIwdh682cgo4GHmilS7TWcbKxMoQvm9FgKY0NFp7HsggGvG/acqfGUJuw0KaYeWZy0pWKEy+Dntb4Y0TGwLqoJxFNQyOqvKVxnV1MJ0OZ4Nuxo5JHmkeknh4dpjJEwui01zK1WDuBHHsyOmE98t2YMQXXTcE7pnbbZaer2LSFNzCtrjzBmZT8xzCkKHYXI31BxPBEhALcSrbJ/QXeqA7Xrqn9UyCuTcN0Czy0ZRPd2wabNR3DgE+cCYF4KMGUjMUIP+No2nqCvsIAKmg8w6Il8OAEGJMAKA01MTMONKK4BH/OAzLMgH75AdGat2pvp1zHVG6wyA4SqumIH//TqJWFh5+MwNyZxN2zZQ5dBfs3b0hVhq0cL3tvumTfb4lr/xpL3rOvaRiatU+sQqgLUn0/RzeKNefjS3pCwUo8CTbTKaSW1IpWPgP/qmCsuIovXz82EkczLiwhEZsBp3SVdQMqtAVcYJzrcHs0x4jcTAWYZUejvtMXxolAnGLdl/0NJeMgz4WB9tTMeETMJAjKHp2YNhHtFS9/C1o+Hxyex32QxIRKHSBlJ37aisZLxYmxs69squmUlcsHheyI5YMfm0SnS0FwES5JqWGm2f5Bh+1G9fFWmGf2QeA6cX/hdiRTZ7VnuFGrdrJVdbteWwaYQuPdekms2YVapwuoNzkS/A+un14rix4bBULMdzij25BkXpDhm3atovNHzETdvz5FsXjKnPlno0gH7la/tkM8iOdQwqbeh7sG+/wKPqPmUk0Cl0kCHNvMCZhrcgQgpIOOgvI2Fp+PoB7mPdb80T2sTJLlV7Oe2ZqMWsYxphsHMXVlXXeju3kWfpY+Ed/D8VGWniE/eoBhhqyOC2+gaWA2tcOyiDPDCoovazwKGWz5B+FN1OTep5VgoHDqoAm2wk1C3o0zJ9a9IuYoATWI1yd2ffQvx6uvZQXcMvTIbhbVJL+ki4yNRLfVjVnPrpUMjafsnjIw2KLYnR0rio8DWIJhpSm13iDj/KSfAjfk4TMSA6HjhhEBXIDN/ShQAHyrKeFVsXhtH5TXSecY6dxU+Xwk7iNn2dhTILa6S/Gmm06bB4nx5Zw8XhYIEI/eucPOAN3HagCp7KaSdzZvrnjbshmP8hJPhnFhlXdJ+OSYDWuThFUypthTxb5NXH3yQk1+50SN872TtQsKwzhJvSIJExMbpucnVmd+V2c680TD4gIcqWVHLIP3+arrePtg0YQiVTa1TNzNXemDyZzTUBecPynkRnIs0dFLSrz8c6HbIGCrLleWyoB7xicUg39pW7KTsIqWh7P0yOiHgGeHqrN95cRAYcQTOhA== __SCROLLPOSITIONX:0 __SCROLLPOSITIONY:106 __VIEWSTATEENCRYPTED: __EVENTVALIDATION:g2V3UVCVCwSFKN2X8P+O2SsBNGyKX00cyeXvPVmP5dZSjIwZephKx8278dZoeJsa1CkMIloC0D51U0i4Ai0xD6TrYCpKluZSRSphPZQtAq17ivJrqP1QDoxPfOhFvrMiMQZZKOea7Gi/pLDHx42wy20UdyzLHJOAmV02MZ2fzami616O0NpOY8GQz1S5IhEKizo+NZPb87FgC5XSZdXCiqqoChoflvt1nfhtXFGmbOQgIP8ud9lQ94w3w2qwKJ3bqN5nRXVf5S53G7Lt+Du78nefwJfKK92BSgtJSCMJ/m39ykr7EuMDjauo2KHIp2N5IVzGPdSsiOZH86EBzmYbEw== ctl00$MainContent$hdnApplyMasterPageWitoutSidebar:0 ctl00$MainContent$hdn1:0 ctl00$MainContent$CorpSearch:rdoByEntityName ctl00$MainContent$txtEntityName:GO ctl00$MainContent$ddBeginsWithEntityName:M ctl00$MainContent$ddBeginsWithIndividual:B ctl00$MainContent$txtFirstName: ctl00$MainContent$txtMiddleName: ctl00$MainContent$txtLastName: ctl00$MainContent$txtIdentificationNumber: ctl00$MainContent$txtFilingNumber: ctl00$MainContent$ddRecordsPerPage:25 ctl00$MainContent$btnSearch:Search Corporations ctl00$MainContent$hdnW:1920 ctl00$MainContent$hdnH:1053 ctl00$MainContent$SearchControl$hdnRecordsPerPage: NZPb87FgC5XSZdXCiqqoChoflvt1nfhtXFGmbOQgIP8ud9lQ94w3w2qwKJ3bqN5nRXVf5S53G7Lt + Du78nefwJfKK92BSgtJSCMJ / m39ykr7EuMDjauo2KHIp2N5IVzGPdSsiOZH86EBzmYbEw == __EVENTTARGET: __EVENTARGUMENT: __LASTFOCUS: __VIEWSTATE:5UILUho/L3O0HOt9WrIfldHD4Ym6KBWkQYI1GgarbgHeAdzM9zyNbcH0PdP6xtKurlJKneju0/aAJxqKYjiIzo/7h7UhLrfsGul1Wq4T0+BroiT+Y4QVML66jsyaUNaM6KNOAK2CSzaphvSojEe1BV9JVGPYWIhvx0ddgfi7FXKIwdh682cgo4GHmilS7TWcbKxMoQvm9FgKY0NFp7HsggGvG/acqfGUJuw0KaYeWZy0pWKEy+Dntb4Y0TGwLqoJxFNQyOqvKVxnV1MJ0OZ4Nuxo5JHmkeknh4dpjJEwui01zK1WDuBHHsyOmE98t2YMQXXTcE7pnbbZaer2LSFNzCtrjzBmZT8xzCkKHYXI31BxPBEhALcSrbJ/QXeqA7Xrqn9UyCuTcN0Czy0ZRPd2wabNR3DgE+cCYF4KMGUjMUIP+No2nqCvsIAKmg8w6Il8OAEGJMAKA01MTMONKK4BH/OAzLMgH75AdGat2pvp1zHVG6wyA4SqumIH//TqJWFh5+MwNyZxN2zZQ5dBfs3b0hVhq0cL3tvumTfb4lr/xpL3rOvaRiatU+sQqgLUn0/RzeKNefjS3pCwUo8CTbTKaSW1IpWPgP/qmCsuIovXz82EkczLiwhEZsBp3SVdQMqtAVcYJzrcHs0x4jcTAWYZUejvtMXxolAnGLdl/0NJeMgz4WB9tTMeETMJAjKHp2YNhHtFS9/C1o+Hxyex32QxIRKHSBlJ37aisZLxYmxs69squmUlcsHheyI5YMfm0SnS0FwES5JqWGm2f5Bh+1G9fFWmGf2QeA6cX/hdiRTZ7VnuFGrdrJVdbteWwaYQuPdekms2YVapwuoNzkS/A+un14rix4bBULMdzij25BkXpDhm3atovNHzETdvz5FsXjKnPlno0gH7la/tkM8iOdQwqbeh7sG+/wKPqPmUk0Cl0kCHNvMCZhrcgQgpIOOgvI2Fp+PoB7mPdb80T2sTJLlV7Oe2ZqMWsYxphsHMXVlXXeju3kWfpY+Ed/D8VGWniE/eoBhhqyOC2+gaWA2tcOyiDPDCoovazwKGWz5B+FN1OTep5VgoHDqoAm2wk1C3o0zJ9a9IuYoATWI1yd2ffQvx6uvZQXcMvTIbhbVJL+ki4yNRLfVjVnPrpUMjafsnjIw2KLYnR0rio8DWIJhpSm13iDj/KSfAjfk4TMSA6HjhhEBXIDN/ShQAHyrKeFVsXhtH5TXSecY6dxU+Xwk7iNn2dhTILa6S/Gmm06bB4nx5Zw8XhYIEI/eucPOAN3HagCp7KaSdzZvrnjbshmP8hJPhnFhlXdJ+OSYDWuThFUypthTxb5NXH3yQk1+50SN872TtQsKwzhJvSIJExMbpucnVmd+V2c680TD4gIcqWVHLIP3+arrePtg0YQiVTa1TNzNXemDyZzTUBecPynkRnIs0dFLSrz8c6HbIGCrLleWyoB7xicUg39pW7KTsIqWh7P0yOiHgGeHqrN95cRAYcQTOhA== __SCROLLPOSITIONX:0 __SCROLLPOSITIONY:106 __VIEWSTATEENCRYPTED: __EVENTVALIDATION:g2V3UVCVCwSFKN2X8P+O2SsBNGyKX00cyeXvPVmP5dZSjIwZephKx8278dZoeJsa1CkMIloC0D51U0i4Ai0xD6TrYCpKluZSRSphPZQtAq17ivJrqP1QDoxPfOhFvrMiMQZZKOea7Gi/pLDHx42wy20UdyzLHJOAmV02MZ2fzami616O0NpOY8GQz1S5IhEKizo+NZPb87FgC5XSZdXCiqqoChoflvt1nfhtXFGmbOQgIP8ud9lQ94w3w2qwKJ3bqN5nRXVf5S53G7Lt+Du78nefwJfKK92BSgtJSCMJ/m39ykr7EuMDjauo2KHIp2N5IVzGPdSsiOZH86EBzmYbEw== ctl00$MainContent$hdnApplyMasterPageWitoutSidebar:0 ctl00$MainContent$hdn1:0 ctl00$MainContent$CorpSearch:rdoByEntityName ctl00$MainContent$txtEntityName:GO ctl00$MainContent$ddBeginsWithEntityName:M ctl00$MainContent$ddBeginsWithIndividual:B ctl00$MainContent$txtFirstName: ctl00$MainContent$txtMiddleName: ctl00$MainContent$txtLastName: ctl00$MainContent$txtIdentificationNumber: ctl00$MainContent$txtFilingNumber: ctl00$MainContent$ddRecordsPerPage:25 ctl00$MainContent$btnSearch:Search Corporations ctl00$MainContent$hdnW:1920 ctl00$MainContent$hdnH:1053 ctl00$MainContent$SearchControl$hdnRecordsPerPage: 

What I am writing is Begin with 'GO' . This site is built with WebForms , so there are such long fields __VIEWSTATE and __EVENTVALIDATION . We also need to send them.

Now we are ready to make a request. First we need to get an empty form. The following code is written in Python 3.3, I think they should still work on 2.x.

 import requests from lxml import etree URL = 'http://corp.sec.state.ma.us/CorpWeb/CorpSearch/CorpSearch.aspx' def get_fields(): res = requests.get(URL) if res.ok: page = etree.HTML(res.text) fields = page.xpath('//form[@id="Form1"]//input') return { e.attrib['name']: e.attrib.get('value', '') for e in fields } 

With get_fields() we will select all <input> from the form. Note that there are also <select> s, I will just hard code them.

 def query(data): formdata = get_fields() formdata.update({ 'ctl00$MainContent$ddRecordsPerPage':'25', }) # Hardcode some <select> value formdata.update(data) res = requests.post(URL, formdata) if res.ok: page = etree.HTML(res.text) return page.xpath('//table[@id="MainContent_SearchControl_grdSearchResultsEntity"]//tr') 

Now we have a general query function that allows us to create a wrapper for certain ones.

 def search_by_entity_name(entity_name, entity_search_type='B'): return query({ 'ctl00$MainContent$CorpSearch':'rdoByEntityName', 'ctl00$MainContent$txtEntityName': entity_name, 'ctl00$MainContent$ddBeginsWithEntityName': entity_search_type, }) 

This particular site example uses the <radio> group to determine which fields to use, so 'ctl00$MainContent$CorpSearch':'rdoByEntityName' is needed here. And you can make others like search_by_individual_name etc. Independently.

Sometimes a website needs additional information to verify a request. By then, you could add some custom headers , such as Origin , Referer , User-Agent , to mimic the browser.

And if a website uses JavaScript to create forms, you need more than requests . PhantomJS is a good browser scripting tool. If you want to do this in Python, you can use PyQt with qtwebkit .

Update : The site seems to have blocked our Python script in order to access it after yesterday. Therefore, we must simulate as a browser. As mentioned above, we can add a custom header. First, add the User-Agent field to the header, see what happened.

 res = requests.get(URL, headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36', }) 

And now ... res.ok returns True !

Therefore, we just need to add this header both in the call to res = requests.get(URL) in get_fields() and in res = requests.post(URL, formdata) in query() . Just in case, add 'Referer':URL to the headers of the latter:

 res = requests.post(URL, formdata, headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36', 'Referer':URL, }) 
+4
source

Source: https://habr.com/ru/post/1502526/


All Articles