Perhaps you want this:
from urllib import urlopen import re pgno = 2 url = "http://www.eximguru.com/traderesources/pincode.aspx?&GridInfo=Pincode0%s" %str(pgno) print url +'\n' sock = urlopen(url) htmlcode = sock.read() sock.close() x = re.search('%;"><a href="javascript:__doPostBack',htmlcode).start() pat = ('\t\t\t\t<td style="width:\d+%;">(\d+)</td>' '<td style="width:\d+%;">(.+?)</td>' '<td style="width:\d+%;">(.+?)</td>' '<td style="width:30%;">(.+?)</td>\r\n') regx = re.compile(pat) print '\n'.join(map(repr,regx.findall(htmlcode,x)))
result
http://www.eximguru.com/traderesources/pincode.aspx?&GridInfo=Pincode02 ('110001', 'New Delhi', 'Delhi', 'Baroda House') ('110001', 'New Delhi', 'Delhi', 'Bengali Market') ('110001', 'New Delhi', 'Delhi', 'Bhagat Singh Market') ('110001', 'New Delhi', 'Delhi', 'Connaught Place') ('110001', 'New Delhi', 'Delhi', 'Constitution House') ('110001', 'New Delhi', 'Delhi', 'Election Commission') ('110001', 'New Delhi', 'Delhi', 'Janpath') ('110001', 'New Delhi', 'Delhi', 'Krishi Bhawan') ('110001', 'New Delhi', 'Delhi', 'Lady Harding Medical College') ('110001', 'New Delhi', 'Delhi', 'New Delhi Gpo') ('110001', 'New Delhi', 'Delhi', 'New Delhi Ho') ('110001', 'New Delhi', 'Delhi', 'North Avenue') ('110001', 'New Delhi', 'Delhi', 'Parliament House') ('110001', 'New Delhi', 'Delhi', 'Patiala House') ('110001', 'New Delhi', 'Delhi', 'Pragati Maidan') ('110001', 'New Delhi', 'Delhi', 'Rail Bhawan') ('110001', 'New Delhi', 'Delhi', 'Sansad Marg Hpo') ('110001', 'New Delhi', 'Delhi', 'Sansadiya Soudh') ('110001', 'New Delhi', 'Delhi', 'Secretariat North') ('110001', 'New Delhi', 'Delhi', 'Shastri Bhawan') ('110001', 'New Delhi', 'Delhi', 'Supreme Court') ('110002', 'New Delhi', 'Delhi', 'Rajghat Power House') ('110002', 'New Delhi', 'Delhi', 'Minto Road') ('110002', 'New Delhi', 'Delhi', 'Indraprastha Hpo') ('110002', 'New Delhi', 'Delhi', 'Darya Ganj')
I wrote this code after studying the structure of the HTML source code with the following code (I think you will understand this without any further explanation):
from urllib2 import Request,urlopen import re pgno = 2 url = "http://www.eximguru.com/traderesources/pincode.aspx?&GridInfo=Pincode0%s" %str(pgno) print url +'\n' sock = urlopen(url) htmlcode = sock.read() sock.close() li = htmlcode.splitlines(True) print '\n'.join(str(i) + ' ' + repr(line)+'\n' for i,line in enumerate(li) if 275<i<300) ch = ''.join(li[0:291]) from collections import defaultdict didi =defaultdict(int) for c in ch: didi[c] += 1 print '\n\n'+repr(li[289]) print '\n'.join('%r -> %s' % (c,didi[c]) for c in li[289] if didi[c]<35)
.
Now the problem is that the same HTML is returned for all pgno values. A site may find that it is a program that wants to connect and retrieve data. This problem should be handled by tools in urllib2 , but I am not trained in this.