Parse pcap files using dpkt (Python)

I am trying to parse previously captured trace for HTTP headers using the dpkt module:

import dpkt import sys f=file(sys.argv[1],"rb") pcap=dpkt.pcap.Reader(f) for ts, buf in pcap: eth=dpkt.ethernet.Ethernet(buf) ip=eth.data tcp=ip.data if tcp.dport==80 and len(tcp.data)>0: try: http=dpkt.http.Request(tcp.data) print http.uri except: print 'issue' continue f.close() 

Although it seems that it parses most packages efficiently, I get a NeedData exception ("premature end of headers") on some. They seem like valid packages in WireShark, so I'm a little confused as to why exceptions are thrown.

Some results:

 /ec/fd/ls/GlinkPing.aspx?IG=4a06eefebcc1495f8f4de7cb41f0ce5c&CID=2265e1228f3451ff8011dcbe5e0cdff7&ID=API.YAds%2C5037.1&1307036510547 issue issue #misses one packet here, two exceptions /?ld=4vyO5h1FkjCNjBpThUTGnzF50sB7QUGL0Ok8YefDTWNmO6RXghgDqHXtcp1OqeXATbCAHliIkglLj95-VEwG6ZJN3fblgd3Lh5NvTp4mZPcBGXUyKqXn9FViBAsmt1T96oumpCL5gm7gZ3qlZqSdLNUWjpML_9I8FvB2TLKPSYcJmb_VwwvJhiHpiUIvrjRdzqdVVnuQZVjQmZIIlfaMq0LOmgew_plopjt7hYvOSzBi3VJl4bqOBVk3zdhIvgZK0SfJp3kEWTXAr2_UU_q9KHBpSTnvuhY2W1xo3K2BOHKGk1VAlMiWtWC_nUaJdZmhzzWfb6yRAmY3M9YkUzFGs9z10-70OszkkNpVMSS3-p7xsNXQnC3Zpaxks 

Help is appreciated; an alternative library recommendation may be needed.

+6
source share
3 answers

I ran into the same issue when dealing with HTTP requests and dpkt.

The problem is that the dpkt HTTP headers parser uses the wrong logic. This exception occurs when HTTP does not end with \r\n\r\n . (And, as you say, in the end there are many good packages without \r\n\r\n .)

Below is the error report.

+1
source

In your Python code, before assigning ip = eth.data, check to see if it is an Ethernet IP type or not. If the Ethernet type is not ip, nothing does for this packet. And check if IP is TCP

  To check
                1. IP packet or not
                2. TCP protocol or not

changed your program code

 
 ............            
       eth = dpkt.ethernet.Ethernet (buf)          
       ip = eth.data  
       tcp = ip.data      
       ........ 

as

    
 ............         
      eth = dpkt.ethernet.Ethernet (buf)  
      if eth.type! = 2048: #For ipv4, dpkt.ethernet.Ethernet (buf) .type = 2048        
            continue         
      ip = eth.data
      if ip.p! = 6:
            continue
      tcp = ip.data        
      .......
  and see whether there is any error issue.        

taking into account,
Irengbach Tilokhan Singh

+1
source

I added a dpkt example that parses and displays HTTP headers. Documents can be found here: http://dpkt.readthedocs.io/en/latest/print_http_requests.html , and sample code can be found in dpkt / examples / print_http_requests.py

 # For each packet in the pcap process the contents for timestamp, buf in pcap: # Unpack the Ethernet frame (mac src/dst, ethertype) eth = dpkt.ethernet.Ethernet(buf) # Make sure the Ethernet data contains an IP packet if not isinstance(eth.data, dpkt.ip.IP): print 'Non IP Packet type not supported %s\n' % eth.data.__class__.__name__ continue # Now grab the data within the Ethernet frame (the IP packet) ip = eth.data # Check for TCP in the transport layer if isinstance(ip.data, dpkt.tcp.TCP): # Set the TCP data tcp = ip.data # Now see if we can parse the contents as a HTTP request try: request = dpkt.http.Request(tcp.data) except (dpkt.dpkt.NeedData, dpkt.dpkt.UnpackError): continue # Pull out fragment information (flags and offset all packed into off field, so use bitmasks) do_not_fragment = bool(ip.off & dpkt.ip.IP_DF) more_fragments = bool(ip.off & dpkt.ip.IP_MF) fragment_offset = ip.off & dpkt.ip.IP_OFFMASK # Print out the info print 'Timestamp: ', str(datetime.datetime.utcfromtimestamp(timestamp)) print 'Ethernet Frame: ', mac_addr(eth.src), mac_addr(eth.dst), eth.type print 'IP: %s -> %s (len=%d ttl=%d DF=%d MF=%d offset=%d)' % \ (inet_to_str(ip.src), inet_to_str(ip.dst), ip.len, ip.ttl, do_not_fragment, more_fragments, fragment_offset) print 'HTTP request: %s\n' % repr(request) 

Result

 Timestamp: 2004-05-13 10:17:08.222534 Ethernet Frame: 00:00:01:00:00:00 fe:ff:20:00:01:00 2048 IP: 145.254.160.237 -> 65.208.228.223 (len=519 ttl=128 DF=1 MF=0 offset=0) HTTP request: Request(body='', uri='/download.html', headers={'accept-language': 'en-us,en;q=0.5', 'accept-encoding': 'gzip,deflate', 'connection': 'keep-alive', 'keep-alive': '300', 'accept': 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1', 'user-agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113', 'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7', 'host': 'www.ethereal.com', 'referer': 'http://www.ethereal.com/development.html'}, version='1.1', data='', method='GET') Timestamp: 2004-05-13 10:17:10.295515 Ethernet Frame: 00:00:01:00:00:00 fe:ff:20:00:01:00 2048 IP: 145.254.160.237 -> 216.239.59.99 (len=761 ttl=128 DF=1 MF=0 offset=0) HTTP request: Request(body='', uri='/pagead/ads?client=ca-pub-2309191948673629&random=1084443430285&lmt=1082467020&format=468x60_as&output=html&url=http%3A%2F%2Fwww.ethereal.com%2Fdownload.html&color_bg=FFFFFF&color_text=333333&color_link=000000&color_url=666633&color_border=666633', headers={'accept-language': 'en-us,en;q=0.5', 'accept-encoding': 'gzip,deflate', 'connection': 'keep-alive', 'keep-alive': '300', 'accept': 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1', 'user-agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113', 'accept-charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7', 'host': 'pagead2.googlesyndication.com', 'referer': 'http://www.ethereal.com/download.html'}, version='1.1', data='', method='GET') 
+1
source

Source: https://habr.com/ru/post/890452/


All Articles