I recently had a similar problem and wrote a library to help solve it: pdfquery .
PDFQuery creates a tree of elements from PDF (using pdfminer, with extra sugar) and allows you to retrieve elements from a page using JQuery or XPath selectors, based mainly on text content or element locations. So, to analyze the table, you will first find where it is in the document, looking for a label:
label = pdf.pq(':contains("Name of your table")') left_corner = float(label.attr('x0')) bottom_corner = float(label.attr('y0'))
You will then continue to search for rows below the table until the search returns results:
page = label.closest('LTPage') while 1: row = pdf.extract( [ ('column_1', ':in_bbox("%s,%s,%s,%s")' % (left_corner+10, bottom_corner+40, left_corner+50, bottom_corner+20)), ('column_2', ':in_bbox("%s,%s,%s,%s")' % (left_corner+50, bottom_corner+40, left_corner+80, bottom_corner+20)) ], page) if not row['column_1'] or row['column_2']: break print "Got row:", matches bottom_corner -= 20
This assumes your rows are 20 points high, the first starts 20 points below the mark, the first column takes 10 to 50 points from the left edge of the mark, and the second column is 50 to 80 points from the left edge of the mark.
If you have empty lines or lines with different heights, this will be more annoying. You may also need to use the merge_tags = None parameter to select individual characters, not words, if the entries in the table are close enough to force the parser to read it as just one line. But I hope this brings you closer ...