You can extract data from pdf using poppler . Depending on your exact requirements, this may be enough.
def extract_to_text(pdf_path) command = ['pdftotext', Shellwords.escape(pdf_path)].join(' ') `#{command}` end def extract_to_html(pdf_path) command = ['pdftohtml', Shellwords.escape(pdf_path)].join(' ') `#{command}` end
These commands will extract the pdf files to the html file and text file, respectively, stored in the same place where your pdf file was.
You can install poppler on mac with homebrew:
brew install poppler
source share