The problem is that PDF is a layout language, not a semantic language, but rather for HTML.
This means that when converting to HTML with any hope that you will remain readable for the end user, you must force HTML to make a layout by positioning individual words (and sometimes letters), and the semantic structure is often distorted or lost - hence gibberish.
You can feel the problem by opening almost any PDF file representing a text document and trying (by eye) to find words or paragraphs in the text.
Compare this with an HTML document that is often read directly from the source.
source share