I convert PDF to text using xpdf pdf2text and it works fine except for one: it converts paragraph characters (& para;) to number 8. I need to find a way to get to everything with a template
preg_match_all('/\b8\d{1,2}-/', 'text');
but just replace β8β with this pattern. I tried to save the matches in an array, but how can I insert them into the text, where do they belong?
Ideally, the paragraph tag will be correctly converted, but I tried several different encodings without success; I think some of the PDFs have embedded fonts.
Any ideas on how I can replace only β8β in this template? I cannot just replace all 8, because the page or chapter of the article link may be 8; but there is no danger that the paragraph will be 80 (that's why I check the number after 8).
Thanks.
source share