PHP Regular Expression to find a pattern, but only replace a single character

I convert PDF to text using xpdf pdf2text and it works fine except for one: it converts paragraph characters (& para;) to number 8. I need to find a way to get to everything with a template

preg_match_all('/\b8\d{1,2}-/', 'text'); 

but just replace β€œ8” with this pattern. I tried to save the matches in an array, but how can I insert them into the text, where do they belong?

Ideally, the paragraph tag will be correctly converted, but I tried several different encodings without success; I think some of the PDFs have embedded fonts.

Any ideas on how I can replace only β€œ8” in this template? I cannot just replace all 8, because the page or chapter of the article link may be 8; but there is no danger that the paragraph will be 80 (that's why I check the number after 8).

Thanks.

+4
source share
1 answer

Grab the rest of the picture in the group and put it back in place:

 $str = preg_replace('/\b8(\d{1,2}-)/', 'replacement$1', $str); 
+5
source

Source: https://habr.com/ru/post/1440878/


All Articles