Remove extra whitespace from extracted pdf text

I have extracted text from a PDF file and some text has extra spaces between words.

Your water and n wastewater sttmmnt

I wrote a function to remove extra spaces from the text above.

function removeExtraWhitespace($val) { $nval = ""; for($i = 0; $i < strlen($val); $i++) { if($val[$i] != " ") { $nval .= $val[$i]; } else if((isset($val[$i-2]) && $val[$i-2] != " ") || (isset($val[$i+2]) && $val[$i+2] != " ")) { $nval .= $val[$i]; } } return $nval; } 

It will display:

Your expression on water and wastewater

I know that this function will not work under any circumstances. If the text has a valid single-letter word, such as “a,” it will fail, or if only part of the word has extra spaces.

I need to remove spaces from string

When setting the above text to my function, it will output:

Ineed to remove spaces from a string

Is there a way to make a function that will work on all possible text?

+5
source share

Source: https://habr.com/ru/post/1272943/


All Articles