Support
- escaping quotes through
\" and \' and multi-line quotes. - inconsistent quotation marks, where quotation marks end at the end of a line.
- additional optimizations for large files
Optimization
Several optimizations to reduce the number of steps:
Example 1 - for a string Word1 Word2 (two spaces between words)
Example 2 - for the string 'example' another_word (two spaces between words)
Regular expression
\G((?:[^\s"']+| (?!\s)|"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')*+)(\s+)
https://regex101.com/r/wT6tU2/4
Replacement
$1 (yes, there is a space at the end)
Visualization

code
try { String resultString = subjectString.replaceAll("\\G((?:[^\\s\"']+| (?!\\s)|\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\.[^'\\\\]*)*')*+)(\\s+)", "$1 "); } catch (PatternSyntaxException ex) {
Human reading
// \G((?:[^\s"']+| (?!\s)|"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')*+)(\s+) // // Options: Case sensitive; Exact spacing; Dot doesn't match line breaks; ^$ don't match at line breaks; Default line breaks; Regex syntax only // // Assert position at the end of the previous match (the start of the string for the first attempt) «\G» // Match the regex below and capture its match into backreference number 1 «((?:[^\s"']+| (?!\s)|"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')*+)» // Match the regular expression below «(?:[^\s"']+| (?!\s)|"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')*+» // Between zero and unlimited times, as many times as possible, without giving back (possessive) «*+» // Match this alternative (attempting the next alternative only if this one fails) «[^\s"']+» // Match any single character NOT present in the list below «[^\s"']+» // Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» // A "whitespace character" (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s» // A single character from the list ""'" «"'» // Or match this alternative (attempting the next alternative only if this one fails) « (?!\s)» // Match the character " " literally « » // Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!\s)» // Match a single character that is a "whitespace character" (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s» // Or match this alternative (attempting the next alternative only if this one fails) «"[^"\\]*(?:\\.[^"\\]*)*"» // Match the character """ literally «"» // Match any single character NOT present in the list below «[^"\\]*» // Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» // The literal character """ «"» // The backslash character «\\» // Match the regular expression below «(?:\\.[^"\\]*)*» // Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» // Match the backslash character «\\» // Match any single character that is NOT a line break character (line feed, carriage return, next line, line separator, paragraph separator) «.» // Match any single character NOT present in the list below «[^"\\]*» // Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» // The literal character """ «"» // The backslash character «\\» // Match the character """ literally «"» // Or match this alternative (the entire group fails if this one fails to match) «'[^'\\]*(?:\\.[^'\\]*)*'» // Match the character "'" literally «'» // Match any single character NOT present in the list below «[^'\\]*» // Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» // The literal character "'" «'» // The backslash character «\\» // Match the regular expression below «(?:\\.[^'\\]*)*» // Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» // Match the backslash character «\\» // Match any single character that is NOT a line break character (line feed, carriage return, next line, line separator, paragraph separator) «.» // Match any single character NOT present in the list below «[^'\\]*» // Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» // The literal character "'" «'» // The backslash character «\\» // Match the character "'" literally «'» // Match the regex below and capture its match into backreference number 2 «(\s+)» // Match a single character that is a "whitespace character" (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s+» // Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»