Grab the line containing "." or ":" but does not end with a period

I am trying to create a regex character set that can contain a period or a colon, but cannot end with a period. So I want the machine to say a string "Lorem./: Ipsom dolor sit", but not"Lorem ipsum dolor sit."

This is what my current regular expression looks like, but it doesn't work, as it will match if the line ends with a period or a colon:

/(\n{2,})([ \wåäöÅÄÖ,()%+\-:.]{2,75}[^.:])(\n{1,})/

I am looking for headers in a huge, poorly formatted text file. And the general pattern in this file is that the header is always preceded by two lines of a new line or more and always replaced by one new line or more. Also, the title sometimes ends in :, but never on ., but sometimes they contain .or :. In addition, they always have a length of 2 to 75 characters and never precede another heading.

Any help would be greatly appreciated.

Edit: I realized that my explanation, where it is pretty bad and partially wrong, updated this post.

+3
source share
3 answers

, , , (?<!\.)$ .

lookbehind.

, :

/\n{2,}([ \wåäöÅÄÖ,()%+\-:.]{2,75}(?<!\.))\n+/

,

  • (\n{2,}),
  • 2 75 ([ \wåäöÅÄÖ,()%+\-:.]),
  • . ((?<!\.) -)
  • (\n+).

EDIT:

, , ; :

preg_match_all(
    '/(?<=\n\n)   # Assert that there are two newlines before the current position
    ^             # Assert that we\'re at the start of a line
    (?![\d -]+$)  # Assert that the line consists not solely of digits, spaces and -s
                  # Assert that the line doesn\'t consist of two Uppercase Words
    (?!\s*\p{Lu}\p{L}*\s+\p{Lu}\p{L}*\s*$)
                  # Match 2-75 of the allowed characters
    [ \wåäöÅÄÖ,()%+\-:.]{2,75}
    (?<!\.)       # Assert that the last one isn\'t a dot
    $             # Assert position at the end of a line
    (?=\n)        # Assert that one newline follows.
    /mxu', 
    $subject, $result, PREG_PATTERN_ORDER);
+3

^EXPRESSION$. , , . , , .

0

Updated due to a modified question:

/(^|[\n\r]{3,}).{2,75}(?<!\.)[\n\r]+/

Example with possible text, etc.

(I'm looking for either \n, or \rbecause the editor on this page seems to treat newlines as \rs)

Previous answer:

/^.+[^.:\n\r]$/m

An example .

0
source

Source: https://habr.com/ru/post/1791977/


All Articles