Perl6 How to get all lines that do not indent by the width of spaces?

I have a very, very large text file that I'm working on, has lines with different sizes of indentation. These valid lines have an indentation width of 12 characters, which is created by a combination of tabs and spaces. Now I want to get all lines that do not have a 12-character indent width, and these lines have from 0 to 11 characters the width of the indents from combinations of tabs and spaces.

if $badLine !~~ m/ ^^ [\s ** 12 ||
                      \t \s ** 4 ||
                      \s \t \s ** 3 ] / { say $badLine; }

But the problem is that when you work on a text file with a word processor, pressing the tab key can give you anywhere from 0 to 8 char -width spaces to fill in the gap. What would be a reasonable way to get all these inappropriate lines that didn't have 12-char -width indentation?

Thank.

+4
source share
2 answers

Width 12

For an indentation width of 12, assuming the tab stops at positions 0, 8, 16, etc .:

for $input.lines {
    .say if not /
        ^                             # start of line
        [" " ** 8 || " " ** 0..7 \t]  # whitespace up to first tab stop
        [" " ** 4]                    # whitespace up to position 12
        [\S | $]                      # non-space character or end of line
    /;
}

Explanation:

  • To go from the beginning of the line (position 0) to the first tab tab (position 8), there are two possibilities that we need to match:

    • 8 spaces.
    • 0 to 7 spaces, and then 1 tab. (The tab goes straight to the tab stack, so that it fills any width after spaces.)
  • The only way to get from stopping the tab (position 8) to the indentation goal (position 12) is to use 4 spaces. (The tab will move through the target to the next tab stop at position 16.)

  • , , , .

named token, :

my token indent ($width) {
    [" " ** 8 || " " ** 0..7 \t] ** {$width div 8}
     " " ** {$width % 8}
}

.say if not /^ <indent(12)> [\S | $]/ for $input.lines;

:

  • , , , , , . ($width div 8 , div - ).

  • , , . ($width % 8 , % modulo.)

, (, ). , , :

my token indent ($width) {  
    :my ($before-first-stop, $numer-of-stops, $after-last-stop);
    {
        $before-first-stop = min $width, 8 - $/.from % 8;
        $numer-of-stops    = ($width - $before-first-stop) div 8;
        $after-last-stop   = ($width - $before-first-stop) % 8;
    }
    [" " ** {$before-first-stop} || " " ** {^$before-first-stop} \t]
    [" " ** 8 || " " ** 0..7 \t] ** {$numer-of-stops}
     " " ** {$after-last-stop}
}

:

  • , , , , , , .

  • $/.from; - .

  • ( ) , .

+6

, ( ), :

# some test input
my \INPUT = qq:to/EOI/;
           11s
            12s
             13s
\t    1t 4s
 \t   1s 1t 3s
    4s
   \t    3s 1t 4s
        \t8s 1t
EOI

# compute indentation width
sub indent-width($_) {
    my $n = 0;

    # iterate over characters
    for .comb {
        # tabs only take enough space to fill an octet
        when "\t" { $n += 8 - $n % 8 }
        default { ++$n }
    }
    $n;
}

# generate output, see below
say ?/^ :r (\h+) <?{ indent-width(~$0) == 12 }> /, " {.trim}"
    for INPUT.lines;

/^ :r (\h+) <?{ indent-width(~$0) == 12 }> /

, <?{...}>, , $0 12.

, :r, regex : 12 .

+4

Source: https://habr.com/ru/post/1668086/


All Articles