Understanding the code: Hash, grep for duplicates (modified to check for multiple elements)

The code:

@all_matches = grep { ! ( $seensentence { $_->[0] .'-'. $_->[1] .'-'. $_->[5] } ++ ) } @all_matches; 

Purpose: This code removes duplicates of certain elements from the @all_matches array, which is AoA.

My attempt at a complete breakdown (using ?? .. around where I'm not sure):

Grep returns @all_matches elements that return true.

%seensentence hash key three elements? of @all_matches . Since the hash can have only unique keys, the first time through it the value is increased from undef (0) to 1. The next time this is a certain value, but ! means that grep returns it only if it is undef (the unique value associated with this element).


My questions:

(1) How to turn {$_->[0] .'-'. $_->[1] .'-'. $_->[5]}++ {$_->[0] .'-'. $_->[1] .'-'. $_->[5]}++ {$_->[0] .'-'. $_->[1] .'-'. $_->[5]}++ in HoH?

I was told that this is another (idiomatic) way to achieve this. A blow in the dark will be:

 ( {$_->[0] => 0, $_->[1] => 0, $_->[5] => 0} )++ 

(1b) Because I do not understand how the original does what I want. I read that -bareword equivalent to "-bareword" , so I tried: {"$_->[0]" . "$_->[1]". "$_->[5]"} {"$_->[0]" . "$_->[1]". "$_->[5]"} {"$_->[0]" . "$_->[1]". "$_->[5]"} and it seemed to work the exact same way. Nevertheless, I don’t understand: does each element process as a key (a) separately (for example, an array of keys) or does it (b) Correct : all at the same time (since . Combines them all on one line) or does it (c) not do that , what I think?

(2) What does this mean: $_->[0] || $_->[1] || $_->[5] $_->[0] || $_->[1] || $_->[5] $_->[0] || $_->[1] || $_->[5] ? He does not do the same as above.

I read that: logical short circuit operators return the last value, so he checked the value in {$_->[0]} , and if he was alone, I thought that the value would increase there if it did not check the next element as long as none were true, that is, when grep passes a unique value.


Thanks for your time, I tried to be as thorough as possible (by mistake?), But let me know if something is missing.

+6
source share
2 answers

First, include grep in the foreach so that we can take a closer look at it. I am going to expand some idioms into larger designs for more clarity.

 my @all_matches = ( ... ); { my %seen; my @no_dupes; foreach my $match ( @all_matches ) { my $first_item = $match->[0]; my $second_item = $match->[1]; my $third_item = $match->[5]; my $key = join '-', $first_item, $second_item, $third_item; if( not $seen{ $key }++ ) { push @no_dupes, $match; } } @all_matches = @no_dupes; } 

In other words, the source encoder creates a hash key using a reference to the array contained in $ match for each of the reference indices $match->[0] , 1 and 5 . Since the hash keys are unique, any duplicates will be deleted, checking if the key exists before pressing @no_dupes .

The grep{} mechanism is simply more efficient code (i.e., a faster type and no variables) to accomplish the same thing. If this works, why reorganize it? What does this not do, what do you need to improve?

To do the same with HoH, you can do this:

 my @all_matches = ( ... ); { my %seen; my @no_dupes; foreach my $match ( @all_matches ) { my $first_item = $match->[0]; my $second_item = $match->[1]; my $third_item = $match->[5]; if( not $seen{ $first_item }->{ $second_item }->{ $third_item }++ ) { push @no_dupes, $match; } } @all_matches = @no_dupes; } 

What can be translated back to grep as follows:

 my @all_matches = ( ... ); { my %seen; @all_matches = grep { not $seen{$_->[0]}->{$_->[1]}{$_->[5]}++ } @all_matches; } 

However, this is the case when I do not see a clear advantage for building a data structure, if you are not going to use %seen later for something else.

Regarding the operator || this is another animal. I cannot think of any useful way to use it in this context. The logical short circuit operator, say, " $a || $b || $c ", checks the logical veracity of $a . If it is true, it returns its value. If it is wrong, it checks $b in the same way. If it is wrong, it checks $c in the same way. But if $a true, $b never checked. If $b true, $c never checked.

+5
source

The $ seensentence key is a simple string. This expression is $_->[0] .'-'. $_->[1] .'-'. $_->[5] $_->[0] .'-'. $_->[1] .'-'. $_->[5] $_->[0] .'-'. $_->[1] .'-'. $_->[5] builds a string. Here is the equivalent expression: join '-', $_->[0], $_->[1], $_->[5] . It is assumed that elements 0, 1, and 5 are enough to identify duplicates in @all_matches.

Edit
Missed your last question.

$_->[0] || $_->[1] || $_->[5] $_->[0] || $_->[1] || $_->[5] returns

  • $_->[0] if $_->[0] not false (0, empty string, undefined),
  • $_->[1] If $_->[1] not false,
  • $_->[5] otherwise.

Shortcut operators stop as soon as it makes sense to stop. In the case of || this happens as soon as the result is an incorrect value. In the case of && this happens as soon as the result is false.

+4
source

Source: https://habr.com/ru/post/892403/


All Articles