How to remove [sub] hash based on keys / values of another hash?

Question

How to remove [sub] hash based on keys / values of another hash?

Suppose I have two hashes. One of them contains a set of data that should contain only what is displayed in another hash.

eg.

my %hash1 = ( test1 => { inner1 => { more => "alpha", evenmore => "beta" } }, test2 => { inner2 => { more => "charlie", somethingelse => "delta" } }, test3 => { inner9999 => { ohlookmore => "golf", somethingelse => "foxtrot" } } ); my %hash2 = ( major=> { test2 => "inner2", test3 => "inner3" } );

What I would like to do is to delete the entire subach in hash1 if it does not exist as a key / value in hash2 {major}, preferably without modules. The information contained in "innerX" does not matter, it just needs to be left alone (if the suba is not deleted, then it may disappear).

In the above example, after performing this operation, hash1 will look like this:

 my %hash1 = ( test2 => { inner2 => { more => "charlie", somethingelse => "delta" } }, );

It removes hash1 {test1} and hash1 {test3} because they do not match anything in hash2.

Here is what I tried now, but it does not work. And this is probably not the safest thing, as I iterate over the hash trying to delete it. However, I delete everything that should be in order?

It was my attempt to do this, however perl complains:

You cannot use string ("inner1") as a HASH ref, while "strict refs" when using

 while(my ($test, $inner) = each %hash1) { if(exists $hash2{major}{$test}{$inner}) { print "$test($inner) is in exists.\n"; } else { print "Looks like $test($inner) does not exist, REMOVING.\n"; #not to sure if $inner is needed to remove the whole entry delete ($hash1{$test}{$inner}); } }

+4

perl perl-data-structures hash

Zack Apr 2 '10 at 20:41

source share

4 answers

You can do this as a single line, as delete () will accept an array of keys. It was not as easy as I first thought, but now I read the problem correctly ...

 delete @hash1{ grep( !( exists($hash2{major}->{$_}) && exists( $hash1{$_}->{ $hash2{major}->{$_} } ) ), keys %hash1 ) };

+4

Penfold Apr 2 '10 at 21:29

source share

So I would do it: (Third charm attempt)

 foreach ( map { [ $_ => $hash2{major}{$_} ] } keys %hash1 ) { my ( $key, $value ) = @$_; if ( defined $value and my $new_value = $hash1{$key}{$value} ) { $hash1{$key} = $new_value; } else { delete $hash1{$key}; } }

+1

Axeman Apr 2 '10 at 21:05

source share

 # This is the actual hash we want to iterate over. my $keepers = $hash2{major}; %hash1 = map { $_ => $hash1{$_} } # existing key and hash contents in %hash1 grep { exists $keepers->{$_} and # key there? exists $hash1{$_}->{ $keepers->{$_} } } # key in hash there? (keys %hash1); # All the keys we might care about

This works because we essentially develop lists of things we want / don't want in three independent steps:

A key call gets all the keys that are in hash1 in one step.
The grep command generates (in one step) a list of keys matching our criteria.
The map generates (in one step) the set of keys and values that we want.

Thus, we will never change the primary hash until we are ready to do it. If% hash1 contains many keys, we will use a lot of memory. If you are worried about this, you would do something like this:

 # Initialization as before ... use File::Temp qw(tempfile); my ($fh, $file) = tempfile(); my $keepers = $hash2{major}; print $fh "$_\n" for (keys %hash1); close $fh; open $fh, "<", $file or die "can't reopen tempfile $file: $!\n"; while ( defined ($_ = <$fh>) ) { chomp; delete $hash1{$_} unless exists $keepers->{$_} and exists $hash1{$_}->{ $keepers->{$_} }; }

This works because we do not iterate over the hash, but over a saved copy of its keys.

+1

Joe mcmahon Apr 2 '10 at 21:53

source share

Greg bacon · Accepted Answer · 2010-04-02T21:03:26+0000

You were close. Remember that $hash2{major}{$test} is a scalar, not a hash link.

 #! /usr/bin/perl use strict; use warnings; my %hash1 = ( test1 => { inner1 => { more => "alpha", evenmore => "beta" } }, test2 => { inner2 => { more => "charlie", somethingelse => "delta" } }, test3 => { inner9999 => { ohlookmore => "golf", somethingelse => "foxtrot" } } ); my %hash2 = ( major => { test2 => "inner2", test3 => "inner3" } ); foreach my $k (keys %hash1) { my $delete = 1; foreach my $inner (keys %{ $hash1{$k} }) { $delete = 0, last if exists $hash2{major}{$k} && $hash2{major}{$k} eq $inner; } delete $hash1{$k} if $delete; } use Data::Dumper; $Data::Dumper::Indent = 1; print Dumper \%hash1;

The line starting with $delete = 0, ... is a bit similar. This is equivalent to $delete = 0; last; $delete = 0; last; within another conditional expression, but it is already nested twice. Not wanting to create a matryoshka doll , I used the operator modifier , but, as the name suggests, it modifies one operator.

What is where the Pera comma operator comes in:

Binary is a comma operator. In a scalar context, it evaluates the left argument, returns that value, then evaluates its right argument and returns that value. It looks like a C-comma.

In this case, the left argument is the expression $delete = 0 , and the right argument is last .

A conditional may seem overly fussy, but

 ... if $hash2{major}{$k} eq $inner;

issues undefined -value warnings when testing for tests not mentioned in %hash2 (for example, test1 / inner1). Using

 .. if $hash2{major}{$k} && $hash2{major}{$k} eq $inner;

would incorrectly delete the test mentioned in %hash2 if its "internal name" was a false value, such as the string "0" . Yes, using exists here can be uselessly fussy, but without knowing your actual hash keys, I chose a conservative route.

Conclusion:

  $ VAR1 = {
   'test2' => {
     'inner2' => {
       'somethingelse' => 'delta',
       'more' => 'charlie'
     }
   }
 };

Although you are not breaking it, remember the following caution regarding using each :

If you add or remove hash elements while iterating over it, you can get missing or duplicate entries, so no. Exception: you can always delete an item that was recently returned by each , which means that the following code will work:
  while (($key, $value) = each %hash) { print $key, "\n"; delete $hash{$key}; # This is safe } 

Update: Finding hashes as if they were arrays (impress your CS fasteners by saying "... linearly, not logarithmically") is a red flag, and the code above does just that. The best approach that turns out to be similar to Penfold's answer is

 %hash1 = map +($_ => $hash1{$_}), grep exists $hash2{major}{$_} && exists $hash1{$_}{ $hash2{major}{$_} }, keys %hash1;

In a pleasant declarative style, it describes the desired contents of %hash1 , namely

first level keys %hash1 should be mentioned in $hash2{major} and
the value in $hash2{major} corresponding to each key of the first level should be a subsection of this key back to %hash1

(Wow, dizzy. We need some placeholder variables in English!)

The unary plus in +($_ => $hash1{$_}) removes the ambiguity for a weak parser, so it knows that we want the expression to be treated as a “pair”. See the end of the perlfunc documentation on map for other cases where this may be necessary.

How to remove [sub] hash based on keys / values ​​of another hash?

More articles:

How to remove [sub] hash based on keys / values of another hash?