How does the IO :: Handle subclass handle to correctly get a low-level descriptor without a file or memory?

I have an application that accesses a PostgreSQL database and needs to read some big binary data from it depending on some necessary processing. It can be hundreds of MB or even several GB of data. Please do not discuss the use of file systems or such as they are now.

This data is simply files of various types, for example. it can be a zip container or some other archive. Some of the necessary processing is a list of Zip contents, perhaps even extracting some elements for further processing, perhaps a hash of the stored data ... In the end, the data is read several times, but written only once to save it.

All the Perl libraries I use can work with file descriptors, some with IO::Handle , others with IO::String or IO::Scalar , some others only with low-level files. So, I created a subclass of IO::Handle and IO::Seekable , which acts as a wrapper for the corresponding methods around DBD::Pg . In CTOR, I create a database connection, open some provided LOID to read and store the handle provided by Postgres in the instance. My own descriptor object is then redirected to someone who can work with such a file descriptor and can directly read and search in the block provided by Postgres.

The problem is that libs use low-level file descriptors or low-level file operations on IO::Handle . Digest::MD5 seems one, Archive::Zip another. Digest::MD5 croak and tells me that no descriptor is provided, Archive::Zip , on the other hand, tries to create a new own descriptor from mine, calls IO::Handle::fdopen and does not work in my case.

 sub fdopen { @_ == 3 or croak 'usage: $io->fdopen(FD, MODE)'; my ($io, $fd, $mode) = @_; local(*GLOB); if (ref($fd) && "".$fd =~ /GLOB\(/o) { # It a glob reference; Alias it as we cannot get name of anon GLOBs my $n = qualify(*GLOB); *GLOB = *{*$fd}; $fd = $n; } elsif ($fd =~ m#^\d+$#) { # It an FD number; prefix with "=". $fd = "=$fd"; } open($io, _open_mode_string($mode) . '&' . $fd) ? $io : undef; } 

I guess the problem is copying a low level descriptor that deletes my own instance, so there is no instance that no longer has my database connection and all that.

So, is it even possible in my case to provide an IO::Handle that can be used successfully wherever a low-level file descriptor was expected?

I mean that I do not have a real file descriptor, I only have an object where the method calls are wrapped in the corresponding Postgres methods, which require a database descriptor. All this data must be stored somewhere, it is necessary to perform packaging, etc.

I tried to do what others do, like IO::String , which for example uses tie . But in the end, the use case is different because Perl is able to independently create a real low-level file descriptor for some internal memory. Something that is not supported at all in my case. I need to save my instance, because only this knows about the database descriptor, etc.

Using my descriptor like IO::Handle , calling the read method and works as expected, but I would like to take it a little further and be more compatible with someone who does not expect to work with IO::Handle objects, Like IO::String or File::Temp can be used as low-level files.

 package ReadingHandle; use strict; use warnings; use 5.10.1; use base 'IO::Handle', 'IO::Seekable'; use Carp (); sub new { my $invocant = shift || Carp::croak('No invocant given.'); my $db = shift || Carp::croak('No database connection given.'); my $loid = shift // Carp::croak('No LOID given.'); my $dbHandle = $db->_getHandle(); my $self = $invocant->SUPER::new(); *$self->{'dbHandle'} = $dbHandle; *$self->{'loid'} = $loid; my $loidFd = $dbHandle->pg_lo_open($loid, $dbHandle->{pg_INV_READ}); *$self->{'loidFd'} = $loidFd; if (!defined($loidFd)) { Carp::croak("The provided LOID couldn't be opened."); } return $self; } sub DESTROY { my $self = shift || Carp::croak('The method needs to be called with an instance.'); $self->close(); } sub _getDbHandle { my $self = shift || Carp::croak('The method needs to be called with an instance.'); return *$self->{'dbHandle'}; } sub _getLoid { my $self = shift || Carp::croak('The method needs to be called with an instance.'); return *$self->{'loid'}; } sub _getLoidFd { my $self = shift || Carp::croak('The method needs to be called with an instance.'); return *$self->{'loidFd'}; } sub binmode { my $self = shift || Carp::croak('The method needs to be called with an instance.'); return 1; } sub close { my $self = shift || Carp::croak('The method needs to be called with an instance.'); my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); return $dbHandle->pg_lo_close($loidFd); } sub opened { my $self = shift || Carp::croak('The method needs to be called with an instance.'); my $loidFd = $self->_getLoidFd(); return defined($loidFd) ? 1 : 0; } sub read { my $self = shift || Carp::croak('The method needs to be called with an instance.'); my $buffer =\shift // Carp::croak('No buffer given.'); my $length = shift // Carp::croak('No amount of bytes to read given.'); my $offset = shift || 0; if ($offset > 0) { Carp::croak('Using an offset is not supported.'); } my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); return $dbHandle->pg_lo_read($loidFd, $buffer, $length); } sub seek { my $self = shift || Carp::croak('The method needs to be called with an instance.'); my $offset = shift // Carp::croak('No offset given.'); my $whence = shift // Carp::croak('No whence given.'); if ($offset < 0) { Carp::croak('Using a negative offset is not supported.'); } if ($whence != 0) { Carp::croak('Using a whence other than 0 is not supported.'); } my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); my $retVal = $dbHandle->pg_lo_lseek($loidFd, $offset, $whence); $retVal = defined($retVal) ? 1 : 0; return $retVal; } sub tell { my $self = shift || Carp::croak('The method needs to be called with an instance.'); my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); my $retVal = $dbHandle->pg_lo_lseek($loidFd); $retVal = defined($retVal) ? $retVal : -1; return $retVal; } 1; 
+5
source share
1 answer

There is a way around this, but it is a bit strange. Your requirements are basically threefold if I read your code and comments correctly:

  • Work as a regular file descriptor / IO :: Access the object as much as possible, make sure that it is not a real file, invisible to the user.
  • Work with Archive::Zip , which is implemented mainly in normal Perl and calls the IO::Handle::fdopen that you sent, which does not allow duplicating the descriptor, since it is not a real descriptor.
  • Work with Digest::MD5 , which is implemented in XS using PerlIO . Since the tricks based on tie and perl in-memory "fake" file descriptors cannot be used at this level, it is cheating than 2.

You can achieve all three of them using PerlIO levels with PerlIO::via . The code is similar to what you write with tie (implement some required behaviors). In addition, you can use the "open variable as file" open functionality and the previously deferred IO::Seekable + IO::Handle IO::File functionality to make it easier to achieve requirement 1 above (to make it suitable for use in Perl code in the same these are regular IO::Handle objects).

The following is an example package that does what you need. It has a few warnings:

  • It does not extend your code or does not interact with the database at all; it just uses the supplied lines theref array as file data. If it looks like it fits your use case, you should adapt it to work with the database.
  • It implements the minimum minimum required to run the demo applications below. You will need to implement a lot more methods to make it "behave" in most cases without a demonstration (for example, he knows nothing about SEEK , EOF , BINMODE , SEEK , et al.). Keep in mind that the arguments / expected behavior of the functions you will implement are not the same as you would for tie or Tie::Handle ; An “interface” has the same name but different contracts.
  • All methods that receive invocant should not use it as hashref / globref directly; they must track all user states in the *$self->{args} glob field. This is because the blessed object is created twice (after it was blessed by PerlIO and once by SUPER::new ), so the state should be used in conjunction with a shared link. If you replace the args field or add / remove any other fields, they will be visible only for the set of methods that created them: either PerlIO methods or methods of a "normal" object. For more details see Comment in the constructor.
  • PerlIO is generally not very introspectively easy. If something does not work under a low-level operation like sysread or <$fh> , a lot of code will distort or do unexpected things, because it believes that these functions cannot die / atomically at the working level. Similarly, when messing around with PerlIO is easy for failure modes, exit the "die or return error" area and end in the "segfault or core dump" area, especially if there are several processes ( fork() ) or threads (these are strange cases, for example, why the module below is not implemented around IO::File->new; followed by $file->open(... "via:<($class)") , this kernel resets for me, I don’t know why). TL DR debugging, why stuff is going wrong at the PerlIO level, can annoy you, you were warned :)
  • Any XS code that accesses a raw file descriptor or does not work through the PerlIO perlapi functions will not honor this. Unfortunately, there are many, but usually not common, well-supported CPAN modules. Basically, Digest::MD5 doesn’t work with bound handles, because it works at a level “below” tie magic; PerlIO is one level below, but there is another level below.
  • This code is a bit messy and can certainly be cleaned up. In particular, it would be a little better to open() layered object directly, skip all the strange objects of a pseudo-indirect object, and then wrap it in IO :: Handle in some other way, for example. via IO::Wrap .
  • PerlIO does not work or works differently on many older Perls.

Packaging:

 package TiedThing; use strict; use warnings; use parent "IO::File"; our @pushargs; sub new { my ( $class, $args ) = @_; # Build a glob to be used by the PerlIO methods. This does two things: # 1. Gets us a place to stick a shared hashref so PerlIO methods and user- # -defined object methods can manipulate the same data. They must use the # {args} glob field to do that; new fields written will . # 2. Unifies the ways of addressing that across custom functions and PerlIO # functions. We could just pass a hashref { args => $args } into PUSHED, but # then we'd have to remember "PerlIO functions receive a blessed hashref, # custom functions receive a blessed glob" which is lame. my $glob = Symbol::gensym(); *$glob->{args} = $args; local @pushargs = ($glob, $class); my $self = $class->SUPER::new(\my $unused, "<:via($class)"); *$self->{args} = $args; return $self; } sub custom { my $self = shift; return *$self->{args}->{customvalue}; } sub PUSHED { return bless($pushargs[0], $pushargs[1]); } sub FILL { return shift(@{*$_[0]->{args}->{lines}}); } 1; 

Usage example:

 my $object = TiedThing->new({ lines => [join("\n", 1..9, 1..9)], customvalue => "custom!", }); say "can call custom method: " . $object->custom; say "raw read with <>: " . <$object>; my $buf; read($object, $buf, 10); say "raw read with read(): " . $buf; undef $buf; $object->read($buf, 10); say "OO read via IO::File::read (end): " . $buf; my $checksummer = Digest::MD5->new;; $checksummer->addfile($object); say "Md5 read: " . $checksummer->hexdigest; my $dupto = IO::Handle->new; # Doesn't break/return undef; still not usable without implementing # more state sharing inside the object. say "Can dup handle: " . $dupto->fdopen($object, "r"); my $archiver = Archive::Zip->new; # Dies, but long after the fdopen() call. Can be fixed by implementing more # PerlIO methods. $archiver->readFromFileHandle($object); 
+1
source

Source: https://habr.com/ru/post/1259149/


All Articles