I have an application that accesses a PostgreSQL database and needs to read some big binary data from it depending on some necessary processing. It can be hundreds of MB or even several GB of data. Please do not discuss the use of file systems or such as they are now.
This data is simply files of various types, for example. it can be a zip container or some other archive. Some of the necessary processing is a list of Zip contents, perhaps even extracting some elements for further processing, perhaps a hash of the stored data ... In the end, the data is read several times, but written only once to save it.
All the Perl libraries I use can work with file descriptors, some with IO::Handle , others with IO::String or IO::Scalar , some others only with low-level files. So, I created a subclass of IO::Handle and IO::Seekable , which acts as a wrapper for the corresponding methods around DBD::Pg . In CTOR, I create a database connection, open some provided LOID to read and store the handle provided by Postgres in the instance. My own descriptor object is then redirected to someone who can work with such a file descriptor and can directly read and search in the block provided by Postgres.
The problem is that libs use low-level file descriptors or low-level file operations on IO::Handle . Digest::MD5 seems one, Archive::Zip another. Digest::MD5 croak and tells me that no descriptor is provided, Archive::Zip , on the other hand, tries to create a new own descriptor from mine, calls IO::Handle::fdopen and does not work in my case.
sub fdopen { @_ == 3 or croak 'usage: $io->fdopen(FD, MODE)'; my ($io, $fd, $mode) = @_; local(*GLOB); if (ref($fd) && "".$fd =~ /GLOB\(/o) {
I guess the problem is copying a low level descriptor that deletes my own instance, so there is no instance that no longer has my database connection and all that.
So, is it even possible in my case to provide an IO::Handle that can be used successfully wherever a low-level file descriptor was expected?
I mean that I do not have a real file descriptor, I only have an object where the method calls are wrapped in the corresponding Postgres methods, which require a database descriptor. All this data must be stored somewhere, it is necessary to perform packaging, etc.
I tried to do what others do, like IO::String , which for example uses tie . But in the end, the use case is different because Perl is able to independently create a real low-level file descriptor for some internal memory. Something that is not supported at all in my case. I need to save my instance, because only this knows about the database descriptor, etc.
Using my descriptor like IO::Handle , calling the read method and works as expected, but I would like to take it a little further and be more compatible with someone who does not expect to work with IO::Handle objects, Like IO::String or File::Temp can be used as low-level files.
package ReadingHandle; use strict; use warnings; use 5.10.1; use base 'IO::Handle', 'IO::Seekable'; use Carp (); sub new { my $invocant = shift || Carp::croak('No invocant given.'); my $db = shift || Carp::croak('No database connection given.'); my $loid = shift // Carp::croak('No LOID given.'); my $dbHandle = $db->_getHandle(); my $self = $invocant->SUPER::new(); *$self->{'dbHandle'} = $dbHandle; *$self->{'loid'} = $loid; my $loidFd = $dbHandle->pg_lo_open($loid, $dbHandle->{pg_INV_READ}); *$self->{'loidFd'} = $loidFd; if (!defined($loidFd)) { Carp::croak("The provided LOID couldn't be opened."); } return $self; } sub DESTROY { my $self = shift || Carp::croak('The method needs to be called with an instance.'); $self->close(); } sub _getDbHandle { my $self = shift || Carp::croak('The method needs to be called with an instance.'); return *$self->{'dbHandle'}; } sub _getLoid { my $self = shift || Carp::croak('The method needs to be called with an instance.'); return *$self->{'loid'}; } sub _getLoidFd { my $self = shift || Carp::croak('The method needs to be called with an instance.'); return *$self->{'loidFd'}; } sub binmode { my $self = shift || Carp::croak('The method needs to be called with an instance.'); return 1; } sub close { my $self = shift || Carp::croak('The method needs to be called with an instance.'); my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); return $dbHandle->pg_lo_close($loidFd); } sub opened { my $self = shift || Carp::croak('The method needs to be called with an instance.'); my $loidFd = $self->_getLoidFd(); return defined($loidFd) ? 1 : 0; } sub read { my $self = shift || Carp::croak('The method needs to be called with an instance.'); my $buffer =\shift // Carp::croak('No buffer given.'); my $length = shift // Carp::croak('No amount of bytes to read given.'); my $offset = shift || 0; if ($offset > 0) { Carp::croak('Using an offset is not supported.'); } my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); return $dbHandle->pg_lo_read($loidFd, $buffer, $length); } sub seek { my $self = shift || Carp::croak('The method needs to be called with an instance.'); my $offset = shift // Carp::croak('No offset given.'); my $whence = shift // Carp::croak('No whence given.'); if ($offset < 0) { Carp::croak('Using a negative offset is not supported.'); } if ($whence != 0) { Carp::croak('Using a whence other than 0 is not supported.'); } my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); my $retVal = $dbHandle->pg_lo_lseek($loidFd, $offset, $whence); $retVal = defined($retVal) ? 1 : 0; return $retVal; } sub tell { my $self = shift || Carp::croak('The method needs to be called with an instance.'); my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); my $retVal = $dbHandle->pg_lo_lseek($loidFd); $retVal = defined($retVal) ? $retVal : -1; return $retVal; } 1;