How can I extract data in a Word document using Perl?

How to extract data from a doc word using Perl?

+3
source share
5 answers

If you are not on Windows, I think the best way might be to convert it first.

If you are not using Windows and do not have access to Win32 :: OLE , you can use OpenOffice to convert documents .

You can wrap the script in a link in your Perl program. Although the link starts with a PDF, if you read on it, it can convert it to text. Also see. Fooobar.com/questions/565252 / ... .

+2
source

Win32:: OLE, script Windows Word.

? , antiword?

+1
use Win32::OLE;
use Win32::OLE::Enum;

$document = Win32::OLE -> GetObject($ARGV[1]);
open (FH,">$ARGV[0]");

print "Extracting Text ...\n";

$paragraphs = $document->Paragraphs();
$enumerate = new Win32::OLE::Enum($paragraphs);
while(defined($paragraph = $enumerate->Next()))
{
    $style = $paragraph->{Style}->{NameLocal};
    print FH "+$style\n";
    $text = $paragraph->{Range}->{Text};
    $text =~ s/[\n\r]//g;
    $text =~ s/\x0b/\n/g;
    print FH "=$text\n";
}

+1

Windows COM- Word.

-, "catdoc" libwv.

0

Word . .docx, .zip, , , . , Microsoft .

0

Source: https://habr.com/ru/post/1712548/


All Articles