ASP.NET library for extracting plaintext from Open XML file formats

Question

ASP.NET library for extracting plaintext from Open XML file formats

Is there an existing library for extracting text form Open XML file formats (such as docx, pptx and xlsx)?

I need to fill out the lucene.net index.

I found this example that extracts text from docx and seems to work fine. But before building my own solution based on this, I was wondering if there is anything already available for other file formats?

+4

asp.net lucene.net openxml

Myster May 6, '10 at 3:37

source share

3 answers

see aspose.com, they have a good library to handle both ppt and pptx.

0

Yaroslav yakovlev Jun 27 '10 at 4:50

source share

You can try Toxy, a text / data unpacking environment for .NET. Currently it supports xls, xlsx, doc, docx. It will support pptx in version 1.5 very soon.

See here for more details.

0

Tony qu Mar 05 '15 at 23:14

source share