Java library for reading Word documents

Is there an open source Java library for reading Word documents (both .docx and the earlier .doc format)?

Read-only access, if sufficient; I do not need to modify Word documents using Java. However, I would like to have access to images and style information.

EDIT

I checked the Apache POI, but it does not seem to be actively supported. See http://poi.apache.org/hwpf/index.html :

At the moment, we, unfortunately, do not have someone who cares about the HWPF and contributes to its development.

+1
source share
3 answers

Apache POI HWPF for .doc and XWPF for .docx files

+6
source

There is an apache project that does this: http://poi.apache.org//

+5
source
public class XParseTest { public static void main(String[] args) throws XmlException, OpenXML4JException, IOException { File file=new File("e:\\testing\\new.docx"); FileInputStream fs = new FileInputStream(file); OPCPackage d = OPCPackage.open(fs); XWPFWordExtractor xw = new XWPFWordExtractor(d); System.out.println(xw.getText()); } } 

this will parse the docx file ...

0
source

Source: https://habr.com/ru/post/1389490/


All Articles