Ruby libraries for parsing .doc files?

I am just wondering if anyone knew of good libraries for parsing .doc files (and similar formats, for example .odt) to extract text, and also save formatting information where possible for display on a website.

Being able to do the same for PDF files would be a bonus, but I don't have much for that.

This is for the Rails project, if that helps at all.

Thanks in advance!

+4
source share
1 answer

Apache POI is a very popular way to access Word and Excel documents. There's a binding to the Ruby POI , which may be worth exploring, but it looks like you have to create one yourself. And the API does not seem very similar to Ruby, since it is an almost direct port from Java code. And it looks like it was tested only against Ruby 1.8.2.

+2
source

Source: https://habr.com/ru/post/1310290/


All Articles