Convert to PDF / A and check Linux compliance

I am working on an online portal where researchers can upload their research papers. One of the requirements is that all PDF files are stored in PDF / A format. Since I cannot rely on users to create PDF / A documents, I need a tool to check and convert standard PDF files to PDF / A format.

What is the best tool you know?

  • Price
  • Quality
  • Speed
  • Available APIs

Open source tools are preferred, but the search did not reveal. iText can create PDF / a, but it’s not easy to convert, since you need to read each page and copy it to a new document, losing all the bookmarks and annotations in the process. (At least as far as I know, if you know about a simple solution, let me know).

APIs must be available for either PHP, Java, or the command line. Please do not specify either GUI-only or Online-only solutions.

+17
java linux php pdf pdfa
Jan 21 '09 at 9:14
source share
5 answers

I am not sure that all your goals can be met at the same time. The story around PDF / A is much more complicated than format conversions like tiff to png.

  • PDF 1.4 basic format: what to do with higher version documents that use functions from these higher versions? Information may be lost.
  • In both PDF / A-1a and 1b, XMP / RDF metadata is required. If the source document is without metadata, you need to get it from somewhere and add it. At least iText can do this.
  • There are many small details to get right, from embedding fonts to making sure that spaces are present, not just horizontal movement commands.

To summarize: I believe that you are better off putting some or all responsibility for conforming to the manufacturers of PDF files. Of course, this does not mean that you cannot help them: if you find out what tools most use to create their documents, you can specify documentation about PDF / A and specific tools. (see this for some extreme example of such documentation)

Good luck with your efforts.

+8
Jan 21 '09 at 22:19
source share

I worked at the French National Library to create an archival system that did similar things. Like most of the top ten libraries in the world, we used JHOVE to recognize file formats.

JHOVE can indicate whether the files are PDF / A or not, and can even check them out. He also knows 7 other types of PDF, see details .

JHOVE is open source, supported by JSTOR and the Harvard University Library. It is quite simple to use .

+8
Jun 09 '10 at 9:30
source share

For the identification part, you can try the Droid tool (identification of a digital recording object), which provides access to the Pronom Technical Registry (which contains PDF / A ).

+3
Jan 21 '09 at 22:39
source share

The Open API API project may be what you are looking for. Starting with version 2.4, Open Office supports PDF / a documents. Here is a sample code on a website on how to convert documents, this example is in Java.

+1
Jan 21 '09 at 16:48
source share

I'm not sure about the PDF / docs, but have you looked at jodconverter? It can convert many different formats for you, and it is open source. We use it quite widely in our project.

http://www.artofsolving.com/opensource/jodconverter

0
Jan 21 '09 at 13:43
source share



All Articles