How can I parse content on a PDF page using Swift

The documentation is not entirely clear to me. So far, I believe that I need to configure the CGPDFOperatorTable, and then create the CGPDFContentStreamCreateWithPage and CGPDFScannerCreate pages on the PDF page.

The documentation is about setting up callbacks, but it is not clear to me how to do this. How to get content from the page?

This is my code so far.

let pdfURL = NSBundle.mainBundle().URLForResource("titleofdocument", withExtension: "pdf") // Create pdf document let pdfDoc = CGPDFDocumentCreateWithURL(pdfURL) // Nr of pages in this PF let numberOfPages = CGPDFDocumentGetNumberOfPages(pdfDoc) as Int if numberOfPages <= 0 { // The number of pages is zero return } let myTable = CGPDFOperatorTableCreate() // lets go through every page for pageNr in 1...numberOfPages { let thisPage = CGPDFDocumentGetPage(pdfDoc, pageNr) let myContentStream = CGPDFContentStreamCreateWithPage(thisPage) let myScanner = CGPDFScannerCreate(myContentStream, myTable, nil) CGPDFScannerScan(myScanner) // Search for Content here? // ?? CGPDFScannerRelease(myScanner) CGPDFContentStreamRelease(myContentStream) } // Release Table CGPDFOperatorTableRelease(myTable) 

This is a similar question: PDF analysis using SWIFT , but no answer yet.

+5
source share
2 answers

Here is an example of callbacks implemented in Swift:

  let operatorTableRef = CGPDFOperatorTableCreate() CGPDFOperatorTableSetCallback(operatorTableRef, "BT") { (scanner, info) in print("Begin text object") } CGPDFOperatorTableSetCallback(operatorTableRef, "ET") { (scanner, info) in print("End text object") } CGPDFOperatorTableSetCallback(operatorTableRef, "Tf") { (scanner, info) in print("Select font") } CGPDFOperatorTableSetCallback(operatorTableRef, "Tj") { (scanner, info) in print("Show text") } CGPDFOperatorTableSetCallback(operatorTableRef, "TJ") { (scanner, info) in print("Show text, allowing individual glyph positioning") } let numPages = CGPDFDocumentGetNumberOfPages(pdfDocument) for pageNum in 1...numPages { let page = CGPDFDocumentGetPage(pdfDocument, pageNum) let stream = CGPDFContentStreamCreateWithPage(page) let scanner = CGPDFScannerCreate(stream, operatorTableRef, nil) CGPDFScannerScan(scanner) CGPDFScannerRelease(scanner) CGPDFContentStreamRelease(stream) } 
+4
source

You have clearly indicated how to do this, all you have to do is collect and try until it works.

First of all, you need to set up a table with callbacks when you indicate yourself at the beginning of your question (all code in Objective-C, NOT Swift):

 CGPDFOperatorTableRef operatorTable = CGPDFOperatorTableCreate(); CGPDFOperatorTableSetCallback(operatorTable, "q", &op_q); CGPDFOperatorTableSetCallback(operatorTable, "Q", &op_Q); 

This table contains a list of PDF statements for which you want to receive an invitation, and associates a callback with them. These callbacks are just functions that you define elsewhere:

 static void op_q(CGPDFScannerRef s, void *info) { // Do whatever you have to do in here // info is whatever you passed to CGPDFScannerCreate } static void op_Q(CGPDFScannerRef s, void *info) { // Do whatever you have to do in here // info is whatever you passed to CGPDFScannerCreate } 

And then you create a scanner and run it, passing it the information that you just defined.

 // Passing "self" is just an example, you can pass whatever you want and it will be provided to your callback whenever it is called by the scanner. CGPDFScannerRef contentStreamScanner = CGPDFScannerCreate(contentStream, operatorTable, self); 

CGPDFScannerScan (contentStreamScanner);

If you want to see a complete example with source code on how to find and process images, check out this site .

+1
source

Source: https://habr.com/ru/post/1245421/


All Articles