ITextSharp is read from a specific position

Question

ITextSharp is read from a specific position

I have a problem using iTextSharp when reading data from a PDF file. What I want to achieve is to read only a certain part of the PDF page (I only want to get the address information, which is in a constant position). I have seen using iTextSharp when reading all pages, such as:

        StringBuilder text = new StringBuilder();

        if (File.Exists(fileName))
        {
            PdfReader pdfReader = new PdfReader(fileName);

            for (int page = 1; page <= pdfReader.NumberOfPages; page++)
            {
                ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

                currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
                text.Append(currentText);
            }
            pdfReader.Close();
        }
        return text.ToString();

But how can I limit myself to only a certain place? I am open to using anything, even the OCR technique, as it may happen in the future that some files will be images (but not necessarily at this time). This project is intended only for me, so there is no commercial use.

Thank!

+4

c # ocr itextsharp

Robert J. Jun 12 '14 at 12:59

source share

2

, , PDF Form, , .

0

d347hm4n 12 . '14 13:13

Bruno Lowagie · Accepted Answer · 2014-06-12T14:23:54+0000

SimpleTextExtractionStrategy LocationTextExtractionStrategy. (Java/#). rect - , , :

RenderFilter[] filter = {new RegionTextRenderFilter(rect)};
ITextExtractionStrategy strategy;
StringBuilder sb = new StringBuilder();
for (int i = 1; i <= reader.NumberOfPages; i++) {
    strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);
    sb.AppendLine(PdfTextExtractor.GetTextFromPage(reader, i, strategy));
}

, rect ( rect, iText ).

, MediaBox , :

Rectangle mediabox = reader.GetPageSize(pagenum);

x = mediabox.Left y = mediabox.Bottom; x = mediabox.Right y = mediabox.Top.

x ; y . PDF " ". ( , PDF UserUnit). 72 = 1 .

ITextSharp is read from a specific position

More articles: