Get text paragraph from pdf using itextsharp

Question

Get text paragraph from pdf using itextsharp

Is there any logic to get paragraph text from pdf file using itextsharp? I know that pdf only supports the launch of texts and it is difficult to determine which lines of texts are associated with which paragraph, and I also know that there are no <p> or other tags to define a paragraph in pdf. However, I tried to get the coordinates of the text runs to build a paragraph from its coordinates, but with no luck :(. My piece of code is here:

 private StringBuilder result = new StringBuilder(); private Vector lastBaseLine; //to store run of texts public List<string> strings = new List<String>(); //to store run of texts Coordinate (Y coordinate) public List<float> baselines = new List<float>(); public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo renderInfo) { Vector curBaseline = renderInfo.GetBaseline().GetStartPoint(); if ((this.lastBaseLine != null) && (curBaseline[Vector.I2] != lastBaseLine[Vector.I2])) { if ((!string.IsNullOrEmpty(this.result.ToString()))) { this.baselines.Add(this.lastBaseLine[Vector.I2]); this.strings.Add(this.result.ToString()); } result = new StringBuilder(); } this.result.Append(renderInfo.GetText()); this.lastBaseLine = curBaseline; }

Does any body have any logic related to this problem?

+6

c # asp.net pdf-parsing itextsharp

Bibek gautam Jun 14 '13 at 5:39

source share

1 answer

Vinoth Ezhilan M · Answer 1 · 2013-08-05T12:07:12+0000

 using (MemoryStream ms = new MemoryStream()) { Document document = new Document(PageSize.A4, 25, 25, 30, 30); PdfWriter writer = PdfWriter.GetInstance(document, ms); document.Open(); document.Add(new Paragraph("Hello World")); document.Close(); writer.Close(); Response.ContentType = "pdf/application"; Response.AddHeader("content-disposition", "attachment;filename=First PDF document.pdf"); Response.OutputStream.Write(ms.GetBuffer(), 0, ms.GetBuffer().Length); }

Here are some examples to help you with this ....

This may not be exactly what you need, but it may help you.

Get text paragraph from pdf using itextsharp

More articles: