ITextSharp Replace text in existing PDF without loss of education

Question

ITextSharp Replace text in existing PDF without loss of education

I searched the Internet for 2 weeks and found interesting solutions to my problem, but nothing gives me an answer.

My goal is to do the following:

I want to find text in a static PDF file and replace this text with another text. I would like to keep the content design. Is it really that hard?

I found a way, but I lost all the information:

using (PdfReader reader = new PdfReader(path)) { StringBuilder text = new StringBuilder(); for (int i = 1; i <= reader.NumberOfPages; i++) { text.Append(PdfTextExtractor.GetTextFromPage(reader, i)); text.Replace(txt_SuchenNach.Text, txt_ErsetzenMit.Text); } return text.ToString(); }

My second attempt was better, but I need fields where I can change the text inside:

  string fileNameExisting =path; string fileNameNew = @"C:\TEST.pdf"; using (FileStream existingFileStream = new FileStream(fileNameExisting, FileMode.Open)) using (FileStream newFileStream = new FileStream(fileNameNew, FileMode.Create)) { // PDF öffnen PdfReader pdfReader = new PdfReader(existingFileStream); PdfStamper stamper = new PdfStamper(pdfReader, newFileStream); var form = stamper.AcroFields; var fieldKeys = form.Fields.Keys; foreach (string fieldKey in fieldKeys) { var value = pdfReader.AcroFields.GetField(fieldKey); form.SetField(fieldKey, value.Replace(txt_SuchenNach.Text, txt_ErsetzenMit.Text)); } // Textfeld unbearbeitbar machen (sieht aus wie normaler text) stamper.FormFlattening = true; stamper.Close(); pdfReader.Close(); }

This preserves the formatting of the rest of the text and changes only the text. I need a solution for text that is NOT in the text box.

Thanks for all your answers and your help.

+6

c # pdf itextsharp

Kevin plaul Apr 13 '15 at 8:16

source share

2 answers

Eugene m · Answer 1 · 2015-04-13T09:46:09+0000

A common problem is that text objects can use embedded fonts with specific glyphs assigned to specific letters. That is, if you have a text object with some text such as "abcdef", then the embedded font may contain glyphs for these letters ("abcdef"), but not for other letters. Therefore, if you replace "abcdef" with "xyz", then the PDF will not display these "xyz" because glyphs are not displayed for these letters.

So, I would consider the following workflow:

Iterate through all text objects;
Add new text objects created from scratch on top of the PDF file and set the same properties (font, position, etc.), but with different text; This step may require the same fonts to be installed on your computer as in the original PDF file, but you can check the installed fonts and use a different font for the new text object. In this way, iTextSharp or another PDF tool will embed a new font object for the new text object.
Delete the original text object after creating the duplicated text object;
process each text object with the workflow described above;
Save the modified PDF document to a new file.

Pradeep kumar · Answer 2 · 2016-12-14T07:12:21+0000

I have been working on the same requirement, and I can achieve this by following these steps.

Step1: Finding Pdf Source File and Destination File Path

Step2: Read the source Pdf file and find the location of the line we want to replace

Step 3: Replace the string with a new one.

 using iTextSharp.text; using iTextSharp.text.pdf; using iTextSharp.text.pdf.parser; using PDFExtraction; using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Web; using System.Web.UI; using System.Web.UI.WebControls; namespace PDFReplaceTextUsingItextSharp { public partial class ExtractPdf : System.Web.UI.Page { static iTextSharp.text.pdf.PdfStamper stamper = null; protected void Page_Load(object sender, EventArgs e) { } protected void Replace_Click(object sender, EventArgs e) { string ReplacingVariable = txtReplace.Text; string sourceFile = "Source File Path"; string descFile = "Destination File Path"; PdfReader pReader = new PdfReader(sourceFile); stamper = new iTextSharp.text.pdf.PdfStamper(pReader, new System.IO.FileStream(descFile, System.IO.FileMode.Create)); PDFTextGetter("ExistingVariableinPDF", ReplacingVariable , StringComparison.CurrentCultureIgnoreCase, sourceFile, descFile); stamper.Close(); pReader.Close(); } /// <summary> /// This method is used to search for the location words in pdf and update it with the words given from replacingText variable /// </summary> /// <param name="pSearch">Searchable String</param> /// <param name="replacingText">Replacing String</param> /// <param name="SC">Case Ignorance</param> /// <param name="SourceFile">Path of the source file</param> /// <param name="DestinationFile">Path of the destination file</param> public static void PDFTextGetter(string pSearch, string replacingText, StringComparison SC, string SourceFile, string DestinationFile) { try { iTextSharp.text.pdf.PdfContentByte cb = null; iTextSharp.text.pdf.PdfContentByte cb2 = null; iTextSharp.text.pdf.PdfWriter writer = null; iTextSharp.text.pdf.BaseFont bf = null; if (System.IO.File.Exists(SourceFile)) { PdfReader pReader = new PdfReader(SourceFile); for (int page = 1; page <= pReader.NumberOfPages; page++) { myLocationTextExtractionStrategy strategy = new myLocationTextExtractionStrategy(); cb = stamper.GetOverContent(page); cb2 = stamper.GetOverContent(page); //Send some data contained in PdfContentByte, looks like the first is always cero for me and the second 100, //but i'm not sure if this could change in some cases strategy.UndercontentCharacterSpacing = (int)cb.CharacterSpacing; strategy.UndercontentHorizontalScaling = (int)cb.HorizontalScaling; //It not really needed to get the text back, but we have to call this line ALWAYS, //because it triggers the process that will get all chunks from PDF into our strategy Object string currentText = PdfTextExtractor.GetTextFromPage(pReader, page, strategy); //The real getter process starts in the following line List<iTextSharp.text.Rectangle> MatchesFound = strategy.GetTextLocations(pSearch, SC); //Set the fill color of the shapes, I don't use a border because it would make the rect bigger //but maybe using a thin border could be a solution if you see the currect rect is not big enough to cover all the text it should cover cb.SetColorFill(BaseColor.WHITE); //MatchesFound contains all text with locations, so do whatever you want with it, this highlights them using PINK color: foreach (iTextSharp.text.Rectangle rect in MatchesFound) { //width cb.Rectangle(rect.Left, rect.Bottom, 60, rect.Height); cb.Fill(); cb2.SetColorFill(BaseColor.BLACK); bf = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, BaseFont.CP1252, BaseFont.NOT_EMBEDDED); cb2.SetFontAndSize(bf, 9); cb2.BeginText(); cb2.ShowTextAligned(0, replacingText, rect.Left, rect.Bottom, 0); cb2.EndText(); cb2.Fill(); } } } } catch (Exception ex) { } } } }

ITextSharp Replace text in existing PDF without loss of education

More articles: