The need for editing, deleting, or replacing text in a PDF is not new. However, due to the immutable format for final document versions that originally made PDFs the go-to documents for legal reasons, text editing within a PDF is not as cut and dry as one might think. To begin with, the text in a PDF is based on positioning within a page and is not necessarily linear.
In other words, what the finished product looks like, is not really how the text appears within the ‘code’ of the PDF document. Combine this with fonts, font sizes, and how to manage text that is not deleted, and it can be a daunting task to manage text in a PDF.
So the first question might be; why not just change the original document (word, text, or other word processing document)? The short answer is; that these documents may or may not be available. Many times, once the text is put into a PDF format, many people consider the ‘final’ version and sometimes delete original documents to save space.
This article will walk through the basics of the new features of Document Solutions for PDF (DsPdf, previously GcPdf) by showing a couple of examples of deleting text and replacing text programmatically using C#. In this blog, we will cover:
Ready to test Document Solutions for PDF? Download a FREE trial today!
As mentioned in the opening paragraphs, managing text within a PDF programmatically or otherwise is not as easy as it seems. Before we get into examples, we need to understand what is happening behind the scenes and the limitations and benefits of the new features in this API.
First, when working with text, it is necessary to understand what other text is around the text that is being manipulated. If one deletes a fragment, it’s important to understand what the text after that needs to do. It is necessary to check how the deleted fragment and those that come after it correlate, whether they belong to the same PDF operator or different ones, and whether there are text positioning operators between them or not. When calculating the text position, the current transformation matrices must be considered for the deleted and "shifted" text.
Knowing this information, it may become obvious that it is not always possible to correctly recalculate the position of the text exactly, like if the fragment being deleted is contained within the FormXObject, and the text after it is not.
To try to manage this, DsPdf implements two text deletion modes to help manage this appropriately:
With the new features, there are several new methods and properties added to the current API within the interface ITextMap:
It is important to note that the Page and GcPdfDocument classes have DeleteText() and ReplaceText() methods, but they all work via ITextMap, e.g., Page.DeleteText() creates a text map for the page and calls its DeleteText() method.
Here is a link to the complete API Reference.
The following code example explores the use case of a rental lease. We look at replacing names, addresses, and other pertinent information in one lease for the lease to be created for a different individual. In this case, we are changing names and information from “Jane Donahue’s” to “John Doe’s” information. The code is commented on heavily to help understand how the tasks are accomplished.
// delete word "wetlands" from the first page using DeleteTextMode.Standard using (FileStream fs = new FileStream("Wetlands.pdf", FileMode.Open, FileAccess.Read, FileShare.Read)) < GcPdfDocument doc = new GcPdfDocument(); doc.Load(fs); FindTextParams ftp = new FindTextParams("wetlands", true, false); doc.Pages[0].DeleteText(ftp, DeleteTextMode.Standard); doc.Save("wetlands_deleted.pdf"); >// delete word "wetlands" from the first page using DeleteTextMode.PreserveSpace using (FileStream fs = new FileStream("Wetlands.pdf", FileMode.Open, FileAccess.Read, FileShare.Read)) < GcPdfDocument doc = new GcPdfDocument(); doc.Load(fs); FindTextParams ftp = new FindTextParams("wetlands", true, false); doc.Pages[0].DeleteText(ftp, DeleteTextMode.PreserveSpace); doc.Save("wetlands_deleted_PreserveSpace.pdf"); >// delete word "wetlands" from the document using (FileStream fs = new FileStream("Wetlands.pdf", FileMode.Open, FileAccess.Read, FileShare.Read)) < GcPdfDocument doc = new GcPdfDocument(); doc.Load(fs); FindTextParams ftp = new FindTextParams("wetlands", true, false); doc.DeleteText(ftp, DeleteTextMode.Standard); doc.Save("wetlands_deleted_doc.pdf"); >// replace word "wetlands" with "WETLANDS" in first page using (FileStream fs = new FileStream("Wetlands.pdf", FileMode.Open, FileAccess.Read, FileShare.Read)) < GcPdfDocument doc = new GcPdfDocument(); doc.Load(fs); FindTextParams ftp = new FindTextParams("wetlands", true, false); doc.Pages[0].ReplaceText(ftp, "WETLANDS", null, null); doc.Save("wetlands_FirstPage.pdf"); >// replace word "wetlands" with "WETLANDS" in document using (FileStream fs = new FileStream("Wetlands.pdf", FileMode.Open, FileAccess.Read, FileShare.Read))
This first demonstration replaces a name (and some other items) within a Lease Agreement. Below is a small piece of code showing the replacement names, followed by the original and updated PDF (screenshots only).
// Replace: // "Jane Donahue" -> "John Doe" // "(123)098-7654" -> "(007)123-4567" // "janed@example.com" -> "johnd@example.com" doc.ReplaceText(new FindTextParams("Jane Donahue", false, true), "John Doe"); doc.ReplaceText(new FindTextParams("(123)098-7654", false, true), "(007)123-4567"); doc.ReplaceText(new FindTextParams("janed@example.com", false, true), "johnd@example.com"); // "13-Dec-20 22:16:00" -> date now // "13-Dec-22 22:16:00" -> date now + 2 years var termStart = DateTime.Now; var termEnd = DateTime.Now + TimeSpan.FromDays(365 * 2); doc.ReplaceText(new FindTextParams("13-Dec-20 22:16:00", false, true), termStart.ToShortDateString() + " " + termStart.ToShortTimeString()); doc.ReplaceText(new FindTextParams("13-Dec-22 22:16:00", false, true), termEnd.ToShortDateString() + " " + termStart.ToShortTimeString());
To get the complete code for these demonstrations, be sure to download it here - Once downloaded, follow the instructions below to run the demonstration.
Step 1: Unzip the demo package
Step 2: Open the solution (DeleteText.sln or ReplaceText.sln)
Step 5: Run the application and check out the results!
Feel free to contact us with questions or comments, and happy coding!
Try Document Solutions for PDF today, download a FREE trial!