PDF-Redaction/Redactor is a Java-based library designed to physically remove sensitive information, including PII, from PDF documents building upon the Apache PDFBox library for processing PDFs. Unlike simple annotation tools that merely place a black box over content, this engine intercepts the PDF content stream to modify or delete the underlying text and mask image data. As well as removing all metadata from a PDF document.
PDF-redaction/PDF-redactor is completely free. Apache 2 licensed, completely open-source for you to use as you like. PDF-redaction/redactor does not require any online services, it is a pure Java library.
Tj, TJ, etc.) to strip sensitive characters while maintaining visual layout.Add the following dependency to your build file:
implementation 'nz.peter.pdfredaction:pdf-redaction:1.0.2'
<dependency>
<groupId>nz.peter.pdfredaction</groupId>
<artifactId>pdf-redaction</artifactId>
<version>1.0.2</version>
</dependency>
The following example demonstrates how to initialize the redactor, define manual regions, set a keyword list, and apply the changes.
// Load your PDF using PDF Box
PDDocument document = Loader.loadPDF(new File("input.pdf"));
// Instantiate the redactor
PdfRedaction redaction = new PdfRedaction();
// Parameters: (PDDocument, listOf("words to redact"), listOf(PageRectanges()))
redaction.redact(
document,
// look for and remove these words on all pages
Arrays.asList("confidential", "SECRET_WORD"),
// redact images @ x=10,y=10,w=100,h=100 on page 1 (the first page)
Collections.singletonList(new RectangleOnPage(1, 10, 10, 100, 100))
);
// Save the modified document into a new PDF with all its metadata removed
document.save(new File("output_redacted.pdf"));
document.close();
gradle jar
# output: ./build/libs/pdf-redaction-1.0.2.jar